Kevin represents Eckerdz @ re:Invent 2025

×

Behind the curtain: how Amazon's AI innovations are powered by AWS

Director of Technology Introduction
Director of Technology Paul something comes out, surprise, wearing sneakers with a sport coat and jeans.

Three companies for this case study: Amazon Store, Zoox, and Prime Video.
Prime Video Scale
Prime: 200 million+ users, 9 billion items delivered same day or next day, Prime Day is our Super Bowl. Prepare with 26+ service teams, 10 weeks of status reports, 45+ scaling strategies deployed.

On Prime Day, over 40% is powered by Graviton.
AWS Service Scale
ElastiCache serves 1.5 quadrillion daily requests, 1.4 trillion requests a minute.

EBS (Elastic Block Store): 20.3 trillion IO operations a day.

DynamoDB: <10 ms responses, CloudFront: a bajillion.
Outposts and Robotics
Outposts: 524 million commands to 7k robots, 8 million commands/hour, up to 160% vs last year.
Rufus AI Assistant
Rufus uses a lot of AWS stack, load balancing, VPC (Virtual Private Cloud), etc. It uses Bedrock to pre-generate questions when someone goes to a product page.
Scaling for Prime Day
How to scale this for Prime Day:
- 87k training and inferential chips across 3 regions
- 3 million tokens served per minute on average
- <1 ms latency maintained
- 4.5x reduction in costs using AWS silicon
- 54% performance per watt improvement
- 250 million active customers 2025
Core Belief
Our core belief: everything should be amplified by AI (not replaced).
Amazon Store SVP Presentation
Amazon Store SVP comes up and starts talking about AI some more:
- $2.01B in annual cost savings from AI automation and defect elimination
- 20,930 agents built since July 2025
Developer Time Spent
Developers spend enormous amounts of time:
- Writing specifications
- Documenting code
- Attending meetings
- Aligning roadmap details
- Communicating with stakeholders
- Conducting design reviews
- Performing functional and integration testing
- Managing security compliance
- Investigating system alerts
- Routine system maintenance
AI can help with ALL of this.
Spec Studio
Amazon Store internally created: Spec Studio: an agentic system that dynamically documents a library's capabilities. Codebase > Spec Studio > makes a spec > give to Kiro > new code. Like it's own SLA (Service Level Agreement).

It has gone viral in Amazon. 107% month over month adoption, 15.4k specs generated.

This can take existing codebase and services and move to an AI native developmental approach.

4.5x improvement in developer velocity; and 75%+ of Store's developer teams to use this approach next year.
AI Impact Summary
AI is enabling Amazon to do more for its customers. Agents are substantially increasing our productivity. Developers are rapidly embracing AI native development.

Amazon is reinventing this process: intelligence layer (Bedrock AgentCore), AI native identity (Kiro), data fountain (Amazon QuickSight), ALL powered by AWS.
Amazon.com Conclusion
Amazon.com guy concludes with: "Thank you. Let's go build." BARF (so sick of that cliche statement).
Zoox Introduction
Next up Zoox, those self driving car things. Millions of computations to make it work.

What powers a Zoox? A billion sensors. It has to evaluate everything in real time. AI stack built from ground up since it didn't exist to begin with. Models on tens of thousands of GPUs to run them continuously. A continuous feedback loop. Data from the road, improve models, make it safer.
Zoox Storage and Analytics
S3 (Simple Storage Service) is the source of truth: source of truth for sensor data, highly scalable, reliable, multi tier and cost efficient storage.

Amazon EMR (Elastic MapReduce) and Athena too.
GPU Bottleneck
GPU is the bottleneck here, but expensive.

Use SLURM for scheduling, priority based scheduling etc. Minimize transfer costs and reduce latency by using the right region. GPU capacity: have to use timely GPU access.
SageMaker HyperPod
We use SageMaker HyperPod to train our models efficiently. Uses EFA (Elastic Fabric Adapter) optimized for high throughput low latency communication. Auto recover from failures, and rich set of GPU observability.
EC2 Capacity Blocks
EC2 Capacity Blocks helps us use train-optimized GPU (procure state of the art training optimized GPUs for a short period of time to handle spiky ML workloads). Efficiency: give access to the latest GPU types like the new P6 instances, allowing us to pivot to newer GPUs as they become available.

AWS is able to help us provide on demand reserved capacity for this large number of GPUs, which we need only for a short duration of time. Resourcing GPUs is a delicate balance between cost efficiency and performance, but with the elasticity of AWS, we can optimize for both.
Prime Insights / NASCAR
Then some Prime Insights guy talks about NASCAR for like 30 minutes, how they stream sports and use a bunch of AWS stuff for it. 100s gigabits of traffic on AWS Direct Connect.
Allen Iverson Video
Then a short video of Allen Iverson (the original AI) talking about how it's cool that Amazon is using AI.

×

AWS Storage Beyond data boundaries: building the data foundation

Building Blocks
BUILDING BLOCKS, we use that a lot around here. The services, the primitives that we build, are rarely used directly by end consumers. They're used by builders who build products on top of AWS.

500 trillion objects in S3 (Simple Storage Service), 200 million requests per second worldwide.
Fundamentals
Fundamentals: security, availability, elasticity, performance, durability. These are necessary to make the tools easy to use.
S3 Rewrite in Rust
We've rewritten the entire data path of S3, from start to finish, we rewrote that in Rust. We have to rewrite stuff all the time to match scale. 80% is quite innovative work.
Conditional Writes
e.g. last year we added conditional writes. This helps prevent RACE conditions for customers writing to S3 from many places, and they don't have to add additional complexity to their systems to prevent. "If this thing does not overwrite thing I expect to overwrite, do not overwrite it"
Other Innovations
Other innovations in last year: object size limit increase (today, to 50TB). Conditional COPY and DELETE; object rename in Express One Zone; batch operations on prefixes.
S3 Architecture: JBOD Model
S3 is built on hard drives. JBOD (Just a Bunch Of Disks) model - there's a server sitting underneath a bunch of hard drives. We drive to get more efficiency on our data centers. So we put larger hard drives in our shells. Drive the density per server to higher levels, optimize cost and efficiency. Put JBOD in a rack, put that rack in a row, put those rows into a data center.
New Metric: Bytes Per Squared Foot
New metric: bytes per squared foot. What? Area metric of storage capacity.
Failed Project: BARGE
Failed project: BARGE. Single racks of 6PB (petabytes) of capacity.
Challenges with Dense Packing
As we move to these densely packed servers, we experienced: greater density, improved economics, availability exposure, poor flexibility. The scale was getting unwieldy. So, move to something more flexible?
Metal Volumes
We pulled in folks from EC2 (Elastic Compute Cloud) and EBS (Elastic Block Store) teams, and we completely reinvented the storage construct. We built: METAL VOLUMES.

Nitro cards in there, and striped disks. Security virtualize them. So they're running on diskless EC2 servers now, not physical. Gives greater degree of availability, remap the disk to alt server when server fails.

Better flexibility: can scale up, etc.
Defragmentation Challenge
Challenge: as data is deleted, you get holes. We still have to defragment the disks in the background. Read, scrub, free up space, etc.

With metal volumes that meant moving over network. But we work with Nitro now. Use Nitro offloads, extend the NVMe (Non-Volatile Memory Express) space and use it.
Metal Volumes Status
We're early here. 53EB (exabytes) of capacity on metal volume. Metal volumes consume 10% less power than traditional storage rack designs.
S3 Express Progress in 2025
S3 Express progress in 2025: price reductions (up to 85% price reductions for Express One Zone storage class). Up to 2M requests per second.

Meta puts their stuff in Amazon S3 Express One Zone. 140 Tbps sustained data transfer to S3 Express One Zone, 1M+ transitions per second, 60PB of storage.
S3 Vectors
S3 vectors: first cloud object store with native support to store and query vectors. 250k+ vector indexes created, 40B+ vectors ingested, 1B+ queries performed.
Why Vectors Matter
Who cares though? Why use vectors? The way we work is changing. Finding the right data and knowing where to step in, choosing the right data to answer a question, that's where vectors step in. You can't label everything yourself, but you can take your data, turn it into a vector representation (just a list of floating point numbers), put that vector in a space, then find your results. Vectors is also used in radiology, fraud detection, etc. Vectors are emergent as a niche tool.
S3 Vectors GA
New! S3 vectors GA (Generally Available). Billion scale vector store offering up to 90% lower costs for uploading, storing, and querying vectors.

Designed for 100ms, warm query latency.

Up to 2B vectors stored & queried per index, 40x the preview capacity.

You can upload up to 1,000 vectors/second when streaming single vector updates, up to 100 search results/query. 10,000 indexes per bucket, supporting up to 20 trillion vectors.
Vector Storage Architecture
Treat vectors like a network of trees. Finding nearby vectors mean hopping through memory a lot. That doesn't work great when your data is entirely in S3 and you have hard drive style latencies. Many round trips. To get elasticity, we had to approach this as an S3 problem. So we take these neighborhoods, coded them as objects in a vector bucket, and then S3 promotes these to memory, leaving that data resident long enough for you to handle queries, then eventually it'll go back to using the storage based vector.

So traditional vector DB + throughput backend.
BMW Case Study
BMW does it too: petabyte scale hybrid research for quality insight. Hybrid semantic and SQL search built on S3 vectors, Amazon Bedrock Titan embeddings, and Amazon Athena.

20PB of structured and unstructured data discoverable through natural language. Millions of records queried in Athena.
Customer Quote
"We migrated our 85 petabyte data lake to S3 tables, we streamline the infrastructure and reduce costs."
New S3 Tables Features
NEW! Replication support for S3 tables. Replicate Iceberg tables across AWS Regions and accounts.

NEW! Intelligent tiering for S3 tables. Automatically optimize costs based on access patterns.

NEW! Supporting Apache Iceberg V3. Available with Apache Spark on EMR (Elastic MapReduce), AWS Glue, Athena, S3 tables, SageMaker notebooks, etc.
Data Value and Metadata
There's enormous value in your data. Opportunities to do stuff with it. But you have to cure it. You end up with metadata layers over the data you have, and 70% of the S3 customer conversations I have are someone working on this thing, but wanting remove the undifferentiated heavy lifting of building those metadata layers, and get you a vector API structure that you can trust other tools to use, so that's why we launched S3 metadata, we continue to mature it, and now it presents a full inventory review of your data.

The managed table of metadata is read only, its high integrity.
S3 Access Points for FSx
NEW! S3 access points for FSx. Like ONTAP.

×

Network observability: picking the right tool for the job

Tools in the Toolbox: Telemetry Sources
Networking Metrics: Available from EC2 (Elastic Compute Cloud), Transit Gateway, CloudWAN, Direct Connect, PrivateLink, and other networking services.
Logs
Various log sources provide detailed network activity information:
- VPC Flow Logs: Capture IP traffic flow information for your Virtual Private Cloud (VPC)
- ELB Logs: Elastic Load Balancer logs showing request and response data
- Route 53 Logs: Domain Name System (DNS) query logs
- New Logs: Additional logging capabilities as they become available
- CloudTrail: API call logs for auditing and compliance
Network Monitoring Tools (Fairly New)
Three relatively new monitoring tools:
- Network Flow Monitor: Analyzes network traffic patterns and performance
- Network Synthetic Monitor: Proactively tests network connectivity and performance
- Internet Monitor: Monitors internet connectivity and performance from end-user perspectives
AWS Network Manager Infrastructure Performance: Provides infrastructure-level performance insights.
Storage Metrics
Metrics and logs for storage services:
- CloudWatch Metrics: Standard metrics for AWS services
- CloudWatch Logs: Centralized log management
- Amazon S3: Simple Storage Service metrics and access logs
- Amazon Kinesis Data Firehose: Streaming data delivery service metrics
Analysis and Visualization Tools
Tools for analyzing and visualizing network data:
- CloudWatch Metric Filter: Filter and transform metric data
- Contributor Insights: Identify top contributors to performance issues
- CloudWatch Logs Insights: Interactive query tool for log analysis
- Amazon Athena: Query data in S3 using standard SQL
- AWS Network Manager: Centralized network management and monitoring
- CloudWatch Dashboards: Customizable dashboards for metrics visualization
- Amazon Managed Grafana: Managed Grafana service for visualization
- Amazon OpenSearch Service: Search and analyze log data
- Amazon QuickSight: Business intelligence and visualization tool
Remedial Actions
Automation tools for responding to network issues:
- Amazon EventBridge: Serverless event bus for event-driven architectures
- AWS Lambda: Serverless compute for automated responses
Other Network Analysis Tools
VPC-Specific Tools:
- VPC Reachability Analyzer: Tests network connectivity between resources
- Network Access Analyzer: Analyzes network access patterns and security
- VPC Traffic Mirroring: Copy network traffic for analysis
System-Level Tools: At the system level, you can use:
- netstat: Network statistics and connection information
- tcpdump: Packet analyzer for network troubleshooting
- dig: DNS lookup utility
- iPerf: Network performance testing tool
- mtr: Network diagnostic tool combining ping and traceroute
CloudWatch Dashboard and Internet Monitor
CloudWatch Dashboard: Uses both low-level and high-level probe tools for networking. It is based on your traffic. Depending on where users are connecting from, the Internet Monitor can make recommendations about regions to replicate for better performance, reduce latency, and may suggest CloudFront instead of us-west-1 for optimal content delivery.
Application Load Balancer (ALB) Telemetry
ALB telemetry can show you headers in the logs such as 403 errors, indicating if WAF (Web Application Firewall) blocked something. When you see these indicators, you can move over to WAF telemetry to see what happened and why the request was blocked.
Backend Network Analysis: Transit Gateway
This covers the frontend. Now let's look at the backend. ELB (Elastic Load Balancer) connects to Transit Gateway. Is the gateway working?

Transit Gateway Telemetry:
- Metrics: Performance and utilization metrics
- TGW Flow Logs: Detailed flow logs for Transit Gateway traffic
VPC Reachability Analyzer
Flow logs could be too much data if you just want to quickly look at whether resources can connect. Use Reachability Analyzer instead. It will evaluate security groups, network rules, and tell you if connections work or are broken, without requiring you to parse through extensive log files.
Network Flow Monitor
You can also look at Network Flow Monitor, which allows you to install agents onto your target instances and then perform analysis of the traffic. It can identify any timeouts and analyzes TCP (Transmission Control Protocol) traffic at the IP (Internet Protocol) level, providing detailed insights into network performance and connectivity issues.

×

Reinventing software with AI agents

Exciting Time for Software Development
Exciting time: code smarter, build faster, AI transforms how we build, rapid ideas to implementation (warning this will probably just be a long commercial for Kiro).

AI as a collaborative partner, gets smarter from every interaction, transforms how we build software.
Innovator Flywheel
Innovator flywheel: design, implement, testing and integration, deployment, maintenance, planning & analyses.
Traditional vs. AI Editors
Traditional editors: developer drives and provides. AI editors: the developer steers the AI agent to author and review code.
Blurring SDLC Phases
Blurring SDLC (Software Development Life Cycle) phases, like planning and delivery. Now you can quickly get prototypes for someone before even any code is written. Now you take someone's designs and tell it to your friendly AI agent and let it do the work. AI-assisted iteration.
Enter Kiro!
Enter Kiro! Bring structure to AI coding with specs. Automate tasks with agent hooks. Built from the ground up for working with agents. Agent hooks automate routine tasks in the background.
Property Based Testing
Property based testing: measure whether your code matches the behavior defined in your specs. Can generate hundreds of thousands of random test cases to test your code.
Prompt to Code to Deployment
Prompt to code to deployment in your terminal: automate workflows in seconds. Analyze errors and trace bugs with precision.
Unleash Custom Agents
Unleash custom agents: build task-specific agents optimized for your best practices through pre-defined tool permissions, context, and prompts.
Demo
Shows demo of having it write the app for you, put the stuff in S3 (Simple Storage Service), showing a screenshot of an image that needed to be cropped and the agent responding very fast to all of the requests.
Job Satisfaction
Job satisfaction goes up because "they all had fun doing it."
Moving Forward
Move from tasks to goal driven direction; scale out concurrent AI tasks to increase velocity; extend agentic AI assistance to every aspect of software delivery.
Kiro Powers
Kiro powers: empower Kiro agents with specialized expertise; on-demand specialization of agents with dynamic context; curated best practices from partners and experts; full-stack dev to deployment use cases.
New Class: Frontier Agents
New class of frontier agents: Kiro autonomous agent, AWS Security Agent, and AWS DevOps Agent. Massively scalable, multiple concurrent tasks, work independently, sometimes for hours or days without intervention.
Frontier Development Agent
The frontier development agent that extends your flow: works autonomously, maintains context, executes across repos.
Kiro GitHub Integration
You can tag Kiro in GitHub like on a task, and Kiro will respond and then start working on it automatically. It already understands, so it'll go analyze the repo, validate the plan, explores the project, maps the frontend/backend, looks at handlers and API (Application Programming Interface) configs, data models, and it will align itself with an approach with the way the system was built. Agent is not session based so it will not forget.
Future Vision
Future vision: human + agents = 10x faster; multi-agent systems; conductor, not just player.
Getting Started
We will be with you through this journey, exciting future, achieve together. Start building with Kiro. 1k Kiro credits to get started, startups can apply for up to one year's worth of Kiro Pro+ tier.

×

Intelligent security: protection at scale from development to production

Changing Security Landscape
AI/other advancements are changing how products are built, and how we defend them; traditional approach no longer good.

Watch out for stuff like "prompt injection" (prompt buried inside content), AI agents are acting more autonomously, so bad guys are planning for this in their threats.

Now we have 99% more faster log retrieval time; we need to move faster.
Security Needs
We need to change where and how we're carrying out our work so we can scale.

We need to be faster adapting to change.

We need to work side by side with the business we support to stay grounded.

Don't approach security as one size fits all - your security team will own and develop tooling.
AWS Internal Security Investment
At AWS internally we create and invest in: security primitives (so don't have to reinvent), helping across the product lifecycle, and scalable ways for our team to build (we use internal tooling and external tooling).
Making Encryption Easier: s2n-tls
Making encryption easier: an open source implementation of the TLS (Transport Layer Security) protocol, we looked at OpenSSL and noticed it was 500k lines of code, how do we know that's secure? So we wrote and eliminated a bunch of extra features not needed, then took this down to 6k lines. Released to community in a way we know it would be secure and we could understand it, this was in 2015. We called it s2n-tls.

Then in 2022 we did s2n-quic, providing support for the QUIC protocol, and also support for post-quantum key exchange.
Simplifying Security Testing
Simplifying security testing: when a change is made to an API, an internal system (AI?) analyzes changes to that API and creates a bunch of tests to cover the entire landscape of that API for edge and normal cases, so the builder doesn't have to do all that fiddly work every time.
Assessment Preparation
What do we already know? Assessment preparation, initial compliance assessments. It used to be pretty manual ("vocally self-critical," an Amazon phrase).
Internal Active Defense Tools
Our internal Active Defense tools: Blackfoot (Network Address Translation at scale). MadPot (sensor system and automated response capabilities), Mithra (massive neural network graph model evaluation reputations), Sonaris (network traffic behavioral analysis by analyzing network traffic). At Amazon you think you know what scale means, then a month in, you are like, I didn't know what that word meant.

Blackfoot translates 312 trillion flows a day, MadPot finds 550M malicious activities a day, Mithra takes over 200k malicious domains a day, and Sonaris blocks almost 5 billion scans a day.
AWS Security Services
We guide not only our internal protections, but we can provide additional protection to you through AWS Shield, Amazon Route 53 Resolver DNS Firewall, AWS WAF (Web Application Firewall), AWS Network Firewall, GuardDuty, Amazon Inspector.
AI Scaling
AI is helping us achieve our work faster, and that expands the ways we can scale to protect our customers. Gen AI is helping us do the things we were already doing better, and that's how you scale to meet the demands of this new phase we're edging into.
Denied Party Screening
Denied party screening: does this transaction adhere to global sanction requirements? Everyday we answer this question over 2 billion times. 96% overall accuracy, outperforming the previous approach for 60% of our volume. Less figuring out what it can do, and more leveraging to use it at business scale.
Understanding Teams
Deeply understanding how teams build lets us weave security expertise into the way that they already operate, scaling right alongside them.
Intentions vs. Mechanisms
Intentions: try harder next time, be more careful, communicate better, remember to do it, pay closer attention. Mechanisms: automate the alert, bake it into the deployment pipeline, trigger an automatic rollback, add a guardrail in the code, enforce it through policy as code. e.g. the lines on a freeway are intentions but the guardrails are mechanisms.
AWS Security Intentions
AWS Security Intentions: block public access (implement block public access controls for all AWS services using resource based policies), IAM (Identity and Access Management) integration (Use Daffodil library for IAM authorization in API services), MCP AuthN (all MCP servers must implement client authentication on every incoming request), MCP Logging (all MCP servers that offer write APIs must log caller information for auditing).
Frontier Agents
Yesterday at the keynote we introduced frontier agents: autonomous agents that can work for a long time without requiring human intervention (in Kiro). Three kinds: Kiro Autonomous agent, AWS DevOps Agent, and AWS Security Agent (custom and managed security requirements, automated security design reviews, continuous security code reviews, on-demand penetration testing) - turning intentions into mechanisms.

So specs that it generates (like my design review) becomes mechanisms. Block public access: non-compliant. Authentication best practices: compliant. Trusted cryptography best practice: insufficient data. In a finding, it will tell you why something is or non compliant, and a remediation guidance.
Daffodil Library
Daffodil is internal to use, because it's a formally verified IAM library that helps other teams to integrate IAM correctly, it is mathematically proven correct. But the proof doesn't do them any good if they don't know about Daffodil, you have the best things to use, but if you don't know it exists, how will you use it? Design review alone would catch this gap.
Your Security Intentions
Now for you: your security intentions. Session limits (all user sessions must timeout after 30 minutes of inactivity), TLS minimums (all external communications must use TLS 1.3 or higher), third party libraries (third party libraries must be from your company's approved library list), Centralized AuthN (all authentication must use your company's centralized authentication service).

Go to AWS Security Agent today, and turn your intentions into your security mechanisms.
Adaptive Security
You'll need security to be adaptive to the changing realities outside and inside of your organization. Be adaptive to the changing realities both inside and outside of your organization.
APT29 Watering Hole Campaign
Check out blog post: Amazon disrupts watering hole campaign by Russia's APT29. The actor had compromised legit websites and injected JS (JavaScript) that redirected about 10% of visitors to actor controlled domains (like mock CloudFlare), in order to secure the shipment. It was a Microsoft code authentication flow, and our analysis of the code revealed some evasion techniques like using randomization to only direct 10% of visitors, and employing obfuscation coding to hide the malicious code, setting cookies to avoid repeated redirects, no compromises of AWS systems happened here. But we saw this was going on, and we need to protect the security of the internet. We worked to isolate the affected instances, trap the operations, and share relevant info with our partners. Why care? This is just one example of increased evolution and scale of attacks that we will all have to defend against and figure out how to defend against together.
Amazon Inspector Token Farming Campaign
Another example: Amazon Inspector detects over 150k malicious packages linked to token farming campaign. Harvested credentials, used those to self propagate, detection is more challenging when the adversary is using legit credentials and doing normal stuff, so we collaborated with security researchers and deployed a new detection rule that we paired with AI to identify suspicious package patterns in the NPM (Node Package Manager) registry, we were able to use indicators like circular dependencies, or absence of a particular config file, to get a better idea of when this package was suspicious. We identified over 150k, that were linked to a particular campaign. We worked with OpenSSF (Open Source Security Foundation) to coordinate our response. This will continue. Teams need to be adaptive.
Development Distance
What's happening with development right now? Builders are further from the code. Testing is critical. Gen AI augmented code development is another example of distance being inserted between the dev and running code in prod. We've gone from low level languages to IDE (Integrated Development Environment), then test driven development, then keep increasing distance. Gen AI will do that even more. This is a real opportunity for security teams, because today if your test is a little off but a human is involved with their human judgment, you can catch when something is going off the rails. But in the future, if tests are off and the human is not near it, things can go off from security perspective. Our devs that we support, they will need the tests to be robust. This could let us automate more fixes for builders.
Supporting the Shift
What are we doing to support this shift? Joint security and builder experimentation; iterate automation; adapt and scale together.
Agent Success
Success comes when an agent does one thing well. A collection of specific agents outperforms something more general. Then assemble them into something larger and more magic. If an agent does one thing well, it is easier to reason about securing the agent itself. On-call agent, reliability-agent. Therefore you know what permissions it needs (least privilege, etc).
Measuring the Right Things
Are you measuring the right things? Not the number of findings, or what kind. But how many of them have you fixed? How many can you get under 3 minutes, what's your P50 time? How long is the longest tail open? What do we do about that? How many builder actions are involved in taking this stuff and fixing it? Come up with better ways to automate and scale your practice. You need to measure the right things in order to stay agile and adapt, even at scale and even through fast change. Building together with your business is key to scaling successfully.
Automatically Identifying Security Opportunities
Automatically identifying security opportunities. We operate at a massive scale and we need to break our workflow down to manageable chunks.

Our work is tracked through tickets. These tickets get into a follow up queue so we can help builders when we need to, and this happens for all AWS services.
CVE Growth
Growing increase in CVE (Common Vulnerabilities and Exposures) publications: went from 40,703 in 2024 up 22% to 49,500 in 2025. A single CVE might impact tens of millions of assets. Need to understand and act quickly.

We're training an AI based assessment engine that evaluates all the CVEs that will assess details like instance types, parameters, and help us identify false positives and evaluate risk across millions of assets.
Security Tooling and the Security Ratchet
Security tooling and the security ratchet. As Amazon we're basically an API vendor, so we need to know the security of a vendor. Back in the day we'd have a static approach, have to look at the API and what it's trying to do and refine security from here. But now, we're automating helpers that can give us a richer picture of the system's overall context and how they connect to each other.
Business Understanding
What is your business trying to achieve? Where can the overall system improve? Deeply understanding these two points is a critical first step. Ask questions end to end and notice when things around you are changing. Cost to build, cost to secure, cost to deploy. Gen AI is changing the first and last of those. Now the middle one is finally catching up.
Agentic World Requirements
What needs to be different as we move into an agentic world?

Supporting agents in production: securely execute and scale agent code, remember past interactions + learning, identity and access controls for all agents and tools, agentic tool use for executing complex workflows, discover and connect with custom tools and resources, understand and audit every interaction.

Runtime > Memory > identity > gateway > code interpreter > browser tool > observability/telemetry.

That's why we built AgentCore.
Working with Business
Working hand in hand with the business you support is crucial. You can't defend what you can't see.
Compliance Checklist Example
Example Compliance checklist for supply chain: SOC 2 Type II certified, annual pen testing, ISO 27001 compliant, NIST (National Institute of Standards and Technology) framework aligned, MEA, encryption at rest and transit, incident response plan, regular security training, vulnerability scanning, access controls documented > all clear!
Security as Wet Blanket
Security can feel like a wet blanket (well kid you're gonna shoot your eye out). AI can help though….
Key Takeaways
Work backwards from the business outcomes and reduce developer friction. Security is part of what any business delivers to its customers. It's not easy, but three things that can make it simpler: embedding expertise, keeping focus on the end risk and adapting, and maintaining a culture of building with your business.

Security teams need to be builders too.

Measure the right things to stay agile and adaptable.

And yes, AI can help!

×

Maximizing block storage performance for high-intensity workloads (STG319)

EBS (Elastic Block Store) Fundamentals
EBS Volumes: Amazon Elastic Block Store (EBS) provides scalable, high-performance block storage volumes that can be attached to Amazon EC2 (Elastic Compute Cloud) instances. EBS Snapshots offer incremental point-in-time copies of volumes, enabling efficient backups and disaster recovery.

Data Services: EBS includes elastic volumes (allowing dynamic resizing), provisioned rate for volume initialization, and time-based snapshot copy capabilities.
Understanding Your Workload Types
Different workloads require different storage characteristics:
- Databases: MySQL and other relational databases need low latency and high IOPS (Input/Output Operations Per Second)
- Data & Analytics: Kafka, Splunk, Hadoop, data warehousing workloads require high throughput
- Media: Transcoding, encoding, render farms need consistent performance
- File Systems: CIFS (Common Internet File System), NFS (Network File System), archive storage have varying requirements
Volume Type Selection: io2 vs gp3
io2 Volumes: Best for relational databases requiring very low latency, high IOPS, and medium throughput. Features include:
- Up to 256,000 IOPS per volume
- 99.999% durability guarantee
- Average latency under 500 microseconds for 16KB I/O operations
- 10x fewer I/O operations exceeding 800 microseconds compared to general purpose volumes
gp3 Volumes: General Purpose SSD volumes designed for single-digit millisecond latencies 99% of the time. In practice, they often outperform this specification. Best practice: if you don't know which volume to use, start with gp3. You can provision workloads strategically—put journaling on io2 and other data on gp3.

Latency Comparison (Air Traffic Controller Analogy):
- gp3: 99% on-time performance means late once in 100 days. When delays occur, arrival time range is around 20 minutes.
- io2: 99.9% on-time performance means late once in 1,000 days. When delays occur, arrival time range is around 2 minutes. io2 is less likely to have delays and has smaller delay windows when they do occur.
AWS Fault Injection Service (FIS)
Fully Managed Fault Injection: AWS Fault Injection Service is a fully managed service for running fault injection experiments. It's easy to get started, simulates real-world conditions, and includes safeguards. It's part of the resilience lifecycle: test and evaluate.

EBS Volume Fault Injection: FIS can simulate stalled I/O and high I/O latency on EBS volumes to:
- Simulate real-world conditions
- Identify weaknesses in your architecture
- Improve recovery mechanisms
- Test observability tools when EBS volumes experience high latency
Four Pre-Configured Scenarios:
- Sustained Latency: Persistent latency on 50% read and 100% write I/O, 500ms for 15 minutes
- Increasing Latency: Gradual increase in latency on 10% read and 25% write I/O, from 50ms to 500ms every 3 minutes
- Intermittent Latency: Three latency spikes on 0.1% read and write I/O
- Decreasing Latency: Gradual decrease in latency on 10% read and write I/O
Example OLTP Database Configuration
Instance: m8g.4xlarge (16 vCPU Graviton4, 64GB memory)
- Network: 7.5 Gbps baseline, 15 Gbps burst
- EBS Throughput: 625 MB/s baseline, 1,250 MB/s burst
- EBS IOPS: 20,000 baseline, 40,000 burst
Storage Configuration:
- Boot Device: gp3
- Database WAL (Write-Ahead Log): io2 @ 10,000 IOPS
- Database Data: gp3 @ 20,000 IOPS & 1,000 MB/s
Storage System Queues and I/O Processing
Queue Architecture: EBS has multiple queues throughout the stack. When your application submits an I/O request:
- Application executes system call and puts request into queue
- File system picks it up and maps location to disk drive
- Disk drive has an allocation unit called a sector, plus maximum transfer size
- EBS as a virtualized storage system has different limitations than the end device
- Requests get split into smaller sub-requests across different parts of the SSD
- Everything merges together at the top of the stack and I/O is returned
Torn Write Prevention: Databases use double-write buffers and extra logging to protect against torn writes (partial writes during power failures). Typical block devices only guarantee sector-sized torn write protection. With appropriate file system configuration, Nitro EC2 instances offer torn write protection across larger I/O operations, often allowing reduction or removal of double-write buffers.
EC2 Status Checks and Metrics
Status Checks:
- StatusCheckFailed: Indicates something wrong with the instance
- StatusCheckFailed_System: Instance infrastructure issue
- StatusCheckFailed_AttachedEBS: EBS infrastructure issue
- StatusCheckFailed_Instance: Instance liveness issue
EC2 Auto Recovery: Automatically mitigates issues on your behalf. If EBS fails, you can fail over to a replica or recover in a different Availability Zone (AZ).

New Metrics: volumeAvgThroughput and volumeAvgIOPS help you right-size your EC2 instance and EBS volumes.
Example Medical Data App Configuration
Storage:
- io2 Log Volume: 10,000 IOPS
- gp3 Data Volume: 30,000 IOPS, 1,000 MB/s throughput
Instance: m8g.8xlarge (if more headroom needed, use 12xlarge)
- Transfer: 625 MB/s (1,250 MB/s burst), 20,000 IOPS (40,000 burst)
- All bandwidth dedicated to Amazon EBS
EBS Instance Performance Limits
Burst Instances: Give you extra performance when needed, but performance can drop off. You must be aware of these limits.

New Metrics - Instance Limit Status: Quickly identify performance issues related to EBS-optimized limits:
- InstanceEBSThroughputExceededCheck: Alerts when throughput limits are exceeded
- InstanceEBSIOPSExceededCheck: Alerts when IOPS limits are exceeded
Describe Instance API: Provides comprehensive information about CPUs, network, memory metrics, and different CPU configurations (important for licensing considerations).

Updating Instances: Cannot change number of CPUs or memory while instance is running. Must stop and restart (treat like a long reboot). In Auto Scaling, just update the launch template.
EBS Evolution: From 2008 to 2025
2008: Only Standard volumes available - 100 IOPS, clunky hard drives, no Quality of Service (QoS), instance performance was overshared.

2012: Provisioned IOPS volumes and optimized instances introduced.

2017: Nitro system ensures you always get the IOPS you need.

Now (2025): Up to 100 Gbps, 400,000 IOPS on r6in instances.
New: R8gb Instances
Specifications:
- Instance sizes up to 24xlarge
- 768 GiB memory
- 150 Gbps EBS bandwidth (nearly doubles performance from last year)
- 200 Gbps networking bandwidth
- Powered by AWS Graviton4
- 30% better compute performance than Graviton3
- Excellent for high-performance applications and NoSQL databases
Nitro System Architecture
Before Nitro: Software-based queue stack with many queues (networking has queues, storage has queues).

Nitro Card as DMA Engine: Direct Memory Access (DMA) engine that also handles encryption. As I/O data is pulled from the system, Nitro encrypts the payload. Another DMA engine puts it on the network. Data bounces through the Nitro card quickly. On the AWS side, same process: dequeue, if it's a read, check local SSD; if it's a write, handle replication. Caching occurs (reads cached more than writes, data not persistent). Then data populates back to your instance. Nitro is efficient with hardware offloads. The network is where AWS can take liberties because they own that infrastructure.
Cloud-Optimized Transport Protocol: SRD
TCP Limitations: Single path through network, requires strict ordering, usually involves the Operating System (OS).

AWS Scalable Reliable Datagram (SRD): Multi-path through network, retries in microseconds, application-level ordering, runs on Nitro hardware. AWS built SRD because TCP did more than they needed. By putting more logic in higher-level applications (like database replication), applications have more context about what needs to be transferred.

SRD Benefits:
- Good for VPC networking and other use cases
- Freedom to route packets using multiple paths
- For EBS, can route every I/O request through a different path
- Multiple paths to get to the endpoint
- Can complete I/O requests in any order (don't need to order data in an I/O request)
- Can send every I/O request on a different path
- React quickly to failures, route around failures in milliseconds
R8gb Innovation: With R8gb instances, AWS can finally do what they've been planning. Only one DMA instance that can pull data from instance, encrypt it, send it to the storage server. On the EBS storage server, AWS can now steer requests directly to the CPU responsible for handling your volume data. CPUs on both instance side and server side are optimized.
Example Hybrid Database Configuration (TiDB)
TiKV Cluster Instances: m8g.4xlarge
- 16 CPUs, 64GB memory
- 7.5 Gbps networking with 15 Gbps burst
- EBS throughput: 625 MB/s baseline, 1,250 MB/s burst
- EBS IOPS: 20,000 baseline, 40,000 burst
- Storage: Data volumes on gp3
Detailed Performance EBS Metrics
Cumulative Metrics: Total I/O operations, bytes, and time spent (microseconds) since volume attachment.

Performance Exceeded Metrics: Total time (microseconds) that either volume or instance performance exceeded provisioned performance since attachment.

I/O Latency Histograms: Show total number of I/O operations completed within each bin since volume attachment. Periodically poll to get ongoing statistics. Can be published to Grafana or other endpoints. Can run side-by-side with iostat and compare (they may differ, suggesting you need to tune your application). Can show demarcation points like network drop-off.
Elastic Volumes Behavior
Size Increase: Immediate size availability, but must resize filesystem to utilize new space.

IOPS/Throughput Increase: During optimizing phase for performance increase, will increase between original and new values.

Latency Impact: Latency may be impacted during optimizing phase. As optimization occurs, different blocks get sorted around.
Queue Theory: Why Latency Matters
Little's Law: L = λ × W
- L: Mean concurrency in a system
- λ (lambda): Mean rate at which requests arrive
- W: Mean time each request spends in the system
Concurrency: Useful measure of capacity (tells us limits) and also a measure of contention. If concurrency is high, so is contention. Air traffic controllers and runways are good examples of capacity.

Queueing in Storage: If Queue Depth (QD) = 1, latency will dictate IOPS you can achieve.
- 1 / latency = operations per second
- If average latency = 500 microseconds, each QD can achieve 2,000 IOPS
- Must use higher queue depth to take advantage of parallelism in EBS
- Higher queue depth = how to drive more traffic to your storage mechanism
Summary: Plan, Monitor, Optimize
Plan: Identify your Key Performance Indicators (KPIs) and select instance and volumes to fit your workload requirements.

Monitor: Use Amazon CloudWatch and other monitoring tools to track performance metrics, identify bottlenecks, and right-size your infrastructure.

Key Takeaways:
- Use io2 for relational databases requiring sub-millisecond latencies
- Use gp3 as default, upgrade to io2 when needed
- Leverage AWS Fault Injection Service to test resilience
- Monitor new metrics (volumeAvgThroughput, volumeAvgIOPS, instance limit status)
- Understand queue depth and latency relationship
- Take advantage of R8gb instances for high-performance workloads
- Use SRD protocol benefits for multi-path I/O routing

×

Peter Desantis keynote - Infrastructure Innovations (i.e. Why AI is sweet and you should never think about anything else ever)

AI Innovation and Cloud Requirements
What does AI innovation mean for the cloud? We've been delivering what it needs for a while. Security, availability, elasticity, cost, agility.

Blathers on and on about why AI requires big computers, and AI is sweet and booming and bla bla bla "infrastructure is more important than ever"
Bare Metal Performance and Nitro System
We always wanted bare metal performance, but there was a "virtualization tax." So we put Jitter in the shitter and developed AWS Nitro System. Server: customer instances and hypervisor; controller: networking, storage, management security and monitoring.
Nitro and Graviton in Computer Architecture
Nitro and Graviton enter a textbook "Computer Architecture: A Quantitative Approach." A quote: "AWS Nitro and Graviton chips demonstrate how custom architectures, grounded in first principles and measured…" ok slide finished.
Latest Hardware Innovations
Nitro v6, Graviton4, Trainium3, our latest hardware innovations.

Because we build both the processors and server and OS (Operating System), we can optimize across the full stack.
Traditional Cooling vs. Graviton Cooling
Traditional cooling approach: heat sink > TIM (Thermal Interface Material) > LID > TIM > SILICON.

Heat transfer is straightforward physics, every layer in the thermal path slows heat movement, so more resistance leads to higher junction temperatures and higher temperatures increase leakage and higher leakage increases power consumption. This is where inefficiency can really build up. Traditional CPU use this design because they must support many systems, and many. But since we control the entire system for Graviton, we can think differently.

Graviton: heat sink > TIM > silicon. We remove the lid with a layer of TIM, and that reduced resistance and allows heat to move more efficiently. Precision manufacturing, carefully selected materials, our fan power drops by 33%.
Virtuous Cycle of Silicon Development
Virtuous cycle of silicon development: develop silicon > expand workloads > find bottlenecks > improve design.
Core and Cache Architecture
Core > cache. Core needs data, it checks the cache. When it's not available, must go all the way out to main memory.

Core > L1 cache > L2 cache > L3 cache > memory.

We love big caches. The more data you can keep close to the core, the fewer slow memory trips you have to take.
Graviton4 Performance
Graviton 4: 30% better performance.

L2 instruction: misses per thousand instructions improved.
Graviton5
NEW! Graviton5: our most efficient CPU ever. 2x number of cores, 5.3x L3 capacity.
EC2 M9g Instances
NEW! Amazon EC2 M9g instances: 25% higher performance compared to M8g instances, best price performance in EC2 today.
Serverless Vision
What if a developer could just hand their code to AWS and have it run?

Serverless is the absence of server management.
AI Workflow
AI workflow: prompt > tokenization > prefill > decode > detokenization > response.

Requests > latency tiering (priority, standard, flex) > fairness (customer 1, 2, 3, 4) > GPU Cluster.

Journal: Submitted > queued > dispatched > in progress > completed (or failed).
Secure AI
Secure AI: secure model weights, zero operator access, cryptographic attestation for models.
Nova Multimodal Embeddings
Back to vectors: knowledge is everywhere. Text, document, image, video, audio. You can't search vectors across different embedding models, because those abstract dimensions that we talked about likely hold entirely different concepts. So this is the challenge we set out to handle with:

Nova Multimodal Embeddings: industry's first unified embedding models. Unified understanding of data.
Amazon OpenSearch Vector Driven
For example, Amazon OpenSearch is now vector driven. Don't have to choose between a traditional keyword or semantic search: hybrid search gives you the keyword based precision, and semantic search allows you to get even better results.
S3 Vectors
With S3 vectors, we reduce the cost of uploading, storing, and querying vectors by up to 90%: store billions to trillions of vectors without infrastructure setup or provisioning.

We're storing them in S3 to be economical, not memory, so how to reduce latency for these vectors? We use vector neighborhood, like a place where a bunch of related clusters are cached. When a user goes to execute a query, a much smaller search is done to find the nearby neighborhoods. This is then loaded from S3 into fast memory, where the team then applies an approximate nearest neighbor algorithm, and it can be done quickly. Result: 100ms query latency, up to 2B vectors per index in production.

>250k vector indexes, 40B vectors ingested, 1B queries so far.
Video Data and S3 Vectors
90% of data today is unstructured: most of that comes from video. Video is also incurably complex: one million hours = 114 years.

S3 vectors turns every S3 bucket into a potential video search engine.
Trainium3 Performance
Trainium 3: TTFT (Time to First Token), great performance; TPS (Tokens Per Second).
EC2 TRN3 Ultra Servers
EC2 TRN3 ultra servers: up to 144 Trainium 3 chips. Comes with neuron switches. Extremely low latencies. 360 PFLOPS (FP8), 20TB HBM (High Bandwidth Memory) capacity, 700 TB/s bandwidth. 4.4x higher compute performance, 3.9x more memory bandwidth.

This is the first system using all three AWS custom designed chips in the same server board. 4x Trainium 3, 1x Graviton (on the same sled as the training, so heat management easier), and 2x Nitro. Top replaceable design.
Microarchitectural Optimizations
Microarchitectural optimizations: microscaling, accelerated softmax instructions, memory addwrite, tensor dereference, traffic shaping, background transpose, MMCAST mode, mesh hashspray.
AI Hardware Kernels
It's just not just best hardware: you need the right tools to build.

AI Hardware Kernels: extracting every bit of performance.
Neuron Kernel Interface
NEW! Neuron Kernel Interface. Direct access to AWS Training NeuronCore instruction set architecture. We're also open sourcing all aspects of the NKI (Neuron Kernel Interface) stack. Nothing worse than a compiler surprising you.
Neuron Profiler
Neuron profiler: the industry's leading hardware native, AI chip profiler.
TRN3 Ultra Servers Efficiency
TRN3 ultra servers: 5x tokens per megawatt.
Trainium PyTorch Support
New! Trainium has PyTorch native support.

×

Unlocking possibilities with AWS Compute

EC2 Scale
287 billion Amazon EC2 instances. 1.5B instances launched every week; 38 regions globally; 1,000+ instance types.
Zoox Example
For example: Zoox. Autonomous vehicles collect petabytes of driving intelligence data across various urban environments, processing data through advanced ML (Machine Learning) models. Uses AWS ML capacity blocks, on demand capacity reservations, Amazon SageMaker HyperPod.
Amazon Leo Example
OR: Amazon Leo. 3000 satellites, 17,000 mph, using Graviton4, FPGA (Field Programmable Gate Array).
Intel Partnership
Intel has been there since the beginning of AWS. We helped Mercedes: 32TB system, 40k users, 17 different systems; we solved with a U7inh instance, performance, reliability, security. Not a simple migration, took 9 months.

Intel Xeon 6: custom processors, unique to AWS. Most powerful Intel processors everywhere.
AWS + Intel Benefits
What does this mean for you? AWS + Intel: 15% better price performance, 2.5x faster data processing, 20% higher performance for workloads. Like C8i, up to 60% faster NGINX performance. R8i, industry leading SAP performance. M8i, up to 30% acceleration for PostgreSQL database.
Flex Family
Flex family: whether compute optimized, general purpose, or memory optimized workloads, you can save an additional 5% on your compute costs. M8i flex, C8i flex, R8i flex.
X8i Instances
NEW! X8i instances. 6 TB of memory and 1.5x memory capacity versus prior generation instances. Great for in memory databases, mission critical workloads. Also C8ine instances, based off Nitro v6. 2.5x higher packet performance, 2x higher network bandwidth. Ideal for security appliances, firewalls.
AWS + AMD Breakthrough
AWS + AMD: another breakthrough.
C8a Instances
NEW! C8a instances. 33% higher memory bandwidth and 30% higher performance than prior generation instances. Up to 384GB of memory, 75 Gbps of network bandwidth, 60 Gbps of EBS (Elastic Block Store) bandwidth. Good for batch processing, distributed analytics, high performance computing, multiplayer gaming.
X8aedz Instances
NEW! X8aedz instances. Next gen memory optimized instances, maximum frequency is 5 GHz. 2x higher compute performance, 31% better price performance.
HPC8a Instances
NEW! HPC8a instances. Designed for compute intensive high performance workloads. Good for computational dynamics, weather forecasting, drug discovery apps.
M8azn Instance
NEW! M8azn instance. Next gen general purpose instances. 2x higher compute performance, 2x more memory bandwidth, 5x faster packet processing. Ideal for compute sensitive workloads, like gaming, financial services, high frequency trading, simulation modeling.

For e-commerce platforms using these, more concurrent shoppers during peak hours = more revenue. Gaming companies = more concurrent players, no lag means more player retention and in game purchases.
AWS + Apple
AWS + Apple: new, EC2 M3 Ultra Mac instances, and EC2 M4 Max Mac instances.
AWS + Nvidia
AWS + Nvidia, 15 years and counting. AWS is the best place to run Nvidia GPUs. Highest reliability, uptime, availability, scalability for GPU based systems. Like the P6e (GB200 NVL72), and P6 (B200, B300).
AML2023 Improvements
AML2023: 12% improvement in instance launch to SSH ready times. Streamlined random number generation services; boot process improvements. Like the c6g medium: 9.1 seconds to 8.05 seconds.
EBS Time Based Snapshot Copy
NEW! Amazon EBS Time Based Snapshot Copy. You can now copy EBS snapshots within or between AWS regions and accounts with a predictable completion. Up to 350% improvement, like adding a 1TB terminal node.

Have EventBridge let you know when the copy completes. With guaranteed completion duration, you can address strict RPO (Recovery Point Objective) requirements, and improve disaster recovery runbooks, and testing cycles with confidence, knowing exactly when your snapshot will be available in the new region or account.
EBS Provisioned Rate for Volume Initialization
NEW! Amazon EBS provisioned rate for volume initialization. Create fully performant EBS volumes within a predictable timeframe by specifying initialization rates between 100 MiB/s and 300 MiB/s - up to 60% improvement.
EBS Status Health Check
NEW! EBS Status Health Check: Detect potential issues and report them in the form of a health check.
Capacity Reservation Topology API
NEW! Capacity Reservation Topology API. Determine the placement of your reserved capacity before launching instances.
Andy Jassy Quote
"There's no compression algorithm for experience." Cringey quote from Andy Jassy that reeks of r/Im14andthisisdeep.
Nitro Impact
Nitro fundamentally changed how we deliver cloud computing. Innovate faster; reduce costs; enhance security.
Graviton Benefits
Graviton: 40% better price performance, 60% less energy, 20% cost savings.

Migrate your workload to Graviton! It only takes one week, one developer, one workload.
NitroTPM + Attestable AMI
NEW! NitroTPM (Trusted Platform Module) + attestable AMI (Amazon Machine Image). Verify trusted software on EC2 instances. Only trusted software running on your EC2 instances.
TRN3 Servers
TRN3 servers: 144 chips; 362 FP8 PFLOPS; 706 TB/s aggregate bandwidth.
Power of Choice
Your power of choice in compute abstractions: serverless, HPC (High Performance Computing), containers, AI, quantum. If there's a good idea, there's at least 12 ways to do it in AWS. But realistically, there's only one way: the way you choose.
Lambda Tenant Isolation
NEW! Lambda tenant isolation. Full security isolation; dramatically simpler operations; scale with confidence. Lambda will make sure you get the hard security boundary around your function, so stay in the abstraction you want to be in, all you need is a tenant ID. Isolation achieved.
Lambda Durable Functions
NEW (but announced yesterday): Lambda durable functions. Simplify developer experience; strengthen application resilience; enhance operational efficiency. Checkpoint between steps in your process. We'll do retry logic. You focus on your problem e.g. payment processing or your manual processing or whatever. Can wait up to a year to continue execution.
Lambda Managed Instances
NEW! Lambda Managed Instances. Choose your compute; drive cost efficiency; maintain Lambda's operational simplicity. Serverless no longer serverless? Focus on business logic.
ECS Managed Instances
NEW! ECS (Elastic Container Service) managed instances. Unlock the breadth of EC2 compute capabilities; improve performance and cost with bin-packing and EC2 node management; all with the same ECS operational experience. We'll now meet you in the middle.
ECS Native Deployments
NEW! ECS native deployments. Blue/green, canary, linear; best practices built in, safer deployments.
ECS Express Mode
NEW! ECS Express Mode. Accelerate application deployment; simplify infrastructure configuration; maintain full control and flexibility. Launch everything in one call. Brought down from 300 parameters down to 30. Set it and forget it.
Amazon EKS Capabilities
NEW! Amazon EKS (Elastic Kubernetes Service) capabilities. Reduce friction/increase velocity; flexibility and choice; enable production ready scale. We'll include stuff like ACK (AWS Controllers for Kubernetes) controller so you don't have to manage it.
EKS Ultra Scale Clusters
EKS ultra scale clusters: earlier we launched ultra scale for AI/ML training. Now we are moving that to new classes of generic workloads (Claude, Amazon Nova). 1.6 million AWS Trainium accelerators.
Checkpointless Training on HyperPod
New: Checkpointless training on HyperPod. Accelerate Gen AI development, reduce training cost, scale with ease. Why do we need checkpoints anyway?
Elastic Training on HyperPod
New: Elastic training on HyperPod. Accelerate time to market; simplify operations; get started easily. Additional GPUs for high priority tasks, etc.
SageMaker AI Model Customization
New: SageMaker AI model customization. Accelerate end to end workflow; comprehensive customization techniques, serverless infrastructure. Generate datasets, evaluate model for success, etc.
Frontier Models for Every Organization
Why can't every organization have their own frontier model?
Amazon Nova Forge
New: Amazon Nova Forge. Build your own frontier model with the only service that gives you access to open training models. Select starting model + checkpoint; mix your data with curated datasets in SageMaker AI; deploy across SageMaker AI and Amazon Bedrock.

×

Closing keynote by Werner Vogel bla bla bla

Opening: Back to the Future Rip-Off
Werner starts out with a corny rip off of Back to the Future, showing the evolution of coding and the continued belief that "this is the end of developers" and "no more engineers," but the 2025 theme is "GenAI for Devs" and he rewrites the title to "Back to the Beginning" instead of back to the future.
Last Keynote Announcement
He starts out by saying this is his last keynote. He's not leaving Amazon, but 14 keynotes is a lot, and he doesn't need to do anymore. "It's time for those younger different voices of AWS to be in front of you."
Will AI Take My Job?
Will AI take my job? …maybe! Some skills will become obsolete, and some things will be automated. We should rephrase and instead say: will AI make me obsolete? NO! As long as you evolve.
Evolution of Programming
1950s: assemblers; 1960s: compilers; 1970s: structured programming; 1980s: object-oriented programming; 1990s: monolithic; 1998: service-oriented; 2000s: on-premises; 2010s: cloud-computing; 2020s: 'traditional development'; 2025: LLM (Large Language Model)-assisted.
The Work is Yours
The work is yours, not the tool's
Multiple Golden Ages
"We are at the epicenter of multiple simultaneous Golden Ages" - Jeff Bezos
The Renaissance Developer is Curious
The renaissance developer is curious.

Curiosity leads to learning (and invention).

Willingness to fail, fail, and be gently corrected.
Scary Moments and Learning
"The scary moments are the ones we learn the most from." Yerkes-Dodson Law: increasing attention > optimal point > impaired performance. You have to put yourself in a position that tests you.

"You are not what you know but what you are willing to learn," - Walt Whitman
Think in Systems
"Think in systems, not isolated parts."

Every system has: reinforcing loops, and balancing loops.

The renaissance developer thinks in systems: to build resilient systems, understand the bigger picture.
Communication
He also communicates. Human to human (natural language) = ambiguous. Human to machine (programming language) = precise.

Well, not anymore. Human to machine is turning into natural language, making it ambiguous again. Specifications reduce ambiguity.
Kiro Again
They talk about Kiro again for the thousandth time, saying it's "Spec driven development." Spec-based prompting.

Old example of spec driven development: Doug Engelbart invents the mouse.

Clear communication reduces mistakes.
The Renaissance Developer is an Owner
The renaissance developer is an owner.

Vibe coding without human review = gambling.

(Vibe coding is fine! But only if you pay close attention to what is being built. We can't just pull a lever on your DB (database) and hope that something good comes out. That's not software developer, that's gambling. And you need to be out somewhere for that.)
Code Generation vs. Review
Code: quick to generate, slower to review.
Challenge #1: Verification Debt
Challenge #1: verification debt (AI writes code so fast you have trouble learning to understand all of it, comprehend).
Challenge #2: Hallucination
Challenge #2: hallucination (sometimes they over engineer things or come up with solutions that are totally inappropriate).
Solutions
Solutions: spec-driven development, automated reasoning, automated testing.
Durability Reviews
Durability reviews: what are all the things that might go wrong?
Human Review to Restore Balance
Human review to restore balance: control point. See saw with generated code on left side, code review on the right side. "Yes, we all hate code reviews. Like being a 12 yr old and standing in front of the class. But in an AI driven world they are crucial, models can generate code faster than we can understand it, so hopefully in the review the control point is used to restore balance. Bring human judgment back into the loop."

Human to human code reviews (mechanism) = knowledge transfer.
TLDR: You Build It, You Own It
TLDR: YOU BUILD IT, YOU OWN IT
The Renaissance Developer is a Polymath
The renaissance developer is a polymath.

"Give me the 20 most important questions to ask of your data and I will design a system for you" - Jim Gray
T-Shaped Understanding
T Shaped Understanding = highly specialized + broad understanding (depth and breadth).

Every "T" is unique: personal skills, functional skills, industry specific.

So broaden your "T".
Renaissance Developer Summary
So the renaissance developer is curious, thinks in systems, communicates, is an owner, and a polymath.
Final Rallying Cry
We are hoping we made it to the end without his usual corny rallying cry "Now go build." We're so close, but unfortunately he blasts a "KEEP CALM" meme (what is this, 2015?) that says Keep calm and now go build! What?? That doesn't even make sense!

×

Fast secure apps at fixed rates: CDN, WAF, DDOS protection

Introducing a new flat rate plan! Deployed through CloudFront.

What is Pay-As-You-Go Pricing?
Pay-As-You-Go Model: Starts at $0, choose features and services, pay only for what you use. Align costs with consumption. Costs vary with usage.

You may be using services like ELB (Elastic Load Balancer), CloudWatch, WAF (Web Application Firewall), and these all have their own prices and costs.
Challenges with Pay-As-You-Go
Customer Challenges:
- Estimating costs up front: Difficult to predict expenses before deployment
- Monthly bill predictability: Bills can vary significantly month-to-month
- Consumption is unpredictable: Traffic spikes lead to unexpected costs
Your usage grows with your business, so do the costs. This is a challenge for budget planning and financial forecasting.
Hosting Needs for Statically Generated Websites
Typical components required:
- DNS: Route 53 (R53)
- Storage: S3 (Simple Storage Service)
- Web Server: CloudFront
- TLS: Certificate Manager (ACM)
- Security: WAF (Web Application Firewall)
- Logging: CloudWatch
Cost Estimation Challenges: 5+ pricing pages, non-standardized pricing dimensions, missing inputs.

Bill Predictability Issues: Unexpected traffic, traffic not controllable.
New! Flat Rate Pricing Plans
Flat Rate Pricing Plans for Delivery and Security: One monthly price includes:
- CloudFront CDN: Content Delivery Network
- AWS WAF and DDoS Protection: Web Application Firewall and Distributed Denial of Service protection
- Bot Management & Analytics: Bot detection and management
- CloudWatch Logs: Logging and monitoring
- Route 53 DNS: Domain Name System
- TLS Certificate: Transport Layer Security certificate
- Serverless Edge Compute: Edge computing capabilities
- S3 Storage Credits: Simple Storage Service storage credits
- No Overage Charges! Pay-as-you-go still available as an option.
How Plans are Structured
Available in four tiers. Include features and usage; select tier with features you need that accommodates baseline traffic. No overages for exceeding allowances. Blocked requests and DDoS attacks never count against your allowance.

Plan Tiers:
- Free: $0/month
- Pro: $15/month
- Business: $200/month
- Premium: $1,000/month
Monthly Usage Allowances for Requests:
- Free: 1 million requests
- Pro: 10 million requests
- Business: 125 million requests
- Premium: 500 million requests
Monthly Data Transfer:
- Free: 100 GB
- Pro: 50 TB
- Business: 50 TB
- Premium: 50 TB
Free Plan Details
Use Case: Personal projects and getting started. Includes all core capabilities.

Features:
- Always-on DDoS protection
- Protect against common threats with AWS WAF
- IP-based rate limiting
- Geographic traffic blocking
- Serverless edge compute
- Smart routing
- Global CDN
- DNS
- Free TLS certificate
- Tiered caching
- Default caching rules
- Fast cache invalidations
- 5 GB S3 storage included
Pro Plan ($15/month)
Includes all Free plan features, plus:
- Protection for WordPress, PHP, SQL databases
- Header-based threat filtering
- Edge key-value store
- Logging
- 50 GB of S3 storage
Business Plan ($200/month)
Includes all Pro plan features, plus:
- Advanced DDoS Protection: AWS baselines your app by analyzing your traffic and figuring out what an anomaly would be
- Bot management and analytics
- Regex-based threat filtering
- JavaScript challenge
- Custom caching rules
- Private origins within VPC
- Uptime SLA (Service Level Agreement)
- 1 TB of S3 storage
Premium Plan ($1,000/month)
Includes all Business plan features, plus:
- High-speed origin routing
- Origin load reduction
- Automatic origin failover
- 5 TB of S3 storage
How to Use Flat Rate Pricing Plans
Bolt onto any internet-facing app: Delivery and security layer. Extend cost savings to other AWS services.

Benefits:
- Waived data transfer costs for AWS origins: Fewer requests reach AWS services that charge based on usage
- Before: Variable data transfer costs, higher latency
- After: One flat rate covers plan + data transfer costs, lower latency
Requests hit CloudFront edge locations, fewer requests reach your app, data transfer costs waived between AWS origins and CloudFront.

Flexibility: If you go through your allowance, you may get routed to a different location with some increased latency (by 20-30ms). Not blocked though. Flexibility is the goal.
Application Load Balancer (ALB) Architecture
Block direct to origin traffic. One flat rate covers plan and data transfer costs.

Traffic Flow: TLS termination > bad traffic is blocked > CloudFront edge locations > CloudFront regional edge cache > CloudFront VPC > private subnet > CloudFront VPC origins fleet > VPC to VPC NAT (Network Address Translation) > ALB in customer VPC, private subnet.

Security: Block direct to origin traffic that bypasses CloudFront. This is not internet-facing.
AWS Cost & Billing Panel
Will this show up as a recommendation in AWS Cost & Billing panel? NO. Not at this time.

If you apply it to an existing architecture, it counts just as a general S3 discount, not applied to any particular bucket.
S3 / Lambda Function URLs / Elemental Architecture
Block direct to origin traffic. One flat rate covers plan + data transfer costs.

Traffic Flow: TLS Termination / bad traffic is blocked > CloudFront edge locations > CloudFront regional edge cache > Origin Access Control (OAC) (private S3 bucket, AWS Elemental, Lambda function URL). Only allow access through your designated CloudFront distributions.

Bad Traffic Handling: Bad traffic fails because it lacks cryptographic proof that the request originated from your designated CloudFront distribution.

×

The power of cloud network innovation

Owning the Entire Stack
Own the entire stack: infrastructure > global network > virtual private cloud > application network. Data center is built from the ground up, we control every aspect from hardware to internet edge. Nobody else can say that! (Is he correct there though?)
AI Models and Network Demands
AI Models: growing more complex and demanding by the day. This is going to push networks. Models have already crossed the trillion parameters threshold. But not just size: need to be trained and deployed all around the world.

AWS Networking is the backbone that makes it possible. Why does networking matter?
AI Accelerators and Scale
1M+ AI accelerators. They work together to adjust, share, talk, they communicate in nanoseconds. But they are getting massive: so we have no choice, must break up the perfect partnership and scale out across hundreds of thousands of accelerators.

When we do that, latency. Server > rack > data center > region.
Hollow Core Fiber
Hollow core fiber: 30% improvement in latency. Thousands of kilometers already installed.

So infrastructure needs to be fast, reliable, and scalable.
Training Ultra Server Example
Concrete example: training 2, ultra server. It's a category of compute: 54 Trainium2 chips onto a single ultra server, 512 neuron cores working together, arranged in a carefully designed topologies. Circular pathways that bridge. This directly determines what computation is possible.

Dynamic scaling, this is what makes the parameters possible. Every AI model needs to talk in microseconds.

This is why we build EC2 ultra cluster. They have already powered some of the largest AI training rooms in the world.

Seamless connectivity > low latency > petabit scale.
Network Topology
Network topology > performance, resiliency.
Wired Rails and Ultra Switch
Wired rails: ML (Machine Learning) clusters often used these. Connectivity that creates mini networks. Any single link can affect your entire training room, causing work to restart from checkpoint. Failures will happen - conventional method is to connect all accelerators on top of rack. But this creates interference. Recovering from checkpoints is costly. We need tech that gives us the best of both - which is why we develop ultra switch. New, top tier networking device in our data center, home grown in AWS. Accelerator index, clustered topology information, network capacity to align AI accelerator traffic to optimal rails. When faced with failure, quickly recover into stability. Ultra switch can keep flow collisions to a minimum.
Ultra Short Reach Optics
Ultra short reach optics: improves link reliability of the critical server to network connection by 3x. Copper is impractical when you need to route hundreds of thousands of thick cables through rack space. Reap size and speed benefits of copper: so we use ultra short reach optics. Minimizing connectors, result is reliability that surpasses copper.
SIDR (Scalable Intent Driven Routing)
But networking is not just hardware. Traditional routing protocol was never built for this. On AI scale, few seconds of delay could stall an entire training room. So we build: SIDR. Scalable Intent Driven Routing. Continuously monitors, detects failures, adapts paths.

Exactly what large scale AI training demands: ultra switch > predictable paths; SIDR > adaptable network. Both: fast, scalable, resilient.
AWS ML Network Deployment
In just 3 years, AWS has deployed: 300k+ switches, 40M+ ports dedicated to ML traffic, growing faster than our core network traffic itself. Extraordinary demands.
Project Rainier
Project Rainier: Anthropic is using this to build and deploy its industry leading AI model Claude. 1M+ Trainium2 chips by the end of 2025.
VPC Foundation
We handle the physical networking so you don't have to. Enforce security > scale globally.

Everything is built on VPC (Virtual Private Cloud). Meaning you get your own private section of the AWS cloud. You decide what connects to what, when, and how. Strict separation between workloads.

Amazon VPC connectivity: VPC peering, Transit Gateway, NAT Gateway, AWS Internet Gateway.
Business Logic vs. Network Topology
"I need to pull data from the user service. I need to talk to the payment API. I need to access the recommendation engine." Code doesn't care about network topology, it thinks in business logic.

Connect securely to a database, access object storage, reach a ML model.
Amazon VPC Lattice
Amazon VPC Lattice: intent driven networking. Eliminates the complexity of routing tables and peering configurations. This service needs to talk to this service: then just define the trust level. It discovers endpoints, enforces zero trust policies, provides deep observability.
PrivateLink for SaaS Services
PrivateLink for SaaS (Software as a Service) services: multi region architecture shouldn't mean multi region complexity. Private connectivity to third party services all over, without anything touching the internet.

NEW! Cross region PrivateLink for AWS services. Across AWS services across region using private connectivity. Available now! This is layered abstraction.
ALB Target Optimizer
LLM (Large Language Model): memory utilization > token processing complexity > type of inference request > model state.

NEW! ALB (Application Load Balancer) Target Optimizer. The load balancing solution for AI inference workloads.

You install the agent on a target, the agent configures max number of concurrent requests from ALB, agent tracks the requests.
Amazon API Gateway
Amazon API Gateway: 140T+ requests in 2024, 40% increase YOY (Year Over Year), 400k+ unique accounts. NEW! API Gateway Portals: transforms how you discover, document, and scale APIs. Fully managed experience, always up to date documentation, detailed user analytics.

NEW! Response Streaming for API Gateway. Instant first bytes, steady delivery, and support for massive payloads. Good for LLM responses, complex analytics, no more buffering entire responses or loading screens, make first paragraph delivered immediately. Real time collaborative editing, progressive data visualization, interactive AI conversations.

NEW! Model Context Protocol (MCP) for API Gateway. Integrated with Amazon Bedrock AgentCore Gateway. Supports API keys, both public and private APIs, does compliance, MCP agentic workflows.
Security Layers
Security: application protection > VPC security > encrypted transit > AWS Nitro System.

Security must be effective at scale, intelligent, and adaptive.
Amazon VPC Encryption Controls
NEW! Amazon VPC Encryption Controls. Built-in, enforceable, auditable. Few clicks, audit the encryption status of your traffic, and modify them to enforce encryption across your entire network infrastructure. Uses Nitro Encryption through various AWS services. Eliminates the operational overhead and complexity associated with this management.
Network Firewall Proxy
NEW! Network Firewall Proxy: easier than ever to secure your applications. Decrypt and filter internet egress traffic; detect compromised workloads; enforce tighter security controls; prevent data leaks. Domain filtering, IP filtering, TLS (Transport Layer Security) interception, response traffic filtering. AWS handles the scaling, while you get enterprise grade data and protective filtration.
Transit Gateway Integration
The more connections you secure, the larger your footprint and more complex. So our focus is making it easier for you; focus on your app rather than the heavy lifting of configuring complex network setups.

Transit Gateway: network firewall native attachment. Now firewall integrates directly into Transit Gateway level. Control across all VPCs and on premises networks with no overhead, and solving cost allocation problems, even in large scale environments.
AWS Network Worldwide
AWS network worldwide: 9M+ kilometers of fiber; 50% expansion YOY.

NEW! Fastnet. Dedicated high capacity transatlantic cable connecting the US to Ireland. Will be done by 2028?

38 regions, 120 AZs (Availability Zones).
Sophos Example
For example, Sophos: 35% improvement in latency globally, 146,000 to 2 million requests per second during a traffic surge.
AWS CloudWAN
AWS CloudWAN: unified network. Segment A, B, C, etc.

NEW! AWS CloudWAN Routing Policy: fine grained controls to optimize route management, control traffic patterns, and customize network behavior across global wide area networks. No more need for third parties.
Direct Connect
150 Direct Connect locations.
AWS Interconnect - Multi Cloud
Complex multi layered networks: higher costs, operational complexity. "Pull data from the user service." SO: streamline multi cloud connectivity. Simple, resilient, private. Radically simplify operations so you can establish private cloud to cloud connections in minutes.

NEW! AWS Interconnect - multi cloud. Simple, resilient, high speed private connections from your VPC to other cloud service providers. Preview with Google Cloud, Azure in 2026 (begrudgingly). Connect directly VPC to other cloud providers.

Pre-cabled capacity, set up with ease, backed by SLA (Service Level Agreement). Stay tuned, because many other providers are in the works.

This is good for customers like Salesforce for instance, they can extend their functionality into Google Cloud, AWS native private connectivity. Access regulated data without internet exposure, unlock zero copy across multi cloud environments, maintain the same trusted security models customers rely on.
AWS Interconnect - Last Mile
NEW! AWS Interconnect - last mile. Faster set up and onboarding; automated network configurations; Lumen. Connect your remote locations to AWS in just a few clicks, such as a Lumen unit. Many providers in the works.
Global POPs
750+ POPs (Points of Presence) in 100+ cities across 50+ countries, with direct peering to all major ISPs (Internet Service Providers). 1,100+ embedded POPs in ISPs across 200+ cities.
CloudFront Flat Rate Pricing
NEW (but announced a few days ago): Amazon CloudFront flat rate pricing plans. One monthly price with no overage charges. Free: for hobbyists, learners, and developers getting started. Pro: launch and grow small websites, blogs, and applications. Business: protect and accelerate business applications. Premium: scale and protect business and mission critical applications. $0, $15, $200, $1,000 a month. Includes CloudFront, CloudWatch, WAF (Web Application Firewall), Lambda, Route 53, S3.

×

Compute innovation with the AWS Nitro System

Instance Stack Architecture
The EC2 instance stack consists of: Server > Application > Instance OS > Nitro Hypervisor (at the very bottom) > Nitro System (the hardware component, dedicated for storage, networking, and security). The Nitro System was introduced in 2017, but AWS started working on it as far back as 2012.
Problems with Pre-Nitro Architecture
Before Nitro, the hypervisor was consuming too many resources. Additionally, there was performance variation in the EC2 stack due to customer demand fluctuations.
Nitro System Benefits
Performance: Better performance across CPU, networking, and storage. AWS has better performance since they've offloaded all functions to dedicated hardware.

Security: Enhanced security that continuously monitors, protects, and verifies the instance hardware and firmware. AWS recommends reading the white paper for detailed security information.

Innovation: Building blocks can always be assembled in many different ways, giving AWS the flexibility to design and rapidly deliver EC2 instances.
Performance Improvements: Elastic Network Adapter
Before Nitro: Barely 1 Gbps network bandwidth.

Now: Network-optimized Nitro instances support up to 600 Gbps.
Elastic Fabric Adapter (EFA) Performance
Current: Up to 6,400 Gbps. AWS has been doubling EFA performance approximately every year.
EBS Performance: Bandwidth and IOPS
Before Nitro: Barely 16,000 IOPS (Input/Output Operations Per Second).

Now: Up to 720,000 IOPS.
Accelerated Pace of Innovation
AWS now offers 1,000+ instance types. The most powerful is the Graviton4.

CPU Options: Intel Xeon, AMD EPYC, AWS Graviton, Apple M1/M2.

AWS delivers the right compute for each application and workload across multiple categories:
- General Purpose (GP): Balanced compute, memory, and networking
- Burstable: Baseline performance with ability to burst
- Compute Optimized: High-performance processors
- Memory Optimized: High memory footprint up to 32 TB
- Storage Optimized: High-speed storage (HDD and SSD)
- Hardware Accelerated: GPUs, FPGA (Field-Programmable Gate Array), ASIC (Application-Specific Integrated Circuit)
- High Performance Computing (HPC) Optimized: For demanding computational workloads
Capabilities: Fast processors up to 4.5 GHz, high memory footprint up to 32 TB, storage options (HDD and SSD), accelerated computing (GPUs, FPGA, ASIC), networking up to 6,400 Gbps, bare metal support. Many Operating Systems (OSes) supported. 1,000 instance types for virtually every workload and business need.
AWS Nitro System Components
Nitro Cards: Provide VPC (Virtual Private Cloud) networking, block storage, instance storage, and system controller functionality.

Nitro Hypervisor: Lightweight and secure hypervisor, CPU, memory and device assignment, bare metal-like performance.

Nitro Security Chip: Integrated into motherboard, traps I/O to non-volatile storage, provides hardware root of trust.
VPC Networking
Nitro cards provide the VPC implementation. VPC Data Plane Offload: ENI (Elastic Network Interface) attachment, security groups, flow logs, routing, limiters, DHCP (Dynamic Host Configuration Protocol), DNS (Domain Name System).

VPC Encryption: Authentication and transparent end-to-end 256-bit encryption for every packet delivered on VPC.
Elastic Network Adapter (ENA)
Extensible host interface with variable number of queues, multi-pathing with ENA Express.
Elastic Fabric Adapter (EFA)
High Performance Computing (HPC) and AI on Amazon EC2 ultra clusters, Remote Direct Memory Access (RDMA), and GPU RDMA. Multipathing and low latency.
Storage: Non-Volatile Memory Express (NVMe)
Standard drivers broadly available for both Linux distributions, Windows, and other Operating Systems (OSes).

EBS Data Plane: End-to-end encryption, SRD (Scalable Reliable Datagram) for better tail latency. Encryption using DMA (Direct Memory Access) so there's no performance degradation.
Traditional SSD vs AWS Nitro SSD
Traditional SSD Design: Has NAND flash, Flash Translation Layer (FTL) that maps logical to physical addresses, performs garbage collection, and manages NAND wear leveling.

Local Instance Storage with AWS Nitro SSD:
- Lower Latencies: Tightly integrated with the Nitro system to provide 60% lower I/O latency
- Improved Reliability: Faster firmware updates to improve reliability without any downtime to the instance
- Nitro Security: All data stored on the disk is encrypted with 256-bit AES (Advanced Encryption Standard)
- Modular Design: Part of the Nitro system architecture
Nitro Hypervisor Architecture
The kernel is smaller since storage and networking are abstracted away. Easy to replace hypervisor on a weekly basis with no interruption to your instances.

AWS doesn't want the hypervisor to have full access to memory.

Commodity Hypervisor Memory Address Space: Virtual Machine A - CPU context, VM A - memory, VM B - CPU context, VM B - Memory (full access).

Nitro Hypervisor Memory Address Space: Much smaller footprint. Hypervisor maps just enough space to perform functionality, then unmaps it again, so AWS can resume running your application. This provides better security isolation.
Nitro Security Chip
Lies on all AWS motherboards. Monitors all transactions and blocks transactions that are not necessary. Only device-related transactions are allowed. The security chip has been extended to provide more functionality over time.
Multi-Host with the AWS Nitro System
Started with Graviton3: 3 non-coherent processors with decoupled lifecycle. Initially to improve density thanks to power efficiency.

Continues with Graviton4: 2 non-coherent processors or coherent processors.

With Intel: Started with Intel Gen 7, now supports 2 non-coherent or coherent processors.

With AMD: Starting with AMD Gen 8: 2 non-coherent or coherent processors.
Key Takeaway
AWS has a unique differentiator in terms of the Nitro System. It has allowed AWS to innovate faster on behalf of customers and deliver better performance and security versus other cloud providers. EC2 offers 1,000+ instance types across CPU options with many new features, thanks to Nitro. The Nitro System has allowed customers to take advantage of infrastructure that continually improves.

Contact Information: Email with questions to Filippo Sironi (sironi@amazon.de) or Sudden Sharma (svsharma@amazon.com).

×

My best attempt to list the new features/launches

All the NEW! and new! stuff from re:Invent 2025

AI & Machine Learning

NEW! Nova2 Models: New models with frontier level intelligence. Reasoning models for workloads, three classes: lite, pro, sonic. Build your own model from scratch? Too expensive. So instead: start with an open source model and modify it, fine tune, reinforce.
NEW! Nova Forge: New service that introduces the concept of open training models. Exclusive access to Nova checkpoints and blend in your proprietary data with an Amazon training set, producing a model that deeply understands your information all without forgetting its core info. These are called Novellas. Upload this and then run it in Bedrock.
NEW! AWS Security Agent: Build apps that are secure from the very beginning. Does stuff upstream to secure your system more often. Scans your code for vulnerability, integrates into your pull request. Custom and managed security requirements, automated security design reviews, continuous security code reviews, on-demand penetration testing.
NEW! AWS DevOps Agent: Proactively prevents incidents, improving your reliability. Investigates incidents and identifies operational improvement. Learns from your resources, their relationships, and existing observability solutions, CI pipelines, then correlates the telemetry across all the sources and then understands. i.e. it found elevated Lambda error rates talking to your database. By the time you got the notice, DevOps agent already went in there and fixed it.
NEW! Kiro Autonomous Agent: Frontier development agent that extends your flow: works autonomously, maintains context, executes across repos. You can tag Kiro in GitHub like on a task, and Kiro will respond and then start working on it automatically. Agent is not session based so it will not forget.
New! Trainium PyTorch Native Support: Trainium now has PyTorch native support, making it easier to train models on AWS silicon.

Storage & Data

New! S3 Vectors GA: First cloud object store with native support to store and query vectors. Billion scale vector store offering up to 90% lower costs for uploading, storing, and querying vectors. Designed for 100ms warm query latency. Up to 2B vectors stored & queried per index, 40x the preview capacity. You can upload up to 1,000 vectors/second when streaming single vector updates, up to 100 search results/query. 10,000 indexes per bucket, supporting up to 20 trillion vectors. S3 vectors turns every S3 bucket into a potential video search engine.
NEW! S3 Tables - Replication Support: Replicate Iceberg tables across AWS Regions and accounts. Streamline infrastructure and reduce costs.
NEW! S3 Tables - Intelligent Tiering: Automatically optimize costs based on access patterns. Save up to 80% in costs.
New! S3 Tables - Automatic Replication: Automatic replication for S3 tables across regions.
NEW! Apache Iceberg V3 Support: Available with Apache Spark on EMR (Elastic MapReduce), AWS Glue, Athena, S3 tables, SageMaker notebooks, etc.
NEW! S3 Access Points for FSx: Like ONTAP.
NEW! EBS Time Based Snapshot Copy: You can now copy EBS snapshots within or between AWS regions and accounts with a predictable completion. Up to 350% improvement, like adding a 1TB terminal node. Have EventBridge let you know when the copy completes. With guaranteed completion duration, you can address strict RPO (Recovery Point Objective) requirements, and improve disaster recovery runbooks, and testing cycles with confidence.
NEW! EBS Provisioned Rate for Volume Initialization: Create fully performant EBS volumes within a predictable timeframe by specifying initialization rates between 100 MiB/s and 300 MiB/s - up to 60% improvement.
NEW! EBS Status Health Check: Detect potential issues and report them in the form of a health check.

Compute & Instances

NEW! Graviton5: Our most efficient CPU ever. 2x number of cores, 5.3x L3 capacity. 30% better performance than Graviton4.
NEW! EC2 M9g Instances: 25% higher performance compared to M8g instances, best price performance in EC2 today.
NEW! X8i Instances: 6 TB of memory and 1.5x memory capacity versus prior generation instances. Great for in memory databases, mission critical workloads.
NEW! C8ine Instances: Based off Nitro v6. 2.5x higher packet performance, 2x higher network bandwidth. Ideal for security appliances, firewalls.
NEW! C8a Instances: 33% higher memory bandwidth and 30% higher performance than prior generation instances. Up to 384GB of memory, 75 Gbps of network bandwidth, 60 Gbps of EBS bandwidth. Good for batch processing, distributed analytics, high performance computing, multiplayer gaming.
NEW! X8aedz Instances: Next gen memory optimized instances, maximum frequency is 5 GHz. 2x higher compute performance, 31% better price performance.
NEW! HPC8a Instances: Designed for compute intensive high performance workloads. Good for computational dynamics, weather forecasting, drug discovery apps.
NEW! M8azn Instance: Next gen general purpose instances. 2x higher compute performance, 2x more memory bandwidth, 5x faster packet processing. Ideal for compute sensitive workloads, like gaming, financial services, high frequency trading, simulation modeling.
NEW! NitroTPM + Attestable AMI: Verify trusted software on EC2 instances. Only trusted software running on your EC2 instances.
NEW! Capacity Reservation Topology API: Determine the placement of your reserved capacity before launching instances.

Serverless & Containers

NEW! Lambda Tenant Isolation: Full security isolation; dramatically simpler operations; scale with confidence. Lambda will make sure you get the hard security boundary around your function, so stay in the abstraction you want to be in, all you need is a tenant ID. Isolation achieved.
New! Lambda Durable Functions: Simplify developer experience; strengthen application resilience; enhance operational efficiency. Checkpoint between steps in your process. We'll do retry logic. You focus on your problem e.g. payment processing or your manual processing or whatever. Can wait up to a year to continue execution.
NEW! Lambda Managed Instances: Choose your compute; drive cost efficiency; maintain Lambda's operational simplicity. Serverless no longer serverless? Focus on business logic.
NEW! ECS Managed Instances: Unlock the breadth of EC2 compute capabilities; improve performance and cost with bin-packing and EC2 node management; all with the same ECS operational experience. We'll now meet you in the middle.
NEW! ECS Native Deployments: Blue/green, canary, linear; best practices built in, safer deployments.
NEW! ECS Express Mode: Accelerate application deployment; simplify infrastructure configuration; maintain full control and flexibility. Launch everything in one call. Brought down from 300 parameters down to 30. Set it and forget it.
NEW! EKS Capabilities: Reduce friction/increase velocity; flexibility and choice; enable production ready scale. We'll include stuff like ACK (AWS Controllers for Kubernetes) controller so you don't have to manage it.
NEW! EKS Ultra Scale Clusters: Earlier we launched ultra scale for AI/ML training. Now we are moving that to new classes of generic workloads (Claude, Amazon Nova). 1.6 million AWS Trainium accelerators.
NEW! Checkpointless Training on HyperPod: Accelerate Gen AI development, reduce training cost, scale with ease. Why do we need checkpoints anyway?
NEW! Elastic Training on HyperPod: Accelerate time to market; simplify operations; get started easily. Additional GPUs for high priority tasks, etc.
NEW! SageMaker AI Model Customization: Accelerate end to end workflow; comprehensive customization techniques, serverless infrastructure. Generate datasets, evaluate model for success, etc.

Networking & Connectivity

NEW! Cross Region PrivateLink for AWS Services: Across AWS services across region using private connectivity. Available now! This is layered abstraction.
NEW! ALB Target Optimizer: The load balancing solution for AI inference workloads. You install the agent on a target, the agent configures max number of concurrent requests from ALB, agent tracks the requests.
NEW! API Gateway Portals: Transforms how you discover, document, and scale APIs. Fully managed experience, always up to date documentation, detailed user analytics.
NEW! Response Streaming for API Gateway: Instant first bytes, steady delivery, and support for massive payloads. Good for LLM responses, complex analytics, no more buffering entire responses or loading screens, make first paragraph delivered immediately. Real time collaborative editing, progressive data visualization, interactive AI conversations.
NEW! Model Context Protocol (MCP) for API Gateway: Integrated with Amazon Bedrock AgentCore Gateway. Supports API keys, both public and private APIs, does compliance, MCP agentic workflows.
NEW! VPC Encryption Controls: Built-in, enforceable, auditable. Few clicks, audit the encryption status of your traffic, and modify them to enforce encryption across your entire network infrastructure. Uses Nitro Encryption through various AWS services. Eliminates the operational overhead and complexity associated with this management.
NEW! Network Firewall Proxy: Easier than ever to secure your applications. Decrypt and filter internet egress traffic; detect compromised workloads; enforce tighter security controls; prevent data leaks. Domain filtering, IP filtering, TLS interception, response traffic filtering. AWS handles the scaling, while you get enterprise grade data and protective filtration.
NEW! Transit Gateway Network Firewall Native Attachment: Now firewall integrates directly into Transit Gateway level. Control across all VPCs and on premises networks with no overhead, and solving cost allocation problems, even in large scale environments.
NEW! Fastnet: Dedicated high capacity transatlantic cable connecting the US to Ireland. Will be done by 2028?
NEW! CloudWAN Routing Policy: Fine grained controls to optimize route management, control traffic patterns, and customize network behavior across global wide area networks. No more need for third parties.
NEW! AWS Interconnect - Multi Cloud: Simple, resilient, high speed private connections from your VPC to other cloud service providers. Preview with Google Cloud, Azure in 2026 (begrudgingly). Connect directly VPC to other cloud providers. Pre-cabled capacity, set up with ease, backed by SLA. Stay tuned, because many other providers are in the works.
NEW! AWS Interconnect - Last Mile: Faster set up and onboarding; automated network configurations; Lumen. Connect your remote locations to AWS in just a few clicks, such as a Lumen unit. Many providers in the works.

CloudFront & CDN

New! CloudFront Flat Rate Pricing Plans: One monthly price with no overage charges. Free: for hobbyists, learners, and developers getting started. Pro: launch and grow small websites, blogs, and applications ($15/month). Business: protect and accelerate business applications ($200/month). Premium: scale and protect business and mission critical applications ($1,000/month). Includes CloudFront, CloudWatch, WAF (Web Application Firewall), Lambda, Route 53, S3. Blocked requests and DDoS attacks never count against your allowance.

Security & Compliance

New! GuardDuty in ECS AND EC2: Enabled for all GuardDuty customers at no additional cost.
NEW! Security Hub Generally Available: Trends dashboard, streamlined pricing model, etc.
New! Unified Data Store in CloudWatch: New unified data store in CloudWatch.

AI Hardware & Infrastructure

NEW! Neuron Kernel Interface: Direct access to AWS Training NeuronCore instruction set architecture. We're also open sourcing all aspects of the NKI (Neuron Kernel Interface) stack. Nothing worse than a compiler surprising you.
NEW! TRN3 Ultra Servers: Up to 144 Trainium 3 chips. Comes with neuron switches. Extremely low latencies. 360 PFLOPS (FP8), 20TB HBM (High Bandwidth Memory) capacity, 700 TB/s bandwidth. 4.4x higher compute performance, 3.9x more memory bandwidth. This is the first system using all three AWS custom designed chips in the same server board. 4x Trainium 3, 1x Graviton (on the same sled as the training, so heat management easier), and 2x Nitro. Top replaceable design. 5x tokens per megawatt.

Cost Optimization

New! Database Savings Plans: These can save you up to 35% across all usage for your database services.
NEW! Flex Family Instances: Whether compute optimized, general purpose, or memory optimized workloads, you can save an additional 5% on your compute costs. M8i flex, C8i flex, R8i flex.

×

My lowkey takeaways based on being at re:invent and watching and listening to stuff

The AI thing is real, but it's not magic. Everyone's talking about billions of agents running around doing stuff, but the reality is simpler: AI is becoming infrastructure. It's like how we used to think about databases or web servers - now it's just part of how things work. The cool part? You don't need to build your own model from scratch. Start with something that exists, tweak it with your data, and you're good. That's what Nova Forge is about - taking what works and making it yours without starting over.

Everything is getting faster and cheaper, but you gotta know what you're doing. Graviton5 chips are powerful - twice the cores, way more cache, 30% faster. But here's the thing: you can't just throw more power at a problem and expect it to solve itself. The Nitro system stuff shows AWS is thinking about the whole stack, not just the CPU. It's like having a race car where the engine, transmission, and suspension all work together instead of fighting each other. The result? Stuff that used to take forever now happens in milliseconds, and it costs less.

Security isn't something you add later - it's built in from the start. The Security Agent and DevOps Agent stuff is interesting because they're actually fixing problems before you even know they exist. It's like having a really good assistant who doesn't just tell you about problems, but goes ahead and fixes them. The VPC encryption controls mean you can enforce security across your whole network with a few clicks instead of configuring a million things. It's still security, but it's not a pain in the ass anymore.

Networking is finally catching up to what we actually need. Cross-region PrivateLink means you can connect stuff across different AWS regions privately, which sounds boring but is actually helpful. The CloudFront flat-rate pricing is smart too - instead of getting surprised by a bill because traffic spiked, you pay one price and that's it. It's like switching from a pay-per-use phone plan to unlimited. You might pay a bit more sometimes, but you never get that "oh shit" moment when the bill arrives.

Storage is becoming more like a database, and databases are becoming more like storage. S3 vectors let you search through billions of items in milliseconds, which is wild. EBS snapshots now copy predictably instead of taking forever and you never know when they'll finish. The S3 tables stuff with automatic tiering and replication means you can set it up once and it just works. It's all about making data easier to work with without having to become a storage expert.

Serverless is getting less serverless, and about damn time. Lambda durable functions mean you can do long-running stuff without worrying about timeouts. Lambda managed instances let you pick your compute but keep the operational simplicity. ECS Express Mode went from 300 parameters to 30. The pattern is clear: give people the abstraction they want, but let them peek under the hood when they need to. It's not about hiding complexity - it's about managing it better.

The real story is about agents doing the boring work. Garman said "billions of agents" and everyone's like "sure, whatever." But think about it: the DevOps Agent that fixes your Lambda errors before you even notice? The Security Agent that scans your code automatically? Kiro that remembers your project history and fixes stuff while you sleep? The goal seems to be letting developers focus on the interesting problems instead of the repetitive ones. The agents handle the routine stuff, you handle the hard stuff.

Cost optimization is about predictability. Database savings plans can save 35%, OK cool. But more importantly, you know what you're paying. The CloudFront flat-rate plans, the capacity reservations, the intelligent tiering - it's all about making costs predictable so you can actually plan instead of just hoping. Nobody wants to be surprised by a bill, especially when it's because something worked too well.

Bottom line: AWS is betting big on AI infrastructure, making everything faster and cheaper, building security in from the start, and letting agents handle the routine work. The tools are there. The infrastructure is there. The agents are there. Now it's about figuring out what to build with them, bra.

×

TWO THINGS I'm glad were not uttered once at this whole reinvent

CloudFormation

Not a single mention of CloudFormation. Not one. GOOD! That thing is a nightmare. You write a 500-line YAML file to spin up a simple app, and by the time you're done debugging why your stack won't update, you've written more CloudFormation than actual application code. It's like they designed it to be as painful as possible.

People are finally moving away from it. Terraform, Pulumi, CDK - anything that doesn't make you want to throw your laptop out the window. AWS knows this. That's why they're pushing CDK so hard, and why they're building all these managed services that don't require you to write infrastructure code at all. ECS Express Mode went from 300 parameters to 30. Lambda managed instances. EKS automation. The pattern is clear: make it so you don't need CloudFormation in the first place.

The fact that they didn't mention it once tells you everything. They're trying to make it irrelevant. Build services that configure themselves, abstract away the infrastructure, and let people focus on building stuff instead of wrestling with YAML syntax errors at 2 AM.

Aurora

Also completely absent. No new Aurora features, no performance improvements, no "look how great Aurora is" demos. Nothing. And again, this is probably for the best because Aurora is expensive as hell.

You want a database? Sure, Aurora will work great. It'll also cost you an arm and a leg. The storage is expensive, the I/O is expensive, the backups are expensive, the cross-region replication is expensive. It's like they designed it to extract maximum value from your wallet. And for what? Slightly better performance than RDS? The ability to scale reads? Cool, but at what cost?

Meanwhile, they're pushing S3 tables, Iceberg, all this data lake stuff. They're talking about S3 vectors for billion-scale searches. They're building new database services that don't have Aurora's baggage. The message is clear: if you need a traditional database, use RDS. If you need something more, use the new stuff. But maybe don't use Aurora unless you really, really need it and have money to burn.

The silence on Aurora is telling. They're not going to kill it - too many people are locked in. But they're not going to promote it either. It's the database equivalent of that old feature that still works but nobody talks about because there are better options now.

×

Opening keynote / Matt Garman

Garman's Speaking Performance
He makes many, many speaking errors throughout the speech. The keynote is full, and even every single overflow room is full, forcing a lame livestream. Maybe he's nervous?

Holy crap, he wore an actual collared shirt. No more "trying to be cool by underplaying the formality of the event by wearing a t shirt and a suit jacket with $300 sneakers," although he still had to wear jeans.
AWS Origins
We made AWS because people wanted to build stuff but like they didn't have the servers and compute, so we wanted to make it easier for them to build.
Future of Agents
"I believe that in the future we will have billions of agents"

"WHY NOT!?"

Future of billions of agents, which will mean invent new building blocks for agentic systems.
AI Infrastructure
We are good for AI infrastructure since we do not cut corners. "Turns out there are no shortcuts"

GPUs are key, we collaborate with Nvidia for 15 years, and AWS is most stable at running a GPU cluster, because we split the details, like debugging BIOS to prevent reboots, instead of just accepting them.
P6e Instance
Introducing P6e instance, powered by Nvidia bla bla bla. Crowd gives mild applause.

Nvidia runs their large scale gen AI clusters in AWS, and even OpenAI. EC2 ultra servers with hundreds of thousands of chips.
AWS AI Factory
Introducing AWS AI factory = dedicated AI infrastructure for AWS in their own data centers. Like a private AWS region, using power you've already acquired.
Product Naming
"People give us a hard time about product naming in AWS. We named Trainium because it's a chip for AI training, but Trainium2 is the best system in the world for inference."
Project Rainier
Introducing Project Rainier = over a bunch of data centers hosting Trainium processors and powering AI's future, multi-campus clusters, multiple buildings acting as one computer, training the next gen of Claude models.

Extremely corny dramatic video with a bunch of AI powered landscapes. Mild, requisite applause from audience.
Trainium3 Ultra Servers
Trainium3 ultra servers are now generally available, first 3nm AI chip, 5x more tokens per megawatt of power.

Trainium4 is around the corner, which will have 6x the compute, 4x memory bandwidth, 2x memory bandwidth capacity, for largest models.
Nova2 Models
New! Nova2 = new models with frontier level intelligence. Reasoning models for workloads, three classes: lite, pro, sonic.

Build your own model from scratch? Too expensive. And needs expertise. So instead: start with an open source model and modify it, fine tune, reinforce.
Nova Forge
New! Nova Forge. New service that introduces the concept of open training models. Exclusive access to Nova checkpoints and blend in your proprietary data with an Amazon training set, producing a model that deeply understands your information all without forgetting its core info. These are called Novellas. Upload this and then run it in Bedrock.
Reddit Case Study
Case study: Reddit uses Gen AI to generate content. They couldn't get the accuracy they wanted. With Forge though, they made their own model using pre-training. For the 1st time they made a model that met their cost efficiency and accuracy targets, and much easier to deploy. This will completely change what companies do with their AI!
Sony Digital CEO
He brings out Digital CEO of Sony. "KODERA-SAN!" The guy talks for an eternity about why AI is sweet, and why Sony uses it so much to bring 'content to their fans across the world.' (Translation; to replace humans so they can cut costs)

Finally Garman comes back out. "Agents are exciting because they can take action and get things done." Deploy agents with AgentCore.
Agent Guardrails
Treat agents like teenagers who need to learn how to adult. Need guardrails.

Announcing: policy in AgentCore, providing real time deterministic controls for how your agents interact with your data.

"TRUST, BUT VERIFY" same for teens and agents
AWS Transform
"Crush tech debt with AWS Transform."

A few months ago we reached Kiro, I think this is an IDE meant to compete with Cursor.

1 year of Kiro for free for startups, apply in the next month!

KIRO now has autonomous agents. Have it fix stuff while you sleep. Remembers the history of your project so you don't have to keep re-explaining it.
AWS Security Agent
New! AWS security agent. Build apps that are secure from the very beginning. Does stuff upstream to secure your system more often. Scans your code for vulnerability, integrates into your pull request (seriously, he says git requests on PullHub, instead of GitHub…he's really tripping over his words all day).
AWS DevOps Agent
New! AWS DevOps Agent. Proactively prevents incidents, improving your reliability. Investigates incidents and identifies operational improvement. Learns from your resources, their relationships, and existing observability solutions, CI pipelines, then correlates the telemetry across all the sources and then understands.

i.e. it found elevated Lambda error rates talking to your database. By the time you got the notice, DevOps agent already went in there and fixed it.
100 New Launches!
New X8i family of memory instances, powered by custom Intel Xeon.

X8aedz up to 8TB of memory.

C8a instances, based on AMD processors with higher performance.

This goes on for an eternity, bunch of random instance classes….
Lambda Updates
New in Lambda: Lambda durable functions. They manage state, do long running workloads, automatic recovery.
S3 Updates
New in S3: S3 max object size is 50TB!

Also: S3 batch operations are 10x faster.

New! Intelligent tiering for S3 tables. Save up to 80% in costs.

New! Automatic replication for S3 tables across regions.

New! S3 vectors…more cost effective to query.
GuardDuty Updates
New! GuardDuty in ECS AND EC2, enabled for all GuardDuty customers at no additional cost.
Security Hub
NEW! Security Hub generally available. Trends dashboard, streamlined pricing model, etc.
CloudWatch
New! Unified data store in CloudWatch.
Database Savings Plans
Biggest new! Database savings plans. These can save you up to 35% across all usage for your database services.

Kevin represents Eckerdz @ AWS re:Invent 2025