+
+
Director of Technology Paul something comes out, surprise, wearing sneakers with a sport coat and jeans.
Three companies for this case study: Amazon Store, Zoox, and Prime Video.
Prime: 200 million+ users, 9 billion items delivered same day or next day, Prime Day is our Super Bowl. Prepare with 26+ service teams, 10 weeks of status reports, 45+ scaling strategies deployed.
On Prime Day, over 40% is powered by Graviton.
ElastiCache serves 1.5 quadrillion daily requests, 1.4 trillion requests a minute.
EBS (Elastic Block Store): 20.3 trillion IO operations a day.
DynamoDB: <10 ms responses, CloudFront: a bajillion.
Outposts: 524 million commands to 7k robots, 8 million commands/hour, up to 160% vs last year.
Rufus uses a lot of AWS stack, load balancing, VPC (Virtual Private Cloud), etc. It uses Bedrock to pre-generate questions when someone goes to a product page.
How to scale this for Prime Day:
Our core belief: everything should be amplified by AI (not replaced).
Amazon Store SVP comes up and starts talking about AI some more:
Developers spend enormous amounts of time:
AI can help with ALL of this.
Amazon Store internally created: Spec Studio: an agentic system that dynamically documents a library's capabilities. Codebase > Spec Studio > makes a spec > give to Kiro > new code. Like it's own SLA (Service Level Agreement).
It has gone viral in Amazon. 107% month over month adoption, 15.4k specs generated.
This can take existing codebase and services and move to an AI native developmental approach.
4.5x improvement in developer velocity; and 75%+ of Store's developer teams to use this approach next year.
AI is enabling Amazon to do more for its customers. Agents are substantially increasing our productivity. Developers are rapidly embracing AI native development.
Amazon is reinventing this process: intelligence layer (Bedrock AgentCore), AI native identity (Kiro), data fountain (Amazon QuickSight), ALL powered by AWS.
Amazon.com guy concludes with: "Thank you. Let's go build." BARF (so sick of that cliche statement).
Next up Zoox, those self driving car things. Millions of computations to make it work.
What powers a Zoox? A billion sensors. It has to evaluate everything in real time. AI stack built from ground up since it didn't exist to begin with. Models on tens of thousands of GPUs to run them continuously. A continuous feedback loop. Data from the road, improve models, make it safer.
S3 (Simple Storage Service) is the source of truth: source of truth for sensor data, highly scalable, reliable, multi tier and cost efficient storage.
Amazon EMR (Elastic MapReduce) and Athena too.
GPU is the bottleneck here, but expensive.
Use SLURM for scheduling, priority based scheduling etc. Minimize transfer costs and reduce latency by using the right region. GPU capacity: have to use timely GPU access.
We use SageMaker HyperPod to train our models efficiently. Uses EFA (Elastic Fabric Adapter) optimized for high throughput low latency communication. Auto recover from failures, and rich set of GPU observability.
EC2 Capacity Blocks helps us use train-optimized GPU (procure state of the art training optimized GPUs for a short period of time to handle spiky ML workloads). Efficiency: give access to the latest GPU types like the new P6 instances, allowing us to pivot to newer GPUs as they become available.
AWS is able to help us provide on demand reserved capacity for this large number of GPUs, which we need only for a short duration of time. Resourcing GPUs is a delicate balance between cost efficiency and performance, but with the elasticity of AWS, we can optimize for both.
Then some Prime Insights guy talks about NASCAR for like 30 minutes, how they stream sports and use a bunch of AWS stuff for it. 100s gigabits of traffic on AWS Direct Connect.
Then a short video of Allen Iverson (the original AI) talking about how it's cool that Amazon is using AI.
BUILDING BLOCKS, we use that a lot around here. The services, the primitives that we build, are rarely used directly by end consumers. They're used by builders who build products on top of AWS.
500 trillion objects in S3 (Simple Storage Service), 200 million requests per second worldwide.
Fundamentals: security, availability, elasticity, performance, durability. These are necessary to make the tools easy to use.
We've rewritten the entire data path of S3, from start to finish, we rewrote that in Rust. We have to rewrite stuff all the time to match scale. 80% is quite innovative work.
e.g. last year we added conditional writes. This helps prevent RACE conditions for customers writing to S3 from many places, and they don't have to add additional complexity to their systems to prevent. "If this thing does not overwrite thing I expect to overwrite, do not overwrite it"
Other innovations in last year: object size limit increase (today, to 50TB). Conditional COPY and DELETE; object rename in Express One Zone; batch operations on prefixes.
S3 is built on hard drives. JBOD (Just a Bunch Of Disks) model - there's a server sitting underneath a bunch of hard drives. We drive to get more efficiency on our data centers. So we put larger hard drives in our shells. Drive the density per server to higher levels, optimize cost and efficiency. Put JBOD in a rack, put that rack in a row, put those rows into a data center.
New metric: bytes per squared foot. What? Area metric of storage capacity.
Failed project: BARGE. Single racks of 6PB (petabytes) of capacity.
As we move to these densely packed servers, we experienced: greater density, improved economics, availability exposure, poor flexibility. The scale was getting unwieldy. So, move to something more flexible?
We pulled in folks from EC2 (Elastic Compute Cloud) and EBS (Elastic Block Store) teams, and we completely reinvented the storage construct. We built: METAL VOLUMES.
Nitro cards in there, and striped disks. Security virtualize them. So they're running on diskless EC2 servers now, not physical. Gives greater degree of availability, remap the disk to alt server when server fails.
Better flexibility: can scale up, etc.
Challenge: as data is deleted, you get holes. We still have to defragment the disks in the background. Read, scrub, free up space, etc.
With metal volumes that meant moving over network. But we work with Nitro now. Use Nitro offloads, extend the NVMe (Non-Volatile Memory Express) space and use it.
We're early here. 53EB (exabytes) of capacity on metal volume. Metal volumes consume 10% less power than traditional storage rack designs.
S3 Express progress in 2025: price reductions (up to 85% price reductions for Express One Zone storage class). Up to 2M requests per second.
Meta puts their stuff in Amazon S3 Express One Zone. 140 Tbps sustained data transfer to S3 Express One Zone, 1M+ transitions per second, 60PB of storage.
S3 vectors: first cloud object store with native support to store and query vectors. 250k+ vector indexes created, 40B+ vectors ingested, 1B+ queries performed.
Who cares though? Why use vectors? The way we work is changing. Finding the right data and knowing where to step in, choosing the right data to answer a question, that's where vectors step in. You can't label everything yourself, but you can take your data, turn it into a vector representation (just a list of floating point numbers), put that vector in a space, then find your results. Vectors is also used in radiology, fraud detection, etc. Vectors are emergent as a niche tool.
New! S3 vectors GA (Generally Available). Billion scale vector store offering up to 90% lower costs for uploading, storing, and querying vectors.
Designed for 100ms, warm query latency.
Up to 2B vectors stored & queried per index, 40x the preview capacity.
You can upload up to 1,000 vectors/second when streaming single vector updates, up to 100 search results/query. 10,000 indexes per bucket, supporting up to 20 trillion vectors.
Treat vectors like a network of trees. Finding nearby vectors mean hopping through memory a lot. That doesn't work great when your data is entirely in S3 and you have hard drive style latencies. Many round trips. To get elasticity, we had to approach this as an S3 problem. So we take these neighborhoods, coded them as objects in a vector bucket, and then S3 promotes these to memory, leaving that data resident long enough for you to handle queries, then eventually it'll go back to using the storage based vector.
So traditional vector DB + throughput backend.
BMW does it too: petabyte scale hybrid research for quality insight. Hybrid semantic and SQL search built on S3 vectors, Amazon Bedrock Titan embeddings, and Amazon Athena.
20PB of structured and unstructured data discoverable through natural language. Millions of records queried in Athena.
"We migrated our 85 petabyte data lake to S3 tables, we streamline the infrastructure and reduce costs."
NEW! Replication support for S3 tables. Replicate Iceberg tables across AWS Regions and accounts.
NEW! Intelligent tiering for S3 tables. Automatically optimize costs based on access patterns.
NEW! Supporting Apache Iceberg V3. Available with Apache Spark on EMR (Elastic MapReduce), AWS Glue, Athena, S3 tables, SageMaker notebooks, etc.
There's enormous value in your data. Opportunities to do stuff with it. But you have to cure it. You end up with metadata layers over the data you have, and 70% of the S3 customer conversations I have are someone working on this thing, but wanting remove the undifferentiated heavy lifting of building those metadata layers, and get you a vector API structure that you can trust other tools to use, so that's why we launched S3 metadata, we continue to mature it, and now it presents a full inventory review of your data.
The managed table of metadata is read only, its high integrity.
NEW! S3 access points for FSx. Like ONTAP.
Networking Metrics: Available from EC2 (Elastic Compute Cloud), Transit Gateway, CloudWAN, Direct Connect, PrivateLink, and other networking services.
Various log sources provide detailed network activity information:
Three relatively new monitoring tools:
AWS Network Manager Infrastructure Performance: Provides infrastructure-level performance insights.
Metrics and logs for storage services:
Tools for analyzing and visualizing network data:
Automation tools for responding to network issues:
VPC-Specific Tools:
System-Level Tools: At the system level, you can use:
CloudWatch Dashboard: Uses both low-level and high-level probe tools for networking. It is based on your traffic. Depending on where users are connecting from, the Internet Monitor can make recommendations about regions to replicate for better performance, reduce latency, and may suggest CloudFront instead of us-west-1 for optimal content delivery.
ALB telemetry can show you headers in the logs such as 403 errors, indicating if WAF (Web Application Firewall) blocked something. When you see these indicators, you can move over to WAF telemetry to see what happened and why the request was blocked.
This covers the frontend. Now let's look at the backend. ELB (Elastic Load Balancer) connects to Transit Gateway. Is the gateway working?
Transit Gateway Telemetry:
Flow logs could be too much data if you just want to quickly look at whether resources can connect. Use Reachability Analyzer instead. It will evaluate security groups, network rules, and tell you if connections work or are broken, without requiring you to parse through extensive log files.
You can also look at Network Flow Monitor, which allows you to install agents onto your target instances and then perform analysis of the traffic. It can identify any timeouts and analyzes TCP (Transmission Control Protocol) traffic at the IP (Internet Protocol) level, providing detailed insights into network performance and connectivity issues.
Exciting time: code smarter, build faster, AI transforms how we build, rapid ideas to implementation (warning this will probably just be a long commercial for Kiro).
AI as a collaborative partner, gets smarter from every interaction, transforms how we build software.
Innovator flywheel: design, implement, testing and integration, deployment, maintenance, planning & analyses.
Traditional editors: developer drives and provides. AI editors: the developer steers the AI agent to author and review code.
Blurring SDLC (Software Development Life Cycle) phases, like planning and delivery. Now you can quickly get prototypes for someone before even any code is written. Now you take someone's designs and tell it to your friendly AI agent and let it do the work. AI-assisted iteration.
Enter Kiro! Bring structure to AI coding with specs. Automate tasks with agent hooks. Built from the ground up for working with agents. Agent hooks automate routine tasks in the background.
Property based testing: measure whether your code matches the behavior defined in your specs. Can generate hundreds of thousands of random test cases to test your code.
Prompt to code to deployment in your terminal: automate workflows in seconds. Analyze errors and trace bugs with precision.
Unleash custom agents: build task-specific agents optimized for your best practices through pre-defined tool permissions, context, and prompts.
Shows demo of having it write the app for you, put the stuff in S3 (Simple Storage Service), showing a screenshot of an image that needed to be cropped and the agent responding very fast to all of the requests.
Job satisfaction goes up because "they all had fun doing it."
Move from tasks to goal driven direction; scale out concurrent AI tasks to increase velocity; extend agentic AI assistance to every aspect of software delivery.
Kiro powers: empower Kiro agents with specialized expertise; on-demand specialization of agents with dynamic context; curated best practices from partners and experts; full-stack dev to deployment use cases.
New class of frontier agents: Kiro autonomous agent, AWS Security Agent, and AWS DevOps Agent. Massively scalable, multiple concurrent tasks, work independently, sometimes for hours or days without intervention.
The frontier development agent that extends your flow: works autonomously, maintains context, executes across repos.
You can tag Kiro in GitHub like on a task, and Kiro will respond and then start working on it automatically. It already understands, so it'll go analyze the repo, validate the plan, explores the project, maps the frontend/backend, looks at handlers and API (Application Programming Interface) configs, data models, and it will align itself with an approach with the way the system was built. Agent is not session based so it will not forget.
Future vision: human + agents = 10x faster; multi-agent systems; conductor, not just player.
We will be with you through this journey, exciting future, achieve together. Start building with Kiro. 1k Kiro credits to get started, startups can apply for up to one year's worth of Kiro Pro+ tier.
AI/other advancements are changing how products are built, and how we defend them; traditional approach no longer good.
Watch out for stuff like "prompt injection" (prompt buried inside content), AI agents are acting more autonomously, so bad guys are planning for this in their threats.
Now we have 99% more faster log retrieval time; we need to move faster.
We need to change where and how we're carrying out our work so we can scale.
We need to be faster adapting to change.
We need to work side by side with the business we support to stay grounded.
Don't approach security as one size fits all - your security team will own and develop tooling.
At AWS internally we create and invest in: security primitives (so don't have to reinvent), helping across the product lifecycle, and scalable ways for our team to build (we use internal tooling and external tooling).
Making encryption easier: an open source implementation of the TLS (Transport Layer Security) protocol, we looked at OpenSSL and noticed it was 500k lines of code, how do we know that's secure? So we wrote and eliminated a bunch of extra features not needed, then took this down to 6k lines. Released to community in a way we know it would be secure and we could understand it, this was in 2015. We called it s2n-tls.
Then in 2022 we did s2n-quic, providing support for the QUIC protocol, and also support for post-quantum key exchange.
Simplifying security testing: when a change is made to an API, an internal system (AI?) analyzes changes to that API and creates a bunch of tests to cover the entire landscape of that API for edge and normal cases, so the builder doesn't have to do all that fiddly work every time.
What do we already know? Assessment preparation, initial compliance assessments. It used to be pretty manual ("vocally self-critical," an Amazon phrase).
Our internal Active Defense tools: Blackfoot (Network Address Translation at scale). MadPot (sensor system and automated response capabilities), Mithra (massive neural network graph model evaluation reputations), Sonaris (network traffic behavioral analysis by analyzing network traffic). At Amazon you think you know what scale means, then a month in, you are like, I didn't know what that word meant.
Blackfoot translates 312 trillion flows a day, MadPot finds 550M malicious activities a day, Mithra takes over 200k malicious domains a day, and Sonaris blocks almost 5 billion scans a day.
We guide not only our internal protections, but we can provide additional protection to you through AWS Shield, Amazon Route 53 Resolver DNS Firewall, AWS WAF (Web Application Firewall), AWS Network Firewall, GuardDuty, Amazon Inspector.
AI is helping us achieve our work faster, and that expands the ways we can scale to protect our customers. Gen AI is helping us do the things we were already doing better, and that's how you scale to meet the demands of this new phase we're edging into.
Denied party screening: does this transaction adhere to global sanction requirements? Everyday we answer this question over 2 billion times. 96% overall accuracy, outperforming the previous approach for 60% of our volume. Less figuring out what it can do, and more leveraging to use it at business scale.
Deeply understanding how teams build lets us weave security expertise into the way that they already operate, scaling right alongside them.
Intentions: try harder next time, be more careful, communicate better, remember to do it, pay closer attention. Mechanisms: automate the alert, bake it into the deployment pipeline, trigger an automatic rollback, add a guardrail in the code, enforce it through policy as code. e.g. the lines on a freeway are intentions but the guardrails are mechanisms.
AWS Security Intentions: block public access (implement block public access controls for all AWS services using resource based policies), IAM (Identity and Access Management) integration (Use Daffodil library for IAM authorization in API services), MCP AuthN (all MCP servers must implement client authentication on every incoming request), MCP Logging (all MCP servers that offer write APIs must log caller information for auditing).
Yesterday at the keynote we introduced frontier agents: autonomous agents that can work for a long time without requiring human intervention (in Kiro). Three kinds: Kiro Autonomous agent, AWS DevOps Agent, and AWS Security Agent (custom and managed security requirements, automated security design reviews, continuous security code reviews, on-demand penetration testing) - turning intentions into mechanisms.
So specs that it generates (like my design review) becomes mechanisms. Block public access: non-compliant. Authentication best practices: compliant. Trusted cryptography best practice: insufficient data. In a finding, it will tell you why something is or non compliant, and a remediation guidance.
Daffodil is internal to use, because it's a formally verified IAM library that helps other teams to integrate IAM correctly, it is mathematically proven correct. But the proof doesn't do them any good if they don't know about Daffodil, you have the best things to use, but if you don't know it exists, how will you use it? Design review alone would catch this gap.
Now for you: your security intentions. Session limits (all user sessions must timeout after 30 minutes of inactivity), TLS minimums (all external communications must use TLS 1.3 or higher), third party libraries (third party libraries must be from your company's approved library list), Centralized AuthN (all authentication must use your company's centralized authentication service).
Go to AWS Security Agent today, and turn your intentions into your security mechanisms.
You'll need security to be adaptive to the changing realities outside and inside of your organization. Be adaptive to the changing realities both inside and outside of your organization.
Check out blog post: Amazon disrupts watering hole campaign by Russia's APT29. The actor had compromised legit websites and injected JS (JavaScript) that redirected about 10% of visitors to actor controlled domains (like mock CloudFlare), in order to secure the shipment. It was a Microsoft code authentication flow, and our analysis of the code revealed some evasion techniques like using randomization to only direct 10% of visitors, and employing obfuscation coding to hide the malicious code, setting cookies to avoid repeated redirects, no compromises of AWS systems happened here. But we saw this was going on, and we need to protect the security of the internet. We worked to isolate the affected instances, trap the operations, and share relevant info with our partners. Why care? This is just one example of increased evolution and scale of attacks that we will all have to defend against and figure out how to defend against together.
Another example: Amazon Inspector detects over 150k malicious packages linked to token farming campaign. Harvested credentials, used those to self propagate, detection is more challenging when the adversary is using legit credentials and doing normal stuff, so we collaborated with security researchers and deployed a new detection rule that we paired with AI to identify suspicious package patterns in the NPM (Node Package Manager) registry, we were able to use indicators like circular dependencies, or absence of a particular config file, to get a better idea of when this package was suspicious. We identified over 150k, that were linked to a particular campaign. We worked with OpenSSF (Open Source Security Foundation) to coordinate our response. This will continue. Teams need to be adaptive.
What's happening with development right now? Builders are further from the code. Testing is critical. Gen AI augmented code development is another example of distance being inserted between the dev and running code in prod. We've gone from low level languages to IDE (Integrated Development Environment), then test driven development, then keep increasing distance. Gen AI will do that even more. This is a real opportunity for security teams, because today if your test is a little off but a human is involved with their human judgment, you can catch when something is going off the rails. But in the future, if tests are off and the human is not near it, things can go off from security perspective. Our devs that we support, they will need the tests to be robust. This could let us automate more fixes for builders.
What are we doing to support this shift? Joint security and builder experimentation; iterate automation; adapt and scale together.
Success comes when an agent does one thing well. A collection of specific agents outperforms something more general. Then assemble them into something larger and more magic. If an agent does one thing well, it is easier to reason about securing the agent itself. On-call agent, reliability-agent. Therefore you know what permissions it needs (least privilege, etc).
Are you measuring the right things? Not the number of findings, or what kind. But how many of them have you fixed? How many can you get under 3 minutes, what's your P50 time? How long is the longest tail open? What do we do about that? How many builder actions are involved in taking this stuff and fixing it? Come up with better ways to automate and scale your practice. You need to measure the right things in order to stay agile and adapt, even at scale and even through fast change. Building together with your business is key to scaling successfully.
Automatically identifying security opportunities. We operate at a massive scale and we need to break our workflow down to manageable chunks.
Our work is tracked through tickets. These tickets get into a follow up queue so we can help builders when we need to, and this happens for all AWS services.
Growing increase in CVE (Common Vulnerabilities and Exposures) publications: went from 40,703 in 2024 up 22% to 49,500 in 2025. A single CVE might impact tens of millions of assets. Need to understand and act quickly.
We're training an AI based assessment engine that evaluates all the CVEs that will assess details like instance types, parameters, and help us identify false positives and evaluate risk across millions of assets.
Security tooling and the security ratchet. As Amazon we're basically an API vendor, so we need to know the security of a vendor. Back in the day we'd have a static approach, have to look at the API and what it's trying to do and refine security from here. But now, we're automating helpers that can give us a richer picture of the system's overall context and how they connect to each other.
What is your business trying to achieve? Where can the overall system improve? Deeply understanding these two points is a critical first step. Ask questions end to end and notice when things around you are changing. Cost to build, cost to secure, cost to deploy. Gen AI is changing the first and last of those. Now the middle one is finally catching up.
What needs to be different as we move into an agentic world?
Supporting agents in production: securely execute and scale agent code, remember past interactions + learning, identity and access controls for all agents and tools, agentic tool use for executing complex workflows, discover and connect with custom tools and resources, understand and audit every interaction.
Runtime > Memory > identity > gateway > code interpreter > browser tool > observability/telemetry.
That's why we built AgentCore.
Working hand in hand with the business you support is crucial. You can't defend what you can't see.
Example Compliance checklist for supply chain: SOC 2 Type II certified, annual pen testing, ISO 27001 compliant, NIST (National Institute of Standards and Technology) framework aligned, MEA, encryption at rest and transit, incident response plan, regular security training, vulnerability scanning, access controls documented > all clear!
Security can feel like a wet blanket (well kid you're gonna shoot your eye out). AI can help though….
Work backwards from the business outcomes and reduce developer friction. Security is part of what any business delivers to its customers. It's not easy, but three things that can make it simpler: embedding expertise, keeping focus on the end risk and adapting, and maintaining a culture of building with your business.
Security teams need to be builders too.
Measure the right things to stay agile and adaptable.
And yes, AI can help!
EBS Volumes: Amazon Elastic Block Store (EBS) provides scalable, high-performance block storage volumes that can be attached to Amazon EC2 (Elastic Compute Cloud) instances. EBS Snapshots offer incremental point-in-time copies of volumes, enabling efficient backups and disaster recovery.
Data Services: EBS includes elastic volumes (allowing dynamic resizing), provisioned rate for volume initialization, and time-based snapshot copy capabilities.
Different workloads require different storage characteristics:
io2 Volumes: Best for relational databases requiring very low latency, high IOPS, and medium throughput. Features include:
gp3 Volumes: General Purpose SSD volumes designed for single-digit millisecond latencies 99% of the time. In practice, they often outperform this specification. Best practice: if you don't know which volume to use, start with gp3. You can provision workloads strategically—put journaling on io2 and other data on gp3.
Latency Comparison (Air Traffic Controller Analogy):
Fully Managed Fault Injection: AWS Fault Injection Service is a fully managed service for running fault injection experiments. It's easy to get started, simulates real-world conditions, and includes safeguards. It's part of the resilience lifecycle: test and evaluate.
EBS Volume Fault Injection: FIS can simulate stalled I/O and high I/O latency on EBS volumes to:
Four Pre-Configured Scenarios:
Instance: m8g.4xlarge (16 vCPU Graviton4, 64GB memory)
Storage Configuration:
Queue Architecture: EBS has multiple queues throughout the stack. When your application submits an I/O request:
Torn Write Prevention: Databases use double-write buffers and extra logging to protect against torn writes (partial writes during power failures). Typical block devices only guarantee sector-sized torn write protection. With appropriate file system configuration, Nitro EC2 instances offer torn write protection across larger I/O operations, often allowing reduction or removal of double-write buffers.
Status Checks:
EC2 Auto Recovery: Automatically mitigates issues on your behalf. If EBS fails, you can fail over to a replica or recover in a different Availability Zone (AZ).
New Metrics: volumeAvgThroughput and volumeAvgIOPS help you right-size your EC2 instance and EBS volumes.
Storage:
Instance: m8g.8xlarge (if more headroom needed, use 12xlarge)
Burst Instances: Give you extra performance when needed, but performance can drop off. You must be aware of these limits.
New Metrics - Instance Limit Status: Quickly identify performance issues related to EBS-optimized limits:
Describe Instance API: Provides comprehensive information about CPUs, network, memory metrics, and different CPU configurations (important for licensing considerations).
Updating Instances: Cannot change number of CPUs or memory while instance is running. Must stop and restart (treat like a long reboot). In Auto Scaling, just update the launch template.
2008: Only Standard volumes available - 100 IOPS, clunky hard drives, no Quality of Service (QoS), instance performance was overshared.
2012: Provisioned IOPS volumes and optimized instances introduced.
2017: Nitro system ensures you always get the IOPS you need.
Now (2025): Up to 100 Gbps, 400,000 IOPS on r6in instances.
Specifications:
Before Nitro: Software-based queue stack with many queues (networking has queues, storage has queues).
Nitro Card as DMA Engine: Direct Memory Access (DMA) engine that also handles encryption. As I/O data is pulled from the system, Nitro encrypts the payload. Another DMA engine puts it on the network. Data bounces through the Nitro card quickly. On the AWS side, same process: dequeue, if it's a read, check local SSD; if it's a write, handle replication. Caching occurs (reads cached more than writes, data not persistent). Then data populates back to your instance. Nitro is efficient with hardware offloads. The network is where AWS can take liberties because they own that infrastructure.
TCP Limitations: Single path through network, requires strict ordering, usually involves the Operating System (OS).
AWS Scalable Reliable Datagram (SRD): Multi-path through network, retries in microseconds, application-level ordering, runs on Nitro hardware. AWS built SRD because TCP did more than they needed. By putting more logic in higher-level applications (like database replication), applications have more context about what needs to be transferred.
SRD Benefits:
R8gb Innovation: With R8gb instances, AWS can finally do what they've been planning. Only one DMA instance that can pull data from instance, encrypt it, send it to the storage server. On the EBS storage server, AWS can now steer requests directly to the CPU responsible for handling your volume data. CPUs on both instance side and server side are optimized.
TiKV Cluster Instances: m8g.4xlarge
Cumulative Metrics: Total I/O operations, bytes, and time spent (microseconds) since volume attachment.
Performance Exceeded Metrics: Total time (microseconds) that either volume or instance performance exceeded provisioned performance since attachment.
I/O Latency Histograms: Show total number of I/O operations completed within each bin since volume attachment. Periodically poll to get ongoing statistics. Can be published to Grafana or other endpoints. Can run side-by-side with iostat and compare (they may differ, suggesting you need to tune your application). Can show demarcation points like network drop-off.
Size Increase: Immediate size availability, but must resize filesystem to utilize new space.
IOPS/Throughput Increase: During optimizing phase for performance increase, will increase between original and new values.
Latency Impact: Latency may be impacted during optimizing phase. As optimization occurs, different blocks get sorted around.
Little's Law: L = λ × W
Concurrency: Useful measure of capacity (tells us limits) and also a measure of contention. If concurrency is high, so is contention. Air traffic controllers and runways are good examples of capacity.
Queueing in Storage: If Queue Depth (QD) = 1, latency will dictate IOPS you can achieve.
Plan: Identify your Key Performance Indicators (KPIs) and select instance and volumes to fit your workload requirements.
Monitor: Use Amazon CloudWatch and other monitoring tools to track performance metrics, identify bottlenecks, and right-size your infrastructure.
Key Takeaways:
What does AI innovation mean for the cloud? We've been delivering what it needs for a while. Security, availability, elasticity, cost, agility.
Blathers on and on about why AI requires big computers, and AI is sweet and booming and bla bla bla "infrastructure is more important than ever"
We always wanted bare metal performance, but there was a "virtualization tax." So we put Jitter in the shitter and developed AWS Nitro System. Server: customer instances and hypervisor; controller: networking, storage, management security and monitoring.
Nitro and Graviton enter a textbook "Computer Architecture: A Quantitative Approach." A quote: "AWS Nitro and Graviton chips demonstrate how custom architectures, grounded in first principles and measured…" ok slide finished.
Nitro v6, Graviton4, Trainium3, our latest hardware innovations.
Because we build both the processors and server and OS (Operating System), we can optimize across the full stack.
Traditional cooling approach: heat sink > TIM (Thermal Interface Material) > LID > TIM > SILICON.
Heat transfer is straightforward physics, every layer in the thermal path slows heat movement, so more resistance leads to higher junction temperatures and higher temperatures increase leakage and higher leakage increases power consumption. This is where inefficiency can really build up. Traditional CPU use this design because they must support many systems, and many. But since we control the entire system for Graviton, we can think differently.
Graviton: heat sink > TIM > silicon. We remove the lid with a layer of TIM, and that reduced resistance and allows heat to move more efficiently. Precision manufacturing, carefully selected materials, our fan power drops by 33%.
Virtuous cycle of silicon development: develop silicon > expand workloads > find bottlenecks > improve design.
Core > cache. Core needs data, it checks the cache. When it's not available, must go all the way out to main memory.
Core > L1 cache > L2 cache > L3 cache > memory.
We love big caches. The more data you can keep close to the core, the fewer slow memory trips you have to take.
Graviton 4: 30% better performance.
L2 instruction: misses per thousand instructions improved.
NEW! Graviton5: our most efficient CPU ever. 2x number of cores, 5.3x L3 capacity.
NEW! Amazon EC2 M9g instances: 25% higher performance compared to M8g instances, best price performance in EC2 today.
What if a developer could just hand their code to AWS and have it run?
Serverless is the absence of server management.
AI workflow: prompt > tokenization > prefill > decode > detokenization > response.
Requests > latency tiering (priority, standard, flex) > fairness (customer 1, 2, 3, 4) > GPU Cluster.
Journal: Submitted > queued > dispatched > in progress > completed (or failed).
Secure AI: secure model weights, zero operator access, cryptographic attestation for models.
Back to vectors: knowledge is everywhere. Text, document, image, video, audio. You can't search vectors across different embedding models, because those abstract dimensions that we talked about likely hold entirely different concepts. So this is the challenge we set out to handle with:
Nova Multimodal Embeddings: industry's first unified embedding models. Unified understanding of data.
For example, Amazon OpenSearch is now vector driven. Don't have to choose between a traditional keyword or semantic search: hybrid search gives you the keyword based precision, and semantic search allows you to get even better results.
With S3 vectors, we reduce the cost of uploading, storing, and querying vectors by up to 90%: store billions to trillions of vectors without infrastructure setup or provisioning.
We're storing them in S3 to be economical, not memory, so how to reduce latency for these vectors? We use vector neighborhood, like a place where a bunch of related clusters are cached. When a user goes to execute a query, a much smaller search is done to find the nearby neighborhoods. This is then loaded from S3 into fast memory, where the team then applies an approximate nearest neighbor algorithm, and it can be done quickly. Result: 100ms query latency, up to 2B vectors per index in production.
>250k vector indexes, 40B vectors ingested, 1B queries so far.
90% of data today is unstructured: most of that comes from video. Video is also incurably complex: one million hours = 114 years.
S3 vectors turns every S3 bucket into a potential video search engine.
Trainium 3: TTFT (Time to First Token), great performance; TPS (Tokens Per Second).
EC2 TRN3 ultra servers: up to 144 Trainium 3 chips. Comes with neuron switches. Extremely low latencies. 360 PFLOPS (FP8), 20TB HBM (High Bandwidth Memory) capacity, 700 TB/s bandwidth. 4.4x higher compute performance, 3.9x more memory bandwidth.
This is the first system using all three AWS custom designed chips in the same server board. 4x Trainium 3, 1x Graviton (on the same sled as the training, so heat management easier), and 2x Nitro. Top replaceable design.
Microarchitectural optimizations: microscaling, accelerated softmax instructions, memory addwrite, tensor dereference, traffic shaping, background transpose, MMCAST mode, mesh hashspray.
It's just not just best hardware: you need the right tools to build.
AI Hardware Kernels: extracting every bit of performance.
NEW! Neuron Kernel Interface. Direct access to AWS Training NeuronCore instruction set architecture. We're also open sourcing all aspects of the NKI (Neuron Kernel Interface) stack. Nothing worse than a compiler surprising you.
Neuron profiler: the industry's leading hardware native, AI chip profiler.
TRN3 ultra servers: 5x tokens per megawatt.
New! Trainium has PyTorch native support.
287 billion Amazon EC2 instances. 1.5B instances launched every week; 38 regions globally; 1,000+ instance types.
For example: Zoox. Autonomous vehicles collect petabytes of driving intelligence data across various urban environments, processing data through advanced ML (Machine Learning) models. Uses AWS ML capacity blocks, on demand capacity reservations, Amazon SageMaker HyperPod.
OR: Amazon Leo. 3000 satellites, 17,000 mph, using Graviton4, FPGA (Field Programmable Gate Array).
Intel has been there since the beginning of AWS. We helped Mercedes: 32TB system, 40k users, 17 different systems; we solved with a U7inh instance, performance, reliability, security. Not a simple migration, took 9 months.
Intel Xeon 6: custom processors, unique to AWS. Most powerful Intel processors everywhere.
What does this mean for you? AWS + Intel: 15% better price performance, 2.5x faster data processing, 20% higher performance for workloads. Like C8i, up to 60% faster NGINX performance. R8i, industry leading SAP performance. M8i, up to 30% acceleration for PostgreSQL database.
Flex family: whether compute optimized, general purpose, or memory optimized workloads, you can save an additional 5% on your compute costs. M8i flex, C8i flex, R8i flex.
NEW! X8i instances. 6 TB of memory and 1.5x memory capacity versus prior generation instances. Great for in memory databases, mission critical workloads. Also C8ine instances, based off Nitro v6. 2.5x higher packet performance, 2x higher network bandwidth. Ideal for security appliances, firewalls.
AWS + AMD: another breakthrough.
NEW! C8a instances. 33% higher memory bandwidth and 30% higher performance than prior generation instances. Up to 384GB of memory, 75 Gbps of network bandwidth, 60 Gbps of EBS (Elastic Block Store) bandwidth. Good for batch processing, distributed analytics, high performance computing, multiplayer gaming.
NEW! X8aedz instances. Next gen memory optimized instances, maximum frequency is 5 GHz. 2x higher compute performance, 31% better price performance.
NEW! HPC8a instances. Designed for compute intensive high performance workloads. Good for computational dynamics, weather forecasting, drug discovery apps.
NEW! M8azn instance. Next gen general purpose instances. 2x higher compute performance, 2x more memory bandwidth, 5x faster packet processing. Ideal for compute sensitive workloads, like gaming, financial services, high frequency trading, simulation modeling.
For e-commerce platforms using these, more concurrent shoppers during peak hours = more revenue. Gaming companies = more concurrent players, no lag means more player retention and in game purchases.
AWS + Apple: new, EC2 M3 Ultra Mac instances, and EC2 M4 Max Mac instances.
AWS + Nvidia, 15 years and counting. AWS is the best place to run Nvidia GPUs. Highest reliability, uptime, availability, scalability for GPU based systems. Like the P6e (GB200 NVL72), and P6 (B200, B300).
AML2023: 12% improvement in instance launch to SSH ready times. Streamlined random number generation services; boot process improvements. Like the c6g medium: 9.1 seconds to 8.05 seconds.
NEW! Amazon EBS Time Based Snapshot Copy. You can now copy EBS snapshots within or between AWS regions and accounts with a predictable completion. Up to 350% improvement, like adding a 1TB terminal node.
Have EventBridge let you know when the copy completes. With guaranteed completion duration, you can address strict RPO (Recovery Point Objective) requirements, and improve disaster recovery runbooks, and testing cycles with confidence, knowing exactly when your snapshot will be available in the new region or account.
NEW! Amazon EBS provisioned rate for volume initialization. Create fully performant EBS volumes within a predictable timeframe by specifying initialization rates between 100 MiB/s and 300 MiB/s - up to 60% improvement.
NEW! EBS Status Health Check: Detect potential issues and report them in the form of a health check.
NEW! Capacity Reservation Topology API. Determine the placement of your reserved capacity before launching instances.
"There's no compression algorithm for experience." Cringey quote from Andy Jassy that reeks of r/Im14andthisisdeep.
Nitro fundamentally changed how we deliver cloud computing. Innovate faster; reduce costs; enhance security.
Graviton: 40% better price performance, 60% less energy, 20% cost savings.
Migrate your workload to Graviton! It only takes one week, one developer, one workload.
NEW! NitroTPM (Trusted Platform Module) + attestable AMI (Amazon Machine Image). Verify trusted software on EC2 instances. Only trusted software running on your EC2 instances.
TRN3 servers: 144 chips; 362 FP8 PFLOPS; 706 TB/s aggregate bandwidth.
Your power of choice in compute abstractions: serverless, HPC (High Performance Computing), containers, AI, quantum. If there's a good idea, there's at least 12 ways to do it in AWS. But realistically, there's only one way: the way you choose.
NEW! Lambda tenant isolation. Full security isolation; dramatically simpler operations; scale with confidence. Lambda will make sure you get the hard security boundary around your function, so stay in the abstraction you want to be in, all you need is a tenant ID. Isolation achieved.
NEW (but announced yesterday): Lambda durable functions. Simplify developer experience; strengthen application resilience; enhance operational efficiency. Checkpoint between steps in your process. We'll do retry logic. You focus on your problem e.g. payment processing or your manual processing or whatever. Can wait up to a year to continue execution.
NEW! Lambda Managed Instances. Choose your compute; drive cost efficiency; maintain Lambda's operational simplicity. Serverless no longer serverless? Focus on business logic.
NEW! ECS (Elastic Container Service) managed instances. Unlock the breadth of EC2 compute capabilities; improve performance and cost with bin-packing and EC2 node management; all with the same ECS operational experience. We'll now meet you in the middle.
NEW! ECS native deployments. Blue/green, canary, linear; best practices built in, safer deployments.
NEW! ECS Express Mode. Accelerate application deployment; simplify infrastructure configuration; maintain full control and flexibility. Launch everything in one call. Brought down from 300 parameters down to 30. Set it and forget it.
NEW! Amazon EKS (Elastic Kubernetes Service) capabilities. Reduce friction/increase velocity; flexibility and choice; enable production ready scale. We'll include stuff like ACK (AWS Controllers for Kubernetes) controller so you don't have to manage it.
EKS ultra scale clusters: earlier we launched ultra scale for AI/ML training. Now we are moving that to new classes of generic workloads (Claude, Amazon Nova). 1.6 million AWS Trainium accelerators.
New: Checkpointless training on HyperPod. Accelerate Gen AI development, reduce training cost, scale with ease. Why do we need checkpoints anyway?
New: Elastic training on HyperPod. Accelerate time to market; simplify operations; get started easily. Additional GPUs for high priority tasks, etc.
New: SageMaker AI model customization. Accelerate end to end workflow; comprehensive customization techniques, serverless infrastructure. Generate datasets, evaluate model for success, etc.
Why can't every organization have their own frontier model?
New: Amazon Nova Forge. Build your own frontier model with the only service that gives you access to open training models. Select starting model + checkpoint; mix your data with curated datasets in SageMaker AI; deploy across SageMaker AI and Amazon Bedrock.
Werner starts out with a corny rip off of Back to the Future, showing the evolution of coding and the continued belief that "this is the end of developers" and "no more engineers," but the 2025 theme is "GenAI for Devs" and he rewrites the title to "Back to the Beginning" instead of back to the future.
He starts out by saying this is his last keynote. He's not leaving Amazon, but 14 keynotes is a lot, and he doesn't need to do anymore. "It's time for those younger different voices of AWS to be in front of you."
Will AI take my job? …maybe! Some skills will become obsolete, and some things will be automated. We should rephrase and instead say: will AI make me obsolete? NO! As long as you evolve.
1950s: assemblers; 1960s: compilers; 1970s: structured programming; 1980s: object-oriented programming; 1990s: monolithic; 1998: service-oriented; 2000s: on-premises; 2010s: cloud-computing; 2020s: 'traditional development'; 2025: LLM (Large Language Model)-assisted.
The work is yours, not the tool's
"We are at the epicenter of multiple simultaneous Golden Ages" - Jeff Bezos
The renaissance developer is curious.
Curiosity leads to learning (and invention).
Willingness to fail, fail, and be gently corrected.
"The scary moments are the ones we learn the most from." Yerkes-Dodson Law: increasing attention > optimal point > impaired performance. You have to put yourself in a position that tests you.
"You are not what you know but what you are willing to learn," - Walt Whitman
"Think in systems, not isolated parts."
Every system has: reinforcing loops, and balancing loops.
The renaissance developer thinks in systems: to build resilient systems, understand the bigger picture.
He also communicates. Human to human (natural language) = ambiguous. Human to machine (programming language) = precise.
Well, not anymore. Human to machine is turning into natural language, making it ambiguous again. Specifications reduce ambiguity.
They talk about Kiro again for the thousandth time, saying it's "Spec driven development." Spec-based prompting.
Old example of spec driven development: Doug Engelbart invents the mouse.
Clear communication reduces mistakes.
The renaissance developer is an owner.
Vibe coding without human review = gambling.
(Vibe coding is fine! But only if you pay close attention to what is being built. We can't just pull a lever on your DB (database) and hope that something good comes out. That's not software developer, that's gambling. And you need to be out somewhere for that.)
Code: quick to generate, slower to review.
Challenge #1: verification debt (AI writes code so fast you have trouble learning to understand all of it, comprehend).
Challenge #2: hallucination (sometimes they over engineer things or come up with solutions that are totally inappropriate).
Solutions: spec-driven development, automated reasoning, automated testing.
Durability reviews: what are all the things that might go wrong?
Human review to restore balance: control point. See saw with generated code on left side, code review on the right side. "Yes, we all hate code reviews. Like being a 12 yr old and standing in front of the class. But in an AI driven world they are crucial, models can generate code faster than we can understand it, so hopefully in the review the control point is used to restore balance. Bring human judgment back into the loop."
Human to human code reviews (mechanism) = knowledge transfer.
TLDR: YOU BUILD IT, YOU OWN IT
The renaissance developer is a polymath.
"Give me the 20 most important questions to ask of your data and I will design a system for you" - Jim Gray
T Shaped Understanding = highly specialized + broad understanding (depth and breadth).
Every "T" is unique: personal skills, functional skills, industry specific.
So broaden your "T".
So the renaissance developer is curious, thinks in systems, communicates, is an owner, and a polymath.
We are hoping we made it to the end without his usual corny rallying cry "Now go build." We're so close, but unfortunately he blasts a "KEEP CALM" meme (what is this, 2015?) that says Keep calm and now go build! What?? That doesn't even make sense!
Introducing a new flat rate plan! Deployed through CloudFront.
Pay-As-You-Go Model: Starts at $0, choose features and services, pay only for what you use. Align costs with consumption. Costs vary with usage.
You may be using services like ELB (Elastic Load Balancer), CloudWatch, WAF (Web Application Firewall), and these all have their own prices and costs.
Customer Challenges:
Your usage grows with your business, so do the costs. This is a challenge for budget planning and financial forecasting.
Typical components required:
Cost Estimation Challenges: 5+ pricing pages, non-standardized pricing dimensions, missing inputs.
Bill Predictability Issues: Unexpected traffic, traffic not controllable.
Flat Rate Pricing Plans for Delivery and Security: One monthly price includes:
Available in four tiers. Include features and usage; select tier with features you need that accommodates baseline traffic. No overages for exceeding allowances. Blocked requests and DDoS attacks never count against your allowance.
Plan Tiers:
Monthly Usage Allowances for Requests:
Monthly Data Transfer:
Use Case: Personal projects and getting started. Includes all core capabilities.
Features:
Includes all Free plan features, plus:
Includes all Pro plan features, plus:
Includes all Business plan features, plus:
Bolt onto any internet-facing app: Delivery and security layer. Extend cost savings to other AWS services.
Benefits:
Requests hit CloudFront edge locations, fewer requests reach your app, data transfer costs waived between AWS origins and CloudFront.
Flexibility: If you go through your allowance, you may get routed to a different location with some increased latency (by 20-30ms). Not blocked though. Flexibility is the goal.
Block direct to origin traffic. One flat rate covers plan and data transfer costs.
Traffic Flow: TLS termination > bad traffic is blocked > CloudFront edge locations > CloudFront regional edge cache > CloudFront VPC > private subnet > CloudFront VPC origins fleet > VPC to VPC NAT (Network Address Translation) > ALB in customer VPC, private subnet.
Security: Block direct to origin traffic that bypasses CloudFront. This is not internet-facing.
Will this show up as a recommendation in AWS Cost & Billing panel? NO. Not at this time.
If you apply it to an existing architecture, it counts just as a general S3 discount, not applied to any particular bucket.
Block direct to origin traffic. One flat rate covers plan + data transfer costs.
Traffic Flow: TLS Termination / bad traffic is blocked > CloudFront edge locations > CloudFront regional edge cache > Origin Access Control (OAC) (private S3 bucket, AWS Elemental, Lambda function URL). Only allow access through your designated CloudFront distributions.
Bad Traffic Handling: Bad traffic fails because it lacks cryptographic proof that the request originated from your designated CloudFront distribution.
Own the entire stack: infrastructure > global network > virtual private cloud > application network. Data center is built from the ground up, we control every aspect from hardware to internet edge. Nobody else can say that! (Is he correct there though?)
AI Models: growing more complex and demanding by the day. This is going to push networks. Models have already crossed the trillion parameters threshold. But not just size: need to be trained and deployed all around the world.
AWS Networking is the backbone that makes it possible. Why does networking matter?
1M+ AI accelerators. They work together to adjust, share, talk, they communicate in nanoseconds. But they are getting massive: so we have no choice, must break up the perfect partnership and scale out across hundreds of thousands of accelerators.
When we do that, latency. Server > rack > data center > region.
Hollow core fiber: 30% improvement in latency. Thousands of kilometers already installed.
So infrastructure needs to be fast, reliable, and scalable.
Concrete example: training 2, ultra server. It's a category of compute: 54 Trainium2 chips onto a single ultra server, 512 neuron cores working together, arranged in a carefully designed topologies. Circular pathways that bridge. This directly determines what computation is possible.
Dynamic scaling, this is what makes the parameters possible. Every AI model needs to talk in microseconds.
This is why we build EC2 ultra cluster. They have already powered some of the largest AI training rooms in the world.
Seamless connectivity > low latency > petabit scale.
Network topology > performance, resiliency.
Wired rails: ML (Machine Learning) clusters often used these. Connectivity that creates mini networks. Any single link can affect your entire training room, causing work to restart from checkpoint. Failures will happen - conventional method is to connect all accelerators on top of rack. But this creates interference. Recovering from checkpoints is costly. We need tech that gives us the best of both - which is why we develop ultra switch. New, top tier networking device in our data center, home grown in AWS. Accelerator index, clustered topology information, network capacity to align AI accelerator traffic to optimal rails. When faced with failure, quickly recover into stability. Ultra switch can keep flow collisions to a minimum.
Ultra short reach optics: improves link reliability of the critical server to network connection by 3x. Copper is impractical when you need to route hundreds of thousands of thick cables through rack space. Reap size and speed benefits of copper: so we use ultra short reach optics. Minimizing connectors, result is reliability that surpasses copper.
But networking is not just hardware. Traditional routing protocol was never built for this. On AI scale, few seconds of delay could stall an entire training room. So we build: SIDR. Scalable Intent Driven Routing. Continuously monitors, detects failures, adapts paths.
Exactly what large scale AI training demands: ultra switch > predictable paths; SIDR > adaptable network. Both: fast, scalable, resilient.
In just 3 years, AWS has deployed: 300k+ switches, 40M+ ports dedicated to ML traffic, growing faster than our core network traffic itself. Extraordinary demands.
Project Rainier: Anthropic is using this to build and deploy its industry leading AI model Claude. 1M+ Trainium2 chips by the end of 2025.
We handle the physical networking so you don't have to. Enforce security > scale globally.
Everything is built on VPC (Virtual Private Cloud). Meaning you get your own private section of the AWS cloud. You decide what connects to what, when, and how. Strict separation between workloads.
Amazon VPC connectivity: VPC peering, Transit Gateway, NAT Gateway, AWS Internet Gateway.
"I need to pull data from the user service. I need to talk to the payment API. I need to access the recommendation engine." Code doesn't care about network topology, it thinks in business logic.
Connect securely to a database, access object storage, reach a ML model.
Amazon VPC Lattice: intent driven networking. Eliminates the complexity of routing tables and peering configurations. This service needs to talk to this service: then just define the trust level. It discovers endpoints, enforces zero trust policies, provides deep observability.
PrivateLink for SaaS (Software as a Service) services: multi region architecture shouldn't mean multi region complexity. Private connectivity to third party services all over, without anything touching the internet.
NEW! Cross region PrivateLink for AWS services. Across AWS services across region using private connectivity. Available now! This is layered abstraction.
LLM (Large Language Model): memory utilization > token processing complexity > type of inference request > model state.
NEW! ALB (Application Load Balancer) Target Optimizer. The load balancing solution for AI inference workloads.
You install the agent on a target, the agent configures max number of concurrent requests from ALB, agent tracks the requests.
Amazon API Gateway: 140T+ requests in 2024, 40% increase YOY (Year Over Year), 400k+ unique accounts. NEW! API Gateway Portals: transforms how you discover, document, and scale APIs. Fully managed experience, always up to date documentation, detailed user analytics.
NEW! Response Streaming for API Gateway. Instant first bytes, steady delivery, and support for massive payloads. Good for LLM responses, complex analytics, no more buffering entire responses or loading screens, make first paragraph delivered immediately. Real time collaborative editing, progressive data visualization, interactive AI conversations.
NEW! Model Context Protocol (MCP) for API Gateway. Integrated with Amazon Bedrock AgentCore Gateway. Supports API keys, both public and private APIs, does compliance, MCP agentic workflows.
Security: application protection > VPC security > encrypted transit > AWS Nitro System.
Security must be effective at scale, intelligent, and adaptive.
NEW! Amazon VPC Encryption Controls. Built-in, enforceable, auditable. Few clicks, audit the encryption status of your traffic, and modify them to enforce encryption across your entire network infrastructure. Uses Nitro Encryption through various AWS services. Eliminates the operational overhead and complexity associated with this management.
NEW! Network Firewall Proxy: easier than ever to secure your applications. Decrypt and filter internet egress traffic; detect compromised workloads; enforce tighter security controls; prevent data leaks. Domain filtering, IP filtering, TLS (Transport Layer Security) interception, response traffic filtering. AWS handles the scaling, while you get enterprise grade data and protective filtration.
The more connections you secure, the larger your footprint and more complex. So our focus is making it easier for you; focus on your app rather than the heavy lifting of configuring complex network setups.
Transit Gateway: network firewall native attachment. Now firewall integrates directly into Transit Gateway level. Control across all VPCs and on premises networks with no overhead, and solving cost allocation problems, even in large scale environments.
AWS network worldwide: 9M+ kilometers of fiber; 50% expansion YOY.
NEW! Fastnet. Dedicated high capacity transatlantic cable connecting the US to Ireland. Will be done by 2028?
38 regions, 120 AZs (Availability Zones).
For example, Sophos: 35% improvement in latency globally, 146,000 to 2 million requests per second during a traffic surge.
AWS CloudWAN: unified network. Segment A, B, C, etc.
NEW! AWS CloudWAN Routing Policy: fine grained controls to optimize route management, control traffic patterns, and customize network behavior across global wide area networks. No more need for third parties.
150 Direct Connect locations.
Complex multi layered networks: higher costs, operational complexity. "Pull data from the user service." SO: streamline multi cloud connectivity. Simple, resilient, private. Radically simplify operations so you can establish private cloud to cloud connections in minutes.
NEW! AWS Interconnect - multi cloud. Simple, resilient, high speed private connections from your VPC to other cloud service providers. Preview with Google Cloud, Azure in 2026 (begrudgingly). Connect directly VPC to other cloud providers.
Pre-cabled capacity, set up with ease, backed by SLA (Service Level Agreement). Stay tuned, because many other providers are in the works.
This is good for customers like Salesforce for instance, they can extend their functionality into Google Cloud, AWS native private connectivity. Access regulated data without internet exposure, unlock zero copy across multi cloud environments, maintain the same trusted security models customers rely on.
NEW! AWS Interconnect - last mile. Faster set up and onboarding; automated network configurations; Lumen. Connect your remote locations to AWS in just a few clicks, such as a Lumen unit. Many providers in the works.
750+ POPs (Points of Presence) in 100+ cities across 50+ countries, with direct peering to all major ISPs (Internet Service Providers). 1,100+ embedded POPs in ISPs across 200+ cities.
NEW (but announced a few days ago): Amazon CloudFront flat rate pricing plans. One monthly price with no overage charges. Free: for hobbyists, learners, and developers getting started. Pro: launch and grow small websites, blogs, and applications. Business: protect and accelerate business applications. Premium: scale and protect business and mission critical applications. $0, $15, $200, $1,000 a month. Includes CloudFront, CloudWatch, WAF (Web Application Firewall), Lambda, Route 53, S3.
The EC2 instance stack consists of: Server > Application > Instance OS > Nitro Hypervisor (at the very bottom) > Nitro System (the hardware component, dedicated for storage, networking, and security). The Nitro System was introduced in 2017, but AWS started working on it as far back as 2012.
Before Nitro, the hypervisor was consuming too many resources. Additionally, there was performance variation in the EC2 stack due to customer demand fluctuations.
Performance: Better performance across CPU, networking, and storage. AWS has better performance since they've offloaded all functions to dedicated hardware.
Security: Enhanced security that continuously monitors, protects, and verifies the instance hardware and firmware. AWS recommends reading the white paper for detailed security information.
Innovation: Building blocks can always be assembled in many different ways, giving AWS the flexibility to design and rapidly deliver EC2 instances.
Before Nitro: Barely 1 Gbps network bandwidth.
Now: Network-optimized Nitro instances support up to 600 Gbps.
Current: Up to 6,400 Gbps. AWS has been doubling EFA performance approximately every year.
Before Nitro: Barely 16,000 IOPS (Input/Output Operations Per Second).
Now: Up to 720,000 IOPS.
AWS now offers 1,000+ instance types. The most powerful is the Graviton4.
CPU Options: Intel Xeon, AMD EPYC, AWS Graviton, Apple M1/M2.
AWS delivers the right compute for each application and workload across multiple categories:
Capabilities: Fast processors up to 4.5 GHz, high memory footprint up to 32 TB, storage options (HDD and SSD), accelerated computing (GPUs, FPGA, ASIC), networking up to 6,400 Gbps, bare metal support. Many Operating Systems (OSes) supported. 1,000 instance types for virtually every workload and business need.
Nitro Cards: Provide VPC (Virtual Private Cloud) networking, block storage, instance storage, and system controller functionality.
Nitro Hypervisor: Lightweight and secure hypervisor, CPU, memory and device assignment, bare metal-like performance.
Nitro Security Chip: Integrated into motherboard, traps I/O to non-volatile storage, provides hardware root of trust.
Nitro cards provide the VPC implementation. VPC Data Plane Offload: ENI (Elastic Network Interface) attachment, security groups, flow logs, routing, limiters, DHCP (Dynamic Host Configuration Protocol), DNS (Domain Name System).
VPC Encryption: Authentication and transparent end-to-end 256-bit encryption for every packet delivered on VPC.
Extensible host interface with variable number of queues, multi-pathing with ENA Express.
High Performance Computing (HPC) and AI on Amazon EC2 ultra clusters, Remote Direct Memory Access (RDMA), and GPU RDMA. Multipathing and low latency.
Standard drivers broadly available for both Linux distributions, Windows, and other Operating Systems (OSes).
EBS Data Plane: End-to-end encryption, SRD (Scalable Reliable Datagram) for better tail latency. Encryption using DMA (Direct Memory Access) so there's no performance degradation.
Traditional SSD Design: Has NAND flash, Flash Translation Layer (FTL) that maps logical to physical addresses, performs garbage collection, and manages NAND wear leveling.
Local Instance Storage with AWS Nitro SSD:
The kernel is smaller since storage and networking are abstracted away. Easy to replace hypervisor on a weekly basis with no interruption to your instances.
AWS doesn't want the hypervisor to have full access to memory.
Commodity Hypervisor Memory Address Space: Virtual Machine A - CPU context, VM A - memory, VM B - CPU context, VM B - Memory (full access).
Nitro Hypervisor Memory Address Space: Much smaller footprint. Hypervisor maps just enough space to perform functionality, then unmaps it again, so AWS can resume running your application. This provides better security isolation.
Lies on all AWS motherboards. Monitors all transactions and blocks transactions that are not necessary. Only device-related transactions are allowed. The security chip has been extended to provide more functionality over time.
Started with Graviton3: 3 non-coherent processors with decoupled lifecycle. Initially to improve density thanks to power efficiency.
Continues with Graviton4: 2 non-coherent processors or coherent processors.
With Intel: Started with Intel Gen 7, now supports 2 non-coherent or coherent processors.
With AMD: Starting with AMD Gen 8: 2 non-coherent or coherent processors.
AWS has a unique differentiator in terms of the Nitro System. It has allowed AWS to innovate faster on behalf of customers and deliver better performance and security versus other cloud providers. EC2 offers 1,000+ instance types across CPU options with many new features, thanks to Nitro. The Nitro System has allowed customers to take advantage of infrastructure that continually improves.
Contact Information: Email with questions to Filippo Sironi (sironi@amazon.de) or Sudden Sharma (svsharma@amazon.com).
The AI thing is real, but it's not magic. Everyone's talking about billions of agents running around doing stuff, but the reality is simpler: AI is becoming infrastructure. It's like how we used to think about databases or web servers - now it's just part of how things work. The cool part? You don't need to build your own model from scratch. Start with something that exists, tweak it with your data, and you're good. That's what Nova Forge is about - taking what works and making it yours without starting over.
Everything is getting faster and cheaper, but you gotta know what you're doing. Graviton5 chips are powerful - twice the cores, way more cache, 30% faster. But here's the thing: you can't just throw more power at a problem and expect it to solve itself. The Nitro system stuff shows AWS is thinking about the whole stack, not just the CPU. It's like having a race car where the engine, transmission, and suspension all work together instead of fighting each other. The result? Stuff that used to take forever now happens in milliseconds, and it costs less.
Security isn't something you add later - it's built in from the start. The Security Agent and DevOps Agent stuff is interesting because they're actually fixing problems before you even know they exist. It's like having a really good assistant who doesn't just tell you about problems, but goes ahead and fixes them. The VPC encryption controls mean you can enforce security across your whole network with a few clicks instead of configuring a million things. It's still security, but it's not a pain in the ass anymore.
Networking is finally catching up to what we actually need. Cross-region PrivateLink means you can connect stuff across different AWS regions privately, which sounds boring but is actually helpful. The CloudFront flat-rate pricing is smart too - instead of getting surprised by a bill because traffic spiked, you pay one price and that's it. It's like switching from a pay-per-use phone plan to unlimited. You might pay a bit more sometimes, but you never get that "oh shit" moment when the bill arrives.
Storage is becoming more like a database, and databases are becoming more like storage. S3 vectors let you search through billions of items in milliseconds, which is wild. EBS snapshots now copy predictably instead of taking forever and you never know when they'll finish. The S3 tables stuff with automatic tiering and replication means you can set it up once and it just works. It's all about making data easier to work with without having to become a storage expert.
Serverless is getting less serverless, and about damn time. Lambda durable functions mean you can do long-running stuff without worrying about timeouts. Lambda managed instances let you pick your compute but keep the operational simplicity. ECS Express Mode went from 300 parameters to 30. The pattern is clear: give people the abstraction they want, but let them peek under the hood when they need to. It's not about hiding complexity - it's about managing it better.
The real story is about agents doing the boring work. Garman said "billions of agents" and everyone's like "sure, whatever." But think about it: the DevOps Agent that fixes your Lambda errors before you even notice? The Security Agent that scans your code automatically? Kiro that remembers your project history and fixes stuff while you sleep? The goal seems to be letting developers focus on the interesting problems instead of the repetitive ones. The agents handle the routine stuff, you handle the hard stuff.
Cost optimization is about predictability. Database savings plans can save 35%, OK cool. But more importantly, you know what you're paying. The CloudFront flat-rate plans, the capacity reservations, the intelligent tiering - it's all about making costs predictable so you can actually plan instead of just hoping. Nobody wants to be surprised by a bill, especially when it's because something worked too well.
Bottom line: AWS is betting big on AI infrastructure, making everything faster and cheaper, building security in from the start, and letting agents handle the routine work. The tools are there. The infrastructure is there. The agents are there. Now it's about figuring out what to build with them, bra.
Walking through the expo floor is like being at a tech-themed flea market where everyone's trying to scan your badge. You know the drill - they get your email, you get a stress ball with their logo, and somehow that's supposed to be a fair trade. The swag tables are stacked with the usual suspects: frisbees, water bottles, those weird fidget spinner things. Like, yeah, a branded koozie is definitely going to make me want to migrate my entire infrastructure to your platform.
Rackspace being there is hilarious. They're basically the ex who shows up to your new partner's birthday party. "Oh hey, I know you're using AWS now, but we can totally help you manage it! We're cool with it, really. We're not bitter at all that you left us for the cloud. Want a sticker?" The audacity is impressive.
But god, the lightning talks. You're just trying to find the coffee stand or figure out where the bathroom is, and suddenly some dork with a microphone is yelling about Kubernetes while you're just trying to grab a coffee. Like, bro, I'm just trying to get caffeinated here. I don't need a live demo of your container management solution while I'm waiting in line. It's like they're doing guerrilla marketing - ambush you when you're vulnerable and distracted. And the worst part is it works, because you're stuck there listening while you wait for your latte.
The whole thing is a weird mix of desperation and confidence. Everyone's trying to convince you their solution is the one that'll change everything, while simultaneously giving away cheap plastic crap like that's going to seal the deal.
Not a single mention of CloudFormation. Not one. GOOD! That thing is a nightmare. You write a 500-line YAML file to spin up a simple app, and by the time you're done debugging why your stack won't update, you've written more CloudFormation than actual application code. It's like they designed it to be as painful as possible.
People are finally moving away from it. Terraform, Pulumi, CDK - anything that doesn't make you want to throw your laptop out the window. AWS knows this. That's why they're pushing CDK so hard, and why they're building all these managed services that don't require you to write infrastructure code at all. ECS Express Mode went from 300 parameters to 30. Lambda managed instances. EKS automation. The pattern is clear: make it so you don't need CloudFormation in the first place.
The fact that they didn't mention it once tells you everything. They're trying to make it irrelevant. Build services that configure themselves, abstract away the infrastructure, and let people focus on building stuff instead of wrestling with YAML syntax errors at 2 AM.
Also completely absent. No new Aurora features, no performance improvements, no "look how great Aurora is" demos. Nothing. And again, this is probably for the best because Aurora is expensive as hell.
You want a database? Sure, Aurora will work great. It'll also cost you an arm and a leg. The storage is expensive, the I/O is expensive, the backups are expensive, the cross-region replication is expensive. It's like they designed it to extract maximum value from your wallet. And for what? Slightly better performance than RDS? The ability to scale reads? Cool, but at what cost?
Meanwhile, they're pushing S3 tables, Iceberg, all this data lake stuff. They're talking about S3 vectors for billion-scale searches. They're building new database services that don't have Aurora's baggage. The message is clear: if you need a traditional database, use RDS. If you need something more, use the new stuff. But maybe don't use Aurora unless you really, really need it and have money to burn.
The silence on Aurora is telling. They're not going to kill it - too many people are locked in. But they're not going to promote it either. It's the database equivalent of that old feature that still works but nobody talks about because there are better options now.
He makes many, many speaking errors throughout the speech. The keynote is full, and even every single overflow room is full, forcing a lame livestream. Maybe he's nervous?
Holy crap, he wore an actual collared shirt. No more "trying to be cool by underplaying the formality of the event by wearing a t shirt and a suit jacket with $300 sneakers," although he still had to wear jeans.
We made AWS because people wanted to build stuff but like they didn't have the servers and compute, so we wanted to make it easier for them to build.
"I believe that in the future we will have billions of agents"
"WHY NOT!?"
Future of billions of agents, which will mean invent new building blocks for agentic systems.
We are good for AI infrastructure since we do not cut corners. "Turns out there are no shortcuts"
GPUs are key, we collaborate with Nvidia for 15 years, and AWS is most stable at running a GPU cluster, because we split the details, like debugging BIOS to prevent reboots, instead of just accepting them.
Introducing P6e instance, powered by Nvidia bla bla bla. Crowd gives mild applause.
Nvidia runs their large scale gen AI clusters in AWS, and even OpenAI. EC2 ultra servers with hundreds of thousands of chips.
Introducing AWS AI factory = dedicated AI infrastructure for AWS in their own data centers. Like a private AWS region, using power you've already acquired.
"People give us a hard time about product naming in AWS. We named Trainium because it's a chip for AI training, but Trainium2 is the best system in the world for inference."
Introducing Project Rainier = over a bunch of data centers hosting Trainium processors and powering AI's future, multi-campus clusters, multiple buildings acting as one computer, training the next gen of Claude models.
Extremely corny dramatic video with a bunch of AI powered landscapes. Mild, requisite applause from audience.
Trainium3 ultra servers are now generally available, first 3nm AI chip, 5x more tokens per megawatt of power.
Trainium4 is around the corner, which will have 6x the compute, 4x memory bandwidth, 2x memory bandwidth capacity, for largest models.
New! Nova2 = new models with frontier level intelligence. Reasoning models for workloads, three classes: lite, pro, sonic.
Build your own model from scratch? Too expensive. And needs expertise. So instead: start with an open source model and modify it, fine tune, reinforce.
New! Nova Forge. New service that introduces the concept of open training models. Exclusive access to Nova checkpoints and blend in your proprietary data with an Amazon training set, producing a model that deeply understands your information all without forgetting its core info. These are called Novellas. Upload this and then run it in Bedrock.
Case study: Reddit uses Gen AI to generate content. They couldn't get the accuracy they wanted. With Forge though, they made their own model using pre-training. For the 1st time they made a model that met their cost efficiency and accuracy targets, and much easier to deploy. This will completely change what companies do with their AI!
He brings out Digital CEO of Sony. "KODERA-SAN!" The guy talks for an eternity about why AI is sweet, and why Sony uses it so much to bring 'content to their fans across the world.' (Translation; to replace humans so they can cut costs)
Finally Garman comes back out. "Agents are exciting because they can take action and get things done." Deploy agents with AgentCore.
Treat agents like teenagers who need to learn how to adult. Need guardrails.
Announcing: policy in AgentCore, providing real time deterministic controls for how your agents interact with your data.
"TRUST, BUT VERIFY" same for teens and agents
"Crush tech debt with AWS Transform."
A few months ago we reached Kiro, I think this is an IDE meant to compete with Cursor.
1 year of Kiro for free for startups, apply in the next month!
KIRO now has autonomous agents. Have it fix stuff while you sleep. Remembers the history of your project so you don't have to keep re-explaining it.
New! AWS security agent. Build apps that are secure from the very beginning. Does stuff upstream to secure your system more often. Scans your code for vulnerability, integrates into your pull request (seriously, he says git requests on PullHub, instead of GitHub…he's really tripping over his words all day).
New! AWS DevOps Agent. Proactively prevents incidents, improving your reliability. Investigates incidents and identifies operational improvement. Learns from your resources, their relationships, and existing observability solutions, CI pipelines, then correlates the telemetry across all the sources and then understands.
i.e. it found elevated Lambda error rates talking to your database. By the time you got the notice, DevOps agent already went in there and fixed it.
New X8i family of memory instances, powered by custom Intel Xeon.
X8aedz up to 8TB of memory.
C8a instances, based on AMD processors with higher performance.
This goes on for an eternity, bunch of random instance classes….
New in Lambda: Lambda durable functions. They manage state, do long running workloads, automatic recovery.
New in S3: S3 max object size is 50TB!
Also: S3 batch operations are 10x faster.
New! Intelligent tiering for S3 tables. Save up to 80% in costs.
New! Automatic replication for S3 tables across regions.
New! S3 vectors…more cost effective to query.
New! GuardDuty in ECS AND EC2, enabled for all GuardDuty customers at no additional cost.
NEW! Security Hub generally available. Trends dashboard, streamlined pricing model, etc.
New! Unified data store in CloudWatch.
Biggest new! Database savings plans. These can save you up to 35% across all usage for your database services.
Bars: Still 11pm last call, lame as hell when they replace the beer with water on all bars at that time. Like, come on. It's a party. Let people drink. Or at least don't pretend the water is still beer. That's just insulting.
Food: Food was a joke, just the same 5 stands over and over again, with long long lines for just a small plate, and only 2 of them had meat. The vegetarians won this event, LAME! You're waiting 20 minutes in line for a tiny plate of a whatever, and most of the options are plant-based, it's just disappointing. Where's the variety? Where's the actual food? Not one dead cow!
Activities: Activities were generic, they didn't have that cool AI lab thing from last year that would transform your face in real time. Car crushing robot, ice rink, giant slide, yawn snore. It's like they went with the most basic carnival attractions possible. Lazy.
The Music: OK, last year they had Weezer, who is definitely dad rock. This year, they had Beck, who is also dad rock, but AWESOME. I walked in as he was finishing up "Loser" (from mid 90s) and he did some killer songs after that.
The Headliner: The headliner was Kaskade, which I can only describe as dad techno. Same beat over and over again. No edge, no power, no lights making you question reality. Zedd from the last two years has much more raw strength and creativity, and was infinitely more entertaining. zzzzz
Zedd brought the energy. Zedd brought the lights. Zedd made you feel like you were part of something bigger. Kaskade is like they wanted EDM but made it safe. Predictable. Boring. The kind of music that plays in the background at a corporate event where they're trying to seem cool but don't want to offend anyone. Zedd was worth staying until midnight for. Kaskade made me czech my watch and wonder if it was over yet.