AWS re:Invent for Platform Teams, GKE at 130k Nodes, and Killing Staging

AWS re:Invent for Platform Teams, GKE at 130k Nodes, and Killing Staging

Author: Teller's Tech - DevOps, SRE and Cloud Podcast December 4, 2025 Duration: 22:00

In this episode of Ship It Weekly, Brian looks at re:Invent through a platform/SRE lens and pulls out the updates that actually change how you design and run systems.

We talk about regional NAT Gateways and Route 53 Global Resolver on the networking side, ECS Express Mode and EKS Capabilities as new paved roads for app teams, S3 Vectors GA and 50 TB S3 objects for AI and data lakes, Aurora PostgreSQL dynamic data masking, CodeCommit’s return to full GA, and IAM Policy Autopilot for AI-assisted IAM policies. This was recorded mid–re:Invent, so consider it a “what matters so far” pass, not a full recap.

Outside AWS, we get into Google’s 130,000-node GKE cluster and what actually applies if you’re running normal-sized clusters, plus the “It’s time to kill staging” argument and what responsible testing in production looks like with feature flags, progressive delivery, and solid observability.

In the lightning round, we hit Zachary Loeber’s Terraform MCP server and terraform-ingest (letting AI tools speak your real Terraform modules), Runs-On’s EC2 instance rankings so you stop picking instance types by vibes, and Airbnb’s adaptive traffic management for their key-value store. We close with Nolan Lawson’s “The fate of small open source” and what it means when your platform quietly depends on one-maintainer libraries.

Links from this episode:

AWS highlights:

https://aws.amazon.com/about-aws/whats-new/2025/11/aws-nat-gateway-regional-availability

https://aws.amazon.com/blogs/aws/introducing-amazon-route-53-global-resolver-for-secure-anycast-dns-resolution-preview

https://aws.amazon.com/about-aws/whats-new/2025/11/announcing-amazon-ecs-express-mode

https://aws.amazon.com/about-aws/whats-new/2025/12/amazon-s3-vectors-generally-available/

Other topics:

https://cloud.google.com/blog/products/containers-kubernetes/how-we-built-a-130000-node-gke-cluster

https://thenewstack.io/its-time-to-kill-staging-the-case-for-testing-in-production/

https://blog.zacharyloeber.com/article/terraform-custom-module-mcp-server/

https://go.runs-on.com/instances/ranking

https://medium.com/airbnb-engineering/from-static-rate-limiting-to-adaptive-traffic-management-in-airbnbs-key-value-store-29362764e5c2

https://nolanlawson.com/2025/11/16/the-fate-of-small-open-source/


For anyone building or running modern systems, the sheer volume of news, tools, and incident reports can be overwhelming. Ship It Weekly cuts through that noise. This isn't a surface-level scan of headlines. Host Brian Teller digs into the latest significant outages, major software releases, and insightful post-mortems, focusing squarely on the practical implications for DevOps, SRE, and platform engineering work. Each episode of the podcast breaks down a couple of key stories, providing the crucial context often missing from tech news. You'll hear analysis that translates events into actionable insights, answering the "so what?" for your own infrastructure and processes. The show also includes a quick rundown of tools or updates actually worth your attention, saving you hours of browsing. The tone is direct and informed, favoring depth over breadth. It’s designed for engineers and technical leaders who need a concise, reliable filter for the week's most relevant developments. Listen to this podcast for a focused recap that prioritizes what actually matters, delivered without fluff. You get the news, plus the necessary interpretation to understand how it might affect your systems, your team, and your on-call rotation. It's a weekly briefing that respects your time while aiming to make you more effective.
Author: Language: English Episodes: 37

Ship It Weekly - DevOps, SRE, Platform and Cloud Engineering News
Podcast Episodes
Fail Small, IaC Control Planes, and Automated RCA [not-audio_url] [/not-audio_url]

Duration: 17:45
This week on Ship It Weekly, Brian kicks off the new year with one theme: automation is getting faster, and that makes blast radius and oversight matter more than ever.We start with Cloudflare’s “fail small” mindset. The…