Projects and Releases
Google Kubernetes Engine (GKE) supports 1.29
GKE is the first managed Kubernetes service to support Kubernetes 1.29.
Prodzilla - Synthetic Monitoring in Rust
A new monitoring tool, written in Rust.
Loco.rs
Loco is a Rust framework for web/API development aimed at single-person or small teams. The docs say it’s “strongly inspired by Rails.”
What does this have to do with Ops? Rust is increasingly being used in ops and infra related software for its benefits - and this framework seems like a good one to know if you are building internal tools which need the entire stack.
Daggerverse
The Dagger project “enables engineers to write build code (and necessary glue) in a popular language like TypeScript, Go, or Python and execute this on various CD platforms such as Jenkins, GitHub Actions, etc.” Daggerverse is a repository of Dagger modules built by the community.
OpenTofu is going GA
OpenTofu is the community-driven open source fork of Terraform.
Tools
hjacobs/kube-downscaler
Scale down your Kubernetes deployments after work hours. Similar in intent to https://github.com/rekuberate-io/sleepcycles . Not new, but interesting enough to be included.
KubeStellar
Stage your Kubernetes resources without creating them, and then apply them on any target cluster on demand.
fck-nat
Run your own NAT gateway on AWS - and save money. Before using this, note the bandwidth limit caveat, and the fact that you should be comfortable running something like this on your own instead of using your cloud provider’s packaged solution.
Engineering Stories
An overview of Cloudflare's logging pipeline
An interesting account of Cloudflare’s reliable, high volume logging pipeline.
Slack's Migration to a Cellular Architecture
From the article - “We have migrated the most critical user-facing services at Slack from a monolithic to a cell-based architecture over the last 1.5 years.” For comparison, see Roblox’s account from last year.
Slashing Data Transfer Costs in AWS by 99%
An interesting story of slashing AWS data transfer costs by using S3’s region granularity. I loved the simplicity of the solution.
Reducing our AWS bill by $100,000
Another well-written AWS cost-optimization story. Logs, Internet data transfer, architectural changes, and a few others.
SourceHut network outage post-mortem
A no-nonsense post-mortem account following a DDoS attack. Hats off to the SourceHut team for pulling through this grueling experience.
Reports
Observability trends and predictions for 2024 from Grafana employees
The interesting ones here are FinOps (stay tuned for my upcoming article), profiling in OpenTelemetry, CI/CD observability, and observability (and overall) cost optimization.
The SRE Report 2024 | Catchpoint
Some points from this report:
AI’s primary role will be to make work easier, not replace human roles, in the next 2 years. I agree with this on principle in the context of ops - given the current maturity levels - although I don’t think anybody can predict the specifics.
66% of (surveyed) organization use multiple monitoring tools.
Individual SRE contributors spent 50% (median value) in engineering work when not on call.
And my favorite - 64% of organizations think they should monitor endpoints outside their control (e.g. CDNs, SaaS services used by infra, dev, or business teams, email providers). This is a problem I have been trying to tackle in a uniform way with mixed results.
Tutorials
Creating an EKS Cluster Using CDKTF
A quick introduction to creating an EKS cluster using the CDK for Terraform. The CDK lets you write code in Typescript, Java, Python, Go, and C# to achieve the same things you would by writing Terraform’s own HCL.
Thank you for reading. If there is something you would like to see more of, or less of, let me know in the comments or on Twitter.
If you have not subscribed to this newsletter yet, you can do so by clicking on the button below.
Photo by Olena Bohovyk on Unsplash