Every few weeks, the Reddit DevOps and SRE subs throw up the same questions:
How do I get into DevOps?
What tools should I learn to get into Ops/SRE?
Which certifications do I need to get into DevOps?
The advice in the comments ranges from following roadmap.sh, to doing and showcasing personal projects, to learning a plethora of tools. There is a strong focus on learning specific tools.
This advice is not useless but it misses the point of Ops/SRE.
If you ask “How do I become a Java developer?” and expect answers in the same vein, you would get:
Master IntelliJ IDEA or Eclipse, memorize the keymaps, learn how to use the debugger.
Learn JUnit and Spring Boot.
Learn Gradle or Maven.
Write a library management app in Java.
This would create toolkit fluency, but does it make them a Java developer? This “roadmap” is missing programming skills and working with and learning from other folks.
Similar rules apply in the Ops world. Learn all the tools you want, but nothing will prepare you for your first outage, on-call duties, or the growing and seemingly unmanageable complexity of an expanding infrastructure. I’ve learned it the hard way. Outages don’t care about your certifications. Tools just equip you for battle. Acquiring them is the first step. After that, it’s experience - learning on the job.
Ops Is Not About Learning Tools Alone
This focus on tools is prevalent in Ops because of historical reasons. The original definition of DevOps was a philosophy and a way of working. DevOps started with a manifesto - like Agile - with good intentions. It increased focus on automation tools and processes that facilitated collaboration between traditionally siloed teams. A lot of great tools have come out of the DevOps movement. At some point, the term “DevOps” got hijacked the way Agile was - and now we have “DevOps engineers”, “DevOps tools”, and even “VP of DevOps”. This created a misconception that DevOps is only about automation, and by implication, that learning automation tools or getting a certificate will make you an expert in DevOps.
Learning tools is a key part, but I want to emphasize that fundamentals are more important.
My advice to folks who want to start with getting into Ops is to:
1. Start with the foundations.
2. Learn by doing.
I wrote about what makes a good ops engineer in a previous post where I covered the non-technical attributes. Learning how to communicate and work well with other people is the top driver for a successful career.
Technical Foundations
If you want to create a roadmap for yourself, I suggest these topics as something to have a strong grounding in:
Operating Systems
Networking
Storage, I/O, and Virtualization basics
Scripting/programming
Security and Cryptography basics
When debugging a hairy problem, or architecting a complex cloud installation, I go back to the building blocks. Sometimes we get bogged down by layers of abstraction and lose sight of the fact that almost everything has the same building blocks underneath.
Take Kubernetes as an example. It is a sophisticated virtualization software built for modern-day web-scale use cases, but the components it’s built from - cgroups, schedulers, resource managers - are already known to us from existing technologies.
Theory or Practice?
Experts push theory over tools - e.g. "Learn cloud concepts without getting into the specifics of AWS". In practice it's hard to learn new things without doing, and learning to use existing tools is the easiest way of "doing".
So is this a chicken and egg problem? Not if you are deliberate about choosing the tools, and the parts of the tools that you wish to learn.
Every tool has a learning curve. If you read the curl man page, the hundreds of options will easily overwhelm you. You won’t use all the options most of the time - only a few. Finding a specific task and focusing on the end goal is a good way to learn the common options.
Fetch a webpage using curl
curl https://api.example.com
Print the request/response headers
curl -v https://api.example.com
Change the request method to POST
curl -X POST -v https://api.example.com
Set the Auth header
curl -X POST -H “Authorization: Bearer xxxxx” https://api.example.com
Upload Post data from a file with the correct Content-Type
curl -X POST -H “Content-Type: application/json” -H “Authorization: Bearer xxxxx” -d “@/home/talonx/apitest.json” https://api.example.com
Don’t get overwhelmed with the landscape of available tools - like the CNCF landscape. Don’t get lost in all the features of one tool either.
Which Tools To Start With?
At the risk of missing out something or offending proponents of X or Y software, I would say these are good starting points:
Configuration Management - Ansible. I prefer this over Puppet, Chef etc as it’s the most straightforward to use.
Continuous Integration and Deployment - If you’re using a managed Git like GitHub, go for the associated tool like GitHub actions which will take care of a lot of the boilerplate.
Infrastructure as Code - OpenTofu/Terraform.
Observability
Prometheus/Alertmanager/Grafana stack for metrics, monitoring and alerting.
Elasticsearch/Logstash/Kibana for log management but this stack can be hard to manage.
Linux/Unix - Know your way around the system - commands for navigation, networking, filesystem, process management, and text processing on the command line. Sorry, Windows/MacOS.
Containers - There’s no escape from containers. Start with Docker because it’s easy, but don’t lose sight of the fact that there are alternatives.
A terminal-based text editor - know the shortcuts.
Security - ssh - starting with private key generation to setting up your config file..
Web Proxy/Load balancing - nginx remains my favorite here. It’s easy to install and configure, and has a bunch of extensions.
Assess your existing knowledge first. Somebody from a dev background will already have a strong base in programming and design. Somebody from a sysadmin background will be an expert in OS, storage, networking, scripting, and monitoring. This is not a prescriptive or complete roadmap. Any roadmap is going to be subjective, just like this one is.
If you are just starting out, I would suggest trying to get into a role in an early-stage startup. The learning curve will be steep and brutal and you will get to do things you would not in bigger companies. The best combination is where you join a startup team that has a veteran Ops engineer.
Some of us laugh inwardly when we have to build the same systems at different companies - but each time it’s an opportunity to break things in a new way, improve on past implementations, and learn something new in the process.
In Conclusion
A career in Ops can be immensely satisfying or immensely stressful, or both. It depends on your expectations. The joy of building and maintaining systems, learning from failure, and having an overarching view of how things work, appeal to most of us who have worked in Ops roles.
Tell me in the comments what you would want me to write about in future posts about becoming an Ops engineer.
Image Credits: choose your stories on Unsplash
These 3 posts got the most traffic in the last 3 months