Going off the brilliant Angie Jones‘ tweet, I am going to try and spit out more blogs even if they are short so that they can start the interesting conversations.
blog posts are just publishing your notes on things you learn—
Angie Jones (@techgirl1908) January
I am grateful to be a part of some fun slack groups and in one a question was raised:
What is the difference between site reliability engineering and platform engineering? I think I’ve heard @abby.bangser say she’s a platform engineer. But it seems like that work is also about site reliability. I can’t articulate what it is I want to learn about because I’m not even sure of the right terms! “DevOps-y stuff” or “CD-stuff” doesn’t sound very professional.
From here a few people jumped in with their ideas but the overwhelming response seemed to be “It all sounds super related” and I have to agree. It is fantastic that the worlds are melding so much more these days but as someone who felt very on the outside of it all just 15 months ago I have a huge amount of empathy for people who want to join the fray and just aren’t sure where to start or what terms mean what.
For some context, by the time I responded it was more of a comparison between SRE and Platform Engineering. In addition, this group is heavily populated by testers or former testers so I chose to describe my understanding using my testing industry definitions. It seemed to help, but I would be very curious where this muddies the waters and what I can to do better welcome testing experienced people into the world of operations.
So here is the exact quote of what I wrote…
IMO SRE is like the quality analyst but for systems. They are the ones who help identify/quantify, test for, and track quality metrics of the system (hence SLAs being big in that role). But just as a great QA can also be a badass bug hunter because they know the wholistic system, an SRE can be a badass triager for the same reasons. Hence often being thought of during incidents.
Versus a platform engineer. IMO these are more like the automation engineers of the testing space. They understand the values the same as QA, but instead of focusing on how to socialise and define what the concepts look like at the org level, they are the “roll up your sleeves and do it” group. Helping run the tools that make your software teams rock and roll. They will probably run a CI server, a git server, maybe a code eval tool like sonarqube, testing tools like pact etc.
I’m totally open to evolving that. That’s just what I understand right now. I am targeting being an SRE, but by joining a platform team I feel I am gaining some understanding of the underlying system and tools so that I can be more effective in that role (just as I believe people who have some software delivery experience via personal or close collaboration are most effective when testing software).
And given the first person asking the question is in very much the same position I was in where they REALLY want to get involved in exploratory testing in production via observability tooling, I added one more thing…
Sorry actually last thing. I also wanted to learn how to use o11y tools but kept getting stopped by not having them/access. Hence going to platform to get rid of any blockers. BUT given the right job/org etc I wouldn’t have felt a need. And right now I’m actually kinda itching to get to use them for real on software as being on platform limits some usecases even if I’m able to make them available to the software teams.
So this evolved even more and did touch on how reliability engineering is SO MUCH more than just software (please see Nora Jones, John Allspaw and others if you need to learn more here!), but I want to understanding…is using these analogis helpful? Harmful? Confusing? A good entry point?