Blog

OMG DevOps at Scale

The Keys to DevOps at Scale

"An organization that’s operating at scale can grow to meet greater demand without too much hassle."

When it comes to DevOps, it's important to know where organizations generally fall short, but every organization is different. We have to identify where there's waste and what inefficiencies prevent you from delivering software rapidly, consistently, and securely. In this post, we'll cover some keys to DevOps at scale so that you can make your DevOps initiative work in a big organization.

 

Set the Foundation to Create a Great Culture

People are at the heart of every DevOps initiative. Making sure they're effectively communicating is the first key to scaled DevOps.

In big organizations, people are used to delivering software in certain ways. These people aren't known for changing process and technologies often. That's because as a company grows, coordination and communication get more complicated.

A key thing then is to change how people work together to deliver software. And I'm not just talking about developers and operations. Marketing, product owners, managers, testing, and especially senior management needs to understand what DevOps means and how their work will be impacted. These people must be engaged in the DevOps journey.

You can start by doing workshops to discover waste and inefficiencies. Then you'd define the initial action items for the first sprint of many iterations that you'll need to do to increase efficiency. AWS developed a Cloud Adoption Framework (CAF) to help organizations get on board with the cloud. I happen to find CAF helpful for the things that also have to adopt DevOps.

Your team should make the effort to agree on how to better work together. They don't need to be best friends. But when they know each other and understand the needs of other teams, then they can find a balance that works for everyone.

Laying the foundation for people to collaborate is the hard part. Luckily, we're about to talk about the easiest problems to solve—technology problems.

Decouple Architecture to Deploy Frequently

When an organization is spending too much time fixing and debugging problems and they don't have time to reduce the technical debt, then applying DevOps might add more complexity.

Architecture is the next key to DevOps at scale. That's because big companies usually have a ton of interconnected systems where if you try to change something, you might break something else. Tightly coupled architectures need the coordination of many people to release software.

To support this section, I'll refer to what I initially heard from Jez Humble in a podcast about architecture in continuous delivery (CD). Testing and deployment are the main focuses of CD architecture. More specifically, it's important to ask these two questions:

  • Is it possible to do testing without requiring an integrated environment?
  • Is it possible to do releases independently of other systems or services?

Decoupled architectures give you the independence to test software without needing to install and configure other parts of the system. Instead, testing is done using mock objects because there's already a contract.

Deployments or releases can be done without having to update other applications. You're not breaking any agreed-upon contract, like response formats. It's also going to be possible to release frequently and in small batches—the two essential features of CD or DevOps.

Solidify Engineering Practices for Releases

The next key to DevOps at scale is to solidify your engineering practices.

The first time I heard about how Google runs production systems was when I read the Site Reliability Engineering (SRE) book. I loved how Ben Treynor defined SRE as "what happens when a software engineer is tasked with what used to be called operations."

A software engineer gets bored easily. Well, maybe that's true of everyone in tech. But the field that changes constantly is the one for programmers. DevOps actually started as a concept of agile infrastructure, where (predictably) some of the agile principles were applied to infrastructure.

As I said before, technology problems are the easiest ones to solve. That's because there are so many tools and practices that were born with the purpose of automating work. In the tools department, we have Jenkins, VSTS, Puppet, Chef, Ansible, Salt, and many more. But more importantly, there are practices like infrastructure as code, production-like environments, canary releases, blue/green deployments, and the most controversial one: trunk-based development.

Everything I mentioned there helps you have continuous integration (CI), which in turn allows CD. Deployments can then be a normal day-to-day operation. At that point, there's no difference when releasing a new feature or emergency bug fixes.

All changes should pass through the same process, forcing you to decrease deployment time by changing things in small batches. This will make developers happy. You'll improve the mean time to recover (MTTR), and the failure rate for deployments will be lowered, making operations folks happy. The outcomes will please both customers and business owners.

The folks heading up DevOps initiatives in big organizations need to have proper tools and practices in place.

Developers Should Be on Call

The next key to DevOps at scale is that developers have access to the code and are responsible for its performance at all times. It's their baby, and they need to keep watch over it.

AWS CTO Werner Vogels summed up the idea behind developers being on call when he said, "you build it, you run it." Operations folks usually end up calling developers when there are unknown problems they can't solve. Who better to fix a bug than those that introduced it in the first place?

That worked for AWS and Netflix, where developers can push their changes by themselves. But this would be ludicrous at some organizations, especially the big ones. Some companies are under regulations specifying that only certain people have access to production. In this case, having a developer on call is useless because they can't do anything to fix it.

So sometimes this becomes more of a mindset than anything else.

Developers could have access to a centralized logging tool or other metrics with read-only access. They can have visibility into what's happening. Then developers will tell operations how to fix it. And because they were interrupted in the middle of the night, they'll create something to fix it automatically. Developers will make changes in code to avoid that happening again.

Too many times I've seen and experienced that operation folks get tired of reporting the same issues over and over again without any response from developers. Having developers on call makes them aware of issues.

This key is related to the first one, about creating culture in the organization. When people are brought together, even if it's forced in cases like this, it makes the team more accountable for their actions. It also fosters continuous improvement, which is essential in every DevOps initiative.

Shift Change Management to the Left

DevOps is about shifting things to the left. Change management is one of those things that can be shifted—and doing so is another key to DevOps at scale.

Change management is very common in large organizations. They usually have a change advisory board (CAB) that evaluates and approves each change going to production. Most of the time, big organizations think DevOps is not for them because of change management and compliance. They can't automate that process because they're regulated.

Jez Humble has a really good talk about the topic. In it, he said he's been involved in some projects for the government, applying continuous delivery principles to highly regulated agencies.

The project cloud.gov was born after this, which is a platform that helps government agencies host projects in the cloud, applying continuous delivery principles. Cloud.gov assures
that all regulatory controls are in place, making the case for you to include automation for compliance and change management purposes.

Auditors actually love this because there's always a trail of every change in the application code that goes live. They can easily see for themselves when changes were approved, who approved them, when they were released, and who released them. This is more effective than sharing screenshots.

But it's not just about automation; it's about including those verifications and sign-offs early in the process, when we can pay attention and delay things cheaply. Pair programming or peer-based review is better than having managers reviewing the changes in the CAB meeting. They only evaluate the risk, not what the developers changed. What's better than having a peer changing the code with you?

What Are the Keys? People, Process, and Technology

You don't need containers, orchestrators, or microservices to do DevOps. Even if you adopt those new, hot technologies but don't apply everything we just talked about, you've simply wasted time and made things more complex. You can apply DevOps with the people and technology you already have. Process and culture will definitely need to change, and that's usually the hardest part. The technology part is usually the easiest.

It's also important that you increase the quality and the amount of feedback. Waste and inefficiencies will always be caught when they're monitored constantly. DevOps is about integrating people, process, and technology. And at scale, things might seem more complicated. You need to find your own way, and your journey might not be the same as others'. But DevOps also works for big organizations—I promise.

Author:  Christian Melendez 

Related Reading: Pitfalls with DevOps at scale.