DevOps Myths & Misconceptions

Common DevOps Myths and Misconceptions

“Wait, what actually is DevOps?”.

If only I had a dime for every time someone asked me this. For many, the term DevOps comes loaded with misconceptions and myths. Today, we’re going to look at some of the common myths that surround the term so that you have a better understanding of what it is. Armed with this knowledge, you’ll understand why you need it and be able to explain it clearly. And you’ll be equipped to share its ideas with colleagues or your boss.
DevOps Myths & Misconceptions

So, What Is DevOps?

Before we go through the myths of DevOps, we’ll need to define what DevOps actually is. Put simply, DevOps is the commitment to aligning both development and operations toward a common set of goals. Usually, for a DevOps organization, that goal is to have early and continuous software delivery.

The Three Ways of DevOps

DevOps is not a role. And DevOps is not a team. But why?

We’ll get to that in just a moment. But before we explain the myths, let’s build on our definition of DevOps by looking at “the three ways” of DevOps: flow, feedback, and continual learning.

  1. Flow—This is how long it takes (and how difficult it is) for you to get your work from code commit to deployed. Flow is your metaphorical factory assembly line for your code. And achieving flow usually means investment in automation and tooling. This often looks like lots of fast-running unit tests, a smattering of integration tests, and then finally some (but only a few!) journey tests. This test setup is what is known as the testing pyramid. Additionally, flow is usually facilitated by what’s known as a pipeline.
  2. Feedback—Good flow requires good feedback. To move things through our pipeline quickly, we need to know as early as possible if the work we’re doing will cause an issue. Maybe our code introduces a bug in a different part of the codebase. Or maybe the code causes a serious performance degradation. These things happen. But if they’re going to happen, we want to know about them as early as possible. Feedback is where concepts like “shift left” come from. “Shift left” is the idea that we want to move our testing to as early in the process as possible.
  3. Continual Learning—DevOps isn’t a destination. DevOps is the constant refinement of the process toward the early delivery of software. As we add more team members, productivity should go up, not down. Continual learning comes by having good production analytics in place. In practice, this could look like conducting post-mortems following an outage. Or it could look like performing process retrospectives at periodic intervals.

The three ways are abstract, that I’ll concede. But it’s the process of converting these abstract ideas into concepts and tools that have created confusion en mass throughout the industry.

So, without further ado, let’s do some myth busting!

Myth 1: DevOps Is a Role

As we covered in the introduction, DevOps is the commitment to collaboration across our development and operations. Based on this definition, it’s fundamentally impossible for DevOps to be a role. We can champion DevOps and we can even teach DevOps practices, but we can’t be DevOps.

Simply hiring people into a position called “DevOps” doesn’t strictly ensure we practice DevOps. Given the wrong organizational constraints, setup, and working practice, your newly hired “DevOps” person will quickly start to look like a traditional operations team member that has conflicting goals with development. A wolf in sheep’s clothing! DevOps is something you do, not something you are.

DevOps is not a role.

Myth 2: DevOps Is Tooling

For me, this is easily the most frustrating myth.

If you’ve ever opened up the AWS console, you know what it feels like to be overwhelmed by tooling. I’ve worked on cloud software for years, and I still find myself thinking, “Why are there 400 AWS services? What do all of these mean?” If tooling is often abhorrent for me, it’s definitely hard for non-technical people.

Why do I find this myth so frustrating? Well, not only is describing DevOps through tooling incorrect; it’s also the fastest way to put a non-technical stakeholder to sleep. And if we care at all about implementing DevOps ideas into our work, we desperately need to be able to communicate with these non-technical people on their terms and in their language. Defining DevOps by cryptic-sounding tooling creates barriers for our communication.

Tools are what we use to implement DevOps. We have infrastructure-as-code tools that help us spin up new virtual machines in the cloud, and we have testing tools to check the speed of our apps. The list goes on. Ever heard the phrase “all the gear and no idea”? Defining DevOps by tooling is to do precisely this. Owning lots of hammers doesn’t make you a DIY expert—fixing lots of things makes you a DIY expert! DevOps companies use tooling, but…

DevOps is not tooling.

Myth 3: DevOps Doesn’t Work in Regulated Industries

DevOps comes with a lot of scary, often implausible sounding practices. When I tell people that I much prefer trunk-based development to branch models, they usually recoil in disgust. “You do what?” they exclaim, acting as if I just popped them square in the jaw. “Everyone pushes changes to master every day? Are you crazy?” they say.

No, I’m definitely not. The proof is in the pudding. When you have a solid testing and deployment pipeline that catches defects well, having every developer commit to the same branch every single day makes a lot of sense. Don’t believe me? Google does it with thousands of engineers.

Many believe that these more radical approaches don’t work in a regulated environment or in scaled environments, like finance. But the evidence is abundantly clear. Applications that are built with agility in mind (meaning it’s easy and fast to make changes) are less risky than their infrequently delivered counterparts.

Yes, it might feel safer to have security checkpoints and to have someone rifle through 100,000 lines of code written over six months. But security checkpoints are little more than theater. They make us feel safe without really making things that much safer. What does reduce security risk is automating your testing process, making small changes, putting them in production frequently, and applying liberal monitoring and observability.

DevOps works in every environment.

Myth 4: DevOps Replaces Ops

Implementing DevOps doesn’t mean you need to go fire your system admins and operations staff. In fact, on the contrary, you need their knowledge. Knowing absolutely everything about development and operations is almost impossible. So, you’ll need people who have different specialties and interests.

Rather than fire our operations teams, we need to make sure their goals are aligned with the development teams’ goals. Everyone simultaneously should be driving toward faster delivery of high-quality software. A good waiter has tasted the food on the menu, but all waiters don’t need to be chefs.

DevOps doesn’t mean removing Ops.

Wrapping Things Up

So, there you have it. The top four myths about DevOps—busted. Hopefully, this clears things up a little and you now know what DevOps is and isn’t. It’s principally a set of beliefs and practices first, with tooling, roles, and teams being secondary.

Every company can and should incorporate ideas of DevOps into their business. It will lead to happier engineers and happier customers.

This post was written by Lou Bichard. Lou is a JavaScript full stack engineer with a passion for culture, approach, and delivery. He believes the best products emerge from high performing teams and practices. Lou is a fan and advocate of old-school lean and systems thinking, XP, continuous delivery, and DevOps.

 

The Cat and the Map

Why Map Your IT Environments?

“Would you tell me, please, which way I ought to go from here?”“That depends a good deal on where you want to get to,” said the Cat.“I don’t much care where” said Alice.“Then it doesn’t matter which way you go,” said the Cat.“so long as I get somewhere,” Alice added as an explanation.“Oh, you’re sure to do that,” said the Cat, “if you only walk long enough.”  — Lewis Carroll, Alice’s Adventures in Wonderland
Why Map Your IT and Test Environments?

Preamble

Running a high-functioning IT team or tech company requires you to be clear in your mind where you want to take your team. If you’re not clear about that, then just like Alice in the quote above, it doesn’t matter which way you go—or, in the context of the increasingly complex tech ecosystem, it doesn’t matter which methodology or tools you adopt. Then you end up implementing this technology or that methodology halfheartedly, which leads to you switching to new technology and methodology, and the cycle repeats. This leads to a form of techno-methodology whiplash for your team. Is that what you want for your team? I hope not.

Know Your Destination, Know Your Landscape

What the Cheshire Cat didn’t point out is that for most of us dealing with complex situations, knowing the destination isn’t enough. We need to know the landscape to plot our way to success. In this article, I will cover the top four reasons why you need to properly map your IT and test environment to bring your team to perform at a high-functioning level.

One View to See It All

When you map your IT and test environment, you essentially establish the landscape of the situation. A good map lets you bring together various priorities and interests of your team and organization in a single view. The benefits of doing so can’t be underestimated. Miller’s law states that the average human mind can hold only about seven things at any one time. Without a map to oversee the entire landscape, how could you possibly navigate your team around risks of deployment, development, and the day-to-day running of the IT and test environments?

In addition, you can build a map that contains multiple levels. Imagine that at the organization overview you map out the various key structures, such as business, ops, IT environment, and test environment. Then you can drill in further by adding in the substructures, such as system instances, applications, data, and infrastructure. All these structures and substructures will interact among themselves, which is why you need to add in the relationships among these structures, the projects, and the teams in your organization.

Now imagine you have this map right now. Wouldn’t that make it a lot easier to think about your decisions and weigh your options? You can almost literally trace how a possible solution would impact which system and which team—so before you even encounter objections, you can anticipate them. That’s the power of a single view of your landscape captured in a map.

Spotting Existing Gaps and New Opportunities

When you have a map, the map almost immediately shows you some low-hanging fruits to pick. Existing gaps and opportunities to improve your existing operations show themselves easily. These low-hanging fruits can give you some quick wins for you and your organization.

Some typical quick wins would be:

  1. Identify waste and save costs. For example, you may identify system instances being maintained but not used.
  2. Identify underutilized resources and consolidate them. This happens quite frequently as well. For example, you have a bunch of system instances that constantly have low utilization. You can decide to consolidate them to bring about a better return on your expenditure on these resources.
  3. Identify undersized systems or applications and reallocate buffer resources. Once you reduce waste and free up resources targeting underutilized resources, you can deploy some of these freed-up resources at the undersized systems. Typically, people would complain that these undersized systems were constantly stretched and not enough resources could be spared due to budget. In other words, you can help reallocate your resources better simply by having this map.
  4. Identify the high-growth areas and enable them to grow faster. With a map, you can view how certain systems or applications are growing quickly because they are driven by fast-growing demand. When you can link these high-growth areas with how they help with organization, you will be able to convince management how adding more budget makes business sense. Or you can redeploy resources from other structures facing slowing growth. In either case, a map bolsters the strength of your decision.

Streamline and Simplify Processes

Everyone has a story about dealing with silly, ridiculous bureaucratic processes. However, as a civilization, progress means more processes are needed for things to run smoothly. Running your IT and test environment successfully means having good processes to ensure things run smoothly. Think Value Stream Mapping.The key is to know when these processes become less effective or even outright unnecessary. Then you need to retire or remodel your processes. The key, therefore, is to discover these increasingly ineffective processes and nip them in the bud.

So, study the stats from your troubleshooting and logs and add those to your map. Talk to your various teams from business and customer support. Add anecdotes in as well. In a single view, you would be able to allow both data and personal stories to drive your decision on how to simplify running your IT and test environments. Streamlining and pruning away processes that used to be (but are no longer) necessary would release more resources back to your budget. This kick-starts a virtuous cycle as freed-up resources can then be redeployed for growing opportunities.

Better Impact Analysis and Scenario Planning

Once you take advantage of the single view to quickly exploit new opportunities, uncover waste, increase better utilization of resources via reallocation, and streamline processes, you have established credibility about mapping. Imagine earning all that success without even using the methodological or technological fad of the day.

Now it’s time for the exciting stuff—planning the future. Once again, the mapping will help greatly. You can plan several scenarios and strategies in a playbook and then check them against the map. The check would involve some kind of impact analysis. The scenario planning exercise is widely used by some of the top-performing organizations in the world. Having a map of your IT and test environment improves the effectiveness and efficiency of the exercise. No more guessing about potential impact of brainstormed strategies for future scenarios; you can immediately check and verify obvious drawbacks and benefits. Scenario planning is better because impact analysis becomes better with a map of your environments.

Conclusion

In Enterprise IT intelligence, “environment mapping” represents a highly beneficial and foundational exercise all IT teams and tech companies should perform at least once every quarter or so. It provides high visibility to the many interrelated structures and their relations in your organization. It is not easy to discern these structures and their relations without the map. The increase in visibility delivers great benefits. Agility, smooth delivery, greater collaboration, and good operational and business decision-making all flow from the greater visibility of the landscape surrounding your team and organization. Buy-in becomes simpler when everybody can be on the same page—and when everybody is looking at the same map as well.

The importance of mapping your environments is key to your organization’s success. Bear in mind that maps are imperfect, but they are still very useful. Mapping helps you and your team become better at your jobs simply because you did the exercise of mapping. The exercise surfaces the differences in the thinking between the members in your team. Therefore, don’t wait until you come up with the perfect map. Your team automatically becomes better with more practice mapping. Your team and your organization will thank you for that when they start to see the uptick in results.

Author: TJ Simmons

This post was written by TJ Simmons. Kim Sia writes under the nom de plume T.J. Simmons. He started his own developer firm five years ago, building solutions for professionals in telecoms and the finance industry who were overwhelmed by too many Excel spreadsheets. He’s now proficient with the automation of document generation and data extraction from varied sources.

Just Enough ITSM

Just Enough ITSM (or ITSM for Non-Production)

Preamble

We've all experienced the frustration that comes from too much or too little service management in your test environment. Lately, the DevOps engineer in me has been thinking about how we end up in one of those states. How can we get just enough service management in non-production environments?

Production environments require more service than non-prod environments. But we shouldn't throw the baby out with the bathwater when it comes to service management in non-prod. I'm a software developer who practices DevOps, so I do a lot of work involving operations, deployment, and automation. I interface with many groups to achieve a good workflow within the organization.

Operations and development often have contradictory goals. Fortunately, we can all find common ground by working together. Understanding each other's needs and goals through communication is the key to success!

 

But before we get into that, let's explore the world of IT service management (ITSM) for a bit. In this post, I'll discuss different levels of service management in non-prod environments and borrow some fundamental DevOps principles that can help you get the right amount of ITSM. Let's start with an overview of non-production environments.

What Are Non-Production Environments?

We use non-production environments for development, testing, and demonstrations. It's best to keep them as independent as possible to avoid any crosstalk. We wouldn't want issues in one environment to affect any of the others.

These environments' users are often internal—for the most part, we're talking about developers, testers, and stakeholders. It's safe to assume that anyone in the company is a potential user. It's also safe to assume that anyone providing a service to the company might have access to non-production environments. But there could also be external users accessing these environments, perhaps for testing purposes.

Unless you have the environment in question tightly controlled, you may not know who those users are. That's a big problem. It's important to understand who's using which environments in case someone inadvertently has access to unauthorized information. Or maybe you just need to know who needs to stay informed about changes or outages in a specific environment.

That's where service management comes in. The next section explains how bad things can be when there is no service management in non-production. This exercise should be fun...or it might make you queasy. Better have a seat and buckle up just in case!

When You Have Zero Service Management in Non-Prod

Let's call this the state of anarchy. Here's what it looks like:

  • Servers are running haywire and no one knows it.
  • Patches are missing.
  • Security holes abound!
  • The network is barely serviceable.

Can anyone even use this environment? How did it get like this, anyway? I have a couple of theories...

  1. Evolutionary Chaos: This model was chaos from the start. Someone set up an environment for testing an app a long time ago. It did its job and was later repurposed. Then, it got repurposed again. And again. Eventually, it started to grow hair. Then an arm sprouted out of its back. Then it grew an extra leg. Suddenly, it began to "self-organize." Now it seems to have a mind of its own. It grew out of chaos!
  2. Entropic Chaos: Entropy is always at play. It takes work to keep it from causing decay. In this theory, things were great in the past. But over time, service management became less and less of a priority for this environment. Entropy won the day, and the situation degraded into chaos.

However the environment got into its current chaotic state, the outcomes are the same. Issues are resolved slowly (if at all). Time is wasted digging up information or piecing it together. Data becomes lost, corrupted, and insecure. Owning chaos is a burden and a huge risk in many respects. We don't want to end up here!

If you've made it this far and still have your lunch in tow, you're past the worst of it. You can uncover your eyes, but be wary! Next, we're going to look at a wholly buckled down environment and how it can go wrong in other ways.

When You Have Too Much ITSM in Non-Prod

It's better to have too much service management than not enough. But it's still not ideal. For one thing, it's wasteful. For another, it causes morale to suffer. Granted, it's reasonable to default to production-level service management at first. But staying on default is a symptom of a big problem—communications breakdown. And the root cause of having too much ITSM is due in part to human nature and in part to organizational legacy.

Here are my two theories on how organizations end up here:

  1. Single-Moded Process: Service delivery, operations, and all other departments focused on service management are hell-bent on making sure the customer is absolutely satisfied with their service. Going the extra mile to make sure the customer is happy is a good thing! Operations folks are trained on production-level service management, so their priority is to keep the trains running. With this in mind, operations management systems are set up for production environments. It's easiest to use that same default everywhere. For better or worse, every environment is treated like a production environment!
  2. Fractured Organization: Organizations are sub-divided into functional groups. When these groups aren't aligned to a shared purpose, they'll align to their own purposes. They even end up competing with each other. They'll center up on their own aims, tossing aside the needs of others.

How You Know When There's a Problem

The fractured organization theory may explain what happened to a friend of mine recently. Let's call him Fabian.

Fabian was the on-call engineer this past June. The overnight support team woke him up several nights in a row for irrelevant issues in the development environment. He brought this up to operations, who were responsible for managing the alert system. Unfortunately, the ops engineer was not sympathetic to his concerns in the slightest. Instead, the ops guy put it upon Fabian to tell him what the alert system should do. That's understandable, but Fabian had no information to that aim. The ops guy wouldn't share anything with Fabian or collaborate with him on putting a plan together.

This story illustrates a misalignment between operations and development. Problems like this crop up all over the place. Usually, we can remedy or even avoid these situations by taking just a bit more time to understand the other side.

The four theories I've presented tell us about extremes. And yes, these extremes push the boundaries and aren't likely to occur. Still, an organization sitting somewhere in the middle may not have the right service management in non-production. As we've seen with Fabian's story, this is often an issue of misaligned goals.

So how do we get to just enough service management? Maybe the answers lay in what's working so well for DevOps! Let's see how.

Just Enough Service Management

IT teams have members with specialties suited to their functional area. Operations folks keep the wheels turning. QA makes sure the applications behave as promised. There are several other specialties—networking, security, and development are just a few examples. Ideally, all of these teams interact and work together toward a well-functioning IT department. But it doesn't just happen. It takes some key ingredients.

Leadership

Working together effectively takes good leadership. Leadership happens at all levels in an organization. Remember, a leader is a person, not a role.

Shared Vision

It's also critical to have a shared vision and shared goals. Creating a shared vision is part of being a leader. Here are a few points to remember about vision:

  • A shared vision creates alignment.
  • The vision should be exciting to everyone.
  • You have to do some selling to get everyone aligned with the vision.

Your vision for the test environment could be something like: "Our test environment will be a well-oiled machine." Use metaphors like "Smooth Operators" or "Pit Crew" to convey the right modes of thinking.

Open Communications

Keep communications open and honest. Open, honest communications can be one of the most significant challenges you'll face in implementing the right amount of service management. Many of us have a hard time being honest for fear of looking weak in the eyes of others. That fear is difficult to overcome, especially in an environment where we don't feel safe and secure. Managers have the vital task of creating an environment where employees feel safe and able to communicate openly. Trust is essential to success.

One Last Look

Getting the wrong amount of service management in any environment is a problem. Too little opens up all kinds of risks. Too much ITSM results in wasted time and resources. In this post, I presented four theories for how an organization might end up with the wrong amount of service management in non-prod and discussed what changes you can make to correct that.

ITSM doesn't happen in a bubble. It takes alignment between many stakeholders. There are three main things we can do to get alignment: wear your leader hat, share the vision, and converse honestly. You can accomplish any goal when you're set up to win—even with something as challenging as achieving just enough service management.

 

Author: Phil Vuollet

This post was written by Phil. Phil Vuollet uses software to automate process to improve efficiency and repeatability.

The EMMi

The 8 Dimensions of the EMMI (Environment Management Maturity Index)

If your interested in IT & Test Environments Management then you have probably heard of the  Environment Management Maturity Index (EMMI), the de-facto standard for measuring ones  Test Environment Management capability. 

If not then let me summarize: the EMMi is a maturity index that provides you with a standard frame of reference to help you assess your strengths, weaknesses, opportunities, and threats. 

A powerful tool for assessing your environment and operational capability across your enterprise and help you quickly opportunities to improve.

As shown in the diagram, the EMMI does this by scoring you on eight key performance areas (KPAs). Today, I've decided to dive deeper into each of those key performance areas so that you can make a well-informed assessment.

The EMMi

KPA 1: Environment Knowledge Management

First up is environment knowledge management. This refers to your ability to understand how your projects move through all your environments, including development, testing, staging, demoing, and production.

However, this is about more than just one software team. This is about understanding how your systems are connected in each environment across multiple software systems and business units. You will likely need a few models of both low-level relationships and higher-level connections of your systems to gain a strong understanding.

When you know how your software systems are connected as they move through environments, you can avoid many problems. You reduce the risk of disruption when a team needs to release to a new environment. For example, if your billing system is dependent upon your product catalog and the product team releases a new version to QA, you may suddenly see network timeouts when you call the service. That timeout is probably due to a performance bug. If you understand how these systems are connected in QA and if you know the process well, you'll avoid hours or days of triaging, trying to figure out why your tests are intermittently failing.

KPA 2: Environment Demand Awareness

Next up we have environment demand awareness. This is not about how much load is on your environments. It's about why you have those environments. Ideally, you should know who's using them and why. Some environments may have obvious uses, like development. However, other uses may be surprising.

Take QA for example. I was once on an engagement where we developers thought it was our job to test out new features before we released to production.  So we kept changing the setup to suit our needs. Eventually, a flock of business analysts came our way, yelling and waving their arms for us to stop. It turned out that many of our customers used QA to test out significant pieces of data before they staged into production, and we were deleting their hard work. Knowing who's using your environments and why will prevent these kinds of things from happening.

When you know who's placing demands on your environments, you can also plan better. You may know of a new group of users coming in the pipeline. Or perhaps your environment is taking a hit from many users at once. If you realize you have two different sets of users in that environment, you can split that environment. You can even tailor each environment depending on those users' needs.

KPA 3: Environment Planning & Coordination

Once you know who's connected to your environments as well as who's making demands of them, you can plan for their needs and yours. It's key to be able to consistently plan and roll out environmental changes to meet upcoming milestones across your enterprise.

Imagine if one of the product team members decided to load test their catalog system and generated five million fake products in their QA environment. This ripples forth to your QA, and none of the purchasing testers can actually do any work. This in turns clogs up their deployments and delays your ability to launch. We can avoid these types of problems with good planning and coordination.

It's also important that your planning and coordination is consistent across teams. When you have a consistent process, all the teams will know when to share knowledge and when to synchronize efforts.

KPA 4: Environment IT Service Management

It's not enough to deliver and manage your environments. Since you have users who demand these environments, we need to put on our customer service hat and support their ongoing use. We should diligently manage incidents, changes, and releases to ensure our users are getting what they need. If we neglect the ongoing support and operations of our users in these environments, the piling amount of incidents and user demand may threaten to overwhelm us.

When we spin up a new environment, we need to ensure the appropriate teams own it end to end. They need to have the necessary tooling and operational support to maintain this environment for its entire lifetime. This means well-understood communication on incident resolution and criticality. And it means well-understood processes to manage changing environmental needs.

KPA 5: Application Release Operations

Alright, this one gets a little tricky. It's healthy to have consistent and repeatable processes across your enterprise for releasing applications. But it's an easy risk to read this and interpret it as "standardize your deploys." I want to be clear: application release and deploys are not the same things.

Your deploys are all about getting packaged source code to the right place. But application release is about exposing new functionality to customers. At the lowest maturity, this happens only during deployment. But with mature teams, we can use tooling and processes to separate the idea of deploying code from activating it for customers.

This means you want to ensure your software teams are equipped to continually deliver code to production and to do it in as automated a fashion as possible. Once your teams are doing this, we can shift our focus to how to activate—or release—this code to our customers. There are many tools to help you make this change. It's this process that you want to standardize across your organization. That way, customers know what to expect, and they'll understand how to check if new features have arrived.

KPA 6: Data Release & Privacy Operations

Let's talk about another key performance area: data release.

Data release across your environments is just as important as application release. But it's often neglected. Each application team ideally owns its own data, but teams need to be explicit how they manage that across their environments.

Time for another story. I knew a team that was quickly delivering high-value financial software, but they depended on a few backend services. Some of these services had a data refresh that occurred once a quarter or so. However, they didn't make this known to the team, so the team had set up their QA environment with a test bed of data to give them a speedy turnaround time on user stories. This data refresh hit them like a punch in the gut. It killed their velocity for weeks.

It's healthy to avoid such problems in your enterprise. We want to ensure data release processes are well known and consistent across teams. It's also a good idea to automate as much as possible to ensure this consistency stays intact, letting our teams work on more valuable efforts.

KPA 7: Infrastructure & Cloud Release Operations

In the same vein as data releases, infrastructure releases have an indirect but profound impact on your teams' applications. How you handle your infrastructure has a ripple effect across multiple applications. If managed well, you can provide a cushion of protection for software systems to run and fail in isolation. If mismanaged, it can bring down a whole ecosystem of applications.

One would think I'd be out of stories by now, but I have another: I was on an engagement at a Fortune 10 company that, as far as I know, is still mismanaging their infrastructure releases. They built an in-house cloud platform from the ground up, but they didn't consider their environmental demand, nor did they create an automated and repeatable system. They instead created a system that requires every application team on it to move every few months. And every move brings with it different problems. They provide no tooling to automate this move. At one point, they would consistently lose a data center every week for three weeks straight. Not only was the platform unstable but it also actively hampered application teams from delivering because they were too busy migrating their infrastructure.

There are many tools to help us manage this effectively. We can take advantage of external cloud platforms. We can practice infrastructure as code principles. Also, we can use configuration management tools to ensure our environments are consistent and we can always go back to a fresh state.

Think of your infrastructure releases as a bed frame, and you want your software teams to feel like they are lying on a comfortable mattress, not a bed of rocks.

KPA 8: Status Accounting & Reporting

Complex systems are quickly becoming table stakes in the world of IT. This complexity makes it harder and more valuable to stay on top of your system health and behavior. Yet the faster you can make decisions about your systems and react to problems, the more competitive you will be.

Throughout your teams, you want to ensure you have ways of understanding team health. That way, you can support troubled areas. You want to monitor system health so that you can triage and fix defects before your customers even know. And you want to get real-time data on your system behavior so that you can react faster than your competitors and get new features out quickly.

This is connected with the infrastructure release key performance area, as you want to equip your software teams with standard tooling to accomplish all of this. The more consistent your tooling, the more you can aggregate data and see behavior across multiple systems.

Multi-Dimensional Success

Getting a handle on these key performance areas across your organization is a potentially tough but worthy endeavor. Mismanaging any of these will cause pain, but handling them well will create a cohesive, value-focused set of teams.

Ready to take the next step? If you're feeling confident about your environments or you're just curious, go ahead and calculate your environmental maturity. The results will give you insight into what area most needs your attention.

The Author: Mark Henke.

Mark has spent over 10 years architecting systems that talk to other systems, doing DevOps before it was cool, and matching software to its business function. Every developer is a leader of something on their team, and he wants to help them see that.

Test Environment Management and DevOps

Why DevOps Needs Test Environment Management

Testing is an essential component of software development. Modern software developers live by the mantra, "If it isn't tested, it isn't done." There's a lot of focus on unit and acceptance testing, but organizations often slip when it comes to practical system testing. That is, many projects fail to put all the parts together and check their interactions.

Sometimes the reason for this is simple: we don't have a good testing environment.

In this post, I will discuss why this is and detail some of the side effects of missing out on that good environment. I'm going to start by creating some context, then I'll talk about fundamental practices that can be applied to the general problem, and I'll close out with a discussion on testing environments.

Test Environment Management and DevOps

Why Don't We Have Good Testing Environments?

Organizations face a number of competing factors when it comes to software development and deployment. There are the first and obvious issues of building the right thing and having a stable solution that users like. Within our organizations, we are always working to balance the cost of development and the cost of operations. We find that cost minimization is hard to achieve when we have these two goals.

Further, the creation and maintenance of each environment is a complicated and time-consuming activity. Coordinating multiple environments has a multiplicative effect on cost. Employees also suffer from fatigue and distraction. Being consistent and thorough becomes more and more difficult as complexity increases. Each of these factors leads to increased cost through waste and rework.

So that's the problem. How do we fix it?

Tradition vs. the New Way

Traditionally we establish one or more test environments. Often our test environment is a smaller version of production. It may not contain the same volume of data. There may not be the same amount of network traffic. The servers might not have the same number of cores or amount of RAM.

This is not the most effective way of testing the system as a whole—we all know that. But there are always reasons that we do it.

The new way of doing things is to create a production environment and run the tests. That is, we automate the infrastructure to the degree that we can create an entire environment with a simple button click and execute our test suite.

We should be building our test environments with the new way in mind as our ideal. That said, there are still many issues that we need to deal with in order to achieve this goal. The following are the heavy hitters on our issue management list.

What Do We Need for Better Testing Environments?

First, it's essential that you carefully lay out what tools you'll need and how they will be used. Having a solid foundation to start with will be helpful later on, so don't skimp on the thinking here. Identify the capabilities you are looking for in setting up an environment. Then ensure that you have the tooling in place to support that. It's much easier to build things into the system in the first place.

Having said that, like with all things, plan only for what you know you need. Don't be overly speculative or overly ambitious. Focus on what you know to be true about the end-state and work to make that a reality.

There is an ever growing list of tools available to help with every aspect of managing your environment. First off, cloud providers universally provide working APIs for every aspect of configuration, allocation, deployment, and provisioning. On top of those APIs there are often whole SDKs and CLIs to make using them even easier. Beyond that, there are 3rd party tools that make the use of those SDKs almost transparent.

As we consider how to create a good test environment, there are a number of considerations that we need to keep in mind.

What Makes for a Good Testing Environment?

The problem you might encounter is that there are so many tools you can't keep track of them. Further, not all of the tools and components may be entirely in your control. The difficulty here is balancing a lean solution against the vast array of available tools for managing that solution. Finding a tool that is light and easy to apply is a first order knowledge problem; how do you make a decision about an ever-changing environment to which you have little control without arriving at a possibly irreconcilable conflict?

This conundrum can be resolved with a light touch. If you can create a lean solution that satisfies your platform requirements, you have a basis for discovering cost savings without sacrificing capabilities.

This is where Test Environment Management (TEM) comes into your plan. At least in part, TEM can help you wrangle all these components and manage their use and deployment.

For good TEM, you'll need the following components.

  1. Testability

Modern software is tested software. Over the last 20 years, we have changed the way we make software by adding a tremendous focus on testability. While the debate rages on about what the most effective means of testing is, one thing we can count on is: there will be tests.

Building a systems infrastructure that supports testability is absolutely necessary for a modern delivery pipeline. So when we think about the capabilities we will need, it is somewhat of a foregone conclusion that we will be able to test the infrastructure before deployment.

So our test environment itself must be testable. Validation of the environment itself is a critical feature of our solution.

  1. Configuration Management

If we're going to use automated releases, we have to have good configuration management. Because we will make all our environments essentially the same way, this should be a straightforward process of identifying the configuration and codifying that into our build process. When we have done this, all environments going forward will be consistent.

  1. Release Management

Just as we would with a production release, our test environment is going to need release management. We need to know what features and fixes are contained in a release so we understand what we should be testing. This requires us to integrate our change management system, source control tools, and release process.

  1. Networks

Network configuration is a concern we must also address. Each deployed environment needs to function with a minimum amount of customization. That is, just because the deployment is to a test environment doesn't mean we should have to reconfigure every service. Virtual networks, Kubernetes, Docker-Compose, and other tooling can minimize these customizations.

  1. Load and Volume Testing

One thing that can be difficult to emulate in our test environment is message volume. In order to test load and performance, we will need some means of creating a transaction volume similar to production. For many web applications, this isn't overly complicated, but for an IoT solution with hundreds of thousands of devices, this can be a daunting task. Careful consideration of these needs is required in planning a test environment. There will be a lot of heavy lifting.

  1. Incorporation of Databases

Similar to message volume, test data is often a challenge. When planning a testing environment, we need to accommodate not only the database configuration, but also the volume of data in order to ensure that we have a proper simulation of the real world.

One approach is to develop a data loader that simulates real data. This loader is executed between the environment creation and test execution steps. Of course, this can be a challenging task for large systems. An alternative is to make copies of production systems. There are several laws we need to be sure we observe when we make copies of systems related to financial and privacy regulations; data masking can be as challenging as simulated data loading.

For greenfield development, getting ahead of these issues will save you a lot of pain and suffering. In the brownfield, developing a careful plan will help you immensely; organic growth in this area yields results, but often with the consequence of interrupted or delayed deployments as issues arise and data is backfilled into the process.

  1. Production's Security Settings

A final issue to be considered is security. In order to get a realistic test of our system, we need to include all of the security settings our production environment has. This includes establishing users with different roles, server certificates, network restrictions, and all of the other settings and configurations we have in production. Because we have automated the deployment process, this shouldn't be difficult to do, but it does increase the number of things we need to keep track of.

What Are the Risks of Not Having a Good Testing Environment?

I've described a complex system of testing environments and automation in very abstract terms. I'll add to those generalizations an important takeaway: if you don't have a consistent, reliable, and fast testing environment, you are at great risk for failure.

I don't mean your project will fail. I mean you are at risk that any particular deployment won't go well. If you don't manage your test environment well, it's easy to can get wrapped up in the test cycle with systems that won't deploy or tests that cannot execute. You might even release bugs because your test environment is tolerant of things that production won't allow.

It is essential that your organization puts effort into the creation, growth, and maintenance of an automated testing environment in order to maximize the effectiveness of your development efforts.

So, Why TEM?

All of the above are necessary components of a modern software delivery pipeline. As organizations move toward continuous deployment, the need for automation grows, and more tooling is necessary to enable that automation. Test environments specifically need additional management in order for things to run smoothly and cost-effectively.

If you want to get into more detail, there are a number of articles and posts elsewhere on the general topic of Test Environment Management (TEM). If you are looking to dig deeper into the topic, I can suggest this article that describes the Use Case for TEM and this one discussing the cost of an inefficient test environment.

Failure to create these test environments puts the organization at risk and can be very costly. In order to create test environments with any reasonable amount of consistency, you must manage them. Therefore, test environment management needs to be a required component of your environment.

Author Rich Dammkoehler

This post was written by  Rich Dammkoehler. Rich has been practicing software development for over 20 years. In the past decade, he has been a Swiss Army Knife of all things agile and a master of agile fu. Always willing to try new things, he’s worked in the manufacturing, telecommunications, insurance and banking industries. In his spare time, Rich enjoys spending time with his family in central Illinois and long-distance motorcycle riding.

How to Achieve Continuous Delivery

How to Achieve Continuous Delivery

So you've decided to get on the continuous delivery train. Congratulations! You're about to turn your deployments from anxiety-inducing to yawn-inducing, which is a luxury non-CD shops will never know. Will it be easy in the short term? Well, let's put it this way: there will need to be some changes to how you approach things. And you'll have to convince your colleagues to change the way they approach things, too. But there are steps you can take to make this transition easier for all. That's what we'll talk about today—your path to successfully doing continuous delivery.

How to Achieve Continuous Delivery

What CD Even Is, Though

First, let me give you a quick summary of the subject at hand. Continuous delivery means packaging every significant code change and pushing it through an automated pipeline of steps until it reaches production. Commonly, these steps act as gatekeepers to the Great Beyond of your prod environment. They're your portcullises and moats. The job of these gatekeepers is to ensure your code is truly ready to enter the wild. Common steps include running automated unit tests, acceptance tests, and smoke tests. A continuous delivery pipeline also includes promoting your package to higher environments and smoke testing them. This is all done so we can eventually make deployments so uneventful that they're boring, reducing risk and saving a boatload of cash and frustration.

How Do We Get There?

If you've read the book Continuous Delivery by Jez Humble and David Farley or if you've perused a few blogs, you may feel overwhelmed at first. Continuous delivery can be a lot to take in. But fret not. We will eat this elephant one step at a time, making it a bearable and possibly fun process.

The principles we'll follow, straight out of Humble and Farley's Continuous Delivery, are to document our steps, continue with those steps even if—especially if—they become painful, and then automate them away. This requires a large dose of tenacity. You have to be willing to stick with it, possibly for an extended period of time. In my experience, this tenacity almost always pays off, often faster than you may think.

Document Your Existing Steps

Starting your journey is as simple as documenting all the steps it takes your system to go from code that's committed and pushed to when it's in production and available to consumers. And yes—you need to document every step. You'll be surprised at how many there are. Pull everyone involved as you need them. Any gaps in the documentation must be fleshed out. Talk to your developers, your QA specialists, your release manager, etc. Talk to your system administrator if you need to. Get it all in one visualization. By the end you should have something like this:

Make sure you understand who owns or commonly performs each step. Sometimes this is a system, but more often, it's a person. If you have trouble showing or understanding the steps, think of your pipeline as a conveyor belt with one thing moving through it: the software package. You may decorate this package with other things, like config files. It may also morph into a different kind of package—for instance, from an executable into a Docker container. But it's still one thing moving through each gated step.

Make Friends

Ultimately, continuous delivery and DevOps is not so much about the tooling but about the people. It's about collaboration and focusing on what matters, and it's about delegating boring deployment work to computers. However, not everyone may take that view. Many people have built up little kingdoms around their role in getting the code to production. They may see your initiative to automate as a threat.

You'll want to understand and have compassion for all the people involved in deploying your system. This may be as easy as giving a heads up. It may be as involved as being vulnerable with them and letting them share their concerns. And at the end of it, you may still have to add a silly button to your deployment server and let them push it. Just remember: it's as much about the people as it is the tooling.

Version Your Package

Ensure your software packages are versioned. You want to know you can grab any build you need and push it through your pipeline. This will make both troubleshooting and tracking easier. Also, ensure that your package is IT & Test Environment agnostic; that is, don't tie your built package to any specific environment, such as dev or prod. We'll wire in the environment-specific stuff later.

Publish Your Package

After you build, version, and unit test your package, publish it to a well-known place. This will make your package available to deploy to multiple environments without rebuilding and unit testing every time. It will also ensure you have a consistent build. You can use something as simple as a shared network drive, but many tools exist to make it easier. Maven and Gradle have the ability to publish built in. Many continuous integration servers also have some sort of publishing mechanism wired in, depending on your language.

Find the Biggest Pain Point

Now the fun part. Document what the biggest pain points are on your diagram, like I did here:

Green is already automated or low pain, and red is the highest pain. You want to ensure the team is doing this painful step as much as possible. The instinct for them will be to run away from it and avoid it. Instead, we want to equip the team with what they need to get rid of it.

Automate the Pain Away

The next step in our path to continuous delivery is to take the pain point from the previous step and figure out how to automate it. There are many, many tools available, depending on the step that's red for you. For testing steps, you can automate your tests. Use a unit testing framework, or Postman, or a more comprehensive testing tool. This is another place where people, namely QA specialists, may think of moving to CD as a threat. It also can be a whole initiative on its own.

For deployment steps, there are many tools to automate the publishing and pushing of your system onto a server. I highly recommend investing in a deployment server. It will save you loads of time automating your pipeline. However, if you're not confident in one or have budget troubles, you can automate with as little as your command shell and some SSH. Something more in the middle can be a task runner like Gradle or even some PowerShell modules. Parameterize these scripts by version number and environment.

You may not be able to automate your most painful step, or you may find it a steep learning curve. That's alright. If it's too difficult, find something similar to start with. You can also mitigate the manual parts down to a few button clicks. That helps. As long as you can remove enough of the pain your red step causes to make it no longer the most painful one, you're moving in the right direction.

Rinse, Repeat

Aggressively and continuously repeat the previous steps. Once you automate most of your painful steps away, your entire team will have an amazing change of attitude when it comes to deployments. It will feel like a large weight has been lifted off your shoulders. Ideally, with one button push, you can get your system from your package repository all the way to production. Most likely, you'll need to press one button per environment. Even then, it'll still be a breath of fresh air compared to the way it was before CD. Nonetheless, you'll want to keep automating until you achieve the one-button-push deployment.

I'm Done Now, Right?

Even though your deployment life will be much easier, I wouldn't stop there. Once you have an effective deployment pipeline, you can do many beautiful things with it. You can make it a zero-downtime deploy. Your team can add feature toggles so you can separate your deployments from your releases. And of course, you should continue to refine and evolve the pipeline as you find new or smaller pain points along the way.

You're on Your Way

As you can see, achieving continuous delivery is well within your reach. It's simple, yet you must persist through all the blockages. Focus on people and showing them how continuous delivery eases their role. Continually and aggressively knock one obstacle down at a time, and you'll be there sooner than you think!

Author Mark Henke. 

Mark has spent over 10 years architecting systems that talk to other systems, doing DevOps before it was cool, and matching software to its business function. Every developer is a leader of something on their team, and he wants to help them see that.

 

I booked the Test Environment

Test Environment Booking Forms and Demand Management

A booking form is a way to tell another department what you need and when you need it, plain and simple.

How do you predict when and how your test environment services will be at a premium? The ebb and flow of business cycles will inevitably cause uneven demands. For example, the end of the year is a busy time for HR as they prepare for a new year of benefits and ever-changing regulations. In contrast, the server management department might be in high demand after Patch Tuesday*. A buggy patch can wreak havoc! These business cycles affect the flow of demands on the test environment.

ITIL recommends that we not only measure but predict demand. Demand management is essential for business. It is a primary part of the Test Environment Management Use Case and falling short of demand means not being able to deliver adequate service. Conversely, overshooting causes waste. That waste translates into dollars. If you can predict demand, then you can manage it.

This post provides an introduction to how you can use booking forms to help manage demand. Seasonal trends are the most straightforward to account for, so let's address those first.

Example of a Test Environment Booking Form:

 

Handling Seasonal Demand

Many industries use booking forms as part of their process to manage demand. Take the case of when you bring your car to the dealership for repairs and they provide a rental car. The dealership puts in a booking form with the rental company. The rental company uses that form to make sure your vehicular needs are met.

Some industries are highly subject to seasonal demands. Hotels and airlines are notoriously swamped during the holiday season. They commonly use booking forms as part of their reservation system in order to manage and control demand. If you've ever experienced the "joys" of holiday travel, just imagine how much worse it would be without a reservation system. It would be total chaos!

Seasonality is ubiquitous in business. Some experience lulls in February. Others are slow during the summer months. Furthermore, the cadence varies by department. HR may be busiest in October, while the accounting department may be buzzing as the new fiscal year approaches. All of those peaks increase the demand for testing as new processes are put in place. A reservation system combined with past trends helps to manage the test environment under this kind of load.

Looking at past data is a great way to predict the seasonal cycles of a business. However, there are times when demand doesn't follow the typical cycles. We need a way to plan for those increases in demand.

Booking Forms Help Predict Demand

A straightforward way to notify another department goes a long way toward preparing for increased demand. It helps the supply side as well as the demand side. The supply side can adjust to meet future demand. Meanwhile, the demand side has a better chance of having its needs met on time.

Many industries use booking forms to communicate demand for services. For example, imagine a team that's working on a software project due to wrap up in September. They will need to book resources for deployment to production. Let's say they need new servers for deployment. Sometime in July, they'll fill out a booking form describing their needs. The form will go to the server team, so they can plan ahead. The server team can fulfill the request in August or even closer to the deliver-by date if that works better with their schedule.

Similarly, consider a new project that's just getting underway. The first round of testing is planned for two months out. That means now would be a good time for the project manager or test manager to put in a request for the testing resources. Let's say the test environment has unique requirements, such as specialized hardware. That request must be placed in advance so the acquisitions can be approved in time. Failure to submit a booking form on schedule would result in project delays. Or, it would cost a significant amount of social capital to recover from the slip-up!

Use Booking Forms for Infrastructure

A booking form is an integral part of a reservation system. When you have finite IT resources, such as physical servers, planning ahead is especially important. Virtualization helps when it comes to shifting demands for servers. But as demand increases up to and beyond physical capacity, something must be done.

In many industries, such as airlines, the cost of service increases with demand. Price adjustment is a standard method of controlling demand. Organizations with fixed assets, as in hotel rooms or airplane seats, need to keep them in use. By reducing prices during times of low demand, these companies increase usage. IT doesn't always have this luxury.

Instead, IT must chase demand. But rather than simply chasing it, we can predict it. In some cases, the PMO will have to adjust priorities to level the demand for limited resources.

Demand Management of Cloud Resources

More and more enterprises are shifting to cloud providers. Complete virtualization of infrastructure and newer technologies like containers are good news when it comes to handling demand. Cloud services make it possible to handle larger fluctuations in demand. And, they do so without reaching the limits of physical capacity. However, it's still important to control resource consumption.

Cost control is one of the primary issues with cloud management. Cloud services offer nearly limitless resource consumption, at a nearly limitless expense. They must be managed effectively to control costs.

Another issue is security. As much as unused servers are wasteful, they are also security liabilities. Cloud providers do a great job of keeping current. Still, consumers must maintain certain types of resources like Virtual Machines.

More VMs means more time spent keeping them running and secure. The waste starts to multiply as the infrastructure and security groups have to take these additional servers into account. The additional servers not only cost money, but they add to the overall workload. That added workload chips away at the capacity of the operations teams—taking little bits away from the things that really do matter.

Speaking of demand for services, let's talk about that demand for personnel.

Booking Forms for Services

Booking forms can be used to book anything from servers to DBAs and testers. Sometimes, the demand for testers can increase beyond capacity.

Let's say your enterprise is moving to the latest OS. Upgrading an OS is a huge undertaking and doing it all in one go could end in disaster. It can be ruinous to hastily make a drastic change to a single system, let alone the operating system used across the entire organization!

Thankfully, OS version upgrades are a rare occurrence. But when your company makes the switch, it's a fairly massive project. You need to test all applications in the IT inventory (and even several that are not). Doing so is the only way to ensure the upgrade will have a limited impact on the business.

However, this kind of massive change presents a problem. You can't throw all your resources into one project. Nor can you hold the project back by limiting supply. There are other projects, and you have to be able to manage resources effectively.

An OS upgrade is a case where you have a sharp increase in demand. Additionally, this fluctuation does not fall on a typical business cycle. You may need additional testers to ensure success. Booking forms would come in from several departments to have their applications tested. You could easily attribute the increase in demand to the OS upgrade project and adjust resources accordingly.

Similarly, you would need to prepare the test environment for the upgrade. Booking forms would flow to the departments that set up the various components within the test environment. A tracking system relates all the requests back to the project. And now we've come full circle—we can see what services are needed to support the project's testing needs.

Form Your Own Conclusions

In this post, we've seen how demand and capacity are related. Managing demand is important for cost control and service delivery. Booking forms are used across industries to communicate in advance. They're standardized and can be used to feed into capacity planning.

In a test environment, infrastructure and staffing demands can fluctuate seasonally. At other times, they can change sporadically. When you use booking forms, you're better able to predict and manage demand. Use them and you'll reduce seasonal stresses and keep surprises to a minimum, which keeps your department on top of things!

* Patch Tuesday is the second Tuesday of the month, when Microsoft releases patches for servers. A bad one can cause a lot of problems that look like buggy application code.

Contributor:

This post was written by Phil Vuollet.

Deployment Styles: Blue/Green, Canary, and A/B

Deployment Styles: Blue/Green, Canary, and A/B

These days, we seem to have an overwhelming number of deployment options. DevOps, continuous delivery, and similar practices have encouraged introspection on how to release valuable software to customers. You've probably heard of three popular options: blue/green deployments, canary releases, and A/B testing. And maybe you've wondered, aren't these all the same? Or, when should I use blue/green instead of A/B? They both have slashes in them, so they're probably equivalent, right?

This post will help you differentiate between these three deployment options and understand why they're valuable. And, as you'll see, each one is indeed valuable depending on the situation.

Deployment Styles: Blue/Green, Canary, and A/B

 

Deployment Vs. Release

Before we delve into the different styles, I want to make a couple of terms clear. Often the ideas of "release" and "deployment" are used interchangeably. But don't be deceived. These are related but different, and some strategies hinge on understanding the difference between them.

Deployment

Deployment is likely the term you understand best. It refers to updating executable code to a specific environment and running it. Strong practices like continuous delivery and continuous deployment focus on how to package this code and get it to the appropriate environment. They often encourage automation and eliminating disruption risks for your customers. Often, as soon as we deploy code customers can see it. However, deploying code need not be the factor that determines whether or not your customers see it. The goal of deployment is to ensure your code can run properly in the appropriate environment.

Release

Releasing is all about making the output of new code visible to your customers. The simplest way to do this is to deploy that code and let it immediately become visible. This is why this concept is often confused with deployment. However, there are many ways to hide code from customers even while it's deployed. The goal of releasing is ensuring a feature meets customer needs and is turned off when defective.

Blue/Green

Blue/green deploys are focused purely on deployment as a way of eliminating downtime and disruption for customers.

How Does It Work?

Let's say you're writing a blog post (go figure) and your audience is expecting to see something in an hour. But you don't like the post and want to rewrite it from scratch. Well, you're not sure you can finish in time, so you publish the old one. When you do have time, you can rewrite the post completely and publish it after the due date, replacing the older version. Depending on when customers visit the blog, they will either see the old post (and you'll get credit for meeting your deadline), or they'll see the new one and enjoy some quality content. Either way, no one will ever see an empty space where they were expecting a post.

Instead of blog posts, picture running software. When you need to deploy a new software version, instead of replacing the existing version you run the new one side by side with the old one. Once everything looks good, you can switch traffic to the new service. That way, customers are always able to get to your system.

What's It Good For?

Blue/green deployment drastically reduces risk in many situations. If you have a site where it costs significantly to be down, even for a few minutes, this option can save your bacon.

Canary

Canary deployments, also known as canary releasing, are usually release-focused. But, sometimes, they can also be deployment-focused. They are deployment-focused when you use your deployment scripts to only update the code on specific containers or servers. They're release-focused when you can change which canary features are visible to some users without redeploying.

How Does It Work?

Canary releasing can work many ways, but essentially you only release a new version or functionality to a small set of customers to start. You then monitor your system and the customers' responses to see if anything...weird...happens. The odd name for this deployment comes from coal miners lowering canaries into the shafts to detect noxious gases so they can see if it's safe before they descend themselves.

Think of canary releasing like a bottle of pop. You accidentally drop it on the ground. You're not sure if it will start spraying everywhere when you open the bottle, so instead of just turning the cap quickly and risking that the pop will explode, you turn the cap slowly to eke out a little air with every rotation. Eventually you can safely open the bottle and drink a refreshing beverage. Canary releasing is eking that software out to the world, user by user.

What's It Good For?

Canary is fantastic for lowering the risk of changes to production. The "no defects in production" mentality is a little overrated, and canary lets you mitigate the cost of defects without spending an enormous amount on preventive testing. (You should absolutely spend some effort on preventive testing, though.)

A/B Testing

A/B testing is a release strategy. It's focused on experimenting with features.

How Does It Work?

With A/B testing, implementation is similar to the process of canary releasing. You have a baseline feature, the "A" feature, and then you deploy or release a "B" feature. You then use monitoring or instrumentation to observe the results of the feature release. Hopefully, this will reveal whether or not you achieved what you wanted.

You're not only limited to two versions in a test. Netflix, for example, has multiple covers it displays as the graphic for the same show based on the version it predicts users will respond to and want to see. But be careful: It's healthy to only run one experiment at a time.

What's It Good For?

A/B testing from the 1,000-foot view may look a lot like canary releasing: You have a set of users seeing the new stuff, and a set with the old stuff. However, A/B has a much different intent. While the focus of canary releasing is risk mitigation, A/B testing's focus is on how useful a feature is for customers. It's the old argument of "build the thing right" vs. "build the right thing."

 

Working Together

Though these three options all tackle different goals, by no means are they mutually exclusive. You can have a system that's backed by blue/green deploys that set up features you can canary-release by region or customer last name. And you can set up certain key features as A/B experiments with your highest-paying customers. These all can work in harmony.

Get Past the Buzz, Choose a Style

We love throwing out buzzwords in this industry, and it can get confusing fast. This is especially true when these jargon-filled concepts operate similarly on the surface. I hope this article helped clarify the differences between these three deployment strategies. Pick one or more that work for you.

 

Author

This post was written by Mark Henke. Mark has spent over 10 years architecting systems that talk to other systems, doing DevOps before it was cool, and matching software to its business function.

Shakedown Cruise

Shakeout Testing With Synthetics: A TEM Best Practice

Shakeout Testing With Synthetics: A Test Environment Management Best Practice

Testing software is pretty easy, right?  You build it to do a thing.  Then you run a test to see if it does that thing.  From there, it's just an uneventful push-button deploy and the only unanswered question is where you're going to spend your performance bonus for finishing ahead of schedule.

Wait.  Is that not how it normally goes in the enterprise?  You mean testing software is actually sort of complicated?

Shakedown Cruise

 

Enterprise-Grade Software Necessarily Fragments the Testing Strategy

I've just described software testing reduced to its most basic and broad mandate.  This is how anyone first encounters software testing, perhaps in a university CS program or a coding bootcamp.  You write "Hello World" and then execute the world's most basic QA script without even realizing that's what you're doing.

  1. Run HelloWorld.
  2. If it prints "Hello World," then pass.
  3. Else, fail.

That's the simplest it ever gets, however.  Even a week later in your education, this will be harder.   By the time you're a seasoned veteran of the enterprise, your actual testing strategy looks nothing like this.

How could it?

Your application is dozens of people working on millions of lines of code for tens of thousands of users.  So if someone asked you "does it do what you programmed it to do," you'd start asking questions.  For which users?  In which timezone and on what hardware?  In what language and under which configuration?  You get the idea.

To address this complexity, the testing strategy becomes specialized.

  1. Developers write automated unit tests and integration tests.
  2. The QA department executes regression tests according to a script, and performs exploratory testing.
  3. The group has automated load tests, stress tests, and other performance evaluations.
  4. You've collaborated with the business on a series of sign-off or acceptance tests.
  5. The security folks do threat modeling and pen testing.

In the end, you have an awful lot of people doing an awful lot of stuff ahead of a release to see if the software not only does what you want it to, but also to see how it responds to adversity.

Sometimes the Most Obvious Part Gets Lost in the Shuffle

But somehow, in spite of all of this sophistication, application development organizations and programs can develop a blind spot.  Let me come back to that in a moment, though.  First, I want to talk about ships.

A massive ship is a notoriously hard thing to test.  Unlike your family car, it seems unlikely that someone is going to put it up on a lift and run it through some simulated paces.  And, while you could test all of its various components outside of the ship or while the ship is docked, that seems insufficient as well.

So you do something profound and, in retrospect, obvious.

You take it out on what's known as a shakedown cruise.  This involves taking an actual ship out into the actual sea with an actual crew, and seeing how it actually performs in an abbreviated simulation of its normal duties.  You test whether the ship is seaworthy by trying it out and seeing.  Does it do what it was built to do?

In the world of software, we have a similar style of test.  All of the other testing that I mentioned above is specialized, specific, and predictive of performance.  But a shakeout test is observational and designed to answer the question, "how will this software behave when asked to do what it's supposed to do."

And it's amazing how often organizations overlook this technique.

Shakeout Testing Is Important

Shakeout testing serves some critical functions for any environment to which you deploy it.  First and foremost, it offers a sanity test.  You've just pushed a new version of your site.  Is the normal stuff working, or have you deployed something that breaks critical path functionality?  It answers the question, "do you need to do an emergency roll-back?"

But beyond that, it also helps you prioritize behavioral components of your system.  If your shakeout testing all passes, but users report intermittent problems or lower priority cosmetic defects, you can make more informed decisions about prioritizing remediation (or not).  The shakeout test, done right, tells you what's important and what isn't.

And, finally, it provides a baseline against which you can continuously evaluate performance.  Do things seem to be slowing down?  Is runtime behavior getting wonky?  Re-run your shakeout testing and see if the results look a lot different.

Shakeout testing is your window into an environment's current, actual behavior.

Shakeout Testing Is Labor-Intensive, Especially With a Sophisticated Deployment Pipeline

Now, all of this sounds great, but understand that it comes with a cost.  Of course it does -- as the saying goes, "there is no free lunch."

Shakeout testing is generally labor intensive, especially if you're going to be comprehensive about it.  Imagine it for even a relatively simple and straightforward scenario like managing a bank account.  Sure, you need to know if you can log in, check your balance, and such.  But you're probably going to need to check this across all different sorts of bank accounts, each of which might have different features or workflows.

It quickly goes from needing to log in and poke around with an account to needing dozens of people to log in and poke around with their accounts.  Oh, and in an environment like prod, you probably want to do as much of this in parallel as possible, so maybe that's hundreds or even thousands of man-hours.

This becomes time-consuming and expensive, with a lot of potential ROI for making it more efficient.

Low-Code, No-Code, and Synthetics: Helping Yourself

As detailed in the article above, a natural next step is to automate the shakeout testing.  In fact, that's pretty much table stakes for implementing the practice these days.  The standard way to do this would involve writing a bunch of scripts or application code to put your system through the paces.

This is certainly an improvement.  You go from the impractical situation of needing an army of data entry people for each shakeout test run to needing a platoon of scripters who can work prior to deployment.  This makes the effort more effective and more affordable.

But there's still a lot of cost associated with this approach.  As you may have noticed, it isn't cheap to pay people to write and maintain code.

This is where the idea of low-code/no-code synthetics solutions come into play.  It is actually possible to automate health checks for your system's underlying components that eliminate the need for a lot of your shakeout testing's end to end scripting.

You can have your sanity checks and your fit for purpose tests in any environment without brute force labor or brittle automation.

Shakeout Testing Has a Maturity Spectrum

If you don't yet do any shakeout testing, then you should start.  If you haven't yet automated it, then you should automate it.  And if you haven't yet moved away from code-intensive approaches to automation, you should do that.

But wherever you are on this spectrum, you should actively seek to move along it.

It is critically important to have an entire arsenal of tests that you run against your software as you develop and deploy it.  It's irresponsible not to have these be both specialized and extensive.  But as you do this, it's all too easy to lose track of the most basic kind of testing the I mentioned in the lead in to the post.  Does the software do what we built it to do?  And the more frequently and efficiently you can answer that, the better.

Contributor: Erik Dietrich

Erik Dietrich is a veteran of the software world and has occupied just about every position in it: developer, architect, manager, CIO, and, eventually, independent management and strategy consultant.  This breadth of experience has allowed him to speak to all industry personas and to write several books and countless blog posts on dozens of sites.

Test Data Management

5 Steps to Better Test Data Management

Forward

I always say that it's important to test in production because nothing compares to a production environment. But it wouldn't be very professional of you to test only—and directly—to production. Testing in production usually gives the impression that you didn't care enough to test before you reached the production stage.

But I'd say that in order for you to even dare to test something in production, you need to have run a set of tests previously in a similar environment—including all the data you need for testing.

That's where test data management (TDM) comes in.

Test Data Management

TDM is the process of creating test data that's similar to the real data being used in a production environment. Developers and testers use this non-production data to be more confident that the new software changes aren't going to break anything during the release.

 

A good testing strategy is usually accompanied by testing data taken from production. Developers sometimes use funny, dummy, and non-real data as well. But what are the steps that you need to follow in order to create good TDM data?

Top 5 Considerations

#1 Only Use the Data You Really Need

If you don't know where to go for your next vacation, just book the next flight and start packing. You might have the best experience of your life...or the worst. If you don't know what data you really need for testing, you might just use it all. That approach has pros and cons, so when you test software without having an idea of which scenarios you need to test, you'll want to have an exact copy of the production database because it's the easiest way to start testing with real data. Otherwise, you'll end up spending too much time and money waiting to get your copy of the data for testing.

When you start creating your testing process by building the list of test cases you'll need, it becomes pretty obvious how much and what type of data you're going to need. More importantly, think about your testing process as an iterative one. If you start testing the login page, you don't need to have all the information from the user for that test case, such as their birthdate or home address.

As you keep iterating, you're going to need more testing data. And as you find more bugs, you're going to need more real data. Unless you need to run stress tests, subsetting data is going to be enough for the majority of the test cases. And even if you still need to validate that the system can handle high waves of traffic, you can also generate varied static data for that purpose. More on this later.

Taking small sets of your production database should be enough for most of the tests you'll run to validate the software. You'll also reduce costs and complexity when building only the test data you really need at the moment.

#2 Avoid Having Sensitive Data for Testing

We've seen a lot of recent GDP-related lawsuits involving big companies in Europe. Europe is taking data protection more seriously than other countries. Pretty soon, regulations like GDPR will be implemented on other continents too. If GDPR is already affecting other companies, we better avoid having unprotected sensitive data in our testing environments.

SOX compliance regulation fosters a separation of duties within an organization. I've worked with these type of regulations. In my experience, auditors only want to see that only certain people have access to the production environments. These people with privileged permissions are legally responsible for what happens with customer data.

Even with regulations in place, data is still leaked. We have to be prepared for that, so you should operate as if you expect the information you're storing will be stolen someday. Mask any data that could identify a person, or what's also called personally identifiable information (PII).

Use irreversible methods to mask data so that it's difficult to unmask it. And make sure you're constantly checking that PII is protected. Managing test data will be simpler and easier if you create subsets of the data to fulfill your different testing needs. And you won't have to worry about giving sensitive data to developers or whoever needs production data for testing.

Ideally, try to avoid having sensitive data. But since sometimes you can't avoid it, try to keep PII data at a minimum, and securely mask the data you need to have.

#3 Build Synthetic Data for Better Efficiency

Even though you decided to mask sensitive data—especially if the data is going to be used for testing purposes—you want to make the security gap as small as possible by not including sensitive data in your tests (even if it's masked). One way to improve security is to replace real data (like credit card numbers) with autogenerated dummy data. That's synthetic data, and it will help you get more efficient results in testing.

You can take advantage of the synthetic data approach by using more realistic data than just dummy data. For example, you might have a user called Joe in your records, but for testing purposes, you decided Joe will be called Jeremy. This gives you a chance to run machine learning experiments where you can learn more about "Jeremy's" preferences without knowing that Jeremy is actually Joe. You're protecting Joe, even if the data is leaked or misused.

Synthetic data makes real data more shareable because you only have the data you need. Why would you need to know a person's name if you're just trying to replicate a bug in production? You're only interested in knowing which paths through the system's workflow the user took. What matters is why the data ended up in a certain state that caused the software to break. You can then decide either to ignore the person's name or replace it with other "real" data.

If you need to have large amounts of data for performance testing, you can use synthetic data to double the size of the database. Along with the previous benefits we discussed, synthetic data makes your tests more efficient by only using the data you need to cover specific test scenarios.

#4 Create Test Data As a Self-Service Model

DBAs are in charge of generating testing data. They know the best ways to do it and what data is sharable among teams (as I explained in the previous section), and sometimes they're the only ones who have access to production databases. When this happens, the DBAs become a bottleneck, and the time spent in the testing stage increases.

That's why you should create test data as a self-service model. It's not just so you don't constantly interrupt DBAs when a developer or tester needs data. The ability to automatically have testing data will let you parallelize the boring task of manually generating data for testing. Do you need to reduce the testing time? Fine. Create more subsets of testing data in parallel and distribute the test cases.

Another benefit of having a self-service model is that you can easily drop and re-create environments on demand. By doing this, you ensure repetition and predictable results when preparing testing data. It's also easier to include TDM in your CI/CD pipeline, which brings you closer each day to one-click deployments.

Creating a self-service model is far from an easy task. So it's important that DBAs, developers, and testers work together to create this model. Not all of them have the same needs and objectives. Join experience, knowledge, and skill to create a better model for data testing.

#5 Keep Testing Data up to Date

Last but not least, keep your testing data up to date. Your software will continue evolving, so the test scenarios and the data they need will keep changing over time too. Some test scenarios will become obsolete, and so will their data. Try to always keep the house clean by making sure you're only generating the testing data you really need regarding its relevance in time.

This process takes discipline, and good communication within the team always helps. Developers need to inform everyone which tests are no longer needed and when it's OK to remove them. And either DBAs or testers need to keep confirming that the data they're using for tests is still valid and relevant.

Keeping data fresh might seem like common sense. But I've seen delivery pipelines where tests continue to grow, even though some of the features no longer exist. Sometimes we get too extreme about trying to have a high percentage of test coverage, which isn't efficient.

Having up-to-date testing data will help you have higher quality TDM.

Benefits: Better Test Results With Better Testing Data

I'd say that testing is the most important stage of any software release life cycle. The more quickly you can verify that everything is still working, the better. Always keep the mindset that parallelizing testing will help you to speed up the process. For that, you need to have better test data quality, and it's not always necessary to have an exact replica of what you have in production. In fact, if you don't, it may help you in the cost, security, or speed departments.

It's important that you start by defining what you truly need and iterate from that. Automation helps with repetitive and boring tasks, but you need to continue taking account of the human side of things in the equation to generate data for testing purposes.

TDM helps you provide only the data you need, on time and securely.

Author:  Christian Melendez 

Further reading suggestions: Holistic Test Data Management