Why Test Data Management Is Critical to Software Delivery?

Imagine you are developing a system that will be used by millions of people. In a situation like this, a system has to be very well-tested for any type of error that can cause the system to break while in production. But what’s the best way to test a system for any possible system failure because of bugs? This is where test data management comes in.

In this post, I will explain why test data management is critical in software delivery. To develop high quality software products, you have to continuously test the system as it’s being developed. Let’s dive in straight to understanding how this problem can be solved by using test data management.

What Is Test Data Management?

Well, in simple terms, test data management is the creation of data sets that are similar to the actual data in the organization’s production environment. Software engineers and testers then utilize this data to test and validate the quality of systems under development.

Now, you might be wondering why you need to create new data. Why not just use the existing production data? Well, data is essential to your organization, so you should protect it at all costs. That means developers and testers shouldn’t have access to it. This has nothing to do with the issue of trust but security. Data should be highly regarded, or else there can be a data breach. And as you know, data breaches can cause loss to an organization.

How Can You Create Test Data?

So, now that we know why we need test data that is separate from our production data, how can we create it?

The first thing you must do is understand the type of the business you are dealing with. More specifically, you need to know how your software product will work and the type of end users that will use the software. By doing so, it will be easier to prepare test data. Keep in mind that test data has to be as realistic as the actual data in the production environment.

You can use automated tools to generate test data. Another way of creating test data is by copying or masking production data that your actual end users will use. Here you have to be creative as well and create different types of test data sets. You can’t rely only on the masked data from production data for testing.

Benefits of Test Data Management in Software Delivery

Test data management has many benefits in software delivery. Here are some of the benefits of test data management in software delivery in any software development environment.

High Quality Software Delivery

When you apply test data management to software delivery, it will give software developers and testers to test the systems and make solid validations of the software. This enhances the security of the system and can prevent possible failures of the system in the production environment. Testing systems with test data gives assurance that the system will perform as expected in the production environment without defects or bugs.

Faster Production and Delivery of Software Products to the Market

Imagine that, after some months of hard work of developing a software application, you’ve just released a software application on the market, only for it to fail at the market level. That’s not only a loss of resources, but it’s also a pain.

A system that’s well-tested using test data will have a shorter production time and excel at the production level. That’s because it’s much more likely to perform the way it was intended to. If the system fails to perform in production because it was not tested well, then the system has to be redone. This wastes time and resources for the organization.

Money Needs Speed

Test data management is critical when it comes to software delivery speed. Having data that’s of good quality and is similar to production data makes development easier and faster. System efficiency is cardinal for any organization, and test data management assures that a system will be efficient when released in production. Therefore, you start generating revenue as soon as you deploy the system.

Imagine having to redo a system after release because users discover some bugs. That can waste a lot of time and resources, and you may also lose the market for that product.

Testing With the Correct Test Data

Testing with good quality test data will help in making sure that the tests you run in the development phase will not change the behavior of the application in the production phase. For example, you might test that the system is accepting supported data by entering a username and password in the text box with all types of data that a user can possibly input into the system.

No matter how many times you test the software, if the test data is not correct, you should expect the software to fail in the production phase. This is why it is always important to ensure that test data is of great quality and resembles your actual production data.

Bug and Logical Fixes

How can you know that the text box is accepting invalid input such as unsupported characters or blank fields from users? Well, you find out by validating the system through testing.

The whole point of having test data in software delivery is to make sure that the software performs as expected. Additionally, you need to make sure that the same tests will pass in production and have no loopholes that could damage the organization’s reputation. Therefore, test data becomes a critical part of software delivery life cycle, as it helps to identify errors and logical problems in the system. Thanks to this, you can make fixes before releasing the software.

For example, imagine a loaning system that makes incorrect calculations by increasing the interest rate by a certain percentage. That can be unfair to the borrowers and can backfire for the lending company.

Earning Trust

Trust is earned, and if you want to earn it from the end users or management, you have to deliver a software product that’s bug-free and works as expected. In fact, every software development and testing team should utilize test data management. Test data management enables teams to deliver software products that stand out and earn trust from management. After all, you can’t ship an error-prone system to the market and expect happy users.

Why Test Data Management Matters

Test data management is essential for ensuring that software applications will function as expected in a production environment. By testing with realistic data, organizations can gain assurance that their software will not fail in production, strengthening their relationship with clients and reducing the chances of fixing bugs in production and rollbacks. Test data management also speeds up the software development life cycle, reducing costs and improving the speed of software delivery. This helps organizations stay competitive in a rapidly changing market by detecting errors at an early stage and fixing them before release.

Additionally, test data management helps reduce compliance and security risks, provides Product Owners and their Steering Committees with assurance that the software they are releasing is of high quality, reduces the risk of data breaches by ensuring only valid and secure data is used in testing, and helps them make informed decisions about product features by evaluating the impact of changes on performance, scalability, and usability.

Summary

In simple terms, test data is simply the data used to test a software application that’s under the software testing life cycle. Test data management, on the other hand, is the actual process of administering data that’s necessary for use in the software development test life cycle.

You can’t deny that test data management is an essential part of testing and developing software. It plays a crucial role in helping you produce high quality software that’s bug-free and works as expected.

You should take test data management seriously and apply it when delivering software. If you do so, your organization will gain more revenue because you’ll deliver higher quality software products. Higher quality products make the customers happy instead of giving them a reason to complain about some bug.

What are Test Data Gold Copies

You lean back in your chair with a satisfied grin. You did it. It wasn’t easy, but you did it. You diagnosed and fixed the bug that kept defying your team. And you have the unit tests to prove it.

The grin slowly fades from your face as you realize that you still need your code to pass the integration tests. And you need to get data to use in them. Not your favorite activity.

You can put that grin back on your face because there is another way: using a gold copy.

Read on to learn what a gold copy is and why you want to use one. You will also find out how it can help you work on an application with low test coverage. You know, the dreaded legacy systems.

What Is a Gold Copy

In essence, a gold copy is a set of test data. Nothing more, nothing less. What sets it apart from other sets of test data is the way you use and guard it.

You only change a gold copy when you need to add or remove test cases.
You use a gold copy to set up the initial state of a test environment.
All automated and manual tests work on copies of the gold copy.

A gold copy also functions as the gold standard for all your tests and for everybody testing your application. It contains the data for all the test cases that you need to cover all the features of your product. It may not start out as comprehensive, but that’s the goal.

Building a comprehensive gold copy isn’t easy or quick. But it’s definitely worth it, and it trumps using production data almost every time.

Why You Don’t Want to Test in Production

Continuous delivery adepts rave about testing in production. And yes, that has enormous benefits. However:

It requires the use of feature toggles to restrict access to new features and changed functionality.
Running the automated tests in your builds against a production environment is not going to make you any friends.
The sheer volume of production data usually is prohibitive for a timely feedback loop.
Giving developers access to production data can violate privacy and other data regulations.

There’s more:

Production data changes all the time, and its values are unpredictable, which makes it unsuitable as a base for automated testing.
Finding appropriate test data in production is a challenge. Testing requires edge cases, when users and thus their data tend to be much more alike than they would like to know.
To comply with privacy and other data regulations, extracts need to be anonymized and masked.

Contrived Test Data Isn’t Half as Bad as It Sounds

Contrived examples usually mean that you wouldn’t encounter the example in the real world. However, when it comes to testing, contrived is what you want. A contrived set of test data:

has only one purpose—verifying that your application works as intended and expected and that code changes do not cause regressions
contains a limited amount of data, enabling a faster feedback loop even for end-to-end tests
can be made to be self-identifying and self-descriptive to help understand what specific data is meant to test
contains edge cases that willtrip you up in the real world but are generally absent from production data by their very definition
can be built into a comprehensive, optimized, targeted set of data that fully exercises your application

Of course, production data can be manipulated to achieve the same. But extracting it stresses production, and manipulating it takes time and effort. And you really don’t want to be doing that again and again and again.

That’s why you combine contrived data and gold copies. You start your gold copy with an extract from production data that is of course anonymized and otherwise made to conform to privacy and data regulations. Over time, you manipulate it into that optimized, targeted set of data. But using that initial set of test data as a gold copy will bring you benefits immediately.

Benefits of Gold Copies

In addition to the benefits of contrived data, using a gold copy gets you these benefits:

You can easily set up a test environment with a comprehensive set of test data
You can easily revert the data in a test environment to its original state
The ability to automate spinning up test environments
Automated regression testing for legacy systems

Everyone working on your application will appreciate it. They no longer have to hunt for good data to use in their test cases. And they no longer have to create test data themselves. A good thing, because creating test data and tests that produce false positives (i.e., tests that succeed when they should fail) is incredibly easy. You only have to use the same values a tad too often.

The ability to automate spinning up a test environment is what makes using a gold copy so invaluable for large development shops and shops that need to support many different platforms. Just imagine how much time and effort can be saved when providing teams and individuals with comprehensive, standard test data that can be automated. For example, using containers and a test data management tool like Enov8’s.

Finally, gold copies can help reduce the headaches and anxiety of working with legacy code. Here’s how.

Slaying the Dreaded Legacy Monster

Any system that does not have enough automated unit and integration tests guarding it against regressions is a legacy system. They are hard to change without worrying.

The lack of tests, especially the lack of unit tests, allowed coding practices that now make it hard to bring a legacy system under test. Because bringing it under test requires refactoring the code. And you can’t refactor with any confidence if you have no tests to tell you if you broke something.

Fortunately, a gold copy can bail you out of this one. It allows you to add automated regression testing by using the golden master technique. That technique takes advantage of the fact that any application with value to its users produces all kinds of output.

Steps in the Golden Master Technique

How you implement the golden master technique depends on your environment. But it always follows the same pattern, and it always starts with a gold copy.

Use your current code against the gold copy to generate the output you want to guard against regressions. For example, a CSV export of an order, a PDF print of that order, or even a screenshot of it.
Save that output. It’s your golden master.
Make your changes.
Use your new code against the gold copy to generate the “output under test” again.
Compare the output you just generated to your golden master.
Look for and explain any differences.

If you were refactoring, which by definition means there were no functional changes, the comparison should show that there are no differences.

If you were fixing a bug, the comparison should show a difference. The golden master would have the incorrect value, while the output from the fixed code would have the correct value. No other differences should be found.

If you were changing functionality, you can expect a lot of differences. All of them should be explicable by the change in functionality. Any differences that cannot be explained that way are regressions.

Explaining the differences requires manual assessment by a human. It’s known as the “Guru Checks Output” anti-pattern. And it needs to be done every test run if you want to stay on top of things. Marking differences as expected can help. Especially when you can customize the comparison so it won’t report them as differences.

Go Get Yourself Some Gold

Now that you know what a gold copy is and how you can use it to your advantage, it’s time for action. It’s time to start building toward the goal of a comprehensive set of test data and use it as a gold copy.

Your first step is simple: save the data from the test environment you set up for the issue or feature you’re working on now. That is going to be your gold copy. If your application uses any kind of SQL database, you could use that to generate a DML-SQL script that you can add to a repository.

Use your gold copy to set up the test environment for your next issue. Make sure you don’t (inadvertently) change your gold copy while you’re working on that issue. When you’re finished, and if you needed to add test data for the test cases of this issue, update your gold copy.

Rinse and repeat, and soon enough you’ll be well on your way to a truly useful comprehensive set of test data.

How Many Test Environments Do I Need?

What Are Test Environments?

Before we get into the factors we’ve mentioned, we have some explaining to do. Or, rather, some defining. In this section, we’ll define test environments. You’ll learn what they are and why do you need them.

Of course, if you’re already experienced in managing test environments—or have enough familiarity with the term—feel free to skip to the next section with a clear conscience.

A testing environment is a setup of software, hardware, and data that allows your testing professionals to execute test cases. For the test environment to be effective, you have to configure it, so it closely resembles the production environment.

As we’ve already covered, there are many types of test environments. Which ones your organization will need depends on several factors, such as the test cases itself, the type of the software under test, and many more. Since that’s the main topic of this post, we’ll get there in a minute.

But first, let’s quickly cover some of the main types of test environments available.

How Many Test Environments Do I Need? The Bare Minimum

We’re about to cover the main factors for deciding which and how many environments your organization should adopt. Before we get there, though, let’s talk about the bare minimum number of environments you need.

Development

The first obvious and indispensable one is the development environment. For some of you, it might sound weird to think of the dev environment as a testing environment, but it is. Developers should constantly test the code they write, not only manually (via building the application and performing quick manual tests) but also automatically, through unit tests.

You might consider the development environment an exception in the sense that, unlike most other environments, it doesn’t need to mimic production too closely. For instance, I have seen people argue that developers that create desktop apps shouldn’t use the best machines available. Instead, they should adopt computers that are close in configuration to those their clients use, so they can feel how the software is going to run. That’s nonsense. Developers should use the better and fastest machines their companies can afford, so their work is done most effectively. If performance is an issue, there should be a performance testing phase (and environment) to handle that. The same goes for other characteristics of the production environment that don’t make sense for developers.

CI (Integration)

What I’m calling here the “CI environment” could also be simply called the test environment, or even integration test environment.

Here is the first step in the CI pipeline after the developer commits their code and push it to the server. The CI server builds the application, running whatever additional steps are adequate, such as documentation generation, version number bumping, and so on. Just building the code is already a type of test. It might help detect dependency issues, eliminating the “but it works on my machine!” problem.

If the application is successfully built, unit/integration tests are executed. This step is vital since it might be slow for developers to run all of the existing tests often in their environments. Instead, they might run only a subset of tests on their environments, and the CI server will take care of running the whole suite after each check-in/push.

QA

Then we have what we’ll call the QA environment. Here is where end-to-end tests are run, manually, automatically, or both. End-to-end testing, also called functional tests, are the types of tests that exercise the whole application, from the UI to the database and back again. This type of testing checks whether the integration between different modules of the software work, as well as the integrations between the software and external concerns, such as the database, network, and the filesystem. As such, it’s a really essential type of testing for most types of software.

Production

Finally, we have the production environment. For many years “testing in production” was seen as the worst sin of testing. Not anymore. Testing is production is not only forgivable but desirable. Practices like canary releases are vital for companies that deploy several times a day since it allows them to achieve shorter release cycles while keeping the high quality of the application. A/B testing can also be seen as a form of testing in production, and it’s essential for organizations that need to learn about their users’ experience when using their software. Finally, some forms of testing, like load testing, would be useless if performed on any environment other than production.

Which and How Many Environments Do You Need? Here Are the Criteria You Should Use to Decide

Having covered the bare minimum environments most organizations need, it’s time to move on. Now we’ll cover the main factors you need to weigh when deciding your testing approach. Let’s go.

Organization Size

The size of the organization matters when deciding which environments it needs. One of the ways this matter is in regards to personnel. Since larger companies have more people, they can afford to have entire teams or even departments dedicated to designing, performing, and maintaining certain types of testing, which includes taking care of the required environment.

Companies of different sizes also have different testing needs due to the software they create. It’s likely that larger companies produce more complex software, which would demand a larger pipeline. The inverse is also likely true for smaller companies.

Finally, organization size often correlates with the stage in which the company finds itself. That’s what we’re covering next.

Organization’s Life Phase

Do you remember when Facebook’s motto was, “Move fast and break things?” It’s been a few years since they changed it to “Move fast, with stable infra.” While the new motto is definitely not as catchy as the previous one—some might say it’s even boring—it makes sense, given where the company stands now.

Startups have different testing needs than most established companies. Their priorities aren’t the same since they’re at very different points in their lifecycles.

For startups, beating their competitors to market might be more valuable than releasing flawless products. Established companies, on the other hand, will probably place “stability in the long term” way higher in the scale. They have their reputation at the stakes. If they’re public, they have to generate results for shareholders.

Therefore, more established companies will usually employ a testing strategy that adopts more environment, and it’s probably more expensive, and definitely slower. But such a strategy might give them the reassurance they need. On the other hand, startups that value time to market might choose a more streamlined pipeline, with fewer environments. Such an approach might be cheaper, easier to build and manage, but will give fewer guarantees than the more heavy-weight approach of the enterprise.

Software Type

The type of software developed is a huge factor when it comes to testing. A database-based web application with a rich user interface will require UI and end-to-end testing, for instance, while a library will not.

Similarly, user-acceptance testing makes sense for applications targeted at final users. For libraries and frameworks, unit and integration tests might suffice. You might have even more specific needs, such as integration with custom hardware, which can require more environments.

The type of software will dictate the required types of testing, which, in turn, will help you decide on the environments.

Domain or Industry

Some industries are highly regulated, while others are less regulated or non-regulated at all. That also has a huge impact on an organization’s testing approach. Domains like financial services and healthcare come to mind.

Your company might need to adhere to rules, regulations, or norms that govern whatever industry it operates in. That might require you have an additional environment in order to test that the product is compliant with these rules.

Time for the Verdict

So, based on all that we’ve just seen. How does one choose which test environments their organization needs? We’ll now, as promised in the title, offer you a quick recipe, or a step-by-step guide.

Start with the basics. Meaning, start with the bare minimum environments we’ve mentioned and then build upon it as your requirements change.
Consider the organization’s size and stage in life. Take into account the values and priorities of the organization (time to market vs. stability, disruption vs. market share, etc.), available personnel, and budget.
Take into account the type of software you make and the industry you belong to.

With that in mind, make your decision. If your organization makes a picture editing app for Android and iOs, you might want to have (besides the obvious dev and prod):

The CI environment to perform unit and integration tests.
A QA environment to help you with end-to-end/integration tests, using both emulation and real devices.
An acceptance testing environment, where stakeholders give the final sign-off for the app’s release.

But if you’re creating a banking application, you could add an additional security and compliance environment. (Keep in mind that this is just an example. I’m not well-acquainted with the financial domain.)

Final Considerations

Test environment management is vital for the modern software delivery process. One of the decisions a test environment manager needs to make is how many environments to use. As you’ve seen, there is no one-size-fits-all answer, but that’s no reason to despair. There are objective criteria you should use to help you with your decision.

The journey isn’t easy, but this blog has many articles that can help you master test environment management and take your organization’s testing approach to new levels.