What Are Data Test Gold Copies and Why You Need Them
You lean back in your chair with a satisfied grin. You did it. It wasn’t easy, but you did it. You diagnosed and fixed the bug that kept defying your team. And you have the unit tests to prove it.
The grin slowly fades from your face as you realize that you still need your code to pass the integration tests. And you need to get data to use in them. Not your favorite activity.
You can put that grin back on your face because there is another way: using a gold copy.
Read on to learn what a gold copy is and why you want to use one. You will also find out how it can help you work on an application with low test coverage. You know, the dreaded legacy systems.
What Is a Gold Copy
In essence, a gold copy is a set of test data. Nothing more, nothing less. What sets it apart from other sets of test data is the way you use and guard it.
- You only change a gold copy when you need to add or remove test cases.
- You use a gold copy to set up the initial state of a test environment.
- All automated and manual tests work on copies of the gold copy.
A gold copy also functions as the gold standard for all your tests and for everybody testing your application. It contains the data for all the test cases that you need to cover all the features of your product. It may not start out as comprehensive, but that’s the goal.
Building a comprehensive gold copy isn’t easy or quick. But it’s definitely worth it, and it trumps using production data almost every time.
Why You Don’t Want to Test in Production
Continuous delivery adepts rave about testing in production. And yes, that has enormous benefits. However:
- It requires the use of feature toggles to restrict access to new features and changed functionality.
- Running the automated tests in your builds against a production environment is not going to make you any friends.
- The sheer volume of production data usually is prohibitive for a timely feedback loop.
- Giving developers access to production data can violate privacy and other data regulations.
There’s more:
- Production data changes all the time, and its values are unpredictable, which makes it unsuitable as a base for automated testing.
- Finding appropriate test data in production is a challenge. Testing requires edge cases, when users and thus their data tend to be much more alike than they would like to know.
- To comply with privacy and other data regulations, extracts need to be anonymized and masked.
Contrived Test Data Isn’t Half as Bad as It Sounds
Contrived examples usually mean that you wouldn’t encounter the example in the real world. However, when it comes to testing, contrived is what you want. A contrived set of test data:
- has only one purpose—verifying that your application works as intended and expected and that code changes do not cause regressions
- contains a limited amount of data, enabling a faster feedback loop even for end-to-end tests
- can be made to be self-identifying and self-descriptive to help understand what specific data is meant to test
- contains edge cases that willtrip you up in the real world but are generally absent from production data by their very definition
- can be built into a comprehensive, optimized, targeted set of data that fully exercises your application
Of course, production data can be manipulated to achieve the same. But extracting it stresses production, and manipulating it takes time and effort. And you really don’t want to be doing that again and again and again.
That’s why you combine contrived data and gold copies. You start your gold copy with an extract from production data that is of course anonymized and otherwise made to conform to privacy and data regulations. Over time, you manipulate it into that optimized, targeted set of data. But using that initial set of test data as a gold copy will bring you benefits immediately.
Benefits of Gold Copies
In addition to the benefits of contrived data, using a gold copy gets you these benefits:
- You can easily set up a test environment with a comprehensive set of test data
- You can easily revert the data in a test environment to its original state
- The ability to automate spinning up test environments
- Automated regression testing for legacy systems
Everyone working on your application will appreciate it. They no longer have to hunt for good data to use in their test cases. And they no longer have to create test data themselves. A good thing, because creating test data and tests that produce false positives (i.e., tests that succeed when they should fail) is incredibly easy. You only have to use the same values a tad too often.
The ability to automate spinning up a test environment is what makes using a gold copy so invaluable for large development shops and shops that need to support many different platforms. Just imagine how much time and effort can be saved when providing teams and individuals with comprehensive, standard test data that can be automated. For example, using containers and a test data management tool like Enov8’s.
Finally, gold copies can help reduce the headaches and anxiety of working with legacy code. Here’s how.
Slaying the Dreaded Legacy Monster
Any system that does not have enough automated unit and integration tests guarding it against regressions is a legacy system. They are hard to change without worrying.
The lack of tests, especially the lack of unit tests, allowed coding practices that now make it hard to bring a legacy system under test. Because bringing it under test requires refactoring the code. And you can’t refactor with any confidence if you have no tests to tell you if you broke something.
Fortunately, a gold copy can bail you out of this one. It allows you to add automated regression testing by using the golden master technique. That technique takes advantage of the fact that any application with value to its users produces all kinds of output.
Steps in the Golden Master Technique
How you implement the golden master technique depends on your environment. But it always follows the same pattern, and it always starts with a gold copy.
- Use your current code against the gold copy to generate the output you want to guard against regressions. For example, a CSV export of an order, a PDF print of that order, or even a screenshot of it.
- Save that output. It’s your golden master.
- Make your changes.
- Use your new code against the gold copy to generate the “output under test” again.
- Compare the output you just generated to your golden master.
- Look for and explain any differences.
If you were refactoring, which by definition means there were no functional changes, the comparison should show that there are no differences.
If you were fixing a bug, the comparison should show a difference. The golden master would have the incorrect value, while the output from the fixed code would have the correct value. No other differences should be found.
If you were changing functionality, you can expect a lot of differences. All of them should be explicable by the change in functionality. Any differences that cannot be explained that way are regressions.
Explaining the differences requires manual assessment by a human. It’s known as the “Guru Checks Output” anti-pattern. And it needs to be done every test run if you want to stay on top of things. Marking differences as expected can help. Especially when you can customize the comparison so it won’t report them as differences.
Go Get Yourself Some Gold
Now that you know what a gold copy is and how you can use it to your advantage, it’s time for action. It’s time to start building toward the goal of a comprehensive set of test data and use it as a gold copy.
Your first step is simple: save the data from the test environment you set up for the issue or feature you’re working on now. That is going to be your gold copy. If your application uses any kind of SQL database, you could use that to generate a DML-SQL script that you can add to a repository.
Use your gold copy to set up the test environment for your next issue. Make sure you don’t (inadvertently) change your gold copy while you’re working on that issue. When you’re finished, and if you needed to add test data for the test cases of this issue, update your gold copy.
Rinse and repeat, and soon enough you’ll be well on your way to a truly useful comprehensive set of test data.
Author: Marjan Venema
This post was written by Marjan Venema. Marjan’s specialty is writing engaging copy that takes the terror out of tech: making complicated and complex topics easy to understand and consume. You’ll find samples on her portfolio. Her content is optimized for search engines, attracting more organic traffic for small businesses and independent professionals in IT and other Tech industries, that she’ll also help with content audits and strategy.