Configuration-Management-Heart-of-ITIL

Why Configuration Management Is at the Heart of ITIL

For many organizations, IT starts small and grows. They don’t plan out how their IT organization will interact with the rest of the business. Instead, they hire a person or two to handle a few computers and maybe set up a server. Over time, those roles grow alongside the business. Eventually, IT leadership recognizes that the business needs more out of the IT organization than what they’re providing. Sometimes it’s because customers aren’t able to get the hardware or software they need.

Whatever the cause, many organizations come to realize that their IT organization just isn’t cutting it.

In lots of instances, those organizations choose to use ITIL, the IT information library.

Configuration-Management-Heart-of-ITIL

What’s ITIL?

In the 1980's, the British government established the IT information library. In the decades since, they’ve updated it repeatedly. It defines a series of best practices that aid IT organizations in delivering high-quality IT solutions to their business. ITIL is actually a very big set of guidelines. The original library was more than thirty books! Even though it’s changed many times throughout the years, ITIL still has a core focus on some key principles. What’s top among those principles is the idea that IT organizations should focus on providing value, work iteratively, and start from where they are.

This means that organizations shouldn’t have to drastically re-organize the way they do business to adopt ITIL best practices. Instead, they should look at how they’re providing value already. They should then identify ways they can provide more value to the business, and implement those changes over time, a little bit at a time. Short, achievable goals can mean that the business entities who rely on the IT organization see constant improvement, instead of waiting for big, difficult projects that may or may not deliver.

A common early step in this process is to implement a configuration management database.

What’s Configuration Management?

Configuration management is the process of storing information about the IT resources within your organization in a centralized repository. Usually, this takes the form of a relational database. As the name implies, you also store information about the configuration of the system inside that database.

Starting your configuration management project can feel a bit like you’re starting in the deep end. Even in businesses with less than 100 employees, it’s likely you have a lot of IT resources. To do configuration management right, you need to find every one of those resources! That said, you should plan to treat creating a configuration management database like any other project. Plan how you’ll undertake asset discovery. Evaluate options for the configuration management database software. Define a realistic picture of success. Then, put that plan into action, and execute it to completion.

How Does Configuration Management Highlight Value of Your Assets?

As noted, configuration management takes all the information about your business’s IT assets and brings them to one place. This is a benefit. When you have information that’s spread out in multiple silos, it’s difficult to find it what you need. If a critical system needs to be replaced, it’s a lot easier to fix it when you know how it’s supposed to be configured. A breakdown in a critical system is much easier to fix when you know how that system works.

Configuration management projects bring additional benefits to IT organizations. It’s common for IT leadership to discover assets they didn’t know existed during the asset discovery phase of a configuration management project. Usually, these assets were quietly doing their jobs, but they were unsupported by IT. IT organizations discover these business-critical assets haven’t received updates in years. This is a serious security risk. Identifying those assets and establishing a proper support plan is one way configuration management is a great side effect of configuration management projects.

How Does Configuration Management Optimize the Value of Your Assets?

Another way that configuration management provides value is by optimizing your IT assets. Once you know where all your IT assets are and how they’re performing, you can standardize on optimal configurations for all of your assets. Configuration management means you know which laptops perform the best for which employees. It’s easy to spot which servers have non-standard configurations when all your configurations are in a single place. Your IT organization can provide value by helping your users get the most out of their systems by standardizing on high-performance configurations.

Finally, IT organizations can minimize the amount of time they spend keeping systems up to date. With standard configurations on all systems, activities, like applying patches, are a one-step process. That means your business is more secure while your IT organization spends less time updating systems.

How Can We Make Configuration Management Successful?

Even though the goal of configuration management is to store all of the information about your IT assets in one place, the project doesn’t need to be monolithic. You can approach it piecemeal. Instead of trying to gather information about every asset across the whole company, focus on one division at a time. Work with the employees there to identify IT assets and how those assets are configured. Once you’ve done this, train those employees to work with the configuration management system. This means that when they need to change configurations or add new systems, they’ll know how to work with your IT team.

This kind of iterative approach pays off in more ways than one. Not only will you break the project into manageable chunks, but you’ll also learn along the way. It’s guaranteed that you’re going to do some things wrong at first. Instead of doing all those things wrong across the whole organization, you can limit your mistakes to just a few employees. Those employees will be able to provide feedback to your team, and that feedback will mean that your project will do better. This ties into one of the core principles of ITIL, which is being iterative in your processes. You should learn from each step of your implementation to make the next one better.

Another way to be successful is to pick high-quality software. When you centralize your configurations, you want to use software that’s simple and straightforward. Choosing a quality implementation platform like Enov8 will save your team hundreds of hours and make the software easier to use for your business.

How Is Configuration Management the Heart of ITIL?

Good configuration management plays directly into the values that are at the heart of ITIL. It not only provides value to the business, but it makes life easier for IT employees too. You can approach configuration management as an iterative process, implementing it one step at a time. That might start with a basic database that tracks laptops and servers, and wind up with a system that tracks items all the way down to the component level. The heart of ITIL is that your team makes those choices.

ITIL isn’t a monolith. The goal isn’t to say that every organization should implement each part of the system. And at no point should you expect that everyone will implement each system in the same way. You should optimize the implementation for what your business needs. Your first step will always be sitting down with stakeholders in your business and determining what will work best for your team and theirs. That’s the heart of ITIL, and good configuration management is one step on the way to making your IT organization better for everyone.

Author Eric Boersma

This post was written by Eric Boersma. Eric is a software developer and development manager who's done everything from IT security in pharmaceuticals to writing intelligence software for the US government to building international development teams for non-profits. He loves to talk about the things he's learned along the way, and he enjoys listening to and learning from others as well.

Failed Service

5 Reasons IT Service Management is Failing

Today’s IT organizations are busier than ever. They process more data, employ more people, and empower more businesses than at any other time in history. This growth in IT power and responsibility highlights the necessity that IT organizations build upon good processes. Many organizations turn to ITIL and IT service management to provide structure to their IT organization. While ITIL is a terrific framework for managing IT organizations, it’s not a silver bullet. Simply knowing about ITIL and using it to structure your IT organization isn’t enough to ensure success.

If you’re concerned that your IT service management processes might be failing your team or your business, read on. I’ve laid out five red flags that will help you detect if IT service management is failing.

 

Failed Service

#1: You’re Not Properly Scoping Changes

A common mistake among IT organizations is failing to set realistic targets for success of new processes. This can take several different forms. All of them are quite damaging to your business.

One form is scoping that may be insufficiently measurable. For instance, leadership doesn’t provide any specific targets but merely sets a goal that things will “get better.” A goal that relies on relative measures of success like “getting better” means measurement will be subjective. Subjective measurements involve the perception of stakeholders, which can be easily swayed by variations in day-to-day service. You don’t want the business to perceive your team as failing because the CTO’s laptop just happened to have a faulty hard drive the day before an organizational review of Service Management objectives.

Another form is scoping that’s too ambitious. An example might be an IT service manager setting a service level agreement that says you’ll resolve all incidents in one hour. That’s not a realistic timeline. Setting unrealistic timelines for employees degrades morale and makes those goals seem meaningless.

The opposite problem can also be trouble for an IT organization. It’s no good to set goals that won’t accomplish anything at all. Setting a goal that’s too loose means your organization won’t need to change to improve and will fail to provide value to the business.

#2: You’re Using the Wrong Tools

While ITIL is primarily focused around creating good processes for your IT organization, tooling is still very important. Regardless of your role in your business’s IT organization, you need the right information at the right time to do your job. High-quality IT service management software is regularly underrated as a part of a good IT service management implementation. It’s not just about getting the right information to the right people. It’s also about making sure that software is easy to use for business users. An effective IT service management implementation puts customers in a position to succeed, even when other parts of the IT organization are failing.

One way to identify failing tools is by looking for common pain points. Spend some time with key users of your IT service management software. Do they regularly have a hard time finding things? Is their time spent trying to make sure they don’t “mess up” the software? Does the software itself suffer regular outages?

If you suspect that your IT service management tools aren’t living up to their promises, you might want to check out a new platform like Enov8. You may find that you can easily cover gaps in your processes with software instead of painful changes on the process side.

#3: You’re Thinking About Incidents Wrong

One way IT service management systems regularly fail their users is by focusing too much on fixing problems. I know, that seems like an odd response. The truth is that sometimes IT organizations can focus too much on fixing their own problems over solving problems for the business.

IT organizations regularly think about incidents as engineering problems while users think about them as an inability to get work done. I really like the analogy of a broken light bulb. An IT organization sees the broken light bulb as the problem. The business doesn’t see it that way. Instead, they feel that the problem is that they’re trying to work in the dark. Engineers might spend days trying to get a new light bulb to users while a much simpler fix would simply be to open the window shades.

IT service management works best when it focuses on delivering the results the business needs. Often times, that requires quality engineering, but it should never be the primary concern. If your team looks at a new incident and immediately jumps to figuring out the technical cause, your IT service management implementation is probably failing. Focus first on fixing the problem for the business before trying to fix the root cause.

#4: Your Processes Are Too Complicated

IT service management is about putting processes into place in order to solve problems for the business. This is a worthy goal! Unfortunately, lots of times organizations lose that vision in the day-to-day running of the team. Something goes wrong as part of incident response, so they add a new step to a process. That new step for the process solves one problem but creates another problem that isn’t immediately apparent. When a problem crops up from that new change, the team adds another step.

You can see where this is going. In trying to fix lots of little problems encountered by your IT service management implementation, you’ve created one big one. Your processes have become much too complicated. The consequences of over-complicated processes are numerous. Employees don’t know what to do while dealing with problems. Management can’t easily understand the state of any given incident’s response. The business is stuck suffering from open issues. Resist the urge to add a new part of the process every time you encounter a problem. If you’re in the habit of doing this, look for what steps of the process you can remove to simplify it.

#5: You’re Not Focused on People

ITIL books and training focus a lot on processes and systems. That’s necessary because people writing books or designing training don’t know the people in your business. But the truth of the matter is that those people are the reason for IT service management. The goal is to make their lives easier. It’s not about implementing a specific process or creating the perfect architecture.

At the end of the day, the true measure of success is whether your IT organization makes working for your business better. Successful IT service management implementations spend a lot of time thinking about their users. They talk with them and listen to the problems those users are facing. Unsuccessful implementations get bogged down by worrying about metrics and tweaks to the process.

The Hardest Part is Recognizing the Problem

Most IT service management implementations don’t fail because of malice. They don’t fail because of incompetence on the part of the team. Those implementations fail because the team didn’t recognize the warning signs of failure before it became entrenched within their system. The IT organizations pursued their implementation with the best of intentions but didn’t know they were headed toward failure. If you recognize some of these issues within your organization, it’s not too late to start fixing them. It’ll require diligence and critical thinking, but you can absolutely be successful.

Author Eric Boersma

This post was written by Eric Boersma. Eric is a software developer and development manager who's done everything from IT security in pharmaceuticals to writing intelligence software for the US government to building international development teams for non-profits. He loves to talk about the things he's learned along the way, and he enjoys listening to and learning from others as well.

Reasons-Enterprise-Configuration-Management-is-Failing

Reasons Enterprise Configuration Management Is Failing

Reasons Enterprise Configuration Management Is Failing

Enterprise configuration management (ECM) is a big topic. Projects that aim to implement ECM are, by their nature, significant endeavors. You’re trying to encapsulate all of your organization’s core configurations down into a single source. That’s difficult under the best of circumstances, and like most projects, ECM projects may fail quickly. When they do fail, they tend to do so spectacularly too. As a project or people manager, you’re responsible for big parts of your ECM project. You want to make sure that it’s going to be a success. How can you tell whether your ECM project is in danger of failing?

We’ve collected five major “issues” that will tell you whether your project is on the wrong track.

Failing-Enterprise-Configuration-Management-2019

Issue #1: Eroding Trust

ECM projects, as we’ve noted, are massive projects. By necessity, they mean working with people outside of your own team. Different teams within an organization often have different overall goals and varying incentives to meet those goals. Sometimes those goals are contrary to the success of a big organization-wide project like an ECM project. In order for a project like this to succeed, that means teams and individual team members sometimes need to set aside their own goals.

In order to set aside their goals and work together, the teams involved need to trust each other, and that’s true for ECM projects too. They have to be able to trust that the work they’re doing together will be more beneficial to the company in the long run than if they were to prioritize their own team goals.

If you identify that there are trust issues between your teams or members of those teams, that’s a big red flag for your project. It means that those teams, when stressed, do what’s in their best interest, not in the interest of the whole organization.

Issue #2: Losing Sight of the Bigger Picture

Losing sight of the bigger picture is a facet of Issue #1, eroding trust. But it’s not exactly the same thing. While people lose trust in other teams when they lose sight of the bigger picture, losing context doesn’t always mean losing trust.

Sometimes losing sight of the bigger picture can result in people who get too focused on one detail of the project. That can manifest in lots of ways. Maybe they’re overly focused on one detail in a part of your implementation. It could be that they’re hyper-critical of a decision that’s been made while ignoring the bigger implications. Whatever the reason, they’re a challenge to work with because they’re too focused on small details. Since ECM stretches across your organization, you need people who see and promote Enterprise wide IT intelligence.

One or two of those kinds of people in your project aren’t going to kill it. But if you’ve got half a dozen, you’re going to have a hard time. Identifying those people and working out their problems early on is important to your implementation’s success.

Issue #3: Things Stop Improving

Any long-term project needs continuous improvement to be successful. If you and your teams already knew how to do all of this, your project would be done already. Stagnation is the enemy of a good ECM implementation. Even if you’ve already delivered your initial ECM project, you need to constantly search for ways to improve what you’re delivering to the business.

Realistically, no matter how good your ECM implementation is, it’s not perfect. There are always ways that you can improve. Whether that’s as a team or in terms of your technology, the state of the thing is constantly advancing. You need to be able to advance with it. Continuous improvement not only improves your efficiency, but it boosts employee morale too. High employee morale and a good sense of the cutting edge makes it easy to recruit talented new employees. It also becomes much easier to retain the good employees you have.

The converse is also true. If you’re stagnating, you’re going to lose your best employees. It’ll be harder to attract top talent because you’re working with outdated processes and technology. That sort of thing leads to a snowball effect: it’s harder to attract top talent, which means it’s harder to tackle bigger challenges, and the cycle repeats.

Issue #4: Users Don’t Use ECM

This issue is a little tougher to detect. By definition, if your users are working around your ECM system, they’re trying to make sure you don’t know about it. That’s not good! If you find out that people are working around your ECM system to store essential configuration in some other way, that’s a warning sign that there’s something wrong with your implementation. If you find someone who’s doing that, it should be a red flag that you need to find a way to improve your systems.

There’s good news, though. If you find someone who’s working around your system, you know just who to ask to make things better. Instead of getting upset that someone is working around your system, your new knowledge presents you with an opportunity. Take the time to sit down with them and ask why it is that the system isn’t working for them. You might not be able to alleviate every issue they have, but you can help make things better. If one person is working around your system, they’re probably not alone.

Because an ECM is supposed to be a central repository of configuration, anyone working around the system is a small failure. The key for you as a dedicated employee is to figure out just how pervasive those workarounds are. You might find, after a bit of investigation, that the employee in question is just lazy—that’s not a red flag. But if you do find that they’re working around your system for good reasons, your ECM implementation is in trouble. Use that as an opportunity to get out there and improve your team and your process.

Issue #5: Lack of Management Buy-in

Even if you avoid every other red flag on this list, a lack of management buy-in will debilitate any project. This issue can also be the root cause of many of the other red flags on this list too. If managers aren’t bought in, they’ll task their employees with different priorities than what your project needs to succeed. They might erode the time your team needs to think critically about what you’re doing and improve your processes. Another manager who isn’t bought-in will fight over tiny details in an attempt to derail the project. A petty manager will run interference for users who are working around systems instead of taking the time to do the right thing.

As the person responsible for the project, it’s your responsibility to make sure managers understand the what and why of ECM. Old wisdom says that a house divided cannot stand; the same is true in business. Your ECM project is going to have a much tougher time if you have to fight other managers within your organization. If you find yourself fighting other managers, you should be worried about the state of your project. That’s the time to start asking those people if there are ways you can prove the usefulness of the project to them. If you can, that’s a great way to get your project back on track.

To Conclude

None of these issues are fatal for your project. If you find one of them cropping up around your implementation, don’t panic. When you find yourself with a red flag, that’s the time to think critically about the choices you’ve made that have brought you to that point. Evaluate whether there are things you can change and spend time talking to your users and the leads of other teams. They’ll help you determine the root of the causes of your problems, and knowing what to fix is half the battle.

Author

This post was written by Eric Boersma. Eric is a software developer and development manager who's done everything from IT security in pharmaceuticals to writing intelligence software for the US government to building international development teams for non-profits. He loves to talk about the things he's learned along the way, and he enjoys listening to and learning from others as well.

Test Environment Management Tools Compared

Five years ago, if you were asked to recommend a “Test Environment Management” platforms you might have struggled.  In fact, you might have struggle to identify one, particularly if you would have considered your own DevTest teams’ behaviour. Lot of disruption, delays, misconfiguration and the inevitable use of Spreadsheets for tracking project bookings, MS Visio document for system information capture, Email for Reporting and perhaps if you were lucky, some test automation for platform health checks. Not exactly elegant nor scalable but undoubtedly better than complete chaos.

However, things have somewhat changed and with a raft of solutions now claiming to solve this problem, The Last Frontier of the SDLC, the question now is not “what” but “which” platform will meet our needs and address one of the SDLC’s biggest “Waste Areas”?

At TEM Dot we decided to compare six of the biggest players in this space across 10 key areas:

Key TEM Vendors

Key TEM Performance Areas

  1. Modelling
  2. Booking Management
  3. Coordination
  4. Ticketing
  5. Health Monitoring
  6. Automation & DevOps
  7. Data Management
  8. Reporting
  9. Extensibility
  10. Affordability

Test Environment Management Tool Scoring

Area-1 Environment Modelling

The ability to know what your Environments and Systems look like.

Historically think Visio or your CMDB (if you have one).

Gold Medal Position:                  

Enov8 & ServiceNow both offer powerful Visual CMDBs & Component / discovery mapping.

Silver:                

Plutora & Xebia offer modelling capability.

Bronze:              

Apwide & Omnium modelling is achieved via tabular forms.

Area-2 Booking & Contention Management

The ability to capture environment requirements & manage contention on Environments & Systems.

Historically think Email & an attached Word document.

Gold Medal Position:              

Enov8 & Plutora offer advanced booking & contention analysis methods.

Silver:                 

Apwide, ServiceNow offer booking requests (ref ticketing) capability.

Bronze:              

Xebia has no obvious environment booking or contention mechanism.

Area-3 Environment Coordination

Tracking Events & Release activity across space (Environments) & time (Month, Year etc).

Historically think a MS Project Plans.

Gold Medal Position:   

Apwide, Enov8, Plutora, ServiceNow offer Environment & Release based calendaring.

Note: Enov8 & Plutora offer Runsheets /Implementation Plans (respectively).

Service Now offers checklists.

Silver:                 

Xebia – Calendaring is release centric (opposed to environment centric).

Bronze:              

Omnium (limited capability identified).

Area-4 Ticketing

Ticketing / IT Service Management to capture Environment Change Requests Incidents etc.

Historically think Remedy.

Gold Medal Position:              

ServiceNow has advanced ITSM methods.

Silver:                 

Apwide (using Jira), Enov8, Plutora have solid Ticketing / Requests functionality.

Bronze:              

Omnium & Xebia dependent on other tools.

Area-5 Health Monitoring

The ability check Systems or Components or Interfaces are up.

Historically think Test Automation scripting or your server monitoring solutions like Zabbix.

Gold Medal Position:                  

Enov8 & ServiceNow offer integration methods & native agents to monitor health.

Silver                  

Apwide & Plutora have APIs that logically allow system health updates.

Bronze:               

Omnium & Xebia don't play in this space.

Area-6 Automation & DevOps

The ability to automate key Environment Operations using code.

Think Jenkins or Puppet Jobs.

Gold Medal Position:                  

Xebia is a powerful release orchestrator (its primary purpose).

Silver:                 

ServiceNow Orchestration automates IT & Business Processes.

Enov8 offers “agnostic” Scripting Hub (Visual Orchestrate), Webhooks & URL Triggers.

Bronze:              

Apwide integration is very simple but can be achieved with Get/Post methods.

Plutora needs other tools to automate/integrate properly (like Dell Boomi). The SaaS only option can also be limiting.

Omnium integrates with other tools to automate.

Area-7 Data Management

The ability to manage one’s data e.g. Extract Data, Masking data, Provisioning Data etc.

Think Compuware File-Aid.

Gold Medal Position:      

Enov8 seems to be the only solution (be it a side solution) for Test Data. Enov8 offers support for Data (PII/Risk) Profiling & Masking and Data Bookings. Enov8’s Visual Orchestrate can also be used to schedule other Data Tools.

Silver:                 

Xebia & ServiceNow capabilities are limited but they can leverage their orchestrators.

Bronze:              

Apwide, Omnium & Plutora don’t appear to play in this space.

Area-8 Reporting

The ability to get & share insights about your Environments.

Historically think drawing pretty pictures & graphs with PowerPoint.

Gold Medal Position:                

A lot of the tools have solid reporting; however, focus is Environments: No Gold Medal yet.

Silver:                 

Enov8 seem to have best out-of-box Environment dashboards. Needs simpler customization.

ServiceNow Env Dashboard are limited but ultimately extensible.

Xebia have some solid report, but more deployment focused.

Plutora is reliant on a new “Tableau” extension. Getting there but seems disjoint.

Bronze:              

Apwide leverages Jira’s native capabilities.

Omnium approach is somewhat “download/export” focused.

Area-9 Extensibility

The ability to have the product do whatever you want.

Think of Salesforce or SAP.

Gold Medal Position:                

ServiceNow – An Extensible Engine. You can use it to build anything.

Enov8 – An “Object Oriented” Extensible Engine. You can use it to build anything.

Silver:                 

Plutora has broad customization features so you can “partially” alter its behaviour.

Bronze:              

Xebia allows customization of your processes but not the platform itself.

With Apwide & Omnium you basically get what you get.

Area-10 Cost

The money ball question. And potentially the most important for some.

Gold Medal Position:           

Low Cost of Entry – Apwide, Enov8 (Free Team Edition) & Omnium

Silver:                 

Medium – Plutora & Xebia

Bronze:             

Expensive - ServiceNow

The "Test Environment Management Tool" Score Card 

Test Environment Management Tools Comparison

Overall Test Environment Management Platform Rating

Final TEM Tool Positions

Position

Player

Findings

#1

Enov8

Very much a Test Environment centric solution.

#2

ServiceNow

An extensible ITSM solution, expensive but powerful.

#3

Plutora

More focused on Release Planning.

#4

Apwide

Simple & Elegant TEM/Release tool that has its place at the table.

#4

Xebia

More focused on Continuous Delivery

#5

Omnium

Inexpensive and will be the right fit for some.

 

Note: Scoring was limited to the ten key areas recognised by TEMDOT as the most important for successful Test Environment management. The scores do not reflect broader functionality i.e. functionality that may be deemed more important for your organization. If you feel there are inaccurate statements in this comparison or a tool missing, please reach out using our contact form.

DataOps Explained

Preamble

Companies—especially large internet companies—treat collections of data as an asset. And more and more companies are developing an appetite to leverage their data to compete. There are also increasing customer expectations for the fast release of high-quality products or services.

So how do you balance speed and quality? DataOps is your answer. Let’s take a look at what DataOps is and why it matters.

 

What Is DataOps?

The term DataOps is an abbreviation of the words data operations.

The speed of development and product release has decreased in the last 10 years due to technologies such as DevOps (development operations). As a result, we have a new problem: data and more data. To help draw insight from loads of raw data, companies use data analytics. Of course, there are various types, such as data mining, that help identify trends, patterns, and relationships in large data sets. Unfortunately, in our need-it-now economy, users of data analytics can’t—or won’t—wait for weeks or months to receive new analytics.

With the increased complexity of the emerging data ecosystem and the need to deliver insights more quickly, a new strategy is essential if we’re to gain value from massive amounts of data.

This is where DataOps comes in. It helps improve the delivery speed and robustness of analytics. In other words, DataOps is an automated, process-oriented methodology that helps analytics and data teams improve the quality of data analytics, as well as reduce its cycle time. To achieve this, DataOps combines agile development, DevOps, and statistical process control.

Similar to how DevOps brought together development and operations teams to handle software delivery problems, DataOps seeks to bring together data practitioners to deliver quality data for applications and business processes.

But do we really need another methodology?

Why DataOps Matters

In our current on-demand economy, a company has to rely on data from various sources to better understand their products, customers, and markets. This all sounds good until you factor in the dynamic nature of data. How do you effectively monitor the flow of a company’s data that includes prediction changes, business anomalies, trend changes, and more?

Someone could argue that we already have analytics to handle all of the data issues. But here’s the problem: Data analytics pipelines are in a deplorable state because of

  • Inadequate automation and orchestration
  • Minimal code and data reuse
  • Or a lack of coordination between the involved parties, such as IT, operations, and even business stakeholders.

In the end, we have poor-quality data that’s delivered too late to meet a business’s needs.

As more and more data is collected, the data pipelines become more complex. At the same time, large, more traditional enterprises realize the need to use all the data their company generates. Such information is becoming important even in everyday decisions.

Needless to say, all of these factors make it necessary for an organization to implement a new approach to govern the flow of data through its life cycle.

And here’s one more reason to consider using DataOps. Companies that have already implemented DevOps practices will find that implementing DataOps gives them a higher competitive edge. This is because the DevOps engineering framework may be regarded as preparation for DataOps. Organizations that rely on data need a similar high-quality and consistent framework that’s useful for fast data analysis.

Implementing DataOps in 7 Steps

DataOps is still a rising approach for data-driven organizations. DataKitchen, a company that developed a DataOps platform for data-driven enterprises, suggests seven steps for implementation. And the good news is you don’t have to discard your existing analytics tools.

Here are the seven steps to implementing DataOps.

Add Data and Logic Tests

This step requires that every time you make changes to an analytics pipeline, you have to add a test for the change. Testing applies to data, models, and logic. The idea is to make sure nothing will be broken in the analytics pipeline. These incremental, automated tests ensure that quality and integrity are built into the final output.

Use a Version Control System

In order for raw data to produce useful information, it goes through many processing steps. And all of these steps involve coding. In a similar manner to other software projects, the source files that data analysts use in the data pipeline require maintenance in a version control system such as Git. The aim of version control is to help keep track of changes and revisions. Keeping the code in a repository is also important, as it helps when there is a need for disaster recovery.

Branch and Merge

To maintain coding changes, data analytics should borrow the approach that software developers use to maintain their projects, which is to continuously update code source files. For instance, when a developer wishes to make changes, they pull out the relevant code from the repository. Changes are then made on the local copy (also called a branch) pulled from the repository. Once new changes are made and tested, the local copy (branch) is merged back into the repository.

Use Multiple Environments

Data analytics team members should have their own environment to work from. These environments will allow team members to work on subsets of data while isolating the rest of the organization from any effects of the ongoing maintenance or additions to the existing data.

Reuse and Containerize

Breaking down a data analytics pipeline into smaller components facilitates code reuse and containerization. By doing this, the data analytics team can move quickly as they leverage existing libraries or other code whenever they want to extend or develop new code.

Parameterize Your Processing

Borrowing the idea of parameters from software development will help in designing a robust data pipeline. And a flexible data-analytics pipeline will accommodate varying run-time circumstances.

Use Simple Storage

Simple storage helps make the whole data analytics pipeline readily available, and it eases the updating process.

What About Data Security?

There’s a lot of concern about how to gain insights from raw data in a robust yet fast manner. But we shouldn’t forget the consequences of data breaches across the globe. The costs you may incur for mishandling personally identifiable data is becoming too expensive. As you work toward building more and delivering faster, it’s important to consider the security of the data you handle.

When implementing DataOps, you must protect the data at every stage of its journey. Always keep in mind the bad guys who are ready to grab your data. And don’t forget the issue of accidentally sharing sensitive data that may cause you to fail to meet regulatory compliance.

Thankfully, there are solutions that help take these worries away, such as Data HotSpot—a product specifically designed for those in test data management and those who consume test data. With Data HotSpot, you are assured complete security, customer protection, brand protection, and penalty avoidance. That means you can implement DataOps and stay way ahead of your competitors with real-time or near real-time analytics.

Unlock the Value of Data

Today, there’s a need to avail data in real-time or near real-time because businesses rely on it to retain a competitive edge. As a result, it became necessary to create analytics methods that can quickly provide data for consumption by users or applications.

DataOps is a multidisciplinary approach that helps data analytics teams overcome the challenges of inflexible and poor-quality data. If an organization can implement DataOps properly, they will experience great improvements in producing robust and adaptive analytics.

As we’ve seen, DataOps matters today because it helps organizations create reliable and readily available data flows. And availability plays an important role in unlocking the value of an organization’s data.

Author: Alice Njenga

This post was written by Alice Njenga. Alice's areas of expertise include technology, artificial intelligence, IoT, cloud computing, security, and telecommunication. She especially enjoys converting dense technical material to articles that are easy for the layman to understand.

DevOps-Metrics

Top 5 DevOps Metrics

When people start talking about DevOps, the idea of metrics usually comes along for the ride. To be able to monitor software after release, we need to know what data is important to us. There are so many options, it may seem overwhelming to know where to look. However, we can limit our options based on two key factors: what decisions we’ll make and how customer-focused they are. With that in mind, I’ll share what I believe to be the five most important DevOps metrics.

DevOps-Metrics

 

Metrics Are for Decisions

The thing about metrics is that they’re useless on their own. People often say, “We need to track this data!” But you need ask them only one question: what decisions will you make with that data? You may be surprised how often—usually after some mumbling—the answer is “I don’t know.” Any metric that doesn’t support a decision or set of decisions we may want to make ahead of time is simply noise. We want to eliminate noise from our minds and focus on what guides our decisions for our team.

Customers First, Then Everything Follows

Knowing what decisions our metrics will support is a good start, but it’s not enough. There are millions of decisions we could make about what we’re seeing. We need a North Star, a guiding light, that will be the anchor from which we can derive a strong set of metrics. This anchor is our customers. For any metric we use, we should be able to point back to how it helps our customers. After all, we ultimately owe them our existence.

Top Five Metrics

Without further ado, I give you the top five DevOps metrics you probably should measure for your team:

  • Customer usage
  • Highest and average latency
  • Number of errors per time unit
  • Highest lead time
  • Mean time to recovery

Customer Usage

The first metric on our list is customer usage. This is any measurement that tells us how much our customers, internal or external, are using our features. When delivering new or enhanced features, it’s important to get to production as soon as possible. But we can’t assume customers want or will use a feature just because we put it in production. This is true even if they specifically ask for the feature. We can weigh how popular a feature actually is against how popular someone claimed it would be or what we estimated it would be.

It’s helpful for us to know how often customers use a feature—even one they requested—after we release it to production and inform them of its existence. Customers often think they need something “right away.” This can cause us to scramble, putting this feature on the top of our backlog. The feature might then sit, inert, for weeks or months because the customers reprioritized their desires.

Internal customers commonly are on a longer cadence, unable to use the feature until they get to it in their own backlog. Tracking customer usage allows us to say, “I know you said this is really urgent, but the last time you said that, it took you six weeks to start using it. Please be sure this is as urgent as you say it is.” We can also use this data to enhance the feature, watching usage go up or down, using hypothesis-driven development.

A good application performance monitoring (APM) tool can track this metric for you. It usually comes in the form of request counts or percentage of traffic.

Highest and Average Latency

Knowing how often customers use your features is a great start. But how do we know if customers are delighted or frustrated with our applications? This is a hard question to answer, but our next metric can hint to us that customers may be frustrated. One of the leading causes of frustration is an application’s slowness. When the response time—that is, the latency—is too high, customers are likely to go elsewhere for their needs.

We want to give our applications the best chance to make customers happy. They’ll appreciate it and likely stick around. If you have internal customers, it may be tempting to say, “They have to use my application, so I don’t need to worry about latency.” Putting aside the potential ethics issue of not caring whether your users have a pleasant experience, that mindset is folly. Even if your direct customers are internal, it’s likely that they or a downstream app are responding to external customers. So, slowness for them is still ultimately hurting your organization’s success. Even if this isn’t the case, enough complaints to the right people may get your applications scrapped.

Two major signals to look for when measuring latency are average latency and the slowest five percent or so of requests. Looking at the average gives you a nice bird’s-eye view of the application as a whole. But even one feature or subset of requests can be enough to create disgruntled customers. This is why it’s also important to keep an eye on your slowest requests.

We can decide where to tune performance with this information. An APM tool can handily monitor all of this for you, in addition to usage.

Number of Errors Per Time Unit

In the same vein of finding out whether our customers are happy, we have the metric of number of errors per time unit. The benefits of this should be pretty clear. Errors with high business impact not only cost your organization money, but they can erode customer trust. Looking at our error rates help us nip these in the bud and find abnormalities that even our tests can’t prevent.

Note that I said “errors with high business impact.” Not all errors are created equal. Your error metrics should differentiate between types of errors. Small glitches and errors are unlikely to erode customer trust or cost a lot of money. For example, if the screen is green instead of blue, that usually won’t be a problem for most people. Also, some errors are caused by users and should be expected. User errors are still good to track because they can provide information about how hard a feature is to use. Just be sure to keep them separate in your monitoring tool.

With this metric in hand, we can decide where to enhance our resiliency. If we can’t control the source of an error, we can decide to escalate that error to the appropriate team. For user errors, we can decide where to focus our efforts on increasing usability.

APM tools are also a great fit for this metric.

Highest Lead Time

Ideally, the work you deliver in your team is set up as a value stream, creating a flow of work from inception to customer usage. This lets us easily identify the individual steps it takes for a piece of software, usually a user story, to reach the customer’s hands. Think of it like an assembly line, but for software features. It’s helpful for us to look at the lead time that a user story takes to go through each step. This helps our customers by increasing the speed by which we get features into their hands.

If we adopt a Theory of Constraints approach, there’s always one highest lead time in our value stream. If we keep finding and reducing that highest lead time, we’ll be ever faster in our ability to deliver software. Say, for example, our value stream has a “coding” step and a “QA testing” step. We can record each step as part of a Kanban board and record which user stories are in “coding” versus “QA testing.” At the end of our iteration, we may see that cards sit in “QA testing” for three days on average, whereas cards sit in “coding” for only two days. “QA testing” is our highest lead time. We can then inspect why it takes so long to do QA testing and make improvements from there.

Lead time comprises two factors: process time and wait time. Process time is the time someone is actively doing something with the user story. Wait time is how long the user story sits idle, finished from the previous step and waiting to be picked up by the next step. Knowing both of these values separately will help the team know what actions they can take to improve the lead time. The decisions you may take on this are varied, but it’s good to have a system in place to frequently inspect and adapt to this metric. A sprint retrospective is a great example of such a system. And, as stated earlier, a Kanban board is a great way to track this metric.

Mean Time to Recovery

The final metric, mean time to recovery, is somewhat of an extension of our error count metric. While it’s good to know how many errors we’re getting, it’s also important to know how fast we can resolve these errors. This goes back to business impact. Business impact is a function both of how often we receive an error and how long it takes to recover from that error. One error that lingers for minutes could have more impact than 20 errors that last only a few milliseconds.

Having both of these metrics will give us a good line of sight on our business impact on errors. This metric is also a good indicator of how equipped your team is to handle operational issues. It’s an often underinvested portion of a team’s tooling.

We can use this metric to decide where we want to improve our insight into our application, such as by adding more logging context. We can also use this metric to help us decide how to simplify our architecture or make our code more readable.

Many tools specialize in error tracking to make it easy to see how quickly the team resolves issues. Some APM tools also have error tracking features.

Strength in Measurement

The key to good measurement is to understand what decisions we’ll be making. These decisions will be most effective when we center our customers. Drawing from this, we can derive a set of strong metrics that ensure our team operates at its best. With these metrics, no challenges will stand in our way for long.

Author: Mark Henke

Mark has spent over 10 years architecting systems that talk to other systems, doing DevOps before it was cool, and matching software to its business function. Every developer is a leader of something on their team, and he wants to help them see that.

DevOps Myths & Misconceptions

Common DevOps Myths and Misconceptions

“Wait, what actually is DevOps?”.

If only I had a dime for every time someone asked me this. For many, the term DevOps comes loaded with misconceptions and myths. Today, we’re going to look at some of the common myths that surround the term so that you have a better understanding of what it is. Armed with this knowledge, you’ll understand why you need it and be able to explain it clearly. And you’ll be equipped to share its ideas with colleagues or your boss.
DevOps Myths & Misconceptions

So, What Is DevOps?

Before we go through the myths of DevOps, we’ll need to define what DevOps actually is. Put simply, DevOps is the commitment to aligning both development and operations toward a common set of goals. Usually, for a DevOps organization, that goal is to have early and continuous software delivery.

The Three Ways of DevOps

DevOps is not a role. And DevOps is not a team. But why?

We’ll get to that in just a moment. But before we explain the myths, let’s build on our definition of DevOps by looking at “the three ways” of DevOps: flow, feedback, and continual learning.

  1. Flow—This is how long it takes (and how difficult it is) for you to get your work from code commit to deployed. Flow is your metaphorical factory assembly line for your code. And achieving flow usually means investment in automation and tooling. This often looks like lots of fast-running unit tests, a smattering of integration tests, and then finally some (but only a few!) journey tests. This test setup is what is known as the testing pyramid. Additionally, flow is usually facilitated by what’s known as a pipeline.
  2. Feedback—Good flow requires good feedback. To move things through our pipeline quickly, we need to know as early as possible if the work we’re doing will cause an issue. Maybe our code introduces a bug in a different part of the codebase. Or maybe the code causes a serious performance degradation. These things happen. But if they’re going to happen, we want to know about them as early as possible. Feedback is where concepts like “shift left” come from. “Shift left” is the idea that we want to move our testing to as early in the process as possible.
  3. Continual Learning—DevOps isn’t a destination. DevOps is the constant refinement of the process toward the early delivery of software. As we add more team members, productivity should go up, not down. Continual learning comes by having good production analytics in place. In practice, this could look like conducting post-mortems following an outage. Or it could look like performing process retrospectives at periodic intervals.

The three ways are abstract, that I’ll concede. But it’s the process of converting these abstract ideas into concepts and tools that have created confusion en mass throughout the industry.

So, without further ado, let’s do some myth busting!

Myth 1: DevOps Is a Role

As we covered in the introduction, DevOps is the commitment to collaboration across our development and operations. Based on this definition, it’s fundamentally impossible for DevOps to be a role. We can champion DevOps and we can even teach DevOps practices, but we can’t be DevOps.

Simply hiring people into a position called “DevOps” doesn’t strictly ensure we practice DevOps. Given the wrong organizational constraints, setup, and working practice, your newly hired “DevOps” person will quickly start to look like a traditional operations team member that has conflicting goals with development. A wolf in sheep’s clothing! DevOps is something you do, not something you are.

DevOps is not a role.

Myth 2: DevOps Is Tooling

For me, this is easily the most frustrating myth.

If you’ve ever opened up the AWS console, you know what it feels like to be overwhelmed by tooling. I’ve worked on cloud software for years, and I still find myself thinking, “Why are there 400 AWS services? What do all of these mean?” If tooling is often abhorrent for me, it’s definitely hard for non-technical people.

Why do I find this myth so frustrating? Well, not only is describing DevOps through tooling incorrect; it’s also the fastest way to put a non-technical stakeholder to sleep. And if we care at all about implementing DevOps ideas into our work, we desperately need to be able to communicate with these non-technical people on their terms and in their language. Defining DevOps by cryptic-sounding tooling creates barriers for our communication.

Tools are what we use to implement DevOps. We have infrastructure-as-code tools that help us spin up new virtual machines in the cloud, and we have testing tools to check the speed of our apps. The list goes on. Ever heard the phrase “all the gear and no idea”? Defining DevOps by tooling is to do precisely this. Owning lots of hammers doesn’t make you a DIY expert—fixing lots of things makes you a DIY expert! DevOps companies use tooling, but…

DevOps is not tooling.

Myth 3: DevOps Doesn’t Work in Regulated Industries

DevOps comes with a lot of scary, often implausible sounding practices. When I tell people that I much prefer trunk-based development to branch models, they usually recoil in disgust. “You do what?” they exclaim, acting as if I just popped them square in the jaw. “Everyone pushes changes to master every day? Are you crazy?” they say.

No, I’m definitely not. The proof is in the pudding. When you have a solid testing and deployment pipeline that catches defects well, having every developer commit to the same branch every single day makes a lot of sense. Don’t believe me? Google does it with thousands of engineers.

Many believe that these more radical approaches don’t work in a regulated environment or in scaled environments, like finance. But the evidence is abundantly clear. Applications that are built with agility in mind (meaning it’s easy and fast to make changes) are less risky than their infrequently delivered counterparts.

Yes, it might feel safer to have security checkpoints and to have someone rifle through 100,000 lines of code written over six months. But security checkpoints are little more than theater. They make us feel safe without really making things that much safer. What does reduce security risk is automating your testing process, making small changes, putting them in production frequently, and applying liberal monitoring and observability.

DevOps works in every environment.

Myth 4: DevOps Replaces Ops

Implementing DevOps doesn’t mean you need to go fire your system admins and operations staff. In fact, on the contrary, you need their knowledge. Knowing absolutely everything about development and operations is almost impossible. So, you’ll need people who have different specialties and interests.

Rather than fire our operations teams, we need to make sure their goals are aligned with the development teams’ goals. Everyone simultaneously should be driving toward faster delivery of high-quality software. A good waiter has tasted the food on the menu, but all waiters don’t need to be chefs.

DevOps doesn’t mean removing Ops.

Wrapping Things Up

So, there you have it. The top four myths about DevOps—busted. Hopefully, this clears things up a little and you now know what DevOps is and isn’t. It’s principally a set of beliefs and practices first, with tooling, roles, and teams being secondary.

Every company can and should incorporate ideas of DevOps into their business. It will lead to happier engineers and happier customers.

This post was written by Lou Bichard. Lou is a JavaScript full stack engineer with a passion for culture, approach, and delivery. He believes the best products emerge from high performing teams and practices. Lou is a fan and advocate of old-school lean and systems thinking, XP, continuous delivery, and DevOps.

 

The Cat and the Map

Why Map Your IT Environments?

“Would you tell me, please, which way I ought to go from here?”“That depends a good deal on where you want to get to,” said the Cat.“I don’t much care where” said Alice.“Then it doesn’t matter which way you go,” said the Cat.“so long as I get somewhere,” Alice added as an explanation.“Oh, you’re sure to do that,” said the Cat, “if you only walk long enough.”  — Lewis Carroll, Alice’s Adventures in Wonderland
Why Map Your IT and Test Environments?

Preamble

Running a high-functioning IT team or tech company requires you to be clear in your mind where you want to take your team. If you’re not clear about that, then just like Alice in the quote above, it doesn’t matter which way you go—or, in the context of the increasingly complex tech ecosystem, it doesn’t matter which methodology or tools you adopt. Then you end up implementing this technology or that methodology halfheartedly, which leads to you switching to new technology and methodology, and the cycle repeats. This leads to a form of techno-methodology whiplash for your team. Is that what you want for your team? I hope not.

Know Your Destination, Know Your Landscape

What the Cheshire Cat didn’t point out is that for most of us dealing with complex situations, knowing the destination isn’t enough. We need to know the landscape to plot our way to success. In this article, I will cover the top four reasons why you need to properly map your IT and test environment to bring your team to perform at a high-functioning level.

One View to See It All

When you map your IT and test environment, you essentially establish the landscape of the situation. A good map lets you bring together various priorities and interests of your team and organization in a single view. The benefits of doing so can’t be underestimated. Miller’s law states that the average human mind can hold only about seven things at any one time. Without a map to oversee the entire landscape, how could you possibly navigate your team around risks of deployment, development, and the day-to-day running of the IT and test environments?

In addition, you can build a map that contains multiple levels. Imagine that at the organization overview you map out the various key structures, such as business, ops, IT environment, and test environment. Then you can drill in further by adding in the substructures, such as system instances, applications, data, and infrastructure. All these structures and substructures will interact among themselves, which is why you need to add in the relationships among these structures, the projects, and the teams in your organization.

Now imagine you have this map right now. Wouldn’t that make it a lot easier to think about your decisions and weigh your options? You can almost literally trace how a possible solution would impact which system and which team—so before you even encounter objections, you can anticipate them. That’s the power of a single view of your landscape captured in a map.

Spotting Existing Gaps and New Opportunities

When you have a map, the map almost immediately shows you some low-hanging fruits to pick. Existing gaps and opportunities to improve your existing operations show themselves easily. These low-hanging fruits can give you some quick wins for you and your organization.

Some typical quick wins would be:

  1. Identify waste and save costs. For example, you may identify system instances being maintained but not used.
  2. Identify underutilized resources and consolidate them. This happens quite frequently as well. For example, you have a bunch of system instances that constantly have low utilization. You can decide to consolidate them to bring about a better return on your expenditure on these resources.
  3. Identify undersized systems or applications and reallocate buffer resources. Once you reduce waste and free up resources targeting underutilized resources, you can deploy some of these freed-up resources at the undersized systems. Typically, people would complain that these undersized systems were constantly stretched and not enough resources could be spared due to budget. In other words, you can help reallocate your resources better simply by having this map.
  4. Identify the high-growth areas and enable them to grow faster. With a map, you can view how certain systems or applications are growing quickly because they are driven by fast-growing demand. When you can link these high-growth areas with how they help with organization, you will be able to convince management how adding more budget makes business sense. Or you can redeploy resources from other structures facing slowing growth. In either case, a map bolsters the strength of your decision.

Streamline and Simplify Processes

Everyone has a story about dealing with silly, ridiculous bureaucratic processes. However, as a civilization, progress means more processes are needed for things to run smoothly. Running your IT and test environment successfully means having good processes to ensure things run smoothly. Think Value Stream Mapping.The key is to know when these processes become less effective or even outright unnecessary. Then you need to retire or remodel your processes. The key, therefore, is to discover these increasingly ineffective processes and nip them in the bud.

So, study the stats from your troubleshooting and logs and add those to your map. Talk to your various teams from business and customer support. Add anecdotes in as well. In a single view, you would be able to allow both data and personal stories to drive your decision on how to simplify running your IT and test environments. Streamlining and pruning away processes that used to be (but are no longer) necessary would release more resources back to your budget. This kick-starts a virtuous cycle as freed-up resources can then be redeployed for growing opportunities.

Better Impact Analysis and Scenario Planning

Once you take advantage of the single view to quickly exploit new opportunities, uncover waste, increase better utilization of resources via reallocation, and streamline processes, you have established credibility about mapping. Imagine earning all that success without even using the methodological or technological fad of the day.

Now it’s time for the exciting stuff—planning the future. Once again, the mapping will help greatly. You can plan several scenarios and strategies in a playbook and then check them against the map. The check would involve some kind of impact analysis. The scenario planning exercise is widely used by some of the top-performing organizations in the world. Having a map of your IT and test environment improves the effectiveness and efficiency of the exercise. No more guessing about potential impact of brainstormed strategies for future scenarios; you can immediately check and verify obvious drawbacks and benefits. Scenario planning is better because impact analysis becomes better with a map of your environments.

Conclusion

In Enterprise IT intelligence, “environment mapping” represents a highly beneficial and foundational exercise all IT teams and tech companies should perform at least once every quarter or so. It provides high visibility to the many interrelated structures and their relations in your organization. It is not easy to discern these structures and their relations without the map. The increase in visibility delivers great benefits. Agility, smooth delivery, greater collaboration, and good operational and business decision-making all flow from the greater visibility of the landscape surrounding your team and organization. Buy-in becomes simpler when everybody can be on the same page—and when everybody is looking at the same map as well.

The importance of mapping your environments is key to your organization’s success. Bear in mind that maps are imperfect, but they are still very useful. Mapping helps you and your team become better at your jobs simply because you did the exercise of mapping. The exercise surfaces the differences in the thinking between the members in your team. Therefore, don’t wait until you come up with the perfect map. Your team automatically becomes better with more practice mapping. Your team and your organization will thank you for that when they start to see the uptick in results.

Author: TJ Simmons

This post was written by TJ Simmons. Kim Sia writes under the nom de plume T.J. Simmons. He started his own developer firm five years ago, building solutions for professionals in telecoms and the finance industry who were overwhelmed by too many Excel spreadsheets. He’s now proficient with the automation of document generation and data extraction from varied sources.

Just Enough ITSM

Just Enough ITSM (or ITSM for Non-Production)

Preamble

We've all experienced the frustration that comes from too much or too little service management in your test environment. Lately, the DevOps engineer in me has been thinking about how we end up in one of those states. How can we get just enough service management in non-production environments?

Production environments require more service than non-prod environments. But we shouldn't throw the baby out with the bathwater when it comes to service management in non-prod. I'm a software developer who practices DevOps, so I do a lot of work involving operations, deployment, and automation. I interface with many groups to achieve a good workflow within the organization.

Operations and development often have contradictory goals. Fortunately, we can all find common ground by working together. Understanding each other's needs and goals through communication is the key to success!

 

But before we get into that, let's explore the world of IT service management (ITSM) for a bit. In this post, I'll discuss different levels of service management in non-prod environments and borrow some fundamental DevOps principles that can help you get the right amount of ITSM. Let's start with an overview of non-production environments.

What Are Non-Production Environments?

We use non-production environments for development, testing, and demonstrations. It's best to keep them as independent as possible to avoid any crosstalk. We wouldn't want issues in one environment to affect any of the others.

These environments' users are often internal—for the most part, we're talking about developers, testers, and stakeholders. It's safe to assume that anyone in the company is a potential user. It's also safe to assume that anyone providing a service to the company might have access to non-production environments. But there could also be external users accessing these environments, perhaps for testing purposes.

Unless you have the environment in question tightly controlled, you may not know who those users are. That's a big problem. It's important to understand who's using which environments in case someone inadvertently has access to unauthorized information. Or maybe you just need to know who needs to stay informed about changes or outages in a specific environment.

That's where service management comes in. The next section explains how bad things can be when there is no service management in non-production. This exercise should be fun...or it might make you queasy. Better have a seat and buckle up just in case!

When You Have Zero Service Management in Non-Prod

Let's call this the state of anarchy. Here's what it looks like:

  • Servers are running haywire and no one knows it.
  • Patches are missing.
  • Security holes abound!
  • The network is barely serviceable.

Can anyone even use this environment? How did it get like this, anyway? I have a couple of theories...

  1. Evolutionary Chaos: This model was chaos from the start. Someone set up an environment for testing an app a long time ago. It did its job and was later repurposed. Then, it got repurposed again. And again. Eventually, it started to grow hair. Then an arm sprouted out of its back. Then it grew an extra leg. Suddenly, it began to "self-organize." Now it seems to have a mind of its own. It grew out of chaos!
  2. Entropic Chaos: Entropy is always at play. It takes work to keep it from causing decay. In this theory, things were great in the past. But over time, service management became less and less of a priority for this environment. Entropy won the day, and the situation degraded into chaos.

However the environment got into its current chaotic state, the outcomes are the same. Issues are resolved slowly (if at all). Time is wasted digging up information or piecing it together. Data becomes lost, corrupted, and insecure. Owning chaos is a burden and a huge risk in many respects. We don't want to end up here!

If you've made it this far and still have your lunch in tow, you're past the worst of it. You can uncover your eyes, but be wary! Next, we're going to look at a wholly buckled down environment and how it can go wrong in other ways.

When You Have Too Much ITSM in Non-Prod

It's better to have too much service management than not enough. But it's still not ideal. For one thing, it's wasteful. For another, it causes morale to suffer. Granted, it's reasonable to default to production-level service management at first. But staying on default is a symptom of a big problem—communications breakdown. And the root cause of having too much ITSM is due in part to human nature and in part to organizational legacy.

Here are my two theories on how organizations end up here:

  1. Single-Moded Process: Service delivery, operations, and all other departments focused on service management are hell-bent on making sure the customer is absolutely satisfied with their service. Going the extra mile to make sure the customer is happy is a good thing! Operations folks are trained on production-level service management, so their priority is to keep the trains running. With this in mind, operations management systems are set up for production environments. It's easiest to use that same default everywhere. For better or worse, every environment is treated like a production environment!
  2. Fractured Organization: Organizations are sub-divided into functional groups. When these groups aren't aligned to a shared purpose, they'll align to their own purposes. They even end up competing with each other. They'll center up on their own aims, tossing aside the needs of others.

How You Know When There's a Problem

The fractured organization theory may explain what happened to a friend of mine recently. Let's call him Fabian.

Fabian was the on-call engineer this past June. The overnight support team woke him up several nights in a row for irrelevant issues in the development environment. He brought this up to operations, who were responsible for managing the alert system. Unfortunately, the ops engineer was not sympathetic to his concerns in the slightest. Instead, the ops guy put it upon Fabian to tell him what the alert system should do. That's understandable, but Fabian had no information to that aim. The ops guy wouldn't share anything with Fabian or collaborate with him on putting a plan together.

This story illustrates a misalignment between operations and development. Problems like this crop up all over the place. Usually, we can remedy or even avoid these situations by taking just a bit more time to understand the other side.

The four theories I've presented tell us about extremes. And yes, these extremes push the boundaries and aren't likely to occur. Still, an organization sitting somewhere in the middle may not have the right service management in non-production. As we've seen with Fabian's story, this is often an issue of misaligned goals.

So how do we get to just enough service management? Maybe the answers lay in what's working so well for DevOps! Let's see how.

Just Enough Service Management

IT teams have members with specialties suited to their functional area. Operations folks keep the wheels turning. QA makes sure the applications behave as promised. There are several other specialties—networking, security, and development are just a few examples. Ideally, all of these teams interact and work together toward a well-functioning IT department. But it doesn't just happen. It takes some key ingredients.

Leadership

Working together effectively takes good leadership. Leadership happens at all levels in an organization. Remember, a leader is a person, not a role.

Shared Vision

It's also critical to have a shared vision and shared goals. Creating a shared vision is part of being a leader. Here are a few points to remember about vision:

  • A shared vision creates alignment.
  • The vision should be exciting to everyone.
  • You have to do some selling to get everyone aligned with the vision.

Your vision for the test environment could be something like: "Our test environment will be a well-oiled machine." Use metaphors like "Smooth Operators" or "Pit Crew" to convey the right modes of thinking.

Open Communications

Keep communications open and honest. Open, honest communications can be one of the most significant challenges you'll face in implementing the right amount of service management. Many of us have a hard time being honest for fear of looking weak in the eyes of others. That fear is difficult to overcome, especially in an environment where we don't feel safe and secure. Managers have the vital task of creating an environment where employees feel safe and able to communicate openly. Trust is essential to success.

One Last Look

Getting the wrong amount of service management in any environment is a problem. Too little opens up all kinds of risks. Too much ITSM results in wasted time and resources. In this post, I presented four theories for how an organization might end up with the wrong amount of service management in non-prod and discussed what changes you can make to correct that.

ITSM doesn't happen in a bubble. It takes alignment between many stakeholders. There are three main things we can do to get alignment: wear your leader hat, share the vision, and converse honestly. You can accomplish any goal when you're set up to win—even with something as challenging as achieving just enough service management.

 

Author: Phil Vuollet

This post was written by Phil. Phil Vuollet uses software to automate process to improve efficiency and repeatability.

The EMMi

The 8 Dimensions of the EMMI (Environment Management Maturity Index)

If your interested in IT & Test Environments Management then you have probably heard of the  Environment Management Maturity Index (EMMI), the de-facto standard for measuring ones  Test Environment Management capability. 

If not then let me summarize: the EMMi is a maturity index that provides you with a standard frame of reference to help you assess your strengths, weaknesses, opportunities, and threats. 

A powerful tool for assessing your environment and operational capability across your enterprise and help you quickly opportunities to improve.

As shown in the diagram, the EMMI does this by scoring you on eight key performance areas (KPAs). Today, I've decided to dive deeper into each of those key performance areas so that you can make a well-informed assessment.

The EMMi

KPA 1: Environment Knowledge Management

First up is environment knowledge management. This refers to your ability to understand how your projects move through all your environments, including development, testing, staging, demoing, and production.

However, this is about more than just one software team. This is about understanding how your systems are connected in each environment across multiple software systems and business units. You will likely need a few models of both low-level relationships and higher-level connections of your systems to gain a strong understanding.

When you know how your software systems are connected as they move through environments, you can avoid many problems. You reduce the risk of disruption when a team needs to release to a new environment. For example, if your billing system is dependent upon your product catalog and the product team releases a new version to QA, you may suddenly see network timeouts when you call the service. That timeout is probably due to a performance bug. If you understand how these systems are connected in QA and if you know the process well, you'll avoid hours or days of triaging, trying to figure out why your tests are intermittently failing.

KPA 2: Environment Demand Awareness

Next up we have environment demand awareness. This is not about how much load is on your environments. It's about why you have those environments. Ideally, you should know who's using them and why. Some environments may have obvious uses, like development. However, other uses may be surprising.

Take QA for example. I was once on an engagement where we developers thought it was our job to test out new features before we released to production.  So we kept changing the setup to suit our needs. Eventually, a flock of business analysts came our way, yelling and waving their arms for us to stop. It turned out that many of our customers used QA to test out significant pieces of data before they staged into production, and we were deleting their hard work. Knowing who's using your environments and why will prevent these kinds of things from happening.

When you know who's placing demands on your environments, you can also plan better. You may know of a new group of users coming in the pipeline. Or perhaps your environment is taking a hit from many users at once. If you realize you have two different sets of users in that environment, you can split that environment. You can even tailor each environment depending on those users' needs.

KPA 3: Environment Planning & Coordination

Once you know who's connected to your environments as well as who's making demands of them, you can plan for their needs and yours. It's key to be able to consistently plan and roll out environmental changes to meet upcoming milestones across your enterprise.

Imagine if one of the product team members decided to load test their catalog system and generated five million fake products in their QA environment. This ripples forth to your QA, and none of the purchasing testers can actually do any work. This in turns clogs up their deployments and delays your ability to launch. We can avoid these types of problems with good planning and coordination.

It's also important that your planning and coordination is consistent across teams. When you have a consistent process, all the teams will know when to share knowledge and when to synchronize efforts.

KPA 4: Environment IT Service Management

It's not enough to deliver and manage your environments. Since you have users who demand these environments, we need to put on our customer service hat and support their ongoing use. We should diligently manage incidents, changes, and releases to ensure our users are getting what they need. If we neglect the ongoing support and operations of our users in these environments, the piling amount of incidents and user demand may threaten to overwhelm us.

When we spin up a new environment, we need to ensure the appropriate teams own it end to end. They need to have the necessary tooling and operational support to maintain this environment for its entire lifetime. This means well-understood communication on incident resolution and criticality. And it means well-understood processes to manage changing environmental needs.

KPA 5: Application Release Operations

Alright, this one gets a little tricky. It's healthy to have consistent and repeatable processes across your enterprise for releasing applications. But it's an easy risk to read this and interpret it as "standardize your deploys." I want to be clear: application release and deploys are not the same things.

Your deploys are all about getting packaged source code to the right place. But application release is about exposing new functionality to customers. At the lowest maturity, this happens only during deployment. But with mature teams, we can use tooling and processes to separate the idea of deploying code from activating it for customers.

This means you want to ensure your software teams are equipped to continually deliver code to production and to do it in as automated a fashion as possible. Once your teams are doing this, we can shift our focus to how to activate—or release—this code to our customers. There are many tools to help you make this change. It's this process that you want to standardize across your organization. That way, customers know what to expect, and they'll understand how to check if new features have arrived.

KPA 6: Data Release & Privacy Operations

Let's talk about another key performance area: data release.

Data release across your environments is just as important as application release. But it's often neglected. Each application team ideally owns its own data, but teams need to be explicit how they manage that across their environments.

Time for another story. I knew a team that was quickly delivering high-value financial software, but they depended on a few backend services. Some of these services had a data refresh that occurred once a quarter or so. However, they didn't make this known to the team, so the team had set up their QA environment with a test bed of data to give them a speedy turnaround time on user stories. This data refresh hit them like a punch in the gut. It killed their velocity for weeks.

It's healthy to avoid such problems in your enterprise. We want to ensure data release processes are well known and consistent across teams. It's also a good idea to automate as much as possible to ensure this consistency stays intact, letting our teams work on more valuable efforts.

KPA 7: Infrastructure & Cloud Release Operations

In the same vein as data releases, infrastructure releases have an indirect but profound impact on your teams' applications. How you handle your infrastructure has a ripple effect across multiple applications. If managed well, you can provide a cushion of protection for software systems to run and fail in isolation. If mismanaged, it can bring down a whole ecosystem of applications.

One would think I'd be out of stories by now, but I have another: I was on an engagement at a Fortune 10 company that, as far as I know, is still mismanaging their infrastructure releases. They built an in-house cloud platform from the ground up, but they didn't consider their environmental demand, nor did they create an automated and repeatable system. They instead created a system that requires every application team on it to move every few months. And every move brings with it different problems. They provide no tooling to automate this move. At one point, they would consistently lose a data center every week for three weeks straight. Not only was the platform unstable but it also actively hampered application teams from delivering because they were too busy migrating their infrastructure.

There are many tools to help us manage this effectively. We can take advantage of external cloud platforms. We can practice infrastructure as code principles. Also, we can use configuration management tools to ensure our environments are consistent and we can always go back to a fresh state.

Think of your infrastructure releases as a bed frame, and you want your software teams to feel like they are lying on a comfortable mattress, not a bed of rocks.

KPA 8: Status Accounting & Reporting

Complex systems are quickly becoming table stakes in the world of IT. This complexity makes it harder and more valuable to stay on top of your system health and behavior. Yet the faster you can make decisions about your systems and react to problems, the more competitive you will be.

Throughout your teams, you want to ensure you have ways of understanding team health. That way, you can support troubled areas. You want to monitor system health so that you can triage and fix defects before your customers even know. And you want to get real-time data on your system behavior so that you can react faster than your competitors and get new features out quickly.

This is connected with the infrastructure release key performance area, as you want to equip your software teams with standard tooling to accomplish all of this. The more consistent your tooling, the more you can aggregate data and see behavior across multiple systems.

Multi-Dimensional Success

Getting a handle on these key performance areas across your organization is a potentially tough but worthy endeavor. Mismanaging any of these will cause pain, but handling them well will create a cohesive, value-focused set of teams.

Ready to take the next step? If you're feeling confident about your environments or you're just curious, go ahead and calculate your environmental maturity. The results will give you insight into what area most needs your attention.

The Author: Mark Henke.

Mark has spent over 10 years architecting systems that talk to other systems, doing DevOps before it was cool, and matching software to its business function. Every developer is a leader of something on their team, and he wants to help them see that.