How to manage the growing complexity of cloud infrastructure

Cloud-based workloads need automated tools to allow operations to scale. Faced with the number of workloads, resources, and other things that make our cloud-based solutions complex, we not only need automation but must have it to survive.

It’s been years since you began migrating applications and data to public clouds. Where you once had a dozen applications on public clouds a few years ago, last year you added dozens more, and now you’re pushing past 500.

While cost savings is the reason for migration to the cloud, something happens when you pass 500 applications. The simple place to host and operate applications suddenly becomes complex, and this leads to an unexpected rise in operation complexity that you just didn't see coming. Why?

The short answer is that most enterprises will soon surpass 500 workloads, and that includes SaaS, PaaS, and IaaS, delivered via public and private clouds. According to 451 Research’s latest Voice of the Enterprise: Cloud Transformation survey of IT buyers, 41 percent of all enterprise workloads "are currently running in some type of public or private cloud. By mid-2018, that number is expected to rise to 60 percent, indicating that a majority of enterprise workloads will run in the cloud in the near term.”

By mid-2018, 60 percent of enterprise workloads will run in the cloud.

451 Research

The tipping point?

The truth is that the tipping point is lower for some enterprises, say, 150 to 250 workloads, and higher for others, maybe 500 to 700. Therefore, 500 is on arbitrary number. The only consistency is that there is a tipping point, where the number of workloads outpaces the enterprise’s ability to effectively manage those workloads.

Basically, there is a tipping point for the cloud because some baggage comes with operating cloud-based applications that require automated operations to keep systems up and running.

Operations must adjust to management practices and tools that can span the world. Thus, when enterprises reach the tipping point, it typically means that the applications are also geographically distributed, which makes operations even more complex.

Clouds can abstract the underlying infrastructure from the platforms and applications. While that helps developers because the infrastructure is being hidden from them, operations needs to set up processes and technology to ensure that the infrastructure is available and reliable.

You don't really care where the physical servers exist, but you must manage them consistently. As the number of workloads grows, so does the number of servers that must be managed. Excel works OK to keep track of IP addresses, server names, and resources from 0 to 150 workloads, but the number of things to track soon grows out of control.

Clouds run applications that share common services. For example, where we once had 10 workloads accessing the same cloud-based database a year ago, it’s now grown to more than 100. These workloads are typically decoupled from the database, meaning they are all usually dependent upon that database. If the database goes down, so do the workloads. We must operate the database accordingly, and the importance has changed a great deal from 10 to 100 workloads that all need that same database to work and run.

Most cloud operations leverage a great deal of automation. From the jump we knew that our cloud-based workloads needed automated tools to allow operations to scale. Faced with the number of workloads, resources, and other things that make our cloud-based solutions complex, we not only need automation but must have it to survive.

With usage-based accounting systems in place, those who leverage cloud resources have their cloud usage tracked. They can then allocate the costs accordingly, with showbacks and chargebacks. For the most part, these types of systems are afterthoughts, often brought in after there is a need, and that’s typically too late. These systems allow you to track costs but keep the users within budget so additional cloud resources can be obtain later, and they keep the cloud providers honest about your usage and their charges.

Preparing for the storm

What’s great about this problem is that you know it’s coming. For most enterprises, there’s still time. Get your operations’ best practices and technology in place as soon as you can. Move through a process that will ensure you pick the right processes, people, and technology. I recommend the following strategy:

  • Define requirements. Specifically, workload requirements, existing and future. Keep in mind that you’re building operational best practices and tool sets that will allow you to keep the cloud-based workloads up and running, meaning you need to understand just what they are doing and the technology they are employing. This is perhaps the most important thing you can do. With perfect information, you can come close to making perfect decisions.
  • Define people and processes. Most enterprises have traditional systems, operations, processes, and people in place. When cloud computing comes along, they try to fit those processes and people into their new cloud-based systems. This is a bad idea for many reasons, but the biggest issue is that public and private clouds are managed differently than traditional systems. You need new skill sets and operations technology to provide the right processes and automation. Thus, you’ll need to adjust your processes, and either retrain or replace your people. Most organizations fall down here, considering that we’re dealing with human resources that are difficult to change.
  • Find the right tools. But understand that they won’t be perfect. Those who deploy clouds and see a tipping point ahead think that technology will save them. That’s almost never the case. Organizations that focus too much on operations and management tools, especially without having a good understanding of their requirements, are likely to pick square-peg tools for round-hole problems.

The tools should provide general-purpose capabilities that span all workloads and clouds. If you’re using one tool for a few workloads and adding a dozen tools to provide operations management, then you’re making your job only more complex.

The idea is to provide a layer of automation and abstraction from you, the workloads, the infrastructure, the network, etc., and to allow you to control many things using a single interface. Moreover, everything within those tools should have the ability to be automated to allow for auto-recovery, auto-scaling, and other operational processes that kick off when certain conditions are met.

Sound complicated? We seem to go through these sorts of problems with any new technology scales. We saw this with the rise of the PC, with the rise of the Web, and now the rise of the cloud. New technology always creates a level of complexity in managing the new technology.

It’s understandable that most companies in the Global 2000 that now leverage cloud-based applications and databases did not see the tipping point coming. Most cloud providers don’t talk about it, and for the most part, enterprises have not experienced it, having now moved past only 50 to 100 workloads on public and private clouds, on average.

Now the bigger complexity problem begins. That said, with a bit of planning and selecting the right technology for the job, complexity is a solvable problem. 

Cloud infrastructure: Lessons for leaders

  • Define your workload requirements, existing and future. With perfect information, you can come close to making perfect decisions.
  • Public and private clouds are managed differently than traditional systems. You need new skill sets and operations technology to provide the right processes and automation. 
  • Find the tools that provide general-purpose capabilities that span all workloads and clouds.

Related link:

Drive business agility and performance with Private Cloud Express with vRealize