Getting a Handle on AIOps And Learning What These Platforms and Solutions Can Do for You
Getting a Handle on AIOps And Learning What These Platforms and Solutions Can Do For You
By Enzo Signore & Bishnu Nayak
As the headline suggests, we wrote this blog to inform readers like you about AIOps. The first question many of you probably have is: What the heck is AIOps?
The simple answer is that AIOps stands for Artificial Intelligence for IT Operations. It’s the next generation of IT operations analytics or ITOA. And its value is in helping organizations address IT challenges on a number of fronts.
These challenges include:
- The increasing complexity and dynamic nature of IT architectures
- Digital business transformation
- Siloed IT operations
- Exponential data growth
All of the above render traditional, domain-centric monitoring and IT operations management inadequate. Such systems can’t correlate the onslaught of data various IT domains create. What’s more, they’re unable to provide insights IT operations teams need to proactively manage their environments. And that just won’t cut it.
AIOps solutions, however, can address these challenges. They enable enterprises to unify and modernize IT operations. And they allow enterprises to make the most of their existing network investments.
Let’s confront the above-noted IT challenges one at a time. Then we’ll explain how AIOps can help your business conquer them.
The Increasing Complexity and Dynamic Nature of IT Architectures
To increase business agility, IT organizations are deploying dynamic, modern IT architectures enabled by virtualization technologies. That includes containers, elastic clouds, microservices, and virtual machines.
At least a quarter of businesses had adopted containers by late 2017. The application container market was worth $762 million in 2016. By 2022 it will balloon to $2.7 billion. The use of cloud platforms is on the rise, as more businesses migrate more applications. By July 2018, 80 percent of all IT budgets will be committed to cloud solutions.
The dynamism these architectures and technologies enables is important for businesses. It helps them adjust to the fluctuating demands of millions of digital customers around the globe.
However, that often comes at the cost of decreased visibility. That’s because application workloads and flows are now abstracted from their physical infrastructure. And that creates new challenges in pinpointing potential issues.
So without end-to-end correlated data, adoption of these key technologies can be risky and cumbersome. Because IT staff will unable to effectively map current workloads to these new environments. And they’ll struggle to manage their performance and uptime. Plus, purchasing these new technologies can be extremely expensive, and AIOps can serve as insurance that organizations get maximum ROI from those investments.
“By 2022 the applications container market will be worth $2.7 billions”
Digital Business Transformation
Enterprises across the globe are leveraging digital technology to transform their businesses. Such efforts aim to provide better experiences to their prospects, customers, suppliers, and internal stakeholders.
To succeed as digital companies, businesses need to rethink their entire IT stack and operational strategy. And they need to ground these efforts with business-first considerations.
That should include how they think about application and network uptime.
Enterprises incur an average cost of $300,000 per outage. That’s if no revenue is at stake. If the outage impacts revenues, organizations lose an average of $72,000 per minute. That means companies lose a whopping $5.6 million per outage.
You can see why modern enterprises must make applications assurance and uptime their No. 1 objective. Those that don’t could face catastrophic damage to their revenues and reputation.
“Companies lose a whopping $5.6 million per outage.”
The Problem with Siloed IT
Research suggests 41 percent of enterprises use 10 or more tools for IT performance monitoring. Seventy percent use more than six. And you need even more tools to manage a hybrid cloud environment. That will include solutions to monitor workloads running in AWS, Azure, or multi-cloud environments.
Domain-centric tools provide a deep view into a specific domain. But they lack the ability to provide a correlated and end-to-end view across domains.
That’s a problem because cross-domain data collection, correlation, and visibility are key. They can enable you to track transaction problems like failed eCommerce orders to infrastructure issues like database timeout errors, for example.
But siloed management tools prevent most organizations from making these important connections. As a result, most enterprises suffer from very longer Mean Time To Repair intervals and unhappy customers.
MTTR averages 4.2 hours and wastes precious resources. Businesses employ an average of 5.8 full-time equivalent employees to address each incident. That FTE figure is as high as 11 in 15 percent of cases.
This drain of resources and finger pointing occurs as IT staff members struggle to manually correlate data. And often a whole lot of data is involved. Solving a critical business problem often entails using hundreds of data points – imagine how complex it becomes when IT is required to use thousands or millions of data points. That’s a lot.
“Mean time to repair averages 4.2 hours and wastes precious resources.”
The Challenge of Exponential Data Growth
Indeed, millions of data points are now flowing to the IT operations team in real time. This data deluge will only accelerate as adoption of containers, microservices, and virtualization grows.
There are automated ways to collect and process this massive amount of data from an individual domain, but domain specific teams then need to manually correlate it. (And 79 percent of organizations reportadding more IT staff to address this problem is not an effective strategy.) This is not only time consuming but also prone to incorrect interpretation and results, requiring skilled resources from different domains, thus leading to a very long diagnostic process for root cause identification.
“Containers alone generate 18 times more data than traditional IT environments.”
To address these challenges, organizations need a new class of technology to modernize the IT operations process. This technology needs to be able to correlate millions of data points across all IT domains. It should have the smarts to apply machine learning to detect patterns. And it should present that information so organizations can easily see what’s happening and gain insights.
This technology is what we mean when we talk about AIOps.
Gartner recognizes AIOps as a new strategic IT segment.
Artificial intelligence for IT operations (AIOps) platforms are software systems that combine big data and AI or machine learning functionality to enhance and partially replace a broad range of IT operations processes and tasks, including availability and performance monitoring, event correlation and analysis, IT service management, and automation,” (Gartner – “Market Guide for AIOps Platforms” – Will Cappelli, Colin Fletcher, Pankaj Prasad. Published: 3 August 2017)
Figure 1: Gartner’s visualization of the AIOPS platform
AIOps Platform Enabling Continuous Insights Across IT Operations Management
The general process by which AIOps platforms and solutions operate includes three basic steps.
An AIOps platform first needs to observe the nature of data and its behavior. That involves collecting information through data discovery.
AIOps data discovery needs to support big data scale. That way it can address the volume of data from different IT domains and sources. Those sources may include legacy infrastructure or new container, hybrid cloud, or virtualized environment elements.
Whatever the data or source, speed is key to the observation part of the process. So the data must be collected in near real time to detect patterns. Performance- and health-related information is collected from hundreds of sources – using an agentless or agent model. Successful AIOps platforms leverage a combination of mechanisms to collect data from a multi-domain and multi-vendor environment. That environment may include an array of containers, hypervisors, network and storage solutions, public cloud, and other technologies and architectures.
A successful AIOps platform also combines the power of big data and machine learning with domain knowledge to identify data relationships and history to solve this complex problem.
An AIOps platform provides orchestration across key IT operations domains – most importantly IT Service Management.
ITSM activities such as change management and incident management have traditionally been manual. And they’re typically heavily dependent upon the Configuration Management Database. The problem with legacy CMDBs is they are highly unreliable for environments involving frequent change.
The AIOps platform provides analytics and input to make ITSM tasks more automated and reliable. For example, AIOps can update CMDBs using its knowledge of the environment, state, and changes. The AIOps platform’s ability to observe hybrid environments on an end-to-end basis provides this power. That ensures CMDB data is relevant and reliable. That allows for automation and faster and more accurate incident management. The automation also minimizes risks that might otherwise happen due to human error. And pattern recognition allows businesses to see and address problems before they affect end-user experiences.
Automation or closed loop functions is the nirvana of AIOps platform.
Of course, automating critical IT operations using machine learning is new territory for most organizations. And IT leadership will need to get comfortable with it before they fully embrace automation. But new state-of-the-art automation – which uses advanced human inputs and machine learning – is maturing. And organizations can employ it today to do both simple and more complex jobs.
For example, they can employ it to clean log files to free up space. And they can use it to restart an application. Automation also can change application traffic policy on a router if AIOps sees the need.
How and Where AIOps Delivers Value
Enterprises that have deployed AIOps solutions have experienced transformational benefits. They include revenue growth, better customer retention, improved customer experience, lower costs, and enhanced performance.
Their operational teams have been able to:
- Increase end-to-end business application assurance and uptime
- Manage an integrated set of business and operational metrics
- Predict and prevent outages
- Dramatically reduce Mean Time to Detect and Mean Time to Repair
- Lower the number of IT FTEs dedicated to troubleshooting
- Decrease operational noise and alerts
- Optimize IT and reduce IT costs
- Replace older, silo-focused IT monitoring tools
- Auto-discover complex, heterogeneous topologies
- Gain visibility into the hybrid IT environment
- Accelerate migration to the hybrid cloud
- Expedite the adoption of hyper-convergence and microservices architecture
- Reduce risk in consolidating and migrating data centers
- Free up resources to enable IT operations to become a proactive source of innovation
- Automate and reduce the cost of audits and compliance
- Simplify IT processes
- Break down silos across their IT teams
- Enable less experienced staff to become more productive, faster
What the AIOps Architecture Looks Like
An AIOps solution includes the following functional blocks:
We’ll address these building blocks from the bottom up because that’s how AIOps itself works.
Open Data Ingestion
An AIOps platform collects data of all types from various sources. That may include data on faults, logs, performance alerts, and tickets. The ability to ingest data from the most diverse data sources is critical. It allows for an accurate, real-time view of all the moving parts across hybrid IT environments. More about open data ingestion here.
Given the very dynamic nature of modern IT environments, businesses need an auto-discovery process. That automatically collects data across all infrastructure and application domains – including on-premises, virtualized, and cloud deployments. And it identifies all infrastructure devices, the running applications, and the resulting business transactions. Read the auto-discovery blog.
Then it’s time for the AIOps platform to correlate this data in a contextual form. So it needs to determine the relationships between infrastructure elements, between an application and its infrastructure, and between the business transactions and the applications.
To learn more about the importance of correlation, check out this blog.
Once the end-to-end correlation process is completed, data need to be presented in an easy-to-use format. And that’s what visualization is all about.
Visualization is important because allows IT operations to quickly pinpoint issues and take corrective actions.
Of course, visualization in IT operations has become a commodity. Every solution includes a dashboard of some type. Yet an estimated 71 percent of organizations say data is not actionable. That’s why AIOps is important. It provides a new generation of visualization that makes data actionable.
Because visualization is key, we’ve also put together a blog on this topic. You can find it here.
Finding the root cause of a problem is key. But it’s even more critical to determine recurring patterns and predict likely future events.
AIOps solutions use supervised and unsupervised machine learning to determine patterns of events in a time-series. They also detect anomalies from expected behaviors and thresholds and predict outages and performance issues. Learn more about machine learning here.
Automation is a key component of AIOps as it delivers the end ROI to the customer. It does so by automating human IT ops tasks, reducing significant OPEX, and expediting innovation. And it reduces MTTR and can improve customer satisfaction.
AIOps enables IT operations to modernize existing processes. It allows IT Operations to make progress vs traditional ITOA strategies, abandon old, reactive processes, and become proactive, by predicting issues and preventing outages.
By providing an end-to-end correlated view of the entire IT environment, AIOps allows enterprises to accelerate their digital transformation strategies, adopt new technologies faster, and increase business productivity.
To learn more about FixStream, check out our AIOps solution whitepaper.