Request a Software Demo

See why EMA says "FixStream is a vendor to watch very closely."

Please share your email to download the report

Download Document

Download Document

Download Document

Monthly Archives : March 2019

2019 Predictions for AIOps

Many enterprises have found that Artificial Intelligence for IT operations (AIOps) platforms help them better manage their hybrid IT environment and improve system availability. Next-generation solutions – call it AIOps+ — now offer advanced capabilities such as automated pattern discovery and prediction, automated it infrastructure entity discovery and application mapping, and data and event correlation across the IT stacks. Based on these advanced features, we predict more firms will adopt AIOps+ in the coming year.

#1 – Enterprises will experiment with AIOps tools but will deploy AIOps+ solutions

While standalone AIOps tools can deliver useful capabilities such as anomaly detection and noise reduction across both legacy and digital environments, organizations may discover the promise of AIOps is difficult to achieve without understanding the business application context. Enter AIOps+ that works across the IT environment, ingesting and correlating massive amounts of data from a variety of sources, from business transactions to applications to it infrastructure. The full-stack data correlation delivered by AIOps+ will increase the accuracy of the machine learning predictions by enabling the ML algorithms to work on real-time, application specific events.

#2 – AIOps+ will help organizations succeed with digital transformations

While traditional IT environments can accept MTTR in the range of hours, modern organizations that are increasing their reliance on digital processes, will not survive unless they can depend on hybrid IT environments with significantly higher uptime performance. Imagine an on-line reservation system, a patient onboarding application, or an eCommerce site – what would be the business impact if they were down for hours or day? AIOps+ solutions provide the advanced real-time/predictive analytics needed for IT teams to make sense of vast sets of event data and drive faster IT incident response. By extracting meaningful insights from business transaction metrics, application flows and incident data, IT can quickly determine the root cause of an issue and even predict when the next outage will occur. Industry analyst Gartner sees an increasing role for artificial intelligence systems  in the next few years as IT departments struggle to support their sprawling digital environments with limited staff.

#3 – MSPs will start deploying AIOps+ to increase their revenues and efficiency

Given the pressing need for 24/7 system availability and reliability, AIOps+ is becoming the future of the agile, digital enterprise. Clients depend on their MSPs to meet their service level agreements (SLA’s) and maintain system availability at all time. However, as more mission-critical applications migrate to the cloud, the siloed, fragmented nature of distributed hybrid IT operations can hinder the incident diagnosis/resolution process. Performing root-case analysis and system diagnostics is complicated by several factors such as: where is the problem located and who owns it; is the issue related to an application or infrastructure failure; and how is the problem impacting system performance?  AIOps+ can provide the end-to-end visibility of distributed IT environment and real time data analysis that MSP’s need to automate root cause analysis and quickly resolve service performance issues.

#4 – Enterprises will appoint centralized teams to take responsibility for AIOps+ deployments

Today, very few organizations have a centralized team responsible for IT operations. Since today’s tools are siloed, IT organizations have structured themselves in the same manner with compute, network, applications and storage analysts each focused on their niche, each using a disparate tool. They still are not using cross-functional tools that correlate data across each silo. In isolation, it may not look as if there is an issue however when the data is correlated, issues emerge. With AIOps+ solutions, organizations gain access to powerful new capabilities. Building a centralized team ensures that the organizations gets a holistic view of the entire hybrid IT environment.

#5 – Consolidation in the AIOps market will start to occur, with those offering AIOps+ gaining ground

Many AIOps vendors have emerged in the market – and not all are worth the hype. Today, there are big and small AIOps providers with varying capabilities. As the AIOps market matures, we foresee a supplier shakeout, with those offering limited AIOps capabilities not surviving. Rather, we think successful vendors will be those offering robust AIOps+ solutions with advanced features such full stack correlation, agentless auto-discovery, automated remediation/restoration, data anomaly detection, and seamless integration with ITSM tools.

#6 – Traditional siloed ITOM/ITOA budgets will freeze and be reallocated to AIOps+

To keep up with the demands of digital business, organizations need to rethink their approach to managing silo hybrid IT environments and system maintenance. As the pain of operational volume, value, variety and velocity of big data analytics grows, organizations need an effective alternative to disparate application monitoring tools and siloed solutions. We predict forward-thinking organizations will reallocate budgets to AIOps+ solutions to better manage, plan, and troubleshoot issues across the entire IT infrastructure monitoring in real-time.

#7 – An ecosystem of solutions (built by SI/VAR/MSPs) will emerge to enrich existing AIOps+ platforms

The best AIOps solutions can seamlessly integrate with other IT assets, ERP applications, and multiple data sources. For example, FixStream’s new release integrates with ServiceNow and Cherwell ITSM/CMDB platforms, New Relic APM, SolarWinds and ManageEngine monitoring tools, etc. delivering functions such as auto-ticketing and change management, and creating an integrated solution greater than the sum of its parts. Taking it a step forward, we see a community of value-added resellers (VARs), system integrators (SI’s) and MSP’s emerging to create customized solutions for AIOps+ platforms that enhance its capabilities, perhaps to meet the needs of a specific industry or to incorporate advanced security features.

Conclusion

In 2019, we think more IT organizations will recognize how AIOps+ solutions can detect, predict and resolve business issues across an enterprise’s entire hybrid IT environment. For example, the latest AIOps+ offering from FixStream enables customers to detect patterns with 90%+ probability and reduce MTTR to just minutes, so IT operations can achieve appropriate service levels and meet customer expectations.

For more information on how FixStream AIOps+ can help you modernize IT ops this year, view our video.

What’s the Real Cost of Data Center Downtime?

We all know that IT downtime is expensive and damaging to organizations and their productivity. Yet, despite ongoing investments in technology, system outages still bedevil many enterprises, including cloud-based environments. In 2018, for example, cloud outages at companies such as Microsoft, Amazon Web Services, and Visa reminded us that even well-maintained IT environments are vulnerable to system disruptions. An extreme example is the 2017 Delta Airlines 5-hour outage that caused the cancellation of 280 flights and cost the company $150 million dollars.

With numerous IT assets spread across on-premise and cloud environments, downtime today can cause a lot more harm than mere customer inconvenience. System outages can directly impact productivity, throughput, profitability and customer attrition. You only need to experience one system outage to realize how much harm it can cause to a company’s reputation and standing in the marketplace. But rather than considering IT downtime as an inevitable cost of doing business, senior IT managers can proactively take steps to minimize their occurrences and impact. And quantifying the cost of system outages can justify spending to remediate them.

How Much do System Outages Really Cost?

Catastrophic airline outages aside, just how costly is downtime to the typical enterprise? There are direct costs for system diagnosis and repair, as well as indirect costs such as the loss of organizational productivity and damage to corporate reputation. Costs may also include damage to (or loss of) mission-critical data and other assets, legal and regulatory impact, and repair costs for core business processes and systems. Add in the lost revenue related to business disruptions and missed sales opportunities, and you can see how even one hour of downtime can cost several hundred thousand or million dollars.

To tally up estimated costs, analyst firms have surveyed clients who experience system downtime and can quantify its impact. Cost estimates will vary by industry, size of the organization, and region. For example, according to Statista, in 2017/2018 the average cost of server downtime was approximately $300,000-400,000 per hour, while 44% of survey responders reporting costs of $1M/per hour or more. And those costs, and associated business impact, would be higher if you experience an unplanned outage during peak traffic time.

The more complex, virtualized and interconnected system environments become, the longer it takes to diagnose and resolve unplanned outages. The siloed, disparate nature of most IT hybrid system infrastructures have caused Mean-Time-To-Recovery (MTTR) rates to escalate along with costs. In fact, ITIC’s Reliability and Hourly Cost of Downtime Trends Survey confirmed that 81% of organizations report the cost of unplanned downtime typically exceeds $300,000/hour, with monetary costs exceeding millions of dollars per minute in extreme cases.

The ITIC study also shows that downtime costs vary between industries and enterprise size. For example, large enterprises with over 1,000 employees could see the costs associated with a single of hour of downtime to exceed $5 Million in nine specific industries, including Banking/Finance; Government; Healthcare; Manufacturing; Media & Communications; Retail; Transportation and Utilities.

Total Cost of System Downtime by Industry

(summary Ponemon Institute study)

  • Financial Services $994,000
  • Healthcare $918,000
  • eCommerce $909,000
  • Industrial $761,000
  • Retail $758,000
  • Hospitality $514,000
  • Public Sector $476,000

Take a look at the following charts (Ponemon Institute study). The first chart shows reveals the relatively consistent breakdown of cost categories associated with business disruption. The second chart illustrates that the average shutdown duration hasn’t changed in the last 6 years.

Why System Outages Happen

A 2018 study by Information Technology Intelligence Consulting points to human error and security issues topping the list of causes for unplanned downtime, with network interruptions another contributing factor, and outdated processes can lengthen the time it takes to resolve outages. Pinpointing the root cause of system outages is complicated by manual, time-consuming correlation of massive amounts of siloed operational data. Proactive planning, automation, and better resources can help minimize human error and hardware/software failures. For example, automated real-time correlation of data between business, application, and infrastructure components can help predict potential system issue so they can be resolved before impacting the business.

Proactively Manage Assets to Minimize System Downtime

To prevent downtime, you must be able to effectively monitor and maintain your assets in real time, and arm your IT staff with tools to help make sense of IT complexity. Artificial Intelligent systems for IT Operations (AIOps) can transforms IT Ops by significantly reducing human errors and the tedious repetition of cumbersome manual processes. By providing real-time full-stack data correlation and visualization of the entire system environment IT staff can gain actionable insights to optimize system performance and meet customer expectations. Viewing and understanding application dependencies can also help employees forecast the potential impact of system changes before they are implemented. This allows you to carefully plan transitions and migrations so they don’t affect the performance of business-critical applications.

AIOps can reduce system downtime by increasing application assurance and uptime. To see how FixStream AIOps can help you improve system availability and reliability, download our free eBook.

BT and FixStream: Mapping the Way to Rapid Issue Resolution

Problem: The Challenge of Diagnosing System Issues in a Virtual Landscape

By Sameer Padhye

In traditional IT environments, various services were managed in siloes. As organizations adopted different technologies and migrated to the cloud and microservices, system environments became more complex, interconnected and difficult to diagnose. IT teams end up drowning in data and distracted by system noise. Faced with massive amounts of system alerts and incidents to diagnose, IT Operations staff don’t have the insights needed to proactively manage and resolve system issues within their environments.

The situation is complicated by the fact that most hybrid IT environments have multiple technology components – i.e. network, storage, compute, application services etc. – provided by a variety of vendors. For example, typical business transactions may use an average of 80 different types of technology. These components are monitored and managed in silos by different teams and tools, making it difficult to uncover the point(s) of failure within the network. Having a distributed, fragmented landscape of various system technologies and suppliers makes it difficult to quickly find and resolve incidents.

The truth is that today’s virtualized dynamic hybrid IT environments can’t be adequately managed with yesterday’s domain-centric monitoring systems. As applications and services shift to the cloud, you need automated resources to monitor and keep track of your network to ensure all is running smoothly. Using manual monitoring approaches no longer makes sense.

Why is that? One reason is that traditional tools and processes deliver a limited awareness of system components and their interdependencies across the hybrid IT infrastructure.

With no end-to-end visibility within an application environment, how can a performance issue with a specific application be quickly uncovered and resolved using manual approaches?

Another shortcoming of manual monitoring is that issues are often diagnosed sequentially (i.e. is the failure within the firewall, the storage? How about the database?), dramatically lengthening the time it takes to discover and repair the faulty component.

Incident Triage Requires Real-time Visibility across Entire Infrastructure

To make sense of distributed systems’ alarms and rapidly pinpoint any faults, IT teams need a solution that delivers consolidated management data and an up-to-date visual end-to-end overview of the complete network solution.

One such solution is Service Intelligence, a new managed service from BT based on big data and machine learning technologies provided by FixStream. Using FixStream’s AIOps solution, Service Intelligence monitors the health of each network site and component. Bringing numerous sources of information together, the service delivers a real-time 360-degree view of system availability and performance status across the entire data-center infrastructure and the wide area network connecting the sites.

Faster Fault Discovery, Faster Problem Resolution

The Service Intelligence solution aggregates the data from all the management tools from any device or location and displays the topology and application maps on a context-aware, intuitive, visually rich dashboard. Service Intelligence uses the geographical/network site’s topology and application maps to overlay diagnostic data across the entire infrastructure. Just as Google Maps visualizes traffic jams or accidents in the navigation panel, FixStream builds application maps and alerts to its users when there is an outage so operations staff can quickly triage and resolve the issues.

Service Intelligence correlates alarms and consolidates data from multiple sources, providing a single pane of glass display of current infrastructure status. The display highlights trouble spots via a “rooftop view” that pinpoints where the fault is located on a topology or application map. It correlates alarms from multiple sources to rapidly pinpoint any faults on a topology or application map. With this information, Support teams can quickly repair the exact issue, improve performance or remedy an outage.

Predictive Diagnostics Can Stop System Failures Before They Happen

Over time, the machine learning capabilities within Service Intelligence solution begins to recognize patterns, leading to predictive capabilities for potential failures. Staff can identify the business-impacting problem that needs to be prioritized and address it preemptively, rather than waiting to repair system components when they actually fail. By proactively resolving potential issues, teams can avoid potential damage to the business.

Service Intelligence Delivers a Better Customer Experience

The improved diagnostics and problem resolution capabilities of Service Intelligence are driven by FixStream, the next generation of IT operations analytics. Customers benefit with:

  • Greater speed and accuracy of system issue diagnostics and resolution across multi-vendor, multi-technology system landscape.
  • Less downtime due to rapid incident diagnosis and speedy resolution.
  • Reduced costs associated with fault management,
  • End users are able to complete business transactions seamlessly, reducing revenue lost from system downtime. The result is higher customer satisfaction and increased revenue.

The FixStream/BT – United to Deliver Network Availability across the Globe

BT uses FixStream AIOps technology to provide customers with real-time, end-to-end visibility and incident management of your entire ICT estate on a global basis. FixStream’s cloud and visualization platform can prevent outages and identify blind spots in today’s world, one that increasingly depends less on hardware and more on virtualized networks and software.

Large global organizations have mission-critical applications can’t afford to fail. With the Service Intelligence solution, customers now have the tools they need to automate incident management across their organization.

Submit to Download

>