Problem: The Challenge of Diagnosing System Issues in a Virtual Landscape
In traditional IT environments, various services were managed in siloes. As organizations adopted different technologies and migrated to the cloud and microservices, system environments became more complex, interconnected and difficult to diagnose. IT teams end up drowning in data and distracted by system noise. Faced with massive amounts of system alerts and incidents to diagnose, IT Operations staff don’t have the insights needed to proactively manage and resolve system issues within their environments.
The situation is complicated by the fact that most hybrid IT environments have multiple technology components – i.e. network, storage, compute, application services etc. – provided by a variety of vendors. For example, typical business transactions may use an average of 80 different types of technology. These components are monitored and managed in silos by different teams and tools, making it difficult to uncover the point(s) of failure within the network. Having a distributed, fragmented landscape of various system technologies and suppliers makes it difficult to quickly find and resolve incidents.
The truth is that today’s virtualized dynamic hybrid IT environments can’t be adequately managed with yesterday’s domain-centric monitoring systems. As applications and services shift to the cloud, you need automated resources to monitor and keep track of your network to ensure all is running smoothly. Using manual monitoring approaches no longer makes sense.
Why is that? One reason is that traditional tools and processes deliver a limited awareness of system components and their interdependencies across the hybrid IT infrastructure.
With no end-to-end visibility within an application environment, how can a performance issue with a specific application be quickly uncovered and resolved using manual approaches?
Another shortcoming of manual monitoring is that issues are often diagnosed sequentially (i.e. is the failure within the firewall, the storage? How about the database?), dramatically lengthening the time it takes to discover and repair the faulty component.
Incident Triage Requires Real-time Visibility across Entire Infrastructure
To make sense of distributed systems’ alarms and rapidly pinpoint any faults, IT teams need a solution that delivers consolidated management data and an up-to-date visual end-to-end overview of the complete network solution.
One such solution is Service Intelligence, a new managed service from BT based on big data and machine learning technologies provided by FixStream. Using FixStream’s AIOps solution, Service Intelligence monitors the health of each network site and component. Bringing numerous sources of information together, the service delivers a real-time 360-degree view of system availability and performance status across the entire data-center infrastructure and the wide area network connecting the sites.
Faster Fault Discovery, Faster Problem Resolution
The Service Intelligence solution aggregates the data from all the management tools from any device or location and displays the topology and application maps on a context-aware, intuitive, visually rich dashboard. Service Intelligence uses the geographical/network site’s topology and application maps to overlay diagnostic data across the entire infrastructure. Just as Google Maps visualizes traffic jams or accidents in the navigation panel, FixStream builds application maps and alerts to its users when there is an outage so operations staff can quickly triage and resolve the issues.
Service Intelligence correlates alarms and consolidates data from multiple sources, providing a single pane of glass display of current infrastructure status. The display highlights trouble spots via a “rooftop view” that pinpoints where the fault is located on a topology or application map. It correlates alarms from multiple sources to rapidly pinpoint any faults on a topology or application map. With this information, Support teams can quickly repair the exact issue, improve performance or remedy an outage.
Predictive Diagnostics Can Stop System Failures Before They Happen
Over time, the machine learning capabilities within Service Intelligence solution begins to recognize patterns, leading to predictive capabilities for potential failures. Staff can identify the business-impacting problem that needs to be prioritized and address it preemptively, rather than waiting to repair system components when they actually fail. By proactively resolving potential issues, teams can avoid potential damage to the business.
Service Intelligence Delivers a Better Customer Experience
The improved diagnostics and problem resolution capabilities of Service Intelligence are driven by FixStream, the next generation of IT operations analytics. Customers benefit with:
- Greater speed and accuracy of system issue diagnostics and resolution across multi-vendor, multi-technology system landscape.
- Less downtime due to rapid incident diagnosis and speedy resolution.
- Reduced costs associated with fault management,
- End users are able to complete business transactions seamlessly, reducing revenue lost from system downtime. The result is higher customer satisfaction and increased revenue.
The FixStream/BT – United to Deliver Network Availability across the Globe
BT uses FixStream AIOps technology to provide customers with real-time, end-to-end visibility and incident management of your entire ICT estate on a global basis. FixStream’s cloud and visualization platform can prevent outages and identify blind spots in today’s world, one that increasingly depends less on hardware and more on virtualized networks and software.
Large global organizations have mission-critical applications can’t afford to fail. With the Service Intelligence solution, customers now have the tools they need to automate incident management across their organization.