IT Infrastructure Insights for Modern IT Operations
The auto discovery process besides collecting all the inventory data it also collects all the topology relationship data. For example, the VLAN, the interface, the links between compute storage and the network entities. It builds a very detailed dynamic topology map by doing the topology analytics. For example, as you see, this topology map belongs to a data center where all the entities across storage, network, and compute are connected with each other. By taking your mouse to any of the domain specific icon you can see the corresponding data.
For example, these are the storage devices and by going to the storage device you can look at all the details and how they’re connected with different network entities. Like what if you take a mouse to network it will show you all the network entities across physical and virtual layer including routers, switches, load balancer, virtual switches virtual routers, etc. If you take your mouse to compute, you will see all the compute devices including every bare metal servers, hypervisors, the VMs, and any hyper converged infrastructure and it showed you how they’re all connected with each other. Each device is connected and you can see the links between them that has meaningful information including the source and target interface, and the operational state of the link. You can also overlay varieties of operational data that is collected to open API interface including the data, health information or ticketing information, log analytics by connecting with our tools like Splunk, the data can be collected by the FixStream data collectors or it can be ingested from your existing tools. For example, if you click on fault, very quickly it will show you in color coding in green, yellow, and red color the critical major and minor faults that are reported. And, you can simply go to the device that has that fault associated with it and you can further drill down into the details about that particular fault and see exactly what that fault is about. Likewise you can look at the alert and it will quickly show you all the performance issues based on the threshold set by the end user and a person’s teams.
Varieties of alerts for example the CPU, disk, memory, and alerts and other information from different entities. Likewise by open tickets from the ticketing system. You can also filter this view by going to different devices on the left hand side and looking at a vendor specific data within the topology or different device types. Once the data is collected and the correlator on the topology map, the platform does a holistic view into your performance data across the domains. So if you go to the performance center, the platform creates a heat map across every entities to understand how different entities have been used relative to each other. And, based on the weighted score of 33.3% of CPU memory and disk, those are the input criteria to build the heat map. It shows you in a score of 0-to-100 different devices and a score is assigned to them based on their utilization. For example, these are all the different VMs within the environment and this subset of the VMs falls into the 10% category, these are the 20% categories, 30% and so on and so forth and that shows you how well the VMs are utilized. That gives you enough analytical insight to understand over or under utilization based on different scenarios within your environment. You can optimize your resource allocations from there. You can also use this analytics view to do placement. For example, if I have a workload that I need to deploy that is way more intense you can simply change the input criteria for the heat map to make memory as 100% and immediately the heatmap is computed based on memory and you can look at the VMs which are very low utilized on memory and they have a lower score. This is a perfect candidate candidate for that work placement where is on the right hand side you can look at these VMs which have a 98.94% score which are consumed heavily on memory. You can further drill down into the details and look at the historic trend of the memory utilization of those VMs. So this gives you the analytics input for you to do your research planning, optimization, and management.
Likewise the IP address view gives you the detail insight into the operational data. Such as, performance metrics and alerts which are based on the threshold you have set. For example, if you go to the device view, by device type, you have two different devices types. Compute gives you the performance matrix of all the servers you can click on them and you can go into details and different metrics are readily available for you across the CPU, disk, and interface and memory. Likewise if you go to the network device type these are all the different network entities and you can further drill down into into them to understand the performance characteristic. For example, the availability metrics, the CPU, all the interface utilizations, all the ethernet ports. For instance, you can click on them and look at all the meaningful interface utilizations metrics being collected by the platform and it is a process to give you the time series view into their health. The service matrix shows you all the details about the application service and the performance matrix that are collected from different service monitoring tools that you may have. Such as Nagios, storage provides you the metrics of different storage entities like lun, volume, port, controller, aggregate you can set threshold and then those become alerts accordingly. Likewise, availability metrics shows you the overall ability of the different entities with the compute storage and device. Likewise, the config manager is something that can be set by your operations team. Whereby the network configuration data can be collected and it can be stored in our time series database and it can be analyzed to understand the difference. It shows if an issue has happened and alert. It can be used as one of the input for troubleshooting.
We also provide you with an operational dashboard where this data is readily available based on the critical events alerts that needs to be used to take actions. For example, you can quickly see the critical events that include faults and alerts. You can see the health score for the compute and the network that shows you how healthy each of the domains are and that is calculated based on the different events been collected over time and you can look at all the different tickets that are open for different domains and also the top used devices. So this gives you that landing zone for you to quickly understand our different operational behavior issues that you need to take actions on. In summary the FixStream platform provides you the infrastructure insight that is highly correlated and different operational data is a good correlator on top of the topology map to understand the dependence and root cause of events that are causing that are causing issue in business applications. Thank you.