Transforming IT: Application Flow Management

Probably none of you have ever heard of application flow management or AFM. And if you have, or think you have, you most likely associate it with network management technologies like NetFlow, for capturing application “conversations” across the network. While this isn’t entirely off the mark—paired flow analysis of traffic volumes across the network is a part of AFM—it’s very far from being the whole picture.

For a few years now, EMA has been assessing a series of capabilities that tend to be orthogonal (set at right angles) to most approaches to network performance management; and to performance management in general. Traditional approaches, which are still the most dominant in the market, and especially dominant among platform vendors, have focused on monitoring events and collecting SNMP statistics through polling network and systems devices. These solutions use this information to isolate points of failure in the infrastructure on a component level that may or may not impact application service performance.

These types of traditional monitoring tools are, as a class, getting better due to better analysis across network and systems devices. In many respects, they represent the next generation of platform centric performance monitoring tools for integrated network and systems performance. But, as a class, they are still far from offering the complete picture and have two very significant limitations.

First of all, they depend on polling for information, and so cannot do real-time analysis (most still collect data every 10-or-15 minutes). Such a low frequency of information gathering makes sense since polling for performance information can in itself cause congestion, and in one or two instances EMA has documented private networks where better than 50% of the network performance issues came from enterprise performance management traffic. But this type of polling provides only limited visibility into service performance, and critical intermittent problems can sometimes go completely undetected.

A second problem with traditional performance management tools is then don’t monitor actual application traffic. They monitor the infrastructure supporting it and through various levels of intelligence (and here the quality ranges vastly) deduce what the application impacts are on a specific component. This is something like studying storm systems by monitoring the trees—certainly not irrelevant, but not the most direct approach.

Assessing AFM

AFM is the flip side of this approach. This is EMA’s term for all those technologies that directly address application traffic at various layers (especially layers three-through-seven and above in the OSI stack). These include a whole host of technologies, but a short list would include:

Analytics for assessing the Quality of Experience (QoE) of applications and transactions from the end-user perspective. QoE should be able to provide a governing metric for assessing the impact of service performance from the point of view of the end user.

Transaction analysis at the data center , or other end of the infrastructure, to monitor transactions within the application server with interfaces into the database server and the performance of other data center elements such as storage. This is an area where a parallel and complementary set of capabilities come into play to round out the visibility of the application’s performance through the heart of the data center to the end user in remote locations.

Analysis of flows from servers to end devices over IT infrastructure, while supporting packet-level drill-down analysis. In other words, end-to-end latencies of application traffic across the infrastructure with packet and protocol drill down—both for real-time analysis and for network forensics targeted at persistent and difficult-to-diagnose problems.

Analysis of traffic volumes through capabilities such as NetFlow or sFlow, or jFlow or IPFix, or through other types of adapters of probes. This helps to understand how applications are impacting the infrastructure in terms of capacity and potential congestion, and can also expose inappropriate practices such as backing up servers remotely in the middle of the work day, or even security breaches.