Correlation, Aggregation and Suppression: What’s in a Name?

It is clear, however, that this isn’t in fact the case. One vendor’s correlation is one’s aggregation, or another’s suppression. Clearly, there is ample room for misunderstanding and confusion.

Let’s look at these concepts further, but first a couple of baseline terms:

Event: An event is an atomic signal (alarm, fault) from a source (sensor,
host, device). An event may be recorded as a single log line entry, say, or
a trap.

Alarm: An alarm is a single atomic signal that is escalated to a console. So an alarm may represent a single event, or multiple related events that a solution such as a security or fault manager has somehow related.

Simple consoles treat each individual event as an alarm, and escalate everything to the end user. Users struggle to keep up with the volume, or endure “fault storms” as an event triggers repeated alerts from itself or other systems. Aggregation, suppression and correlation are all approaches to eliminate this problem.

Aggregation

Aggregation is the process of only issuing an alarm after multiple occurrences of an event take place, usually within a fixed timeframe. Aggregation might only alarm about login failures on a firewall if 10 failures happen in 60 seconds, but not on each individual failure.

Aggregation therefore reduces the alarm volumes, but risks introducing false negatives — not reporting real threats. If an attacker in the example above cracks a password in nine tries you will never find out about it, because the aggregation won’t trigger. How you aggregate depends on how sensitive you want the alarm, and your tolerance for false positives (false alarms), which will increase with lower aggregation thresholds.

Aggregation implementations vary greatly. Some only aggregate events from one source (e.g., nine failures in one minute for firewall A), while others operate enterprise-wide (e.g. nine failures in one minute across any and all firewalls). Some can also get very granular — alarm when there are nine failures in one minute for user U on any firewall. Increased granularity is better at identifying a real threat, and reducing false positives. Some vendors’ “correlation” is just aggregation.

Suppression

Suppression is also sometimes mislabeled as correlation. There are two major different classes of suppression. The first is de-duplication, where repeated events are ignored. Say a router is polled every 60 seconds by a network management system (NMS), and it goes down. Every 60 seconds the NMS will issue a new “router down” event. Suppression eliminates repeated, duplicate alerts, keeping the operator informed of just the initial fault.

“Downstream suppression” is the ability to recognize that some events are artifacts, created by another, different event. In the router example above, not only will the router become available, but systems behind the router will also become invisible to the NMS. Polling those servers will send an alarm for each one, which in large enterprises can cause hundreds or even thousands of false alarms (an “event storm”) — all caused by the
original failure. Downstream suppression uses network topology to relate different nodes, filtering artifact events without notifying the operator. It’s called “downstream” suppression because the servers generating the false alarms are “downstream” of the router when viewed from the NMS.

Correlation

The goal of correlation is to create one alarm for each related set of events, regardless of their source, and may leverage techniques such as suppression and aggregation. What differentiates correlation from these other approaches, however, is its sophistication. Correlation seeks patterns in the event streams coming from a wide variety of event sources, related in time, potentially with Boolean, statistical or other extensions
applied.

Correlation takes all the events, links them, and sends a single, meta-alarm. Correlation relies greatly on normalized data, where each host’s events have been coerced into a single canonical data representation. Correlation looks for common factors and anomalies across the enterprise, such as an attacker’s IP Address, or a surge of attempts to contact a rarely-used IP port. Correlation engines usually add other, more sophisticated techniques, typically proprietary to each vendor, but they all share the same goal: the desire to correctly identify event patterns to focus the user.

Clearly, with all the potential for confusion it’s important to be informed as to the terms, their meaning and their potential impact. Whether dealing with vendors, colleagues or your senior management, establishing a consistent vocabulary is a basic necessity for effective communication.

Phil Hollows is the vice president of marketing for OpenService.