NetFlow Niche
NetFlow was originally developed by Cisco Systems (Nasdaq: CSCO)
as part of its Internetwork Operating System (IOS). It is, however, an open protocol with several versions, including Juniper Networks (Nasdaq: JNPR)
' cflowd and Huawei Technologies' NetStream.
NetFlow involves two elements -- a Flow generator and a Flow collector. The switch or router acts as the Flow generator -- sending continuous stream of what it is seeing on one or more interfaces -- to a NetFlow collector -- a server or appliance that collects the data from one or more Flow generators.
The NetFlow generator examines the packets based on seven key fields: source and destination IP address, source and destination port, Layer 3 protocol type, type-of-service bit and input logical interface. If those seven criteria are identical for two or more packets, the generator assigns those packets to the same flow or conversation. When that conversation is complete, it sends the data to the collector.
Since NetFlow uses the switch or router's CPU, two approaches can be used to reduce the overhead. One is to sample the packets; for example, only analyzing every tenth packet rather than all of them. The other is to only activate it on certain key interfaces.
A single NetFlow collector can receive data from hundreds of network interfaces. The collector, in addition to storing the NetFlow statistics, generally also includes analysis software that can determine:
- The applications seen on the interface
- The hosts communicating on the interface
- Who the hosts are conversing with and with what protocol (and much more)
The above information alone will help answer over 90 percent of questions pertaining to:
- Who: The host causing the problem
- What: The application the host causing the enigma was using
- When: The time stamps related to when the issue surfaced
- Where: The router/switch and the interface the traffic was seen on
NetFlow analysis results in one computer collecting flows from dozens or even hundreds of interfaces. The geographical limitations are dramatically reduced and the amount of resources necessary is a fraction of the money spent on deploying multiple packet analyzers.
Making the Arrest
NetFlow is included with higher-end hardware from Cisco and other networking vendors, but it must be activated. As mentioned, due to the overhead, it won't always be activated. The state of Maine, for example, decided to use NetFlow on parts of its network which connect about 750 entities around the state.
"NetFlow is integral to Cisco's IOS, so there was no extra cost in turning it on in the routers," said Duncan Bond, the State of Maine's data network supervisor. "Because the reporting capability is there already, it is simple enough to turn on, and we didn't have to install any extra equipment at the edge. It was a no brainer. It is free information."
The state has a backbone using Nortel ATM (asynchronous transfer mode) switches. At the ATM locations, Cisco routers direct traffic to the edge sites, typically over T1 connections. The state also has a SONET (synchronous optical network) ring in the capital area.
"On the entities that are connected right at the core, we don't have a need for flow-related information because we have plenty of bandwidth," said Bond. "But for all of our WAN (wide area network)-based edge locations, we have turned on NetFlow reporting."
Although it was free to set up the generators, he did have to purchase a server to act as the collector. He uses a dual-CPU (3.8 Ghz) Windows
server with 4 GB of RAM. The data is stored on an internal 400 GB RAID (redundant array of independent drives) array. The software he uses for analyzing the NetFlow data is Scrutinizer from Plixer International of Sanford, Maine. Scrutinizer comes in a free version that monitors an unlimited number of interfaces and stores the data for 24 hours. Commercial versions run from US$1,995 to $8,995.
The NetFlow data is initially available in one-minute intervals, and then rolls up to five-minute and half-hour intervals. Bond has the system set to retain the one-minute data for seven days. Beyond the first few days, that level of granularity is no longer useful. Network administrators can drill down and view the data in real time, but most often it is used after the fact to locate what was causing an earlier slowdown.
"Typically users don't call right when they have a problem, but will wait till later in the day to phone in a trouble ticket," Bond says. "At that time the problem has passed so we can't see it in real time any more; that is where this tool really shines."
He gives the example of an office on a relatively low-speed circuit that called in at 9 a.m. to report that the circuit had been slow since staff started arriving around 7 a.m. Looking at the bandwidth graphs showed that the bandwidth was saturated. Clicking into Scrutinizer then showed that it was Windows updates that were scheduled to occur at that time.
"Windows updates and virus updates are particularly expensive in terms of bandwidth," says Bond. "Depending on how their business works, either we can reschedule the updates or they have to put up with it."
In other cases, he has found problems connected with server consolidation. Consolidating the servers to central locations cuts down on maintenance costs, but shifts more traffic to the WAN. NetFlow shows the increase in bandwidth utilization by specific users and applications as those changes occur, so capacity can be added as needed. However, Scrutinizer is primarily used by Bond and his crew of four to spot and address immediate blocks.
"It is in relatively constant use," he said, "and the payback is enormous."
Lesson Learned
The lesson to be learned from this tale is that packet and flow monitoring make it much easier to find and remove the sources of network problems. RMON, however, is not as widely supported currently as Netflow.
In addition, Netflow just requires activating it on the network equipment, rather than installing software probes or packet capture appliances. Probably the wisest choice, therefore, is to activate Netflow, and set up a server to capture and analyze the data.