Network Flow Analysis

degree vs. strength for different classes of network traffic
Network flow records are high-level descriptions of Internet connections that offer information about the endpoints and volume of data involved but not access to the actual data transferred. Because the collection and analysis of flow data is much more tractable for high-speed networks than deep packet inspection, flow data has become popular for a variety of applications in network management, especially anomaly and intrusion detection.

Most flow-based security products and research efforts use a relational model of flow data, in which each flow is simply another tuple. Our group’s research takes a different approach: we instead regard each network flow as contributing weight to a directed edge in a graph. For various forms of analysis, the nodes in these graphs may represent clients, servers, entire autonomous systems, or even individual TCP ports. We examine the structural properties and dynamics of graphs derived from flow data in an attempt to develop more robust forms of anomaly detection and improved models of Internet traffic. We believe this structural approach to flow analysis reveals patterns within Internet traffic difficult to discern through more traditional analysis.

Our research has yielded a number of surprising results, chief among them the finding that the distribution of network traffic per host is so broad and so well characterized by a power law that standard threshold-based anomaly detection systems must choose their thresholds arbitrarily, making them inherently susceptible to either over- or under-reporting. We have also found that Web traffic exhibits superlinear scaling between degree and strength: the more servers contacted by a Web client, the more data that client tends to exchange with each server. Finally, we have been successful in using properties of traffic graphs to construct a taxonomy of network applications based on the behavior of their users, allowing us to classify unknown applications without resorting to packet inspection.

Our more recent efforts in this area relate to understanding statistical biases in traffic graphs caused by the use of packet sampling in the routers that generate flow data; and to using spectral analysis of the connectivity matrices associated with traffic graphs to identify anomalous hosts.

Results from this project have been presented at WWW2005 (paper), the Workshop on Structure and Function of Complex Networks (slides), the IPAM Workshop on Random and Dynamic Graphs and Networks (slides), the Statphys 23 satellite meeting on Complex Networks: from Biology to Information Technology (paper), and NGDM2007. For an archival publication see our paper Properties and Evolution of Internet Traffic Networks from Anonymized Flow Data in ACM TOIT.

Active Participants

Support

	Mark Meiss is supported by the Advanced Network Management Laboratory, which is one of the Pervasive Technology Labs established at Indiana University with the assistance of the Lilly Endowment.
	This research is also supported in part by the National Science Foundation under awards 0348940, 0513650.
	Our primary source of anonymized network flow data is the Internet2 (Abilene) network.

Opinions, findings, conclusions, recommendations or points of view of this group are those of the authors and do not necessarily represent the official position of the National Science Foundation, Internet2, or Indiana University.