Carleton University
Technical Report TR-07-06
February 26, 2007

Towards Understanding Network Traffic Through Whole Packet Analysis

Abdulrahman Hijazi, Hajime Inoue, Ashraf Matrawy, P.C. van Oorschot, Anil Somayaji

Abstract

We present ADHIC, an algorithm that hierarchically clusters network traffic without making assumptions about the structure of packets. Packets are judged similar using patterns of n-byte strings at fixed offsets p, or (p,n)-grams. By sampling packets to find high frequency (p,n)-grams and then applying divisive hierarchical clustering (an unsupervised machine learning method), ADHIC can separate traffic along typical divisions such as IP vs. non-IP traffic, TCP and UDP, and standard applications such as web and SSH traffic without using domain-specific knowledge. It can also correctly cluster data transmitted on non-standard ports, and can even appropriately segregate the traffic from applications that do not use standard ports (such as peer-to-peer programs). NetADHICT, our implementation of ADHIC, is available for download and is licensed under the GNU GPL license.

TR-07-06.pdf