Carleton University
Technical Report TR-07-06
February 26, 2007
Towards Understanding Network Traffic Through Whole Packet Analysis
Abdulrahman Hijazi, Hajime Inoue, Ashraf Matrawy, P.C. van Oorschot, Anil Somayaji
Abstract
We present ADHIC, an algorithm that hierarchically clusters network traffic without making assumptions about the structure of packets. Packets are judged similar using patterns of n-byte strings at fixed offsets p, or (p,n)-grams. By sampling packets to find high frequency (p,n)-grams and then applying divisive hierarchical clustering (an unsupervised machine learning method), ADHIC can separate traffic along typical divisions such as IP vs. non-IP traffic, TCP and UDP, and standard applications such as web and SSH traffic without using domain-specific knowledge. It can also correctly cluster data transmitted on non-standard ports, and can even appropriately segregate the traffic from applications that do not use standard ports (such as peer-to-peer programs). NetADHICT, our implementation of ADHIC, is available for download and is licensed under the GNU GPL license.