Download presentation
Presentation is loading. Please wait.
Published byGerda Zimmermann Modified over 5 years ago
1
Transport Layer Identification of P2P Traffic
T. Karagiannis, A. Broido, M. Faloutsos, K. Claffy
2
Outline Introduction Related work Payload analysis & Limitations
Non-payload identification Experiments & Evaluation P2P traffic trends Conclusions
3
Characters of P2P Traffic
Traffic volume grows rapidly Frequently upgrades & emergence of new protocols Disguise the traffic to circumvent firewalls & legal issues Non-standard, proprietary protocols (poorly documented) Operate on arbitrary port numbers Support payload encryption
4
Identification Methodology
Examining packet payload Signature-based methodology Limitations Identifying at transport layer Based on flow patterns & P2P behaviors Advantages
5
Contributions Develop a methodology for P2P traffic profiling by identifying flow patterns and behavior characteristics Evaluate the effectiveness by comparing with payload analysis Convince the growing of P2P traffic by analyzing backbone traces
6
Previous Work Detailed characterization of a small subset of P2P protocols & networks Properties of topology, bandwidth, caching & availability, etc. Signature-based traffic identification Traffic estimation of P2P applications with fixed ports
7
Payload Analysis
8
Payload Analysis M1: Flag a flow with a src/dst port number matching one of the well-known port numbers. M2: Flag a flow as P2P if the 16-byte payload of any packet matches the signatures , else flag it as non-P2P. A loose lower bound on P2P volume M3: Hash the {src, dst} ip pair of a flow flagged as P2P into a table. Flag the flows containing an IP address in the table as “possible P2P” even if no payload matches.
9
Limitations Captured payload size HTTP requests Encryption
Only first 16 bytes of payload Only 4 bytes in older traces HTTP requests Encryption Other P2P protocols Unidirectional traces
10
Non-payload Identification
Two main heuristics: {src, dst} IP pairs that use both TCP and UDP to transfer data The behavior of peers by studying connection characteristics of {IP, port} pairs
11
High-level description
Data processing Build the flow table Collect information on various characteristics Identification of potential P2P pairs Based on the two P2P heuristics Eliminate false positives By other heuristics of non-P2P traffic
12
TCP/UDP Heuristic Concurrent usage of both TCP & UDP is typical for many P2P protocols Look for {src, dst} IP pairs that use both TCP & UDP protocols to identify P2P hosts Other protocols that also use TCP & UDP concurrently DNS, NETBIOS, IRC, gaming, streaming Fixed well-known ports
13
TCP/UDP Heuristic If a {src, dst} IP pair concurrently uses both TCP and UDP, we consider flows between this pair P2P so long as the src or dst ports are not in the set in Table 3
14
Connection Pattern Heuristic
P2P: for a {IP, port} pair, N(distinct connected ports) = N(distinct connected IPs) Web: for {w, 80} pair, N(distinct connected ports) ≥ N(distinct connected IPs) while a host initiates more than one concurrent connection for parallel downloading
15
False positives Some heuristics for decreasing false positives
Mail server DNS Gameing Malware Others
16
Mail Server Behavior resembles {IP, port} heuristic
Examine the flows with port number 25, 110, 113
17
DNS Concurrently use TCP & UDP at port 53
For flows that (src-port = dst-port) < 501, both src & dst {IP, port} pairs are considered non-P2P
18
Gaming & Malware Many flows to different IPs/ports, carrying the same-sized packet
19
Gaming & Malware
20
Other Heuristics Scans One packet pairs
Count the number of {IP, port} with specific IP to eliminate port scans One packet pairs Remove one packet flows
21
Other Heuristics Msn messenger server Port history Port 1863
3 distinct dst IPs within the same prefix Port history Examine the set of ports connected to an {IP, port} pair Reject if all ports reflect well-known service
22
Final Algorithm P2PIP: IPs classified as P2P by TCP/UDP heuristic
P2PPairs: {IP, port} pairs classified as P2P by {IP, port} pair heuristic Rejected: rejected pairs MailServers: rejected IPs IPPort: {IP, port} pairs not in MailServers or Rejected IPSet: distinct IPs with specific pair PortSet: distinct ports for specific pair Avg_pktssizesSet: distinct average packet sizes Transfer_sizesSet: distinct tranferred flow sizes
23
Final Algorithm
24
Fraction of Identified P2P Traffic
25
False Positives
26
Robustness
27
Pros & Cons Pros Cons Privacy issues Anonymization of IP addresses
Storage overhead Processing overhead Ability to detect unknown protocols Overcome encryption Cons Disability in analyzing specific protocol
28
P2P Traffic Trends
29
Conclusions Non-payload identification methodology
Ability to identify unknown protocols Miss 5% flows comparing with payload analysis 8%~12% false positives Challenge the claims of P2P traffic’s decline
30
Thanks !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.