Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transport Layer Identification of P2P Traffic

Similar presentations


Presentation on theme: "Transport Layer Identification of P2P Traffic"— Presentation transcript:

1 Transport Layer Identification of P2P Traffic
T. Karagiannis, A. Broido, M. Faloutsos, K. Claffy

2 Outline Introduction Related work Payload analysis & Limitations
Non-payload identification Experiments & Evaluation P2P traffic trends Conclusions

3 Characters of P2P Traffic
Traffic volume grows rapidly Frequently upgrades & emergence of new protocols Disguise the traffic to circumvent firewalls & legal issues Non-standard, proprietary protocols (poorly documented) Operate on arbitrary port numbers Support payload encryption

4 Identification Methodology
Examining packet payload Signature-based methodology Limitations Identifying at transport layer Based on flow patterns & P2P behaviors Advantages

5 Contributions Develop a methodology for P2P traffic profiling by identifying flow patterns and behavior characteristics Evaluate the effectiveness by comparing with payload analysis Convince the growing of P2P traffic by analyzing backbone traces

6 Previous Work Detailed characterization of a small subset of P2P protocols & networks Properties of topology, bandwidth, caching & availability, etc. Signature-based traffic identification Traffic estimation of P2P applications with fixed ports

7 Payload Analysis

8 Payload Analysis M1: Flag a flow with a src/dst port number matching one of the well-known port numbers. M2: Flag a flow as P2P if the 16-byte payload of any packet matches the signatures , else flag it as non-P2P. A loose lower bound on P2P volume M3: Hash the {src, dst} ip pair of a flow flagged as P2P into a table. Flag the flows containing an IP address in the table as “possible P2P” even if no payload matches.

9 Limitations Captured payload size HTTP requests Encryption
Only first 16 bytes of payload Only 4 bytes in older traces HTTP requests Encryption Other P2P protocols Unidirectional traces

10 Non-payload Identification
Two main heuristics: {src, dst} IP pairs that use both TCP and UDP to transfer data The behavior of peers by studying connection characteristics of {IP, port} pairs

11 High-level description
Data processing Build the flow table Collect information on various characteristics Identification of potential P2P pairs Based on the two P2P heuristics Eliminate false positives By other heuristics of non-P2P traffic

12 TCP/UDP Heuristic Concurrent usage of both TCP & UDP is typical for many P2P protocols Look for {src, dst} IP pairs that use both TCP & UDP protocols to identify P2P hosts Other protocols that also use TCP & UDP concurrently DNS, NETBIOS, IRC, gaming, streaming Fixed well-known ports

13 TCP/UDP Heuristic If a {src, dst} IP pair concurrently uses both TCP and UDP, we consider flows between this pair P2P so long as the src or dst ports are not in the set in Table 3

14 Connection Pattern Heuristic
P2P: for a {IP, port} pair, N(distinct connected ports) = N(distinct connected IPs) Web: for {w, 80} pair, N(distinct connected ports) ≥ N(distinct connected IPs) while a host initiates more than one concurrent connection for parallel downloading

15 False positives Some heuristics for decreasing false positives
Mail server DNS Gameing Malware Others

16 Mail Server Behavior resembles {IP, port} heuristic
Examine the flows with port number 25, 110, 113

17 DNS Concurrently use TCP & UDP at port 53
For flows that (src-port = dst-port) < 501, both src & dst {IP, port} pairs are considered non-P2P

18 Gaming & Malware Many flows to different IPs/ports, carrying the same-sized packet

19 Gaming & Malware

20 Other Heuristics Scans One packet pairs
Count the number of {IP, port} with specific IP to eliminate port scans One packet pairs Remove one packet flows

21 Other Heuristics Msn messenger server Port history Port 1863
3 distinct dst IPs within the same prefix Port history Examine the set of ports connected to an {IP, port} pair Reject if all ports reflect well-known service

22 Final Algorithm P2PIP: IPs classified as P2P by TCP/UDP heuristic
P2PPairs: {IP, port} pairs classified as P2P by {IP, port} pair heuristic Rejected: rejected pairs MailServers: rejected IPs IPPort: {IP, port} pairs not in MailServers or Rejected IPSet: distinct IPs with specific pair PortSet: distinct ports for specific pair Avg_pktssizesSet: distinct average packet sizes Transfer_sizesSet: distinct tranferred flow sizes

23 Final Algorithm

24 Fraction of Identified P2P Traffic

25 False Positives

26 Robustness

27 Pros & Cons Pros Cons Privacy issues Anonymization of IP addresses
Storage overhead Processing overhead Ability to detect unknown protocols Overcome encryption Cons Disability in analyzing specific protocol

28 P2P Traffic Trends

29 Conclusions Non-payload identification methodology
Ability to identify unknown protocols Miss 5% flows comparing with payload analysis 8%~12% false positives Challenge the claims of P2P traffic’s decline

30 Thanks !


Download ppt "Transport Layer Identification of P2P Traffic"

Similar presentations


Ads by Google