Download presentation
Presentation is loading. Please wait.
Published byScot Parrish Modified over 9 years ago
1
FiG: Automatic Fingerprint Generation Shobha Venkataraman Joint work with Juan Caballero, Pongsin Poosankam, Min Gyung Kang, Dawn Song & Avrim Blum Carnegie Mellon University
2
2 Fingerprinting Linux Solaris Windows XP SP2 Windows XP SP1 Network administrator Used to identify: versions of software on hosts operating systems of hosts hosts running versions with vulnerabilities
3
3 Fingerprint: set of queries sent to host + classification function analyzing queries & responses Well-known fingerprinting tools: nmap, fpdns The Fingerprinting Process Queries Responses Output: what OS? (e.g. Linux) Host Fingerprinting Tool
4
4 Finding Fingerprints How do fingerprinting tools get fingerprints? Existing approach: Manual identification Incomplete, time-consuming Difficult to keep up-to-date Fingerprinting Tool What classification function? What queries? Need automatic, accurate fingerprint generation!
5
5 Our Contribution: FiG In particular: Use machine learning to automatically generate fingerprints Automatically generate accurate fingerprints: Distinguishing OS Distinguishing implementations of DNS servers Finding new fingerprints Demonstrate automatic fingerprint generation is possible
6
6 Outline Fingerprint Generation Problem Overview of Approach Automatic Fingerprint Generation Experimental Results Conclusion
7
7 Fingerprint Generation Problem Goal: find fingerprints, i.e. Useful queries Classification function that distinguishes implementations Fingerprint Generator Linux Windows XP Solaris Fingerprints Fingerprinting Tool
8
8 Outline Fingerprint Generation Problem Overview of Approach Automatic Fingerprint Generation Experimental Results Conclusion
9
9 FiG: Overview of Approach Query Exploration Learning Fingerprints Candidate Queries FiG: Automatic Fingerprint Generation Fingerprinting Tool Query exploration: Generate candidate queries Learning: Automatically find fingerprints
10
10 FiG: Overview of Approach Query Exploration Learning Fingerprints Candidate Queries FiG: Automatic Fingerprint Generation Fingerprinting Tool
11
11 Query Exploration Goal: generate candidate queries query: specially crafted packet sent to host Infeasible to generate all possible queries All queries = all possible byte combinations of packet header e.g., 40 bytes of TCP & IP header => 2^320 queries! Instead, use protocol semantics to design queries
12
12 Query Exploration Queries: packets with unusual values in fields of header Explore unusual values for fields independently Explore fields with rich semantics exhaustively i.e., all possible values e.g., TCP flags Explore other fields selectively i.e., some valid, invalid values e.g., tcp checksum, tcp src port
13
13 FiG: Overview of Approach Query Exploration Learning Fingerprints Candidate Queries Fingerprinting Tool Data Collection Training Phase: learn potential fingerprints Testing Phase: test accuracy of fingerprints
14
14 Data Collection Data Collection Testing Phase Training Phase 1. Send candidate queries to hosts 2. Collect responses from hosts 3. Split into training & testing data Data Collection Testing Data Training Data Candidate Queries And Responses
15
15 Training Phase Training Phase Data Collection Testing Phase Goal: learn potential fingerprints from data Intuition: different implementations differ in bytes of responses Learn which bytes of responses distinguish between implementations!
16
16 What we’re learning 1. Extract features 2. Combine features to distinguish implementations Outline: Features Classification functions Combining into fingerprints Data Collection Windows Solaris Linux Training Data Data Collection Testing Phase Training Phase
17
17 Features Analyze only bytes of response Use both value & position of individual bytes in response Capture this idea with position-substring efg 4 6 hj i 7 9 k 10 abcd 0 3 Response byte sequence Some example position-substrings abcdefghjk i 1 2 5 8
18
18 Classification Functions Classification function position-substrings of response to query q Two classes of functions: 1.Conjunctions 2.Decision lists Analyze each query & each implementation separately YES (comes from Linux) NO (does not come from Linux) e.g. for query q, for Linux implementation
19
19 Conjunctions Capture identical behaviour across all hosts require position-substrings distinctive to Linux to appear in responses from ALL Linux hosts if (response[4-5]==0x0000 && response[34-35]==0x16d0) then Linux else NotLinux Positions 4-5 Linux NotLinux 00 16d0000416d0 Positions 34-35
20
20 Decision Lists Need more expressivity than conjunctions Capture multiple types of behaviour within implementation allow many sets of position-substrings, each distinctive to implementation (e.g. Windows) if (response[34-35] == 0xffff) then Windows else if (response[34-35] == 0x40e8) then Windows else NotWindows Windows f f f 40e8 Positions 34-35
21
21 What we’re learning Data Collection Windows Solaris 1. Extract features Linux 2. Combine features to distinguish implementations Outline: Features Classification functions Combining into fingerprints Data Collection Testing Phase Training Phase
22
22 Binary-fingerprints Binary-fingerprint for implementation (e.g., Linux) is: single query + classification function: e.g., conjunction or decision list = boolean: e.g. Linux, or Not Linux? Binary-fingerprint separates ONE implementation Learning (so far) finds binary-fingerprints Conjunctions/decision lists of position-substrings (e.g. Linux or Not Linux? Windows or NotWindows?)
23
23 Multi-class Fingerprint Combine binary-fingerprints for multiple implementations Multi-class fingerprint is: single query + classification functions e.g. conjunctions, decision lists = implementation, e.g. Linux, Windows, Solaris, unknown? Linux or Not Linux? Windows or Not Windows? Solaris or Not Solaris? Binary-fingerprints for query q Linux? Windows? Solaris? unknown? Multi-class fingerprint (for query q)
24
24 Training Phase Summary Analyze responses to all queries, one at a time Use position-substrings of bytes in response Generate binary-fingerprints & multi-class fingerprints Send these to testing phase
25
25 Testing Phase Data Collection Testing Phase Training Phase Testing Data Binary & Multi-class Fingerprints Which fingerprints are accurate? Fingerprints Fingerprinting Tool
26
26 Outline Fingerprint Generation Problem Overview of Approach Automatic Fingerprint Generation Query Exploration Phase Learning Phase Experimental Results Experimental Setup & Data Fingerprinting Results: Binary & Multi-class Fingerprints Examples of New Fingerprints Conclusion
27
27 Experiment Setup & Data OS fingerprint generation: 3 OS: 77 Windows, 29 Linux, 22 Solaris hosts 305 different queries DNS fingerprint generation: 5 DNS server implementations: 10 BIND8, 12 BIND9, 11 Windows Server 2003, 10 MyDNS, 11 TinyDNS hosts 96 different queries
28
28 Multi-class Fingerprints OS: 66 queries with multi-class fingerprints DNS: 19 queries with multi-class fingerprints All these are decision lists! No multi-class fingerprints with conjunctions found Decision list has greater discriminatory power One-query fingerprint distinguishing ALL implementations simultaneously
29
29 All Fingerprints: OS Binary-fingerprints Lots more binary-fingerprints! Find conjunctions & decision lists in binary-fingerprints Again, more fingerprints with more expressive decision lists Similar results for DNS OSLinuxSolarisWindows Decision list13098 Conjunction4253 One-query fingerprint distinguishing ONE implementation from rest Multi-class 66 0
30
30 Examples of New Fingerprints Invalid value in data offset field: Windows & Solaris hosts respond when value < 5 Linux hosts do not respond RST+ACK packets in responses: Linux & Solaris hosts set TCP Ack # to 0 Windows hosts set TCP Ack # to Ack # of query
31
31 Examples of New Fingerprints Behaviour on ECN & CWR bits Linux & Windows ignore ECN & CWR bits in queries Solaris do not ignore them (sometimes) Behaviour of QdCount field on invalid queries (DNS fingerprinting) Some servers copy the field value, others don’t
32
32 Conclusion Automatic fingerprint generation is possible Use machine learning to identify fingerprints Generate fingerprints automatically for 2 applications: Distinguish OS Distinguish implementations of DNS servers Find multi-class fingerprints using decision lists Discover new fingerprints for fingerprinting tools
33
33 Thank You! Questions? shobha@cs.cmu.edu
34
34
35
35 Binary-fingerprints: DNS DNSBIND8BIND9MicrosoftMyDNSTinyDNS Conjunction002229 Decision-list3328322941 Similar results for DNS binary-fingerprints More fingerprints with more expressive decision list No binary-fingerprints with conjunctions for BIND8 & BIND9 One-query fingerprint distinguishing ONE implementation from rest
36
36 Related Work Active fingerprinting: Comer & Lin ’94: Probing to find differences in TCP Padhye & Floyd ’01: compliance testing & protocol violations Passive Fingerprinting Paxson ’97: TCP implementation with traffic traces Beverly ’04, Lippman et al ’03: classify OS Franklin et al ’06: wireless device driver fingerprinting Tools: OS fingerprinting: Nmap, queso, Xprobe, Snacktime Passive fingerprinting: p0f, siphon Defeating OS fingerprinting: Smart et al ’00: TCP Fingerprint scrubber Tools: Morph, IPPersonality
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.