FiG: Automatic Fingerprint Generation Shobha Venkataraman Joint work with Juan Caballero, Pongsin Poosankam, Min Gyung Kang, Dawn Song & Avrim Blum Carnegie.

Slides:



Advertisements
Similar presentations
IEEE INFOCOM 2004 MultiNet: Connecting to Multiple IEEE Networks Using a Single Wireless Card.
Advertisements

Overview The TCP/IP Stack. The Link Layer (L2). The Network Layer (L3). The Transport Layer (L4). Port scanning & OS/App detection techniques. Evasion.
Network and Application Attacks Contributed by- Chandra Prakash Suryawanshi CISSP, CEH, SANS-GSEC, CISA, ISO 27001LI, BS 25999LA, ERM (ISB) June 2006.
Code-Red : a case study on the spread and victims of an Internet worm David Moore, Colleen Shannon, Jeffery Brown Jonghyun Kim.
IPv6 – IPv4 Network Address, Port & Protocol Translation & Multithreaded DNS Gateway Navpreet Singh, Abhinav Singh, Udit Gupta, Vinay Bajpai, Toshu Malhotra.
REVEALING MIDDLEBOXES INTERFERENCE WITH TRACEBOX Gregory Detal*, Benjamin Hesmans*, Olivier Bonaventure*, Yves Vanaubel° and Benoit Donnet°. *Université.
 Dynamic policies o Change as system security state/load changes o GAA architecture  Extended access control lists  Pre-, mid- and post-conditions,
Network Mapping  Identify Live Hosts  Determine running Services TCP Port Scanning UDP Port Scanning Banner Grabbing ARP Discovery  Identify Perimeter.
IS333, Ch. 26: TCP Victor Norman Calvin College 1.
Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine.
Network Security of Labnet ******. Introduction Test the network security of the servers on our Labnet domain Find Potential Weaknesses Find Security.
UDP & TCP Where would we be without them!. UDP User Datagram Protocol.
Transport Layer – TCP (Part1) Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF.
1 Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation David Brumley, Juan.
System Security Scanning and Discovery Chapter 14.
1 Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation David Brumley, Juan.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 OSI Transport Layer Network Fundamentals – Chapter 4.
A Framework for Classifying Denial of Service Attacks Alefiya Hussain, John Heidemann and Christos Papadopoulos presented by Nahur Fonseca NRG, June, 22.
Internet Control Message Protocol (ICMP). Introduction The Internet Protocol (IP) is used for host-to-host datagram service in a system of interconnected.
Computer Security and Penetration Testing
CIS 193A – Lesson12 Monitoring Tools. CIS 193A – Lesson12 Focus Question What are the common ways of specifying network packets used in tcpdump, wireshark,
Network Simulation Internet Technologies and Applications.
What Can IP Do? Deliver datagrams to hosts – The IP address in a datagram header identify a host IP treats a computer as an endpoint of communication Best.
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
A Hybrid Model to Detect Malicious Executables Mohammad M. Masud Latifur Khan Bhavani Thuraisingham Department of Computer Science The University of Texas.
Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.
1 Reconnaissance, Network Mapping, and Vulnerability Assessment ECE4112 – Internetwork Security Georgia Institute of Technology.
Intrusion Detection and Prevention. Objectives ● Purpose of IDS's ● Function of IDS's in a secure network design ● Install and use an IDS ● Customize.
CIS 450 – Network Security Chapter 3 – Information Gathering.
IP Forwarding.
Othman Othman M.M., Koji Okamura Kyushu University 1.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 2.6 UDP Principles (Chapter 24) (User Datagram Protocol)
New Streaming Algorithms for Fast Detection of Superspreaders Shobha Venkataraman* Joint work with: Dawn Song*, Phillip Gibbons ¶,
A Virtual Honeypot Framework Author: Niels Provos Published in: CITI Report 03-1 Presenter: Tao Li.
Learning Rules for Anomaly Detection of Hostile Network Traffic Matthew V. Mahoney and Philip K. Chan Florida Institute of Technology.
Firewall Fingerprinting Amir R. Khakpour 1, Joshua W. Hulst 1, Zhihui Ge 2, Alex X. Liu 1, Dan Pei 2, Jia Wang 2 1 Michigan State University 2 AT&T Labs.
Remote Physical Device Fingerprinting Authors: Tadayoshi Kohno, Andre Broido, KC Claffy Presented: IEEE Symposium on Security and Privacy, 2005 Kishore.
Linux Networking and Security
A VIRTUAL HONEYPOT FRAMEWORK Author : Niels Provos Publication: Usenix Security Symposium Presenter: Hiral Chhaya for CAP6103.
1 Limits of Learning-based Signature Generation with Adversaries Shobha Venkataraman, Carnegie Mellon University Avrim Blum, Carnegie Mellon University.
Scanning & Enumeration Lab 3 Once attacker knows who to attack, and knows some of what is there (e.g. DNS servers, mail servers, etc.) the next step is.
1 Figure 4-1: Targeted System Penetration (Break-In Attacks) Host Scanning  Ping often is blocked by firewalls  Send TCP SYN/ACK to generate RST segments.
User Fingerprinting Jeffrey Pang 1 Ben Greenstein 2 Ramakrishna Gummadi 3 Srinivasan Seshan 1 David Wetherall 2,4 Presenter: Nan Jiang Most Slides:
Network Protocol System Fingerprinting - A Formal Approach Guoqiang Shu and David Lee INFOCOM 2006 Speaker: Chang Huan Wu 2008/10/31.
Decoding an IP Header (1)
Understanding the network level behavior of spammers Published by :Anirudh Ramachandran, Nick Feamster Published in :ACMSIGCOMM 2006 Presented by: Bharat.
Centre de Comunicacions Avançades de Banda Ampla (CCABA) Universitat Politècnica de Catalunya (UPC) Identification of Network Applications based on Machine.
IP addresses IPv4 and IPv6. IP addresses (IP=Internet Protocol) Each computer connected to the Internet must have a unique IP address.
Advanced Packet Analysis and Troubleshooting Using Wireshark 23AF
Hybrid Intelligent Systems for Detecting Network Anomalies Lane Thames ECE 8833 Intelligent Systems.
1 Microsoft Windows 2000 Network Infrastructure Administration Chapter 4 Monitoring Network Activity.
Exploiting Network Structure for Proactive Spam Mitigation Shobha Venkataraman * Joint work with Subhabrata Sen §, Oliver Spatscheck §, Patrick Haffner.
Polygraph: Automatically Generating Signatures for Polymorphic Worms Presented by: Devendra Salvi Paper by : James Newsome, Brad Karp, Dawn Song.
The Devil and Packet Trace Anonymization Authors: Ruoming Pang, Mark Allman, Vern Paxson and Jason Lee Published: ACM SIGCOMM Computer Communication Review,
Emir Halepovic, Jeffrey Pang, Oliver Spatscheck AT&T Labs - Research
Polygraph: Automatically Generating Signatures for Polymorphic Worms Authors: James Newsome (CMU), Brad Karp (Intel Research), Dawn Song (CMU) Presenter:
TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 16 Stream Control Transmission.
TCP/IP Protocol Suite 1 Chapter 17 Upon completion you will be able to: Domain Name System: DNS Understand how the DNS is organized Know the domains in.
Network and Port Scanning Chien-Chung Shen
1 Layer 3: Routing & Addressing Honolulu Community College Cisco Academy Training Center Semester 1 Version
Chapter 9 The Transport Layer The Internet Protocol has three main protocols that run on top of IP: two are for data, one for control.
UDP: User Datagram Protocol. What Can IP Do? Deliver datagrams to hosts – The IP address in a datagram header identify a host – treats a computer as an.
29/09/2016 Passive Detection of TCP Congestion Events Shane Alcock and Richard Nelson University of Waikato, Hamilton New Zealand.
Port Scanning James Tate II
The Devil and Packet Trace Anonymization
Distributed Network Traffic Feature Extraction for a Real-time IDS
8 Network Layer Part V Computer Networks Tutun Juhana
Standards Basics.
TCP - Part I Relates to Lab 5. First module on TCP which covers packet format, data transfer, and connection management.
IPv4 Addressing By, Ishivinder Singh( ) Sharan Patil ( )
Presentation transcript:

FiG: Automatic Fingerprint Generation Shobha Venkataraman Joint work with Juan Caballero, Pongsin Poosankam, Min Gyung Kang, Dawn Song & Avrim Blum Carnegie Mellon University

2 Fingerprinting Linux Solaris Windows XP SP2 Windows XP SP1 Network administrator Used to identify: versions of software on hosts operating systems of hosts hosts running versions with vulnerabilities

3 Fingerprint: set of queries sent to host + classification function analyzing queries & responses Well-known fingerprinting tools: nmap, fpdns The Fingerprinting Process Queries Responses Output: what OS? (e.g. Linux) Host Fingerprinting Tool

4 Finding Fingerprints How do fingerprinting tools get fingerprints? Existing approach: Manual identification Incomplete, time-consuming Difficult to keep up-to-date Fingerprinting Tool What classification function? What queries? Need automatic, accurate fingerprint generation!

5 Our Contribution: FiG In particular: Use machine learning to automatically generate fingerprints Automatically generate accurate fingerprints: Distinguishing OS Distinguishing implementations of DNS servers Finding new fingerprints Demonstrate automatic fingerprint generation is possible

6 Outline Fingerprint Generation Problem Overview of Approach Automatic Fingerprint Generation Experimental Results Conclusion

7 Fingerprint Generation Problem Goal: find fingerprints, i.e. Useful queries Classification function that distinguishes implementations Fingerprint Generator Linux Windows XP Solaris Fingerprints Fingerprinting Tool

8 Outline Fingerprint Generation Problem Overview of Approach Automatic Fingerprint Generation Experimental Results Conclusion

9 FiG: Overview of Approach Query Exploration Learning Fingerprints Candidate Queries FiG: Automatic Fingerprint Generation Fingerprinting Tool Query exploration: Generate candidate queries Learning: Automatically find fingerprints

10 FiG: Overview of Approach Query Exploration Learning Fingerprints Candidate Queries FiG: Automatic Fingerprint Generation Fingerprinting Tool

11 Query Exploration Goal: generate candidate queries query: specially crafted packet sent to host Infeasible to generate all possible queries All queries = all possible byte combinations of packet header e.g., 40 bytes of TCP & IP header => 2^320 queries! Instead, use protocol semantics to design queries

12 Query Exploration Queries: packets with unusual values in fields of header Explore unusual values for fields independently Explore fields with rich semantics exhaustively i.e., all possible values e.g., TCP flags Explore other fields selectively i.e., some valid, invalid values e.g., tcp checksum, tcp src port

13 FiG: Overview of Approach Query Exploration Learning Fingerprints Candidate Queries Fingerprinting Tool Data Collection Training Phase: learn potential fingerprints Testing Phase: test accuracy of fingerprints

14 Data Collection Data Collection Testing Phase Training Phase 1. Send candidate queries to hosts 2. Collect responses from hosts 3. Split into training & testing data Data Collection Testing Data Training Data Candidate Queries And Responses

15 Training Phase Training Phase Data Collection Testing Phase Goal: learn potential fingerprints from data Intuition: different implementations differ in bytes of responses Learn which bytes of responses distinguish between implementations!

16 What we’re learning 1. Extract features 2. Combine features to distinguish implementations Outline: Features Classification functions Combining into fingerprints Data Collection Windows Solaris Linux Training Data Data Collection Testing Phase Training Phase

17 Features Analyze only bytes of response Use both value & position of individual bytes in response Capture this idea with position-substring efg 4 6 hj i 7 9 k 10 abcd 0 3 Response byte sequence Some example position-substrings abcdefghjk i

18 Classification Functions Classification function position-substrings of response to query q Two classes of functions: 1.Conjunctions 2.Decision lists Analyze each query & each implementation separately YES (comes from Linux) NO (does not come from Linux) e.g. for query q, for Linux implementation

19 Conjunctions Capture identical behaviour across all hosts require position-substrings distinctive to Linux to appear in responses from ALL Linux hosts if (response[4-5]==0x0000 && response[34-35]==0x16d0) then Linux else NotLinux Positions 4-5 Linux NotLinux 00 16d d0 Positions 34-35

20 Decision Lists Need more expressivity than conjunctions Capture multiple types of behaviour within implementation allow many sets of position-substrings, each distinctive to implementation (e.g. Windows) if (response[34-35] == 0xffff) then Windows else if (response[34-35] == 0x40e8) then Windows else NotWindows Windows f f f 40e8 Positions 34-35

21 What we’re learning Data Collection Windows Solaris 1. Extract features Linux 2. Combine features to distinguish implementations Outline: Features Classification functions Combining into fingerprints Data Collection Testing Phase Training Phase

22 Binary-fingerprints Binary-fingerprint for implementation (e.g., Linux) is: single query + classification function: e.g., conjunction or decision list = boolean: e.g. Linux, or Not Linux? Binary-fingerprint separates ONE implementation Learning (so far) finds binary-fingerprints Conjunctions/decision lists of position-substrings (e.g. Linux or Not Linux? Windows or NotWindows?)

23 Multi-class Fingerprint Combine binary-fingerprints for multiple implementations Multi-class fingerprint is: single query + classification functions e.g. conjunctions, decision lists = implementation, e.g. Linux, Windows, Solaris, unknown? Linux or Not Linux? Windows or Not Windows? Solaris or Not Solaris? Binary-fingerprints for query q Linux? Windows? Solaris? unknown? Multi-class fingerprint (for query q)

24 Training Phase Summary Analyze responses to all queries, one at a time Use position-substrings of bytes in response Generate binary-fingerprints & multi-class fingerprints Send these to testing phase

25 Testing Phase Data Collection Testing Phase Training Phase Testing Data Binary & Multi-class Fingerprints Which fingerprints are accurate? Fingerprints Fingerprinting Tool

26 Outline Fingerprint Generation Problem Overview of Approach Automatic Fingerprint Generation Query Exploration Phase Learning Phase Experimental Results Experimental Setup & Data Fingerprinting Results: Binary & Multi-class Fingerprints Examples of New Fingerprints Conclusion

27 Experiment Setup & Data OS fingerprint generation: 3 OS: 77 Windows, 29 Linux, 22 Solaris hosts 305 different queries DNS fingerprint generation: 5 DNS server implementations: 10 BIND8, 12 BIND9, 11 Windows Server 2003, 10 MyDNS, 11 TinyDNS hosts 96 different queries

28 Multi-class Fingerprints OS: 66 queries with multi-class fingerprints DNS: 19 queries with multi-class fingerprints All these are decision lists! No multi-class fingerprints with conjunctions found Decision list has greater discriminatory power One-query fingerprint distinguishing ALL implementations simultaneously

29 All Fingerprints: OS Binary-fingerprints Lots more binary-fingerprints! Find conjunctions & decision lists in binary-fingerprints Again, more fingerprints with more expressive decision lists Similar results for DNS OSLinuxSolarisWindows Decision list13098 Conjunction4253 One-query fingerprint distinguishing ONE implementation from rest Multi-class 66 0

30 Examples of New Fingerprints Invalid value in data offset field: Windows & Solaris hosts respond when value < 5 Linux hosts do not respond RST+ACK packets in responses: Linux & Solaris hosts set TCP Ack # to 0 Windows hosts set TCP Ack # to Ack # of query

31 Examples of New Fingerprints Behaviour on ECN & CWR bits Linux & Windows ignore ECN & CWR bits in queries Solaris do not ignore them (sometimes) Behaviour of QdCount field on invalid queries (DNS fingerprinting) Some servers copy the field value, others don’t

32 Conclusion Automatic fingerprint generation is possible Use machine learning to identify fingerprints Generate fingerprints automatically for 2 applications: Distinguish OS Distinguish implementations of DNS servers Find multi-class fingerprints using decision lists Discover new fingerprints for fingerprinting tools

33 Thank You! Questions?

34

35 Binary-fingerprints: DNS DNSBIND8BIND9MicrosoftMyDNSTinyDNS Conjunction Decision-list Similar results for DNS binary-fingerprints More fingerprints with more expressive decision list No binary-fingerprints with conjunctions for BIND8 & BIND9 One-query fingerprint distinguishing ONE implementation from rest

36 Related Work Active fingerprinting: Comer & Lin ’94: Probing to find differences in TCP Padhye & Floyd ’01: compliance testing & protocol violations Passive Fingerprinting Paxson ’97: TCP implementation with traffic traces Beverly ’04, Lippman et al ’03: classify OS Franklin et al ’06: wireless device driver fingerprinting Tools: OS fingerprinting: Nmap, queso, Xprobe, Snacktime Passive fingerprinting: p0f, siphon Defeating OS fingerprinting: Smart et al ’00: TCP Fingerprint scrubber Tools: Morph, IPPersonality