Polygraph: Automatically Generating Signatures for Polymorphic Worms

Slides:



Advertisements
Similar presentations
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Advertisements

Greg Williams CS691 Summer Honeycomb  Introduction  Preceding Work  Important Points  Analysis  Future Work.
Polygraph: Automatically Generating Signatures for Polymorphic Worms James Newsome *, Brad Karp *†, and Dawn Song * † Intel Research Pittsburgh * Carnegie.
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
Worm Origin Identification Using Random Moonwalks Yinglian Xie, V. Sekar, D. A. Maltz, M. K. Reiter, Hui Zhang 2005 IEEE Symposium on Security and Privacy.
Polymorphic blending attacks Prahlad Fogla et al USENIX 2006 Presented By Himanshu Pagey.
 Looked at some research approaches to: o Evaluate defense effectiveness o Stop worm from spreading from a given host o Defend a circle of friends against.
 Well-publicized worms  Worm propagation curve  Scanning strategies (uniform, permutation, hitlist, subnet) 1.
Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience Zhichun Li, Manan Sanghi, Yan Chen, Ming-Yang Kao and Brian.
High-Performance Network Anomaly/Intrusion Detection & Mitigation System (HPNAIDM) Zhichun Li Lab for Internet & Security Technology (LIST) Department.
Worms: Taxonomy and Detection Mark Shaneck 2/6/2004.
“On Scalable Attack Detection in the Network” Ramana Rao Kompella, Sumeet Singh, and George Varghese Presented by Nadine Sundquist.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese, and Stefan Savage Manan Sanghi.
Chapter 9 Classification And Forwarding. Outline.
Internet Quarantine: Requirements for Containing Self-Propagating Code David Moore et. al. University of California, San Diego.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
Over the last years, the amount of malicious code (Viruses, worms, Trojans, etc.) sent through the internet is highly increasing. Due to this significant.
Vulnerability-Specific Execution Filtering (VSEF) for Exploit Prevention on Commodity Software Authors: James Newsome, James Newsome, David Brumley, David.
1 All Your iFRAMEs Point to Us Mike Burry. 2 Drive-by downloads Malicious code (typically Javascript) Downloaded without user interaction (automatic),
Stamping out worms and other Internet pests Miguel Castro Microsoft Research.
Detection of ASCII Malware Parbati Kumar Manna Dr. Sanjay Ranka Dr. Shigang Chen.
Carnegie Mellon Selected Topics in Automated Diversity Stephanie Forrest University of New Mexico Mike Reiter Dawn Song Carnegie Mellon University.
Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering.
Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience Zhichun Li, Manan Sanghi, Yan Chen, Ming-Yang Kao and Brian.
FiG: Automatic Fingerprint Generation Shobha Venkataraman Joint work with Juan Caballero, Pongsin Poosankam, Min Gyung Kang, Dawn Song & Avrim Blum Carnegie.
Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.
IEEE Communications Surveys & Tutorials 1st Quarter 2008.
1 Limits of Learning-based Signature Generation with Adversaries Shobha Venkataraman, Carnegie Mellon University Avrim Blum, Carnegie Mellon University.
Stamping out worms and other Internet pests Miguel Castro Microsoft Research.
Presented by: Akbar Saidov Authors: M. Polychronakis, K. G. Anagnostakis, E. P. Markatos.
Defending Against Internet Worms: A Signature-Based Approach Aurthors: Yong Tang, and Shigang Chen Publication: IEEE INFOCOM'05 Presenter : Richard Bares.
Module 7: Advanced Application and Web Filtering.
Polygraph: Automatically Generating Signatures for Polymorphic Worms James Newsome, Brad Karp, and Dawn Song Carnegie Mellon University Presented by Ryan.
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
nd Joint Workshop between Security Research Labs in JAPAN and KOREA Polymorphic Worm Detection by Instruction Distribution Kihun Lee HPC Lab., Postech.
Polygraph: Automatically Generating Signatures for Polymorphic Worms Presented by: Devendra Salvi Paper by : James Newsome, Brad Karp, Dawn Song.
Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.
Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software Paper by: James Newsome and Dawn Song.
Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS
Machine Learning for Network Anomaly Detection Matt Mahoney.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
Polygraph: Automatically Generating Signatures for Polymorphic Worms Authors: James Newsome (CMU), Brad Karp (Intel Research), Dawn Song (CMU) Presenter:
All Your Queries are Belong to Us: The Power of File-Injection Attacks on Searchable Encryption Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University.
@Yuan Xue Worm Attack Yuan Xue Fall 2012.
Yan Chen Northwestern Lab for Internet and Security Technology (LIST) Dept. of Computer Science Northwestern University
Very Fast containment of Scanning Worms Presented by Vinay Makula.
DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.
Constraint Framework, page 1 Collaborative learning for security and repair in application communities MIT site visit April 10, 2007 Constraints approach.
Network Security Lab Jelena Mirkovic Sig NewGrad presentantion.
Vigilante: End-to-End Containment of Internet Worms Manuel Costa, Jon Crowcroft, Miguel Castro, Antony Rowstron, Lidong Zhou, Lintao Zhang and Paul Barham.
Internet Quarantine: Requirements for Containing Self-Propagating Code
POLYGRAPH: Automatically Generating Signatures for Polymorphic Worms
Distributed Network Traffic Feature Extraction for a Real-time IDS
Data Streaming in Computer Networking
Worm Origin Identification Using Random Moonwalks
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
Using SSL – Secure Socket Layer
When Security Games Go Green
Providing Secure Storage on the Internet
Aditya Ganjam, Bruce Maggs*, and Hui Zhang
SPEAKER: Yu-Shan Chou ADVISOR: DR. Kai-Wei Ke
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
Memento: Making Sliding Windows Efficient for Heavy Hitters
CSC-682 Advanced Computer Security
Outline System architecture Current work Experiments Next Steps
Transport Layer Identification of P2P Traffic
Introduction to Internet Worm
Presentation transcript:

Polygraph: Automatically Generating Signatures for Polymorphic Worms James Newsome*, Brad Karp*†, and Dawn Song* *Carnegie Mellon University †Intel Research Pittsburgh

Internet Worms Definition: Malicious code that propagates by exploiting software No human interaction needed Able to spread very quickly Slammer scanned 90% of Internet in 10 minutes James Newsome May, 2005

Proposed Defense Strategy ! Worm Detected! Honeycomb [Kreibich2003] Autograph [Kim2004] Earlybird [Singh2004] James Newsome May, 2005

Challenge: Polymorphic Worms Polymorphic worms minimize invariant content Encrypted payload Obfuscated decryption routine Polymorphic tools are already available Clet,ADMmutate Do good signatures for polymorphic worms exist? Can we generate them automatically? James Newsome May, 2005

Good News: Still some invariant content Decryption Routine Key Encrypted Payload \xff\xbf NOP slide GET Host: Payload Part 2 HTTP/1.1 URL Part 1 Random Headers Protocol framing Needed to make server go down vulnerable code path Overwritten Return Address Needed to redirect execution to worm code Decryption routine Needed to decrypt main payload BUT, code obfuscation can eliminate patterns here James Newsome May, 2005

Bad News: Previous Approaches Insufficient Previous approaches use a common substring Longest substring “HTTP/1.1” 93% false positive rate Most specific substring “\xff\xbf” .008% false positive rate (10 / 125,301) Decryption Routine Key Encrypted Payload \xff\xbf NOP slide GET Host: Payload Part 2 HTTP/1.1 URL Part 1 Random Headers James Newsome May, 2005

What to do? No one substring is specific enough BUT, there are multiple substrings Protocol framing Value used to overwrite return address (Parts of poorly obfuscated code) Our approach: combine the substrings James Newsome May, 2005

Outline Substring-based signatures insufficient Generating signatures Perfect (noiseless) classifier case Signature classes & algorithms Evaluation Imperfect classifier case Clustering extensions Attacking the system Conclusion James Newsome May, 2005

Goals Identify classes of signatures that can: Accurately describe polymorphic worms Be used to filter a high speed network line Be generated automatically and efficiently Design and implement a system to automatically generate signatures of these classes James Newsome May, 2005

Polygraph Architecture Suspicious Flow Pool Network Tap Signature Generator Flow Classifier Worm Signatures Innocuous Flow Pool James Newsome May, 2005

Outline Substring-based signatures insufficient Generating signatures Perfect (noiseless) classifier case Signature classes & algorithms Evaluation Imperfect classifier case Clustering extensions Attacking the system Conclusion James Newsome May, 2005

Signature Class (I): Conjunction Signature is a set of strings (tokens) Flow matches signature iff it contains all tokens in the signature O(n) time to match (n is flow length) Generated signature: “GET” and “HTTP/1.1” and “\r\nHost:” and “\r\nHost:” and “\xff\xbf” .0024% false positive rate (3 / 125,301) NOP slide Decryption Routine Decryption Key Encrypted Payload \xff\xbf GET Host: Payload Part 2 HTTP/1.1 URL Part 1 Random Headers James Newsome May, 2005

Generating Conjunction Signatures Use suffix tree to find set of tokens that: Occur in every sample of suspicious pool Are at least 2 bytes long Generation time is linear in total byte size of suspicious pool Based on a well-known string processing algorithm [Hui1992] James Newsome May, 2005

Signature Class (II): Token Subsequence Signature is an ordered set of tokens Flow matches iff it contains all the tokens in signature, in the given order O(n) time to match (n is flow length) Generated signature: GET.*HTTP/1.1.*\r\nHost:.*\r\nHost:.*\xff\xbf .0008% false positive rate (1 / 125,301) NOP slide Decryption Routine Decryption Key Encrypted Payload \xff\xbf GET Host: Payload Part 2 HTTP/1.1 URL Part 1 Random Headers James Newsome May, 2005

Generating Token Subsequence Signatures Use dynamic programming to find longest common token subsequence (lcseq) between 2 samples in O(n2) time [SmithWaterman1981] Find lcseq of first two samples Iteratively find lcseq of intermediate result and next sample James Newsome May, 2005

Experiment: Signature Generation How many worm samples do we need? Too few samples  signature is too specific  false negatives Experimental setup Using a 25 day port 80 trace from lab perimeter Innocuous pool: First 5 days (45,111 streams) Suspicious Pool: Using Apache exploit described earlier Non-invariant portions filled with random bytes Signature evaluation: False positives: Last 10 days (125,301 streams) False negatives: 1000 generated worm samples James Newsome May, 2005

Signature Generation Results # Worm Samples Conjunction Subseq 2 100% FN 3 to 100 0% FN .0024% FP 0% FN .0008% FP GET .* HTTP/1.1\r\n.*\r\nHost: .*\xee\xb7.*\xb2\x1e.*\r\nHost: .*\xef\xa3.*\x8b\xf4.*\x89\x8b.*E\xeb.*\xff\xbf GET .* HTTP/1.1\r\n.*\r\nHost: .*\r\nHost:.*\xff\xbf James Newsome May, 2005

Also Works for Binary Protocols Created polymorphic version of BIND TSIG exploit used by Li0n Worm Single substring signatures: 2 bytes of Ret Address: .001% false positives 3 byte TSIG marker: .067% false positives Conjunction: 0% false positives Subsequence: 0% false positives Evaluated using a 1 million request trace from a DNS server that serves a major university and several CCTLDs James Newsome May, 2005

Outline Substring-based signatures insufficient Generating signatures Perfect (noiseless) classifier case Signature classes & algorithms Evaluation Imperfect classifier case Clustering extensions Attacking the system Conclusion James Newsome May, 2005

Noise in Suspicious Flow Pool What if classifier has false positives? 3 worm samples: GET .* HTTP/1.1\r\n.*\r\nHost: .*\r\nHost:.*\xff\xbf 3 worm samples + 1 legit GET request: GET .* HTTP/1.1\r\n.*\r\nHost: 3 worm samples + a non-HTTP request: .* James Newsome May, 2005

Our Approach: Hierarchical Clustering Used for multiple sequence alignment in Bioinformatics [Gusfield1997] Initialization: Each sample is a cluster Each cluster has a signature matching all samples in that cluster Greedily merge clusters Minimize false positive rate, using innocuous pool Stop when any further merging results in significant false positives Output the signature of each final cluster of sufficient size James Newsome May, 2005

Hierarchical Clustering Worm Sample 1 Innoc Sample 1 Worm Sample 2 Innoc Sample 2 Worm Sample 3 Merge Candidate Common substrings: HTTP/1.1, GET, … High false positive rate! James Newsome May, 2005

Hierarchical Clustering Worm Sample 1 Innoc Sample 1 Worm Sample 2 Innoc Sample 2 Worm Sample 3 Merge Candidate Common substrings: HTTP/1.1, GET, … High false positive rate! James Newsome May, 2005

Hierarchical Clustering Worm Sample 1 Innoc Sample 1 Worm Sample 2 Innoc Sample 2 Worm Sample 3 Merge Candidate Common substrings: HTTP/1.1, GET, \xff\xbf, \xde\xad Low false positive rate (but high false negative rate) this wasn’t described clearly. e.g., haven’t merged enough yet. explicitly show bad part of signature, and how it goes away in the next merge. not coming across that every merge drops content from the signature. James Newsome May, 2005

Hierarchical Clustering Worm Sample 1 Innoc Sample 1 Worm Sample 2 Innoc Sample 2 Worm Sample 3 Cluster Cluster HTTP/1.1, GET, \xff\xbf, \xde\xad HTTP/1.1, GET, \xff\xbf James Newsome May, 2005

Clustering Evaluation (with noise) Suspicious pool consists of: 5 polymorphic worm samples Varying number of noise samples Noise samples chosen uniformly at random from evaluation trace Clustering uses innocuous pool to estimate false positive rate James Newsome May, 2005

Clustering Results Noise Conjunction Fpos Fneg Subseq 0% .0024% 0% .0008% 0% 38% 50% 80% .7470% 100% 1.109% 100% 90% .3384% 100% .4150% 100% .6903% 100% 1.716% 100% James Newsome May, 2005

Outline Substring-based signatures insufficient Generating signatures Perfect (noiseless) classifier case Signature classes & algorithms Evaluation Imperfect classifier case Clustering extensions Attacking the system Conclusion James Newsome May, 2005

Overtraining Attacks Conjunction and Subsequence can be tricked into overtraining Red herring attack Include extra fixed tokens Remove them over time Result: Have to keep generating new signatures Coincidental pattern attack Create ‘coincidental’ patterns given a small set of worm samples Result: more samples needed to generate a low-false-negative signature (50+) James Newsome May, 2005

Solution: Threshold matching Signature classifies as worm if enough tokens are present Implementation: Bayes Signatures Assign each token a score based on Bayes Law Choose highest-acceptable false positive rate Choose threshold that gets at most that rate in innocuous training pool Properties: Signatures generated and matched in linear time Not susceptible to overtraining attacks Don’t need clustering You get the false positive rate you specify Currently does not use ordering James Newsome May, 2005

Outline Substring-based signatures insufficient Generating signatures Perfect (noiseless) classifier case Signature classes & algorithms Evaluation Imperfect classifier case Clustering extensions Attacking the system Conclusion James Newsome May, 2005

Remaining False Positives Conjunction signature has 3 false positives 1 of these also matched by subsequence signature What is causing these? Would it be so bad if 3 legitimate requests were filtered out every 10 days? James Newsome May, 2005

The Offending Request GET /Download/GetPaper.php?paperId=XXX HTTP/1.1 … Host: nsdi05.cs.washington.edu\r\n POST /Author/UploadPaper.php HTTP/1.1\r\n <binary data containing \xff\xbf> James Newsome May, 2005

Possible Fixes Use protocol knowledge Use distance between tokens Match on request level instead of TCP flow level Require \xff\xbf be part of Host header Disadvantage: need protocol knowledge Use distance between tokens Makes signatures more specific Disadvantage: risks more overtraining attacks James Newsome May, 2005

Future Work Defending against overtraining Further reducing false positives Could be reduced by learning more features (such as offsets) But this increases risk of overtraining Promising solution: semantic analysis Automatically analyze how worm exploit works Only use features that must be present First steps in Newsome05 (NDSS) Currently extending this work (Brumley-Newsome-Song) James Newsome May, 2005

Conclusions Key observation: Content variability is limited by nature of the software vulnerability Have shown that: Accurate signatures can be automatically generated for polymorphic worms Demonstrated low false positives with real exploits, on real traffic traces James Newsome May, 2005

Thanks! Questions? Contact: jnewsome@ece.cmu.edu James Newsome May, 2005

Coincidental Pattern Attack Conjunction & Subsequence may overtrain Coincidental pattern attack: For non-invariant bytes, choose ‘a’ or ‘b’ Result: Suspicious pool has many substrings in common of form: ‘aabba’, ‘babba’… Unseen worm samples will have many of these substrings, but not every one James Newsome May, 2005

Results with “Coincidental Pattern Attack” False negatives: Suspicious Pool Size James Newsome May, 2005

Results: Multiple Worms + Noise Conjunction Subseq Bayes 0% .0024% 0% .0008% 0% .008% 0% 38% 50% 80% .7470% 100% 1.109% 100% 90% .3384% 100% .4150% 100% .6903% 100% 1.716% 100% 10% 100% James Newsome May, 2005

The Innocuous Pool Used to determine: Goals: Can be generated by: How often tokens appear in legit traffic Estimated signature false positive rates Goals: Representative of current traffic Does not contain worm flows Can be generated by: Taking a relatively old trace Filtering out known worms and exploits James Newsome May, 2005

Key Algorithm: Token Extraction Need to identify useful tokens Substrings that occur in worm samples Problem: Find all substrings that: Occur in at least k out of n samples Are at least x bytes long Can be solved in time linear in total length of samples using a suffix tree James Newsome May, 2005

Signature Class (III): Bayes Use a Bayes classifier Presence of a token is a feature Hence, each token has a score: Generated signature: (‘GET’: .0035, ‘Host:’: .0022, ‘HTTP/1.1’: .11, ‘\xff\xbf’: 3.15) Threshold=1.99 .008% false positive rate (10 / 125,301) James Newsome May, 2005

Generating Bayes Signatures Use suffix tree to find tokens that occur in a significant number of samples Determine probabilities: Pr(worm) = Pr(~worm) = .5 Pr(substring|worm): use suspicious pool Pr(substring|~worm): use innocuous pool Set a “certainty threshold” c Signature matches a flow if the Bayes formula identifies it as more than c% likely to be a worm Choose c that results in few (< 5) false positives in innocuous pool James Newsome May, 2005

Innocuous Pool Poisoning Before releasing worm: Determine what signature of worm is Flood Internet with innocuous requests that match Eventually included in innocuous training pool Release worm Polygraph will: Generate signature for worm See that it causes many false positives in innocuous pool Reject signature Solution: Use a relatively old trace for innocuous pool Drawback: Hierarchical clustering generates more spurious signatures James Newsome May, 2005