Fast Port Scan Using Sequential Hypothesis Testing Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan.

Slides:



Advertisements
Similar presentations
A small taste of inferential statistics
Advertisements

Tests of Hypotheses Based on a Single Sample
Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
1 Reading Log Files. 2 Segment Format
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
HYPOTHESIS TESTING Four Steps Statistical Significance Outcomes Sampling Distributions.
Intrusion Detection Systems and Practices
Very Fast containment of Scanning Worms Presenter: Yan Gao Authors: Nicholas Weaver Stuart Staniford Vern.
CS 552 Computer Security Richard Martin. Aspects of Security Confidentiality: can a 3rd party see it? Authentication: Who am I talking to? Non-repudiation:
Beyond the perimeter: the need for early detection of Denial of Service Attacks John Haggerty,Qi Shi,Madjid Merabti Presented by Abhijit Pandey.
Lesson 13-Intrusion Detection. Overview Define the types of Intrusion Detection Systems (IDS). Set up an IDS. Manage an IDS. Understand intrusion prevention.
ANOMALY DETECTION AND CHARACTERIZATION: LEARNING AND EXPERIANCE YAN CHEN – MATT MODAFF – AARON BEACH.
Collaborating Against Common Enemies Sachin Katti Balachander Krishnamurthy and Dina Katabi AT&T Labs-Research & MIT CSAIL.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Internet Quarantine: Requirements for Containing Self-Propagating Code David Moore et. al. University of California, San Diego.
Intrusion Detection Systems CS391. Overview  Define the types of Intrusion Detection Systems (IDS).  Set up an IDS.  Manage an IDS.  Understand intrusion.
Chapter 5For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Suppose we wish to know whether children who grow up in homes without access to.
Port Scanning.
Chapter 8 Introduction to Hypothesis Testing. Hypothesis Testing Hypothesis testing is a statistical procedure Allows researchers to use sample data to.
Tracking Port Scanners on the IP Backbone Tao Ye Sprint Burlingame, CA Avinash Sridharan University of Southern California.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Business Statistics - QBM117 Introduction to hypothesis testing.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 – Multiple comparisons, non-normality, outliers Marshall.
AIS, Passwords Should not be shared Should be changed by user Should be changed frequently and upon compromise (suspected unauthorized disclosure)
1 Oppliger: Ch. 15 Risk Management. 2 Outline Introduction Formal risk analysis Alternative risk analysis approaches/technologies –Security scanning –Intrusion.
Speaker:Chiang Hong-Ren Botnet Detection by Monitoring Group Activities in DNS Traffic.
CS 552 Computer Security Richard Martin. Aspects of Security Confidentiality: can a 3rd party see it? Authentication: Who am I talking to? Non-repudiation:
Fast Portscan Detection Using Sequential Hypothesis Testing Authors: Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan Publication: IEEE.
Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.
Intrusion Detection and Prevention. Objectives ● Purpose of IDS's ● Function of IDS's in a secure network design ● Install and use an IDS ● Customize.
Detection Unknown Worms Using Randomness Check Computer and Communication Security Lab. Dept. of Computer Science and Engineering KOREA University Hyundo.
1 Lecture 19: Hypothesis Tests Devore, Ch Topics I.Statistical Hypotheses (pl!) –Null and Alternative Hypotheses –Testing statistics and rejection.
Who Is Peeping at Your Passwords at Starbucks? To Catch an Evil Twin Access Point DSN 2010 Yimin Song, Texas A&M University Chao Yang, Texas A&M University.
© 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual.
IEEE Communications Surveys & Tutorials 1st Quarter 2008.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Fast Port Scan Detection Using Sequential Hypotheses Testing* Authors: Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan IEEE Symposium.
Leveraging Asset Reputation Systems to Detect and Prevent Fraud and Abuse at LinkedIn Jenelle Bray Staff Data Scientist Strata + Hadoop World New York,
Automating Analysis of Large-Scale Botnet Probing Events Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson* Lab for Internet and Security Technology (LIST)
1 Very Fast containment of Scanning Worms By: Artur Zak Modified by: David Allen Nicholas Weaver Stuart Staniford Vern Paxson ICSI Nevis Netowrks ICSI.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 8 Hypothesis Testing I. Significant Differences  Hypothesis testing is designed to detect significant differences: differences that did not occur.
Intrusion Detection Systems Paper written detailing importance of audit data in detecting misuse + user behavior 1984-SRI int’l develop method of.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Statistical Techniques
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Role Of Network IDS in Network Perimeter Defense.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Significance Tests: The Basics Textbook Section 9.1.
Fraud Detection Notes from the Field. Introduction Dejan Sarka –Data.
Very Fast Containment of Scanning Worms Written By: Nicholas Weaver, Stuart Staniford, Vern Paxson Presentation By: Nathan Johnson A.K.A Space Monkey and.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P.
Firewalls. Overview of Firewalls As the name implies, a firewall acts to provide secured access between two networks A firewall may be implemented as.
Chapter 9 Introduction to the t Statistic
Very Fast containment of Scanning Worms Presented by Vinay Makula.
Internet Quarantine: Requirements for Containing Self-Propagating Code
Authors – Johannes Krupp, Michael Backes, and Christian Rossow(2016)
Characteristics of Internet Background Radiation
Very Fast containment of Scanning Worms
Introduction to Networking
Introduction to Networking
Chapter Nine Part 1 (Sections 9.1 & 9.2) Hypothesis Testing
Very Fast Containment of Scanning Worms
1 Chapter 8: Introduction to Hypothesis Testing. 2 Hypothesis Testing The general goal of a hypothesis test is to rule out chance (sampling error) as.
Transport Layer Identification of P2P Traffic
Presentation transcript:

Fast Port Scan Using Sequential Hypothesis Testing Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan

Introduction  Port Scanning: Reconnaissance Hackers will scan host/hosts for vulnerable ports as potential avenues of attack Hackers will scan host/hosts for vulnerable ports as potential avenues of attack  Not clearly defined Scan sweeps Scan sweeps Connection to a few addresses, some fail?Connection to a few addresses, some fail? Granularity Granularity Separate sources as one scan?Separate sources as one scan? Temporal Temporal Over what timeframe should activity be trackedOver what timeframe should activity be tracked Intent Intent Hard to differentiate between benign scans and scans with malicious intentHard to differentiate between benign scans and scans with malicious intent

Previous Scanning Techniques  Malformed Packets Packets used for “stealth scanning” Packets used for “stealth scanning”  Connections to ports/hosts per unit time Checks whether a source hits more than X ports on Y hosts in Z time Checks whether a source hits more than X ports on Y hosts in Z time  Failed connections Malicious connections will have a higher ratio of failed connection attempts Malicious connections will have a higher ratio of failed connection attempts

Bro NIDS  Current algorithm in use for years  High efficiency  Counts local connections from remote host  Differentiates connections by service  Sets threshold  Blocks suspected malicious hosts

Flaws in Bro  Skewed for little-used servers Example: a private host that one worker remotely logs into from home Example: a private host that one worker remotely logs into from home  Difficult to choose probabilities  Difficult to determine never-accessed hosts Needs data to determine appropriate parameters Needs data to determine appropriate parameters

Threshold Random Walk (TRW)  Objectives for the new algorithm: Require performance near Bro Require performance near Bro High speed High speed Flag as scanner if no useful connection Flag as scanner if no useful connection Detect single remote hosts Detect single remote hosts

Data Analysis  Data analyzed from two sites, LBL and ICSI Research laboratories with minimal firewalling Research laboratories with minimal firewalling LBL: 6000 hosts, sparse host density LBL: 6000 hosts, sparse host density ICSI: 200 hosts, dense host density ICSI: 200 hosts, dense host density

Separating Possible Scanners  Which of remainder are likely, but undetected scanners? Argument nearly circular Argument nearly circular Show that there are properties plausibly used to distinguish likely scanners in the remainder Show that there are properties plausibly used to distinguish likely scanners in the remainder Use that as a ground truth to develop an algorithm against Use that as a ground truth to develop an algorithm against

Data Analysis (cont.)  First model Look at remainder hosts making failed connections Look at remainder hosts making failed connections Compare all of remainder to known bad Compare all of remainder to known bad Hope for two modes, where the failed connection mode resembles the known bad Hope for two modes, where the failed connection mode resembles the known bad No such modality exists No such modality exists

Data Analysis (cont.)  Second model Examine ratio of hosts with failed connections made to successful connections made Examine ratio of hosts with failed connections made to successful connections made Known bad have a high percentage of failed connections Known bad have a high percentage of failed connections Conclusion: remainder hosts with <80% failure are potentially benign Conclusion: remainder hosts with <80% failure are potentially benign Rest are suspect Rest are suspect

TRW – continued  Detect failed/succeeded connections  Sequential Hypothesis Testing Two hypotheses: benign (H_0) and scanner (H_1) Two hypotheses: benign (H_0) and scanner (H_1) Probabilities determined by the equations Probabilities determined by the equations Theta_0 > theta_1 (benign has higher chance of succeeding connection) Theta_0 > theta_1 (benign has higher chance of succeeding connection) Four outcomes: detection, false positive, false negative, nominal Four outcomes: detection, false positive, false negative, nominal

Thresholds  Choose Thresholds Set upper and lower thresholds, n_0 and n_1 Set upper and lower thresholds, n_0 and n_1 Calculate likelihood ratio Calculate likelihood ratio Compare to thresholds Compare to thresholds

Choosing Thresholds  Choose two constants, alpha and beta Probability of false positive (P_f) <= alpha Probability of false positive (P_f) <= alpha Detection probability (P_d) >= beta Detection probability (P_d) >= beta Typical values: alpha = 0.01, beta = 0.99 Typical values: alpha = 0.01, beta = 0.99  Thresholds can be defined in terms of P_f and P_d or alpha and beta n_1 <= P_d/P_f n_1 <= P_d/P_f n_0 >= (1-P_d)/(1-P_f) n_0 >= (1-P_d)/(1-P_f) Can be approximated using alpha and beta Can be approximated using alpha and beta n_1 = beta/alpha n_1 = beta/alpha n_0 = (1-beta)/(1-alpha) n_0 = (1-beta)/(1-alpha)

Evaluation Methodology  Used the data from the two labs  Knowledge of whether each connection is established, rejected, or unanswered  Maintains 3 variables for each remote host D_s, the set of distinct hosts previously connected to D_s, the set of distinct hosts previously connected to S_s, the decision state (pending, H_0, or H_1) S_s, the decision state (pending, H_0, or H_1) L_s, the likelihood ratio L_s, the likelihood ratio

Evaluation Methodology (cont.)  For each line in dataset Skip if not pending Skip if not pending Determine if connection is successful Determine if connection is successful Check whether is already in connection set; if so, proceed to next line Check whether is already in connection set; if so, proceed to next line Update D_s and L_s Update D_s and L_s If L_s goes beyond either threshold, update state accordingly If L_s goes beyond either threshold, update state accordingly

Results

TRW Evaluation  Efficiency – true positives to rate of H1  Effectiveness – true positives to all scanners  N – Average number of hosts probed before detection

TRW Evaluation (cont.)  TRW is far more effective than the other two  TRW is almost as efficient as Bro  TRW detects scanners in far less time

Potential Improvements  Leverage Additional Information Factor for specific services (e.g. HTTP) Factor for specific services (e.g. HTTP) Distinguish between unanswered and rejected connections Distinguish between unanswered and rejected connections Consider time local host has been inactive Consider time local host has been inactive Consider rate Consider rate Introduce correlations (e.g. 2 failed in a row worse than 1 fail, 1 success, 1 fail) Introduce correlations (e.g. 2 failed in a row worse than 1 fail, 1 success, 1 fail) Devise a model on history of the hosts Devise a model on history of the hosts

Improvements (cont.)  Managing State Requires large amount of maintained states for tracking Requires large amount of maintained states for tracking However, capping the state is vulnerable to state overflow attacks However, capping the state is vulnerable to state overflow attacks  How to Respond What to do when a scanner is detected? What to do when a scanner is detected? Is it worth blocking? Is it worth blocking?  Evasion and Gaming Spoofed IPs Spoofed IPs Institute “whitelists”Institute “whitelists” Use a honeypot to try to connectUse a honeypot to try to connect Evasion (inserting legitimate connections in scan) Evasion (inserting legitimate connections in scan) Incorporating other information, such as a model of what is normal for legitimate users and give less weight to connections not fitting the patternIncorporating other information, such as a model of what is normal for legitimate users and give less weight to connections not fitting the pattern  Distributed Scans Scans originating from more than one source Scans originating from more than one source Difficult to fix in this framework Difficult to fix in this framework

Conclusion/Summary  TRW- based on ratio of failed/succeeded connections  Sequential Hypothesis Testing  Highly accurate  Quick Response