28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 1 Cyber-TA: Massive and Distributed Data Correlation Phillip Porras - Computer Science Laboratory, SRI International 28 September 2006 Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 2 Massive Data Correlation Group: Examining strategies to collect and analyze local network events in search of large-scale attack phenomena, emerging malware threats, stealth activity across large-scale networks Contributors: SRI, Yale, SANS Institute, NCSU, UC Davis, GA-Tech, and others Perspectives: Massive/Passive Analysis Methods: Examining large-scale data correlation strategies to apply in incoming security log data from the repository “Data utility requirements” for data privacy services Optimal data sources New (and current) correlation strategies must address data anonymization Distributed Analysis Methods: Distribute attack detection logic to producers, collect results abstractions and conduct group consensus analyses Massive Data Correlation Group Massive Data Correlation Data Analysis Approaches Stealth Threats Massive PPDM Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter Shifts and Spikes Highly Predictive Blacklists Distributed Correlation Techniques
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 3 Data Analysis Approaches Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter Massive/Passive Analysis Methods low-rate (“Stealth”) pattern/sequence detection in massive data stores massive privacy-preserving data mining strategies (Massive PPDM) fast entropy-shift detection in high-volume data streams Highly-Predictive Blacklist (HPB) production Distributed Analysis Methods: producer-side behavior-based malware correlation ( botHunter v0.9 ) summary statistics, consensus attack detection and trend analyses Massive Data Correlation Data Analysis Approaches Stealth Threats Massive PPDM Shifts and Spikes Highly Predictive Blacklists Distributed Correlation Techniques
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 4 Isolating Stealthy Actions in Massive Data Volumes Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter Objective: “Stealth” def in this context: seeking long-duration or short- sequence deterministic behavior patterns in massive data streams Current Detection Methods: lack computational and memory efficiency in processing massive data stores Current coordinated attack discovery (e.g., attack collaboration) have not been applied in repository-scale applications We seek data pruning techniques, optimal data attribute selections that will facilitate various deterministic behavior pattern analyses: Low-speed scanning, common malware communication patterns, long-duration propagation analyses, regularities in IDS Log production patterns that indicate detection redundancies… Employ massive-data analysis techniques in areas such as streaming algorithmics, very-large databases, and distributed data mining Massive Data Correlation Data Analysis Approaches Stealth Threats Massive PPDM Shifts and Spikes Highly Predictive Blacklists Distributed Correlation Techniques
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 5 80 Example Low-density pattern analyzer: port N-Grams Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter Provides a basis upon which Automated discovery of emerging malware scan patterns Local Systems can be compared to global N-Gram patterns FOUND: On days 1-3 there were sources per day probed the following 10 port combination (All MS B.O. Targets) 0080 – Web Server 0135 – MS DCE Locator Service (DHCP, DNS, WINS) 0139 – MS NetBios 0445 – MS Win2K SMB 1025 – CAN MS LSASRV.DLL B.O 1433 – MS SQL-Server B.O – MS Bagle Virus Backdoor 3127 – MS MyDoom Backdoor 5000 – BioNet, Bubble, Blazer, ICKiller Backdoors 6129 – MS Dameware Remote Admin 80:135:139:445:1025:1433:2745:3127:5000: :139:445:1025:1433:2745:3127:5000: :445:1025:1433:2745:3127:5000: :135:139:445:1025:2745:3127:5000: M connection over A 56K unused IP Common SRC_IP cnts Dst_Port N-Grams Massive Data Correlation Data Analysis Approaches Stealth Threats Massive PPDM Shifts and Spikes Highly Predictive Blacklists Distributed Correlation Techniques
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 6 Massive PPDM Strategies Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter Current PPDM Methods Peer-based shared encryption scheme (e.g., homomorphic encryption) Example Capabilities Privacy Preserving Set Intersection: All parties want the intersection of their private datasets revealed, without gaining/revealing non-intersecting data Privacy Preserving Set Matching: Each member P i wants to know which values in its set intersect with values of the other members set, without gaining/revealing non-matchers Solutions are traced to 2-party case of private equality testing, among other techniques Massive PPDM: PPDM in non-peer-based environments (e.g., large-scale sensor grids) PPDM computational scalability and lightweight key coordination schemes Usage Concept: N coalition partners wish to compare netflow/intrusion/FW logs to find common attack sources :: insufficient trust to openly share unrelated connection histories Massive Data Correlation Data Analysis Approaches Stealth Threats Massive PPDM Shifts and Spikes Highly Predictive Blacklists Distributed Correlation Techniques
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 7 Massive Data Efficient Change/Shift Detection Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter Entropy LETS TALK Massive Data Correlation Data Analysis Approaches Stealth Threats Massive PPDM Shifts and Spikes Highly Predictive Blacklists Distributed Correlation Techniques
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 8 S. Katti, B. Krishnamurthy, D. Katabi, “Collaborating Against Common Enemies,” ACM SIGCOMM’05 Internet Measurement Conference. Surveyed data from 1700 DShield Sensors Introduced Highly Collaborative Groups: Relative small membership sizes Correlated attacks appear at corr_group networks within small time frames Groups relations are long lasting Cross group relations have small intersections Implications: blacklist sharing among groups may yield higher relevance rates, more managable sizes Highly-Predictive Blacklisting (HPB) - Concept Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter Global Blacklist Worst Offender List New Offenders Sensor Repository Internet Contributor Pool Correlated Group Blacklist Correlated Group Blacklist Massive Data Correlation Data Analysis Approaches Stealth Threats Massive PPDM Shifts and Spikes Highly Predictive Blacklists Distributed Correlation Techniques
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 9 Contributor Pool Cluster Details Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter Massive Data Correlation Data Analysis Approaches Stealth Threats Massive PPDM Shifts and Spikes Highly Predictive Blacklists Distributed Correlation Techniques Clustering Logic Each node corresponds to a /24 subnet. Different colors represent different prefixes. Two nodes are connected if more than 10% of the attacks target one nodes also go to the other. The nodes in the clusters are highly connected while there is little or no connection between nodes in different clusters.
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 10 Clusters are constructed using day one’s alert reports On day one: –attackers observed by the repository: 976,997 –attackers observed by the cluster: 10,106 On day two: –over 50% of the attackers seen by any node in the cluster can be predicted by day one’s observation from the cluster Day one repository observation Day one attack to the cluster Day two attack HPB – Example Data Assessment Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter Massive Data Correlation Data Analysis Approaches Stealth Threats Massive PPDM Shifts and Spikes Highly Predictive Blacklists Distributed Correlation Techniques
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 11 bØtHunt3r A behavior-based correlation framework for botnet detection Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 12 What is botHunter ? A Real Case Study Behavior-based Correlation Architectural Overview Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter botHunter Sensors Correlation Framework Example botHunter Output Cyber-TA Integration botHunter is a passive bot detection system, consisting of… Snort-based sensor suite specialized in malware-specific event detection malware-specific inbound scan detection using TRW variant comprehensive remote to local exploit detection, emphasizing most common methods PAYL-based session anomaly detection system detecting payload exploits over key TCP protocols Botnet specific egg download banners, bot registration acknowledements Victim-to-C&C-based communications exchanges, particularly for IRC bot protocols inbound to outbound scan monitoring system Cyber-TA-based plugin correlator combines information from sensors to recognize bots that infect and coordinate with your internal network assets Submits “bot-detection profiles” to the Cyber-TA repository infrastructure
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 13 Bot infection case study: Phatbot Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter An example “infection lifecycle” of the Phatbot infection captured in a controlled VMWare environment: A: Attack, V: Victim, C: C&C Server E1: A.* V.{2745, 135, 1025, 445, 3127, 6129, 139, 5000} (Bagle, DCOM2, DCOM, NETBIOS, DOOM, DW, NETBIOS, UPNP…TCP connections w/out content transfers) E2: A.* V.135 (Windows DCE RCP exploit in payload) E3: V.* A (transfer a relatively large file via random A port specified by exploit) E4: V.* C.6668 (connect to an IRC server) E5: V.* V‘.{2745, 135, 1025, 445, 3127, 6129, 139, 5000} (V begins search for new infection targets, listens on for future egg downloads) What is botHunter? A Real Case Study Behavior-based Correlation Architectural Overview botHunter Sensors Correlation Framework Example botHunter Output Cyber-TA Integration
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 14 A Behavioral-based Approach Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter E1: Inbound Scan E2: Inbound Infection E3: Egg Download E4: C&C Comms E5: Outbound Scan Type I Type II A-2-V V-2-A V-2-C V-2-* Search for duplex communication sequences that are indicative of infection-coordination-infection lifecycle Under a weighted correlation scheme, external stimulus is not enough to declare bot stimulus does not require strict ordering, but does require temporal locality botHunter abstracts the infection lifecycle into 5 possible stages What is botHunter? A Real Case Study Behavior-based Correlation Architectural Overview botHunter Sensors Correlation Framework Example botHunter Output Cyber-TA Integration
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 15 e2: Exploits e3: Egg Downloads e4: C&C Traffic Botnets: Architecture Overview Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter Snort SCADE Span Port to Ethernet Device botHunter Ruleset Signature Engine spp_scade.c|h SLADE spp_scade.c|h e2: Payload Anomalies e1: Inbound Malware Scans e5: Outbound Scans botHunter Correlator CTA Anonymizer Plugin Java System Requirements: Snort , OS: Linux, MacOS, Win, FreeBSD, Solaris, Java bothunter.configbothunter.XML C T A P A S R N S O E R T bot Infection Profile: Confidence Score Victim IP Attacker IP List (by confidence) Coordination Center IP (by confidence) Full Evidence Trail: Sigs, Scores, Ports Infection Time Range What is botHunter? A Real Case Study Behavior-based Correlation Architectural Overview botHunter Sensors Correlation Framework Example botHunter Output Cyber-TA Integration
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 16 botHunter Sensor Suite : SCADE Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter SCADE:./snort-2.6.0/src/preprocessors/spp_scade.c Custom malware specific weighted scan detection system for inbound and outbound sources Inbound (E1: Initial Scan Phase): suspicious port scan weighted TRW score = failed connection to vulnerable port = high weight failed connection to other port = median weight successful connection to vulnerable port = low weight Outbound (E5: Victim Outbound Scan): S1 – Scan rate of V over time t S2 – Scan failed connection rate of V over t S3 – Scan target entropy (low revisit rate implies bot search) over t Majority voting scheme employed combines model assessments ∑ i w i log p(ContactPort i |bot) p(ContactPort i | bot) ∑ i w i log p(ContactPort i |bot) p(ContactPort i | bot) What is botHunter? A Real Case Study Behavior-based Correlation Architectural Overview botHunter Sensors Correlation Framework Example botHunter Output Cyber-TA Integration
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 17 botHunter Sensor Suite : SLADE Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter SLADE:./snort-2-6.0/src/preprocessors/spp_slade.c Suspicious payload detect: Modified PAYL 3-gram byte distribution analyzer over a limited set of network services Implements a lossy data structure to capture 3-gram hash space: default vector size = (Versus n=3, = 2 24 ≈ 16M). Current Slade port set: 21, 53, 80, 135, 1025, 445 TCP Auto-transition from train to detect mode: enabled Current Status: in develop to enable per-port auto-threshold selection What is botHunter? A Real Case Study Behavior-based Correlation Architectural Overview botHunter Sensors Correlation Framework Example botHunter Output Cyber-TA Integration
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 18 botHunter Sensor Suite : Signature Engine Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter botHunter Signature Set: Replaces all standard snort rules with five custom rulesets: e[1-5].rules Scope: known worm/bot exploit general traffic signatures, shell/code/script exploits, update/download/registered rules, C&C command exchanges, outbound scans and malware exploits Rule sources: Bleeding Edge malware rulesets Snort Community Rules Snort Registered Free Set Cyber-TA Custom bot-specific rules Current Set: 237 rules, operating on SRI/CSL and GA-Tech networks, relative low false positive rate What is botHunter? A Real Case Study Behavior-based Correlation Architectural Overview botHunter Sensors Correlation Framework Example botHunter Output Cyber-TA Integration
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 19 VictimIP E1 E2 E3 E4 E5 Score botHunter - Correlation Framework Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter Bot-State Correlation Data Structure Rows: Valid Internal Home_Net IP Colums: Bot infection stages Entry: IP addresses that contributed alerts to E-Column Score Column: Cumulative score for per Row Threshold – (row_score > threshold) declare bot InitTime Triggers – An event that initiate pruning timer Pruning Timer – Seconds remaining until a row is reinitialized Characteristics of Bot Declarations states are triggered in any order, but pruning timer reinitializes row state once an InitTime Trigger is activated external stimulus alone cannot trigger bot alert 2 x internal bot behavior triggers bot alert When bot alert is declared, IP addresses are assigned responsibility based on raw contribution Defaults: E1 – Inbound scan detected weight =.25 E2 – Inbound exploit detected weight =.25 E3 – Egg download detected weight =.50 E4 – C&C channel detected weight =.50 E5 – Outbound scan detected weight =.50 Threshold = 1.0 Pruning Interval = 120 seconds What is botHunter? A Real Case Study Behavior-based Correlation Architectural Overview botHunter Sensors Correlation Framework Example botHunter Output Cyber-TA Integration
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 20 Implementation Status and Example Output Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter %./Run_botHunter.csh –c./config/phatbot.config Starting program... Score:1.5 (>= 1.0) Infect Target: Infector List: C & C List: (25), (3) Start:06/22/ :42:23.33 PDT Report End: 06/22/ :44:38.54 PDT INBOUND SCAN (16:42:23 PDT) E1 scade detected host [ ] scanned by [ ] at ports [ ] EXPLOIT (2) (16:42:24.67 PDT) E2 SHELLCODE x86 NOOP 135<-4819 (16:42:24.67 PDT) E2 SHELLCODE x86 0x90 unicode NOOP 135<-4819 EGG DOWNLOAD C and C TRAFFIC (25) (16:42:41.34 PDT-16:43:31.20 PDT) E4 COMMUNITY BOT Internal IRC server detected E4 BLEEDING-EDGE TROJAN BOT - potential scan/exploit command 1037<-6668 E4 COMMUNITY BOT GTBot scan command 1037<-6668 OUTBOUND SCAN (16:43:46.85 PDT) E5 scade detected suspicious scanner [ ] scanning 30 IPs at ports [0 2745] Example VMWare Phatbot Experiment Coordination Center: Initial Bot Infector: Victim System: What is botHunter? A Real Case Study Behavior-based Correlation Architectural Overview botHunter Sensors Correlation Framework Example botHunter Output Cyber-TA Integration
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 21 botHunter - born a Cyber-TA plugin Introduction Approaches to Privacy-Preserving Correlation A Cyber-TA Distributed Correlation Example – botHunter botHunter Correlator CTA Anonymizer Plugin Java bothunter.configbothunter.XML C T A P A S R N S O E R T MIXNET Deliver Daemon Delivery Ack TLS Session TOR Circuit TCP/IP Encrypted Anonymous Log Delivery Protocol Anonymization Service Cyber-TA RDBMS Manager Delivery Ack TLS Session TOR Circuit TCP/IP Cyber-TA Threat Ops Center CTA Anonymizer Snort Alerts What is botHunter? A Real Case Study Behavior-based Correlation Architectural Overview botHunter Sensors Correlation Framework Example botHunter Output Cyber-TA Integration “Bot Profile” Repository
28 September 2006 ARO Kickoff Meeting Phillip Porras Cyber-TA: Secure Collaborative Threat Reconnaissance slide 22 END