Network Asset Discovery & Tracking Vern Paxson University of California Berkeley, California USA August 23, 2010.

Slides:



Advertisements
Similar presentations
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Advertisements

Web Mining.
Netflow Data-Mining Techniques Chris Poetzel Argonne National Laboratory Scott Pinkerton.
Transitioning to IPv6 April 15,2005 Presented By: Richard Moore PBS Enterprise Technology.
Good afternoon. My name is Marek Pawłowski
1 Planetary Network Testbed Larry Peterson Princeton University.
© 2013 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual.
C van Ingen, D Agarwal, M Goode, J Gupchup, J Hunt, R Leonardson, M Rodriguez, N Li Berkeley Water Center John Hopkins University Lawrence Berkeley Laboratory.
Web Communication Client attempts to “pull” information from server – http message sent across Internet by TCP/IP* – packet switching used to route message.
1 Internet Networking Spring 2004 Tutorial 13 LSNAT - Load Sharing NAT (RFC 2391)
1 Prefetching the Means for Document Transfer: A New Approach for Reducing Web Latency 1. Introduction 2. Data Analysis 3. Pre-transfer Solutions 4. Performance.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
Report by: Loizos Konomou EL933 Fall 2005 Prof: Yong Liu Ruoming Pang, Mark Allman, Mike Bennett, Jason Lee, Vern Paxson, Brian Tierney Princeton University,
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
NetFlow Analyzer Drilldown to the root-QoS Product Overview.
1 Personal Activity Coordinator (PAC) Xia Hong UC Berkeley ISRG retreat 1/11/2000.
Unconstrained Endpoint Profiling (Googling the Internet)‏ Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University.
Netflow and Botnets Steven M. Bellovin Columbia University 1smb.
Overview of Search Engines
RelSamp: Preserving Application Structure in Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, Ramana Rao Kompella, Sanjay Rao.
VMware vCenter Server Module 4.
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
Enterprise Search. Search Architecture Configuring Crawl Processes Advanced Crawl Administration Configuring Query Processes Implementing People Search.
#SPC271 IT-Pro, Level 300 Demo Heavy Session (Hopefully!)
SECURING NETWORKS USING SDN AND MACHINE LEARNING DRAGOS COMANECI –
Acceleratio Ltd. is a software development company based in Zagreb, Croatia, founded in We create innovative software solutions for SharePoint,
S New Security Developments in DICOM Lawrence Tarbox, Ph.D Chair, DICOM WG 14 (Security) Siemens Corporate Research.

Routing and Routing Protocols Dynamic Routing Overview.
思科网络技术学院理事会. 1 Application Layer Functionality and Protocols Network Fundamentals – Chapter 3.
GrIDS -- A Graph Based Intrusion Detection System For Large Networks Paper by S. Staniford-Chen et. al.
COEN 252 Computer Forensics Collecting Network-based Evidence.
Chapter 1 Introduction to Data Mining
70-291: MCSE Guide to Managing a Microsoft Windows Server 2003 Network, Enhanced Chapter 6: Name Resolution.
1 Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Speaker: Jun-Yi Zheng 2010/03/29.
CIS 450 – Network Security Chapter 3 – Information Gathering.
2012 4th International Conference on Cyber Conflict C. Czosseck, R. Ottis, K. Ziolkowski (Eds.) 2012 © NATO CCD COE Publications, Tallinn 朱祐呈.
Mr. Mark Welton.  Firewalls are devices that prevent traffic from entering or leaving a network  Firewalls are often used between networks, or when.
Module 3: Designing IP Addressing. Module Overview Designing an IPv4 Addressing Scheme Designing DHCP Implementation Designing DHCP Configuration Options.
Module 4: Planning, Optimizing, and Troubleshooting DHCP
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
Chapter 5: Implementing Intrusion Prevention
User Fingerprinting Jeffrey Pang 1 Ben Greenstein 2 Ramakrishna Gummadi 3 Srinivasan Seshan 1 David Wetherall 2,4 Presenter: Nan Jiang Most Slides:
802.11n Sniffer Design Overview Vladislav Mordohovich Igor Shtarev Luba Brouk.
1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
Rendezvous Regions: A Scalable Architecture for Service Location and Data-Centric Storage in Large-Scale Wireless Sensor Networks Karim Seada, Ahmed Helmy.
Intrusion Detection Cyber Security Spring Reading material Chapter 25 from Computer Security, Matt Bishop Snort –
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Module 12: Responding to Security Incidents. Overview Introduction to Auditing and Incident Response Designing an Audit Policy Designing an Incident Response.
1 Flexible, High-Speed Intrusion Detection Using Bro Vern Paxson Computational Research Division Lawrence Berkeley National Laboratory and ICSI Center.
Change Is Hard: Adapting Dependency Graph Models For Unified Diagnosis in Wired/Wireless Networks Lenin Ravindranath, Victor Bahl, Ranveer Chandra, David.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
PDS4 Demonstration Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Resolve today’s IT management dilemma Enable generalist operators to localize user perceptible connectivity problems Raise alerts prioritized by the amount.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Using HTTP Access Logs To Detect Application-Level Failures In Internet Services Peter Bodík, UC Berkeley Greg Friedman, Lukas Biewald, Stanford University.
Windows Vista Configuration MCTS : Advanced Networking.
OSIsoft High Availability PI Replication Colin Breck, PI Server Team Dave Oda, PI SDK Team.
Steve Simon MVP SQL Server BI
Scaling Big Data Mining Infrastructure: The Twitter Experience
Securing the Network Perimeter with ISA 2004
IBM BPM online Training in Bangalore
Plethora: Infrastructure and System Design
Intro to Ethical Hacking
Network Profiler: Towards Automatic Fingerprinting of Android Apps
File Transfer Issues with TCP Acceleration with FileCatalyst
Unconstrained Endpoint Profiling (Googling the Internet)‏
Presentation transcript:

Network Asset Discovery & Tracking Vern Paxson University of California Berkeley, California USA August 23, 2010

Overview Grounding asset discovery in reality: empirical enterprise data  Acquired extensive data from operational environments Supporting asset discovery and tracking with capture/archive technology  VAST = Visibility Across Time and Space  Enhancing “time machine” technology towards operational use Exploration of asset discovery algorithms  Mining for unique signatures & clusters

Access To Empirical Enterprise Data Leveraging ties with operational cybersecurity at Lawrence Berkeley National Lab (LBL), we obtained access to extensive raw internal logs  ~4,000 users, ~12,000 internal hosts, Gbps/10Gbps  Archive resides beyond OTP portal  Exportable to team members we work with using negotiated anonymization Can also mediate access via running analyses via portal  Ground truth (or at least partial) available  Topology, historical DNS also available

Scope of the Data Netflow: 74B records across 15 months  Recorded at 3 internal core routers  5-minute dumps  ~1K flows/sec LDAP: 4.5 years, 5.6B records DNS: 5 years, 47B records 5 years, 17B records  Received, sent, read via {POP,IMAP,HTTP} DHCP: 2 months, 144M records Individual systems: 2 months, 1.6B records Logs are a pain to deal with. Written in many distinct formats, meant for human-not- machine consumption

VAST: Motivating Premise Modern serious attacks often manifest –Over a range of time scales –Involving numerous system components Serious = –E.g. stolen credentials –E.g. insiders, spear-phishers Detecting these requires broad visibility –Across time (into the past; looking to the future) –Across space (different forms of sensing; inter- site)

A General Network Time Machine Policy-neutral data Uniform data model VAST Repository For assets: Extensive uniform logging of activity for mining/discovery Unified asset tracking using general data model

VAST DB System Architecture Event Streams DispatcherDispatcher Query Engine OperatorOperator EventDataEventDataIndexIndex Aggr.Aggr. Archive Stream Query Engine Live Analysis

Exploring Longitudinal Patterns of Enterprise Activity Visualization of internal DNS lookups of internal LBL hosts –Based on longitudinal DNS logs X axis: position in LBL address space Y axis: scaled to number of lookups (Demo)

Preliminary Exploration of Netflow Data Single day from LBL –9,702 source hosts, 11,362 destinations Removed internal scanners Very simple clustering: Jaccard index on each host’s destinations –Note: doesn’t mean host was client Initial crunch took ~24 CPU hours –Coded in Scala, 15 minutes on 17-node cluster For exact matches, 91% of hosts unique Remainder exhibit ~ power-law structure

Preliminary Exploration of Connection Patterns To what degree does a host’s past activity suffice to distinguish its future activity? –Use #1: find hosts that significantly alter their behavior E.g., due to failure/failover –Use #2: track assets / disambiguate NAT/DHCP aliasing –Use #3: understand what makes a host unique (~ “role discovery”) / find similar hosts Outbound traffic data set: 402 non-NATed source hosts –1,528,619 distinct destinations –168 days Outbound HTTP data set: 160 non-NATed source hosts –62,031 distinct HTTP host header destinations –137 days

Fingerprinting End Systems, con’t So far, two assessments: –A: train first 10 days, evaluate on next 10 days –B: train first 30 days, evaluate on next 30 days Classification approach #1: Naïve Bayes –Use destinations as symbols for bag-of-words –P[Correct system in scenario A]: 53% –P[Correct system in scenario B]: 53% However: in failure instances, often the correct system is near the top …

Fingerprinting End Systems, con’t Classification approach #2: Jaccard index –Destinations weighted by their relative rarity –P[Correct for A]: 77% –P[Correct for B]: 70%  Benefit in considering constellations of destinations rather than just individual destinations in isolation

Next Steps Begin navigating huge LBL logs to determine –Extent of information available –Efficient & sound ways to sample/slice data –Low-hanging fruit for asset identification Work towards operational VAST deployment to gather future such data in a unified/coherent fashion Refine clustering techniques towards identifying sets of servers, including backups Develop/refine fingerprinting techniques for asset tracking