George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight.

Slides:



Advertisements
Similar presentations
Network Security Highlights Nick Feamster Georgia Tech.
Advertisements

Bitmap algorithms for flow counting – Internet Measurement Conference, October 2003 Bitmap Algorithms for Counting Active Flows on High Speed Links Cristian.
New Directions in Traffic Measurement and Accounting Cristian Estan (joint work with George Varghese)
New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.
Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College.
OpenSketch Slides courtesy of Minlan Yu 1. Management = Measurement + Control Traffic engineering – Identify large traffic aggregates, traffic changes.
A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,
Estimating TCP Latency Approximately with Passive Measurements Sriharsha Gangam, Jaideep Chandrashekar, Ítalo Cunha, Jim Kurose.
IP Routing Lookups Scalable High Speed IP Routing Lookups.
Streaming Algorithms for Robust, Real- Time Detection of DDoS Attacks S. Ganguly, M. Garofalakis, R. Rastogi, K. Sabnani Krishan Sabnani Bell Labs Research.
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.
Polytechnic University,ECE Department1 Detection of “Hot Spots” Paper Title : Joint Data Streaming and Sampling Techniques for Detection of Super Sources.
Modeling Network Traffic as Images Seong Soo Kim and A. L. Narasimha Reddy Computer Engineering Department of Electrical Engineering Texas A&M University.
5/1/2006Sireesha/IDS1 Intrusion Detection Systems (A preliminary study) Sireesha Dasaraju CS526 - Advanced Internet Systems UCCS.
Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department.
Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
Flash Crowds And Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites Aaron Beach Cs395 network security.
Towards a High-speed Router-based Anomaly/Intrusion Detection System (HRAID) Zhichun Li, Yan Gao, Yan Chen Northwestern.
Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese, and Stefan Savage Manan Sanghi.
CS591A1 Fall Sketch based Summarization of Data Streams Manish R. Sharma and Weichao Ma.
FIREWALLS & NETWORK SECURITY with Intrusion Detection and VPNs, 2 nd ed. 6 Packet Filtering By Whitman, Mattord, & Austin© 2008 Course Technology.
1 BRICK: A Novel Exact Active Statistics Counter Architecture Nan Hua 1, Bill Lin 2, Jun (Jim) Xu 1, Haiquan (Chuck) Zhao 1 1 Georgia Institute of Technology.
1 Towards Anomaly/Intrusion Detection and Mitigation on High-Speed Networks Yan Gao, Zhichun Li, Yan Chen Northwestern Lab for Internet and Security Technology.
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
Attig 1 Automatically Inferring Patterns of Resource Consumption in Network Traffic In Proceedings of SIGCOMM 2003 Reviewed By Michael Attig
Tracking Port Scanners on the IP Backbone Tao Ye Sprint Burlingame, CA Avinash Sridharan University of Southern California.
Packet Filtering. 2 Objectives Describe packets and packet filtering Explain the approaches to packet filtering Recommend specific filtering rules.
Net Optics Confidential and Proprietary Net Optics appTap Intelligent Access and Monitoring Architecture Solutions.
Automated Worm Fingerprinting
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
New Streaming Algorithms for Fast Detection of Superspreaders Shobha Venkataraman* Joint work with: Dawn Song*, Phillip Gibbons ¶,
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.
DNS Security Pacific IT Pros Nov. 5, Topics DoS Attacks on DNS Servers DoS Attacks by DNS Servers Poisoning DNS Records Monitoring DNS Traffic Leakage.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
Vladimír Smotlacha CESNET Full Packet Monitoring Sensors: Hardware and Software Challenges.
The UCSD Network Telescope A Real-time Monitoring System for Tracking Internet Attacks Stefan Savage David Moore, Geoff Voelker, and Colleen Shannon Department.
Mapping Internet Sensors with Probe Response Attacks Authors: John Bethencourt, Jason Franklin, Mary Vernon Published At: Usenix Security Symposium, 2005.
1 LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
Online Identification of Hierarchical Heavy Hitters Yin Zhang Joint work with Sumeet SinghSubhabrata Sen Nick DuffieldCarsten Lund.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
AutoFocus: A Tool for Automatic Traffic Analysis Cristian Estan, University of California, San Diego.
Cristian Estan, Garret Magin University of Wisconsin-Madison USENIX LISA, 17 December 2015 Interactive traffic analysis and visualization with Wisconsin.
PART3 Data collection methodology and NM paradigms 1.
D 陳怡安 R 解巽評 R 高榮泰 IEEE/ACM TRANSACTIONS ON NETWORKING OCTOBER 2006 Cristian Estan, George Varghese, Member, IEEE, and Michael Fisk.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.
Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.
Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.
SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)
Ch. 31 Q and A IS 333 Spring 2016 Victor Norman. SNMP, MIBs, and ASN.1 SNMP defines the protocol used to send requests and get responses. MIBs are like.
1 Netflow Collection and Aggregation in the AT&T Common Backbone Carsten Lund.
SketchVisor: Robust Network Measurement for Software Packet Processing
Jennifer Rexford Princeton University
A Resource-minimalist Flow Size Histogram Estimator
Data Streaming in Computer Networking
The Variable-Increment Counting Bloom Filter
Network and Services Management
Data collection methodology and NM paradigms
Cristian Estan, Stefan Savage, George Varghese
Optimal Elephant Flow Detection Presented by: Gil Einziger,
Qun Huang, Patrick P. C. Lee, Yungang Bao
SCREAM: Sketch Resource Allocation for Software-defined Measurement
Mapping Internet Sensors With Probe Response Attacks
Memento: Making Sliding Windows Efficient for Heavy Hitters
Lu Tang , Qun Huang, Patrick P. C. Lee
PCAV: Evaluation of Parallel Coordinates Attack Visualization
Presentation transcript:

George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Research motivation The Internet in 1969The Internet today Problems Flexibility, speed, scalability Overloads, attacks, failures Measurement & control Ad-hoc solutions suffice Engineered solutions needed Research direction: towards a theoretical foundation for systems doing engineered measurement of the Internet

Current solutions Analysis Server Raw data Traffic reports Network Operator Router Fast link Memory Network State of the art: simple counters (SNMP), time series plots of traffic (MRTG), sampled packet headers (NetFlow), top k reports Concise? Accurate?

Measurement challenges Data reduction – performance constraints  Memory (Terabytes of data each hour)  Link speeds (40 Gbps links)  Processing (8 ns to process a packet) Data analysis – unpredictability  Unconstrained service model (e.g. Napster, Kazaa )  Unscrupulous agents (e.g. Slammer worm)  Uncontrolled growth (e.g. user growth)

Main contributions Data reduction: Algorithmic solutions for measurement building blocks  Identifying heavy hitters (part 1 of talk)  Counting flows or distinct addresses Data analysis: Traffic cluster analysis automatically finds the dominant modes of network usage (part 2 of talk)  AutoFocus traffic analysis system used by hundreds of network administrators

Identifying heavy hitters Analysis Server Raw data Traffic reports Router Fast link Memory Network Identifying heavy hitters with multistage filters Network Operator

Why are heavy hitters important? Network monitoring: Current tools report top applications, top senders/receivers of traffic Security: Malicious activities such as worms and flooding DoS attacks generate much traffic Capacity planning: Largest elements of traffic matrix determine network growth trends Accounting: Usage based billing most important for most active customers

Problem definition Identify and measure all streams whose traffic exceeds threshold (0.1% of link capacity) over certain time interval (1 minute)  Streams defined by fields (e.g. destination IP)  Single pass over packets  Small worst case per packet processing  Small memory usage  Few false positives / false negatives

Measuring the heavy hitters Unscalable solution: keep hash table with a counter for each stream and report largest entries Inaccurate solution: count only sampled packets and compensate in analysis Ideal solution: count all packets but only for the heavy hitters Our solution: identify heavy hitters on the fly  Fundamental advantage over sampling – instead of (M is available memory)

Why is sample & hold better? uncertainty Sample and hold Ordinary sampling

How do multistage filters work? Array of counters Hash(Pink)

How do multistage filters work? Collisions are OK

How do multistage filters work? Stream memory stream1 1 Insert Reached threshold stream2 1

Stage 2 How do multistage filters work? Stream memory stream1 1 Stage 1

Conservative update Gray = all prior packets

Conservative update Redundant

Conservative update

Multistage filter analysis Question: Find probability that a small stream (0.1% of traffic) passes filter with d = 4 stages * b = 1,000 counters, threshold T = 1% Analysis: (any stream distribution & packet order)  can pass a stage if other streams in its bucket ≥ 0.9% of traffic  at most 111 such buckets in a stage => probability of passing one stage ≤ 11.1%  probability of passing all 4 stages ≤ = 0.015%  result tight

Multistage filter analysis results d – filter stages T – threshold h=C/T, (C capacity) k=b/h, (b buckets) n – number of streams M – total memory QuantityResult Probability to pass filter Streams passing Relative error

Bounds versus actual filtering Number of stages Average probability of passing filter for small streams (log scale) Worst case bound Zipf bound Actual Conservative update

Comparing to current solution Trace: 2.6 Gbps link, 43,000 streams in 5 seconds Multistage filters: 1 Mbit of SRAM (4096 entries) Sampling: p=1/16, unlimited DRAM Average absolute error / average stream size Stream sizeMultistage filtersSampling s > 0.1%0.01%5.72% 0.1% ≥ s > 0.01%0.95%20.8% 0.01% ≥ s > 0.001%39.9%46.6%

Summary for heavy hitters Heavy hitters important for measurement processes More accurate results than random sampling:. instead of Multistage filters with conservative update outperform theoretical bounds Prototype implemented at 10 Gbps ?

Building block 2, counting streams Core idea  Hash streams to bitmap and count bits set  Sample bitmap to save memory and scale  Multiple scaling factors to cover wide ranges Result  Can count up to 100 million streams with an average error of 1% using 2 Kbytes of memory Accurate for streams 8-15 streams 0-7 streams

Bitmap counting Does not work if there are too many flows Hash based on flow identifier Estimate based on the number of bits set

Bitmap counting Bitmap takes too much memory Increase bitmap size

Bitmap counting Too inaccurate if there are few flows Store only a sample of the bitmap and extrapolate

Bitmap counting Must update multiple bitmaps for each packet Use multiple bitmaps, each accurate over a different range Accurate if number of flows is

Bitmap counting

Bitmap counting Multiresolution bitmap 0-32

Future work

Traffic cluster analysis Analysis Server Raw data Traffic reports Router Fast link Memory Network Network Operator Part 2: Describing traffic with traffic cluster analysis Part 1: Identifying heavy hitters, counting streams

Finding heavy hitters not enough RankDestination IPTraffic 1jeff.dorm.bigU.edu11.9% 2lisa.dorm.bigU.edu3.12% 3risc.cs.bigU.edu2.83% Most traffic goes to the dorms … RankDest. networkTraffic 1library.bigU.edu27.5% 2cs.bigU.edu18.1% 3dorm.bigU.edu17.8% Where does the traffic come from? …… What apps are used? Which network uses web and which one kazaa? Aggregating on individual fields useful but  Traffic reports often not at right granularity  Cannot show aggregates over multiple fields Traffic analysis tool should automatically find aggregates over right fields at right granularity RankSource IPTraffic 1forms.irs.gov13.4% 2ftp.debian.org5.78% 3www.cnn.com3.25% RankSource NetworkTraffic 1att.com25.4% 2yahoo.com15.8% 3badU.edu12.2% RankApplicationTraffic 1web42.1% 2ICMP12.5% 3kazaa11.5%

Ideal traffic report Traffic aggregateTraffic Web traffic42.1% Web traffic to library.bigU.edu26.7% Web traffic from forms.irs.gov13.4% ICMP from sloppynet.badU.edu to jeff.dorm.bigU.edu11.9% Web is the dominant application The library is a heavy user of web That’s a big flash crowd! This is a Denial of Service attack !! Traffic cluster reports try to give insights into the structure of the traffic mix

Definition A traffic report gives the size of all traffic clusters above a threshold T and is:  Multidimensional: clusters defined by ranges from natural hierarchy for each field  Compressed: omits clusters whose traffic is within error T of more specific clusters in the report  Prioritized: clusters have unexpectedness labels

Unidimensional report example Threshold=100 Hierarchy / / / / / / / / / / / /30 AI Lab 2 nd floor CS Dept

Unidimensional report example / / / / / Compression < ≥100 Source IPTraffic / / Rule: omit clusters with traffic within error T of more specific clusters in the report

Multidimensional structure All traffic USEU CANYFRRU WebMail Source netApplication All traffic EU RU Mail RU Mail RU Web

AutoFocus: system structure Traffic parser Web based GUI Cluster miner Grapher Packet header trace / NetFlow data categories names

Traffic reports for weeks, days, three hour intervals and half hour intervals

Colors – user defined traffic categories Separate reports for each category

Analysis of unusual events Sapphire/SQL Slammer worm  Found worm port and protocol automatically

Analysis of unusual events Sapphire/SQL Slammer worm  Identified infected hosts

Related work Databases [FS+98] Iceberg Queries  Limited analysis, no conservative update Theory [GM98,CCF02] Synopses, sketches  Less accurate than multistage filters Data Mining [AIS93] Association rules  No/limited hierarchy, no compression Databases [GCB+97] Data cube  No automatic generation of “interesting” clusters