1 Hailuoto Workshop A Statistician ’ s Adventures in Internetland J. S. Marron Department of Statistics and Operations Research University of North Carolina.

Slides:



Advertisements
Similar presentations
Internet Measurement Conference 2003 Source-Level IP Packet Bursts: Causes and Effects Hao Jiang Constantinos Dovrolis (hjiang,
Advertisements

Estimation and identification of long-range dependence in Internet traffic Thomas Karagiannis University of California,
Doc.: IEEE /1216r1 Submission November 2009 BroadcomSlide 1 Internet Traffic Modeling Date: Authors: NameAffiliationsAddressPhone .
A First Look at Modern Enterprise Traffic
CS3505 The Internet and Info Hiway transport layer protocols : TCP/UDP.
2  Something “feels the same” regardless of scale 4 What is that???
CCNA 1 v3.1 Module 11 Review.
1 Self-Similar Ethernet LAN Traffic Carey Williamson University of Calgary.
CMPT 855Module Network Traffic Self-Similarity Carey Williamson Department of Computer Science University of Saskatchewan.
On the Constancy of Internet Path Properties Yin Zhang, Nick Duffield AT&T Labs Vern Paxson, Scott Shenker ACIRI Internet Measurement Workshop 2001 Presented.
CLIENT / SERVER ARCHITECTURE AYRİS UYGUR & NİLÜFER ÇANGA.
Small scale analysis of data traffic models B. D’Auria - Eurandom joint work with S. Resnick - Cornell University.
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
A Nonstationary Poisson View of Internet Traffic T. Karagiannis, M. Molle, M. Faloutsos University of California, Riverside A. Broido University of California,
Self-Similarity in Network Traffic Kevin Henkener 5/29/2002.
Variance of Aggregated Web Traffic Robert Morris MIT Laboratory for Computer Science IEEE INFOCOM 2000’
Copyright © 2005 Department of Computer Science CPSC 641 Winter Network Traffic Measurement A focus of networking research for 20+ years Collect.
Self-Similar through High-Variability: Statistical Analysis of Ethernet LAN Traffic at the Source Level Walter Willinger, Murad S. Taqqu, Robert Sherman,
CS 6401 Network Traffic Characteristics Outline Motivation Self-similarity Ethernet traffic WAN traffic Web traffic.
Self-Similarity of Network Traffic Presented by Wei Lu Supervised by Niclas Meier 05/
1 Chapters 9 Self-SimilarTraffic. Chapter 9 – Self-Similar Traffic 2 Introduction- Motivation Validity of the queuing models we have studied depends on.
References for M/G/1 Input Process
Traffic Modeling.
Syllabus outcomes Describes and applies problem-solving processes when creating solutions Designs, produces and evaluates appropriate solutions.
Esri International User Conference | San Diego, CA Technical Workshops | Spatial Statistics: Best Practices Lauren Rosenshein, MS Lauren M. Scott, PhD.
1 LES of Turbulent Flows: Lecture 1 Supplement (ME EN ) Prof. Rob Stoll Department of Mechanical Engineering University of Utah Fall 2014.

Object Orie’d Data Analysis, Last Time Statistical Smoothing –Histograms – Density Estimation –Scatterplot Smoothing – Nonpar. Regression SiZer Analysis.
Chapter 6-2 the TCP/IP Layers. The four layers of the TCP/IP model are listed in Table 6-2. The layers are The four layers of the TCP/IP model are listed.
Object Orie’d Data Analysis, Last Time Cornea Data & Robust PCA –Elliptical PCA Big Picture PCA –Optimization View –Gaussian Likelihood View –Correlation.
Robust PCA Robust PCA 3: Spherical PCA. Robust PCA.
Interacting Network Elements: Chaos and Congestion Propagation Gábor Vattay Department of Physics of Complex Systems Eötvös University, Budapest, Hungary.
The Joint Distribution of Internet Flow Sizes and Durations C HEOLWOO P ARK J. S TEPHEN M ARRON The University of North Carolina at Chapel Hill.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
A Nonstationary Poisson View of Internet Traffic Thomas Karagiannis joint work with Mart Molle, Michalis Faloutsos, Andre Broido.
SiZer Background Scale Space – Idea from Computer Vision Goal: Teach Computers to “See” Modern Research: Extract “Information” from Images Early Theoretical.
1 Hailuoto Workshop A Statistician ’ s Adventures in Internetland J. S. Marron Department of Statistics and Operations Research University of North Carolina.
Burst Metric In packet-based networks Initial Considerations for IPPM burst metric Tuesday, March 21, 2006.
The Client-Server Model And the Socket API. Client-Server (1) The datagram service does not require cooperation between the peer applications but such.
Wavelet Spectral Analysis Ken Nowak 7 December 2010.
Object Orie’d Data Analysis, Last Time Cornea Data & Robust PCA –Elliptical PCA Big Picture PCA –Optimization View –Gaussian Likelihood View –Correlation.
Networks Part 2: Infrastructure + Protocols NYU-Poly: HSWP Instructor: Mandy Galante.
Risk Analysis Workshop April 14, 2004 HT, LRD and MF in teletraffic1 Heavy tails, long memory and multifractals in teletraffic modelling István Maricza.
Previously Definition of a stationary process (A) Constant mean (B) Constant variance (C) Constant covariance White Noise Process:Example of Stationary.
1 Long-Range Dependence in a Changing Internet Traffic Mix STATISTICAL and APPLIED MATHEMATICAL SCIENCES INSTITUTE Félix Hernández-Campos Don Smith Department.
1 Internet Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
Notices of the AMS, September Internet traffic Standard Poisson models don’t capture long-range correlations. Poisson Measured “bursty” on all time.
Statistical Smoothing In 1 Dimension (Numbers as Data Objects)
1 Network Communications A Brief Introduction. 2 Network Communications.
Cisco I Introduction to Networks Semester 1 Chapter 7 JEOPADY.
1 Interesting Links. On the Self-Similar Nature of Ethernet Traffic Will E. Leland, Walter Willinger and Daniel V. Wilson BELLCORE Murad S. Taqqu BU Analysis.
TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Internet Protocol Version4 (IPv4)
Statistical Smoothing
SiZer Background Finance "tick data":
Transport Protocols Relates to Lab 5. An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
Chapter 5 Network and Transport Layers
Internet Traffic Modeling
Minimal Envelopes.
(will study more later)
Transport Protocols Relates to Lab 5. An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
Notices of the AMS, September 1998
Process-to-Process Delivery:
Mark E. Crovella and Azer Bestavros Computer Science Dept,
Transport Protocols An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
Chapter 9 Model Building
Presented by Chun Zhang 2/14/2003
Internet Applications & Programming
Process-to-Process Delivery: UDP, TCP
CPSC 641: Network Traffic Self-Similarity
Transport Protocols Relates to Lab 5. An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
Presentation transcript:

1 Hailuoto Workshop A Statistician ’ s Adventures in Internetland J. S. Marron Department of Statistics and Operations Research University of North Carolina February 9, 2016

2 A Menu of Interesting Issues Bin Count Time Series Long Range dependence? Point Process of Flow Start Times Duration Distributions (heavy tails) Heavy tail Durations LRD Relationship between Size and Duration? Time series of packets within flows?

3 Long Range Dependence Controversy Initial Models: Queuing Theory Short Range Dep’ce Aggregation of Mice and Elephant: Heavy tail Durations LRD Mandelbrot, Taqqu, Paxson, Willinger,… More recently: Aggregation of point Processes Poisson Cleveland, et al.

4 Explanation of Controversy – Zooming Autocorrelation Depends on “Scale” (i.e. binwidth, m) Fine Scales: (< 1 ms) ~ White Noise – Poisson Medium Scales: (~ 10 ms) Dependence “lifts up” Coarse Scale: (> 1 sec) Consistent with L. R. D.

5 Long Range Dependence Theory Self-Similar: H: Hurst parameter Increments: LRD: If

6 Drawbacks of Conventional Time Series Methods Clumsy at Modeling L. R. D. E.g. ARMA etc. all S. R. D. Assumption of Stationarity Really need “local stationarity” Assumption of Linear Processes Doesn’t make physical sense Instead have “aggregation of flows” “Correlation” for heavy tailed distn’s?

7 An H estimation approach: Wavelets : wavelet coefficients

8 Estimation of H, Based on Wavelet Spectrum Properties Weighted linear regression on Estimation of H: Abry & Veitch (1998) Robust to nonstationarities (linear trend) : uncorrelated

9 Example Wavelet Spectrum (FGN, H=0.9)

10 Experience with Hurst parameter estimation Toy Data: Excellent, (Poisson Data is flat, FGN linear) Real Data: More challenging Studied ~30 two hour time blocks, 2002 H Estimation makes sense (~ 0.8 – 0.9) for many cases i.e. FGN is a reasonable model But some there were very strange cases (H >> 1)

11 Real Data (“nice”): 2002 Apr 13 Sat 19:30 – 21:30

12 Real Data (“ugly”): 2002 Apr 13 Sat 1 pm – 3 pm

13 Explanatory Tool: SiZer SIgnificance of ZERo crossings of the derivative of the smooths in scale space: Chaudhuri and Marron (1999) Exploratory smoothing method Are bumps really there? Consider all smoothing levels Study (simultaneous) C. I.s for slope (derivative) of smooth Combine with statistical inference and visualization Blue: slope significantly upwards Red: slope significantly downwards Purple: insignificant slope

14 SiZer Example British Incomes Data Kernel Density Estimation Two modes “really there”! Bralower’s Fossil Data Local Linear Regression Smaller valley “not there”

15 Dependent SiZer Park, Marron, and Rondonotti (2004) SiZer compares data with white noise Inappropriate in time series Dependent SiZer compares data with an assumed model Goodness of fit test

16 Dep’ent SiZer : 2002 Apr 13 Sat 1 pm – 3 pm

17 Zoomed view (to red region, i.e. “flat top”)

18 Further Zoom: finds very periodic behavior!

19 Revisit: 2002 Apr 13 Sat 1 pm – 3 pm

20 Quick Check: “Delete” periodic time block

21 Possible Physical Explanation IP “Port Scan” Common device of hackers Searching for “break in points” Send query to every possible (within UNC domain): IP address Port Number Replies can indicate system weaknesses Internet Traffic is hard to model

22 Experience with Hurst parameter estimation Studied ~30 two hour time blocks, 2002 H Estimation makes sense (~ 0.8 – 0.9) for many cases i.e. FGN is a reasonable model But some there were very strange cases (H >> 1) Studied ~30 two hour time blocks, 2003 Traffic appears “similar”, using e.g. Dependent SiZer But H estimates much smaller (~ 0.7), across all time blocks Why???

23 Wavelet Spectrum: 2003 Sat 9:30 – 11:30 pm

24 Explanation of Shoulder: different protocols Major Components of Traffic: Transmission Control Protocol (TCP), often ~80% “Acknowledges packets” for “sure transfer” Web browsing (HTTP), FTP, , … User Datagram Protocol (UDP), often ~15% Unacknowledged for “data streaming” Video, music, …

25 Wavelet Spectra: all 2003 packet TCP vs. UDP Overlay all time blocks, and sub-spectra for TCP and UDP In 2002 TCP Dominated Now UDP creates major hump at medium scales Scale ~ 1 sec

26 Explanation of UDP Bump “Blubster” - File Sharing Application A replacement for Napster Transfers big files by TCP Does “handshaking” by UDP Work around for server (could be shut down) Huge fraction of traffic (just to “stay in touch”)?!?

27 Blubster sub-spectrum: 2003 Sat 9:30

28 Zoomed (convent’al) SiZer View of Blubster

29 Final Blubster Oddity Effect shows up for “packet counts” Not for “byte counts” Reason: Blubster handshake packets are small Thus not significant fraction of total bytes Violation of “conventional wisdom” Usually “byte behavior” ~ “packet behavior”

30 Wavelet Spectrum : packet vs. byte

31 A deeper look at sampling Revisit Mice-Elephant Sampling,Mice-Elephant Sampling Over wide range of scales: Random Sampling… But “not representative”… Artifact of: Huge Sample Size Very Heavy Tails