A Comparison of Web100, Active, and Passive methods for Throughput Calculation I-Heng Mei 8/30/02.

Slides:



Advertisements
Similar presentations
ANOVA Demo Part 2: Analysis Psy 320 Cal State Northridge Andrew Ainsworth PhD.
Advertisements

NETWORK LAYER (1) T.Najah AlSubaie Kingdom of Saudi Arabia Prince Norah bint Abdul Rahman University College of Computer Since and Information System NET331.
Umut Girit  One of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 OSI Transport Layer Network Fundamentals – Chapter 4.
CHAPTER 52 POPULATION ECOLOGY Copyright © 2002 Pearson Education, Inc., publishing as Benjamin Cummings Section A: Characteristics of Populations 1.Two.
Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers Tsung-i (Mark) Huang Jaspal Subhlok University of Houston GAN ’ 05 / May 10, 2005.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 OSI Transport Layer Network Fundamentals – Chapter 4.
1 Cross-Layer Scheduling for Power Efficiency in Wireless Sensor Networks Mihail L. Sichitiu Department of Electrical and Computer Engineering North Carolina.
802.11n MAC layer simulation Submitted by: Niv Tokman Aya Mire Oren Gur-Arie.
Traffic Sensitive Active Queue Management - Mark Claypool, Robert Kinicki, Abhishek Kumar Dept. of Computer Science Worcester Polytechnic Institute Presenter.
Reliable Transport Layers in Wireless Networks Mark Perillo Electrical and Computer Engineering.
Lecture 10 Comparison and Evaluation of Alternative System Designs.
Chapter 8: I/O Streams and Data Files. In this chapter, you will learn about: – I/O file stream objects and functions – Reading and writing character-based.
Enhancing TCP Fairness in Ad Hoc Wireless Networks Using Neighborhood RED Kaixin Xu, Mario Gerla University of California, Los Angeles {xkx,
Comparison of Routing Metrics for a Static Multi-Hop Wireless Network Richard Draves, Jitendra Padhye, Brian Zill Microsoft Research Presented by: Jón.
Ns Simulation Final presentation Stella Pantofel Igor Berman Michael Halperin
Ch. 28 Q and A IS 333 Spring Q1 Q: What is network latency? 1.Changes in delay and duration of the changes 2.time required to transfer data across.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution Business Statistics: A First Course 5 th.
Process-to-Process Delivery:
Understanding the Two-Way Analysis of Variance
Measures of Central Tendency
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Great Basin Verification Task 2008 Increased Variability Review of 2008 April through July Period Forecast for 4 Selected Basins Determine what verification.
Statistical Techniques I EXST7005 Factorial Treatments & Interactions.
PROPAGATION OF ERROR.  We tend to use these words interchangeably, but in science they are different Accuracy vs. Precision.
TCP/IP for VSE The Last Word in Performance Presented by John Rankin CSI International Phone: (800) Web: Copyright (C) 2006.
1 Semester 2 Module 10 Intermediate TCP/IP Yuda college of business James Chen
_______________________________________________________________________________________________________________ PHP Bible, 2 nd Edition1  Wiley and the.
Advanced Computer Networks1 Efficient Policies for Carrying Traffic Over Flow-Switched Networks Anja Feldmann, Jenifer Rexford, and Ramon Caceres Presenters:
UNIT IP Datagram Fragmentation Figure 20.7 IP datagram.
By: TARUN MEHROTRA 12MCMB11.  More time is spent maintaining existing software than in developing new code.  Resources in M=3*(Resources in D)  Metrics.
ANOVA (Analysis of Variance) by Aziza Munir
1 Using Netflow data for forecasting Les Cottrell SLAC and Fawad Nazir NIIT, Presented at the CHEP06 Meeting, Mumbai India, February
Statistical Significance of Data
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
23.1 Chapter 23 Process-to-Process Delivery: UDP, TCP, and SCTP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1 Overview of IEPM-BW - Bandwidth Testing of Bulk Data Transfer Tools Connie Logg & Les Cottrell – SLAC/Stanford University Presented at the Internet 2.
Types of Service. Types of service (1) A network architecture may have multiple protocols at the same layer in order to provide different types of service.
Vertical Optimization Of Data Transmission For Mobile Wireless Terminals MICHAEL METHFESSEL, KAI F. DOMBROWSKI, PETER LANGENDORFER, HORST FRANKENFELDT,
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 19 Domain Name System (DNS)
CHAPTER Basic Definitions and Properties  P opulation Characteristics = “Parameters”  S ample Characteristics = “Statistics”  R andom Variables.
What Do You See?. A scatterplot is a graphic tool used to display the relationship between two quantitative variables. How to Read a Scatterplot A scatterplot.
Internet Protocol Version 4 VersionHeader Length Type of Service Total Length IdentificationFragment Offset Time to LiveProtocolHeader Checksum Source.
Unit 1 Lecture 4.
Other Types of t-tests Recapitulation Recapitulation 1. Still dealing with random samples. 2. However, they are partitioned into two subsamples. 3. Interest.
Slide 1 Regression Assumptions and Diagnostic Statistics The purpose of this document is to demonstrate the impact of violations of regression assumptions.
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
© 2015 Pittsburgh Supercomputing Center Opening the Black Box Using Web10G to Uncover the Hidden Side of TCP CC PI Meeting Austin, TX September 29, 2015.
N. Saoulidou, Fermilab1 Study of the QIE Response & Calibration (Current Injection CalDet & Development of diagnostic tools for NearDet N.Saoulidou,
#16 Application Measurement Presentation by Bobin John.
Performance Evaluation of L3 Transport Protocols for IEEE (2 nd round) Richard Rouil, Nada Golmie, and David Griffith National Institute of Standards.
Performance Comparison of Ad Hoc Network Routing Protocols Presented by Venkata Suresh Tamminiedi Computer Science Department Georgia State University.
Introduction Dispersion 1 Central Tendency alone does not explain the observations fully as it does reveal the degree of spread or variability of individual.
Chapter 5 Peer-to-Peer Protocols and Data Link Layer Timing Recovery.
Operating System Examples - Scheduling. References r er/ch10.html r bangalore.org/blug/meetings/200401/scheduler-
Chemical Equilibrium. Unit Objectives  Define chemical equilibrium.  Explain the nature of the equilibrium constant.  Write chemical equilibrium expressions.
Chapter 28 Q and A IS 333 Spring A quiz question Q: What is network latency? 1.Changes in delay and duration of the changes 2.time required to transfer.
1 ICCCN 2003 Modelling TCP Reno with Spurious Timeouts in Wireless Mobile Environments Shaojian Fu School of Computer Science University of Oklahoma.
Stats Methods at IC Lecture 3: Regression.
Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers
Transport Protocols Relates to Lab 5. An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
One-Way Analysis of Variance
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Working Scientifically
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

A Comparison of Web100, Active, and Passive methods for Throughput Calculation I-Heng Mei 8/30/02

Summary of Methods Active Throughput reported directly by application Passive Netflow records indicating time and number of bytes transferred per flow (routing tables) Web100 Exposes TCP connection variables through the /proc file system in linux For throughput -- bytes sent and time for each connection

Tasks Integrate Web100 into bw-tests Stream-by-stream comparison between passive/web100 data Overall correlation between active/passive/web100 throughputs

Integrating Web100 Old way Use web100 “userland” scripts to: Query for existing TCP connections Query for desired variables(~20) for each connection Why it didn’t work for us Lots of process overhead Web100 forgets connections after 5 to 10 seconds  not all variables recorded for all streams

Integrating Web100 New way Interact directly with Web100 API (C/C++) Dramatically reduce overhead  all variables can be recorded for all streams

Stream-by-stream For each transfer, create a table that lists the stream-by-stream stats given by passive, web100, and active(if available) methods.table

Stream-by-stream Passive vs Web100 Netflow always reports more bytes sent Expected, since Netflow includes tcp headers and retransmissions, whereas DataBytesOut (Web100 Variable) does not. Netflow consistently reports a slightly longer elapsed time Web100 does not have an ‘elapsed time’ var Sender/Receiver/Congestion Limited States

Stream-by-stream Active vs. Web100 (only Iperf) Bytes transferred nearly identical What causes the small discrepancies? Iperf reports a smaller elapsed time Expected – Iperf only considers the time spent transferring data, whereas Web100 considers the entire lifetime of a connection Declining pattern

Stream-by-stream Active vs. Passive (only Iperf) Iperf reports fewer bytes sent Expected, since retransmissions and TCP headers are counted by Netflow Iperf reports smaller elapsed time Expected, Netflow considers the entire lifetime of a connection as the elapsed time Same declining pattern as for Active/Web100

Overall Correlation Throughput Calculation Methods 1. SUM(Mbits per stream/Time per stream) 2. SUM(Mbits per str)/AVG(Time per str) 3. SUM(Mbits per str)/MAX(Time per str) If all streams have same elapsed time  Method 1 == Method 2 == Method 3

Overall Correlation Correlation Tables One for Passive/Web, one for Active/Web, and one for Active/Passive Example Row: 103 Iperf test runs(samples) from SLAC to Caltech Two data sets X – set of 103 passive throughputs (using method 1) Y – set of 103 web100 throughputs (using method 1) Important stats are R and Error Each row corresponds to a unique combination of

Passive/Web100 Correlation Very highly correlated for all tests. Very low error for all tests. Summary stats Almost all rows have R ~ 1 Exceptions mostly due to “Long Flows”Long Flows When one or more streams in a transfer reports a grossly exaggerated elapsed time Occurs most often in bbcp*

Passive/Web100 Correlation Effects of Long Flows On several occasions, the bbcpmem transfer to node1.nersc.gov suffered from long flow (example)example Method 2 and 3 throughputs are severely lowered. Why is method 1 still highly correlated? When bbcpmem transfers experienced long flow to nersc.gov, each time it was exactly 1 (of 8) stream that experienced the long flow. Method 1 throughput for a transfer not affected much if there are few long flows compared to the total number of flows

Active/Web100 Correlation Good correlation, low error for all tests except Bbftp - Summary statsSummary stats For Bbftp, only method 3 works. Bbftp considers elapsed time to be the duration of the process, includes a lot of connection setup time. Bbftp streams tend to vary greatly in elapsed time For the other tests, any method provides good correlation between Active/web100. Still there are cases of low correlation/high error – what causes those?

Low Correlation (Active/Web) Not caused by long flow (only affects Netflow) Example – Bbcpdisk to node1.mcs.anl.gov Example Low correlation, high error. Range of throughput values reported by Bbcpdisk is significantly larger than values calculated with Web100 data Caused by “lingering sockets” (past study)past study Bbcp makes system calls to close sockets, but they linger on for some time while the kernel properly closes them. Lingering time tends to be longer for transfers with many simultaneous streams and large RTT Not a ‘random’ event like long flows. This is a consistent occurrence.

Active/Passive Correlation Good correlation, low error for all tests except Bbftp - Summary statsSummary stats Suffers from Long flow and Lingering sockets Again, only method 3 works for Bbftp. For the other tests, method 1 is best (alleviates the long flow problem)

Conclusions Overall, active/passive/web100 throughputs are all well correlated. Long flows in passive data Either ignore transfers with long flows, or Use calculation method 1 for those tests if applicable The lingering sockets problem – Unavoidable for passive Web100 – possible to deal with if we constantly monitor throughput variables during the transfer (Currently we only look at the variables at the end of a transfer)

Conclusions Consider how the application calculates throughput Iperf sums the throughputs for each stream  use method 1 Bbftp divides total bytes by elapsed time elapsed time = entire process  use method 3 (or variation that uses ‘absolute time’) Bbcp* also divides total bytes by elapsed time elapsed time = entire transfer  use method 3 (but consider using method 1 to alleviate the effect of long flows) For more info