The Measured Network Traffic of Compiler Parallelized Programs Peter A. Dinda Northwestern University, Computer Science

Slides:



Advertisements
Similar presentations
1 ICS 156: Lecture 2 (part 2) Data link layer protocols Address resolution protocol Notes on lab 2.
Advertisements

1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
2014 Examples of Traffic. Video Video Traffic (High Definition) –30 frames per second –Frame format: 1920x1080 pixels –24 bits per pixel  Required rate:
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
Geoff Salmon, Monia Ghobadi, Yashar Ganjali, Martin Labrecque, J. Gregory Steffan University of Toronto.
1 Network Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
1 Self-Similar Wide Area Network Traffic Carey Williamson University of Calgary.
An Empirical Study of Real Audio Traffic A. Mena and J. Heidemann USC/Information Sciences Institute In Proceedings of IEEE Infocom Tel-Aviv, Israel March.
Profiling Network Performance in Multi-tier Datacenter Applications
On the Self-Similar Nature of Ethernet Traffic - Leland, et. Al Presented by Sumitra Ganesh.
Copyright © 2005 Department of Computer Science CPSC 641 Winter WAN Traffic Measurements There have been several studies of wide area network traffic.
Responsive Interactive Applications by Dynamic Mapping of Activation Trees February 20, 1998 Peter A. Dinda School of Computer.
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
1 Yi Qiao Jason Skicewicz Peter A. Dinda Prescience Laboratory Department of Computer Science Northwestern University Evanston, IL An Empirical Study.
Fair Sharing of MAC under TCP in Wireless Ad Hoc Networks Mario Gerla Computer Science Department University of California, Los Angeles Los Angeles, CA.
Recent Results in Resource Signal Measurement, Dissemination, and Prediction App Transport Network Data Link Physical App Transport Network Data Link Physical.
Self-Similarity in Network Traffic Kevin Henkener 5/29/2002.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Variance of Aggregated Web Traffic Robert Morris MIT Laboratory for Computer Science IEEE INFOCOM 2000’
Copyright © 2005 Department of Computer Science CPSC 641 Winter Network Traffic Measurement A focus of networking research for 20+ years Collect.
Online Prediction of the Running Time Of Tasks Peter A. Dinda Department of Computer Science Northwestern University
Exploiting Packet Header Redundancy for Zero Cost Dissemination of Dynamic Resource Information Peter A. Dinda Prescience Lab Department of Computer Science.
1 Dong Lu, Peter A. Dinda Prescience Laboratory Computer Science Department Northwestern University Virtualized.
Reduced TCP Window Size for VoIP in Legacy LAN Environments Nikolaus Färber, Bernd Girod, Balaji Prabhakar.
A Prediction-based Real-time Scheduling Advisor Peter A. Dinda Prescience Lab Department of Computer Science Northwestern University
Load Analysis and Prediction for Responsive Interactive Applications Peter A. Dinda David R. O’Hallaron Carnegie Mellon University.
Dynamic Topology Adaptation of Virtual Networks of Virtual Machines Ananth I. Sundararaj Ashish Gupta Peter A. Dinda Prescience Lab Department of Computer.
Inferring the Topology and Traffic Load of Parallel Programs in a VM environment Ashish Gupta Peter Dinda Department of Computer Science Northwestern University.
Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By.
1 WAN Measurements Carey Williamson Department of Computer Science University of Calgary.
Implement a QoS Algorithm for Real-Time Applications in the DiffServ-aware MPLS Network Zuo-Po Huang, *Ji-Feng Chiu, Wen-Shyang Hwang and *Ce-Kuen Shieh.
IP Network Basics. For Internal Use Only ▲ Internal Use Only ▲ Course Objectives Grasp the basic knowledge of network Understand network evolution history.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
1 Chapters 9 Self-SimilarTraffic. Chapter 9 – Self-Similar Traffic 2 Introduction- Motivation Validity of the queuing models we have studied depends on.
Traffic Modeling.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.
RM2D Let’s write our FIRST basic SPIN program!. The Labs that follow in this Module are designed to teach the following; Turn an LED on – assigning I/O.
1 High-Level Carrier Requirements for Cross Layer Optimization Dave McDysan Verizon.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
ACN: CSFQ1 CSFQ Core-Stateless Fair Queueing Presented by Nagaraj Shirali Choong-Soo Lee ACN: CSFQ1.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
William Stallings Data and Computer Communications 7 th Edition Chapter 1 Data Communications and Networks Overview.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Huirong Fu and Edward W. Knightly Rice Networks Group Aggregation and Scalable QoS: A Performance Study.
Copyright © 2003 OPNET Technologies, Inc. Confidential, not for distribution to third parties. Session 1341: Case Studies of Security Studies of Intrusion.
IPDPS 2005, slide 1 Automatic Construction and Evaluation of “Performance Skeletons” ( Predicting Performance in an Unpredictable World ) Sukhdeep Sodhi.
OS Services And Networking Support Juan Wang Qi Pan Department of Computer Science Southeastern University August 1999.
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA 3 v3.0 Module 4 Switching Concepts.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
Networking Fundamentals. Basics Network – collection of nodes and links that cooperate for communication Nodes – computer systems –Internal (routers,
Lecture (Mar 23, 2000) H/W Assignment 3 posted on Web –Due Tuesday March 28, 2000 Review of Data packets LANS WANS.
Burst Metric In packet-based networks Initial Considerations for IPPM burst metric Tuesday, March 21, 2006.
Networks. Ethernet  Invented by Dr. Robert Metcalfe in 1970 at Xerox Palo Alto Research Center  Allows group of computers to communicate in a Local.
Chapter Two Fundamentals of Data and Signals Data Communications and Computer Networks: A Business User's Approach Eighth Edition.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
Challenges in the Next Generation Internet Xin Yuan Department of Computer Science Florida State University
CCNA3 Module 4 Brierley Module 4. CCNA3 Module 4 Brierley Topics LAN congestion and its effect on network performance Advantages of LAN segmentation in.
1 Transport Layer: Basics Outline Intro to transport UDP Congestion control basics.
Module 16: Distributed System Structures Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Apr 4, 2005 Distributed.
1 Internet Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.
Studies of LHCb Trigger Readout Network Design Karol Hennessy University College Dublin Karol Hennessy University College Dublin.
Accurate WiFi Packet Delivery Rate Estimation and Applications Owais Khan and Lili Qiu. The University of Texas at Austin 1 Infocom 2016, San Francisco.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
M. R. Kharazmi Chapter 1 Data Communications and Networks Overview.
Chapter Two Fundamentals of Data and Signals
Department of Computer Science Northwestern University
Ananth I. Sundararaj Ashish Gupta Peter A. Dinda Prescience Lab
Presentation transcript:

The Measured Network Traffic of Compiler Parallelized Programs Peter A. Dinda Northwestern University, Computer Science Brad M. Garcia Laurel Networks Kwok-Shing Leung Magma Design Automation Original Study Done At Carnegie Mellon University

2 Overview Analysis of packet traces of representative HPF-like codes running on a shared Ethernet Traffic very different from typical models –Simple packet size+interarrival behaviors –Correlation between flows –Periodicity within flows and in aggregate Implications for prediction and QoS models Caveats: Data is old, shared media network Current focus

3 Outline Why? CMU Fx Compiler Communication Patterns and programs Methodology Results Implications for prediction and QoS Conclusions and future work

4 Why Study Traffic of Parallel Programs? Networking provisioning –Source models, Aggregated traffic models Adaptive applications –Measurement and prediction –Network Weather Service, Remos, RPS Resource reservation and QoS systems –Source models, framing QoS requests –Intserv, ATM Computational Grids may introduce lots of such traffic …

5 CMU Fx Compiler Variant of High Performance Fortran Both task and data parallelism Sophisticated communication generation –Compile-time and run-time Multiple target platforms –Custom communication back-ends iWarp (original target), Paragon, T3D, T3E, MPI –PVM, MPI on many platforms Alpha/DUX, Sun/Solaris, I386/Linux, …

6 Genesis of Communication Patterns in Fx Programs Parallel Array assignment Parallel Loop iteration input and output Parallel prefix and loop merge Inter-task communication Tasks can themselves be task or data parallel Parallel I/O Distribution of sequential I/O

7 Communication Patterns in Study Run-time communication

8 Beyond Kernels: AIRSHED Air quality modeling application –Used Fx model created by app developers Coupled chemistry and wind simulations on 3D array (layers, species, locations) Data input and distribution Do I=1,h Pre-processing do j=1,k Chemistry/vertical transport Distribution transpose Horizontal transport Distribution transpose

9 Environment Nine DEC 3000/400 workstations at 133 MHz, OSF/1 2.0, DEC’s tcpdump Fx 2.2 on PVM 3.3.3, TCP-based communication 10 mbps half-duplex shared Ethernet LAN Not private: experiments done 4-5 am Recording Host (tcpdump) Execution Hosts

10 Why is this still interesting? (study was done ~1996) Compiler and run-time still representative Communication patterns still common Hard to do a study like this today –Modern Ethernet is switched –Can only see an approximation of the aggregate traffic (SNMP, Remos) not the actual packets Artificial synchronization? BSP anyway Implications for network prediction and QoS models are important

11 Methodology Run program, collecting all packets Arrival time and size (including all headers) Classify packets according to “connections” One-way flow of data between machines All packets from machine A to machine B Includes TCP ACKs for symmetric connection Aggregate and per-connection metrics

12 Metrics Packet size distributions Packet interarrival time distributions Average bandwidth consumed Instantaneous bandwidth consumed –10 ms sliding window –Time and frequency domain (power spectrum) Results for 4 node versions of programs

13 Packet Size Distributions Trimodal MTU-sized packets “leftovers” (n byte message modulo MTU size) ACKs for symmetric connection T2DFFT generates many different sizes Run-time communication system uses multiple PVM pack calls per message Typical LAN traffic has wide range of sizes

14 Packet Interarrival Time Distribution Bursty, but only at only a few timescales –Most are within a burst: closely spaced –Few are between bursts: farther apart Quite deterministic on-off sources –Synchronized communication phases in Fx Typical network source model is heavy-tailed stochastic on-off which mix giving self-similarity

15 Long-term Average Bandwidth Resource demands are often quite light

16 SOR: connection, time domain Bandwidth (KB/s) Time (Seconds) Note Periodicity

17 The Power Spectrum Frequency domain view of signal Density of variance as function of frequency –Power = variance Excellent for seeing periodicities –Periodically appearing feature in the signal turns into a spike in the power spectrum

18 SOR: connection, power spectrum Power Frequency (Hz) Unlike pink noise of typical LAN traffic

19 2DFFT: aggregate, power spectrum Power Frequency (Hz) Periodicity in aggregate Behavior too

20 Airshed: aggregate, power spectrum Power Frequency (0-10 Hz) Periodicity at a limited number of timescales

21 Airshed: aggregate, power spectrum Power Frequency (0-10 Hz) } Power Frequency (0-1 Hz)

22 Airshed: aggregate, power spectrum Power } } Frequency (0-10 Hz)Frequency (0-1 Hz) Frequency (0-0.1 Hz) Power

23 Implications for Network Prediction Networks with significant parallel workloads may be more predictable Typical LAN and WAN look like pink noise Mixing effects unclear, however Source model must be different More deterministic, no heavy tails Parallel applications appear detectable

24 Implications for QoS Models Pattern should be conveyed Show traffic correlation along multiple flows Source model more deterministic Deterministic burst size Deterministic burst interval that depends on proffered bandwidth More degrees of freedom to expose Number of nodes Closer coupling of network’s and app’s optimization problemss

25 Conclusions and Future Work Analysis of packet traces of Fx codes running on a shared Ethernet using 1996 data Traffic very different from typical models –Simple packet size+interarrival behaviors –Correlation between flows –Periodicity within flows and in aggregate QoS models should allow and exploit these differences Parallel traffic appears more predictable than common traffic

26 For More Information Resource Prediction System (RPS) Toolkit Prescience Lab Fx and AIRSHED

27 SOR: aggregate, time domain Bandwidth (KB/s) Time (Seconds)

28 SOR: aggregate, freq domain Power Frequency (Hz)

29 2DFFT: connection, freq domain Power Frequency (Hz)

30 T2DFFT: aggregate, freq domain Power Frequency (Hz)

31 T2DFFT: connection, freq domain Power Frequency (Hz)

32 HIST: aggregate, freq domain Power Frequency (Hz)

33 SEQ: aggregate, freq domain Power Frequency (Hz)