Rice01, slide 1 Characterizing NAS Benchmark Performance on Shared Heterogeneous Networks Jaspal Subhlok Shreenivasa Venkataramaiah Amitoj Singh University.

Slides:

Advertisements

Similar presentations

pathChirp Efficient Available Bandwidth Estimation

Advertisements

CGrid 2005, slide 1 Empirical Evaluation of Shared Parallel Execution on Independently Scheduled Clusters Mala Ghanesh Satish Kumar Jaspal Subhlok University.

Transparent and Flexible Network Management for Big Data Processing in the Cloud Anupam Das Curtis Yu Cristian Lumezanu Yueping Zhang Vishal Singh Guofei.

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Jaringan Komputer Lanjut Packet Switching Network.

UCSD SAN DIEGO SUPERCOMPUTER CENTER 1 Symbiotic Space-Sharing: Mitigating Resource Contention on SMP Systems Professor Snavely, University of California.

Scheduling of parallel jobs in a heterogeneous grid environment Scheduling of parallel jobs in a heterogeneous grid environment Each site has a homogeneous.

SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.

Communication Pattern Based Node Selection for Shared Networks

Memory System Characterization of Big Data Workloads

Dr. Gengbin Zheng and Ehsan Totoni Parallel Programming Laboratory University of Illinois at Urbana-Champaign April 18, 2011.

Self-Adapting Scheduling for Tasks with Dependencies in Stochastic Environments Ioannis Riakiotakis, Florina M. Ciorba, Theodore Andronikos and George.

A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO.

Adaptive Sampling for Sensor Networks Ankur Jain ٭ and Edward Y. Chang University of California, Santa Barbara DMSN 2004.

A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter ： S.Y.Chen.

Looking at the Server-side of P2P Systems Yi Qiao, Dong Lu, Fabian E. Bustamante and Peter A. Dinda Department of Computer Science Northwestern University.

Internet and Intranet Protocols and Applications Section V: Network Application Performance Lecture 11: Why the World Wide Wait? 4/11/2000 Arthur P. Goldberg.

User Experiments of Using Congestion Pricing to Allocate Access Link Bandwidth Jimmy Shih, Randy Katz, Anthony Joseph.

University Of Maryland1 A Study Of Cyclone Technology.

Inferring the Topology and Traffic Load of Parallel Programs in a VM environment Ashish Gupta Resource Virtualization Winter Quarter Project.

1 Performance Evaluation of Gigabit Ethernet & Myrinet

Peer-to-peer Multimedia Streaming and Caching Service by Won J. Jeon and Klara Nahrstedt University of Illinois at Urbana-Champaign, Urbana, USA.

Inferring the Topology and Traffic Load of Parallel Programs in a VM environment Ashish Gupta Peter Dinda Department of Computer Science Northwestern University.

1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk.

Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Chapter 3 Performance Measurement of TCP/IP Networks.

SwitchR: Reducing System Power Consumption in a Multi-Client Multi-Radio Environment Yuvraj Agarwal (University of California, San Diego) Trevor Pering,

Texas Learning and Computation Center High Performance Systems Lab Automatic Clustering of Grid Nodes Nov 14, 2005 Qiang Xu, Jaspal Subhlok University.

1 Using Multiple Energy Gears in MPI Programs on a Power- Scalable Cluster Vincent W. Freeh, David K. Lowenthal, Feng Pan, and Nandani Kappiah Presented.

1 High-Level Carrier Requirements for Cross Layer Optimization Dave McDysan Verizon.

Trace Generation to Simulate Large Scale Distributed Application Olivier Dalle, Emiio P. ManciniMar. 8th, 2012.

1 Michihiro Koibuchi, Takafumi Watanabe, Atsushi Minamihata, Masahiro Nakao, Tomoyuki Hiroyasu, Hiroki Matsutani, and Hideharu Amano

High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center.

Measurement and Modeling of Packet Loss in the Internet Maya Yajnik.

Example: Sorting on Distributed Computing Environment Apr 20,

IPDPS 2005, slide 1 Automatic Construction and Evaluation of “Performance Skeletons” ( Predicting Performance in an Unpredictable World ) Sukhdeep Sodhi.

Parallelization of Classification Algorithms For Medical Imaging on a Cluster Computing System 指導教授 : 梁廷宇老師系所 : 碩光通一甲姓名 : 吳秉謙學號 :

BOINC Workshop 10 Hien Nguyen, Eshwar Rohit University of Houston Supervisors: Dr. Jaspal Subhlok University of Houston Dr. David P. Anderson SSL – U.C,

1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.

Computer Science Adaptive, Transparent Frequency and Voltage Scaling of Communication Phases in MPI Programs Min Yeol Lim Computer Science Department Sep.

In Large-Scale Cluster Yutaka Ishikawa Computer Science Department/Information Technology Center The University of Tokyo

1 RECONSTRUCTION OF APPLICATION LAYER MESSAGE SEQUENCES BY NETWORK MONITORING Jaspal SubhlokAmitoj Singh University of Houston Houston, TX Fermi National.

Multiplicative Wavelet Traffic Model and pathChirp: Efficient Available Bandwidth Estimation Vinay Ribeiro.

A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside

Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.

Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.

Design an MPI collective communication scheme A collective communication involves a group of processes. –Assumption: Collective operation is realized based.

PathChirp Spatio-Temporal Available Bandwidth Estimation Vinay Ribeiro Rolf Riedi, Richard Baraniuk Rice University.

Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.

1. Introduction REU 2006-Packet Loss Distributions of TCP using Web100 Zoriel M. Salado, Mentors: Dr. Miguel A. Labrador and Cesar D. Guerrero 2. Methodology.

Deadline-based Resource Management for Information- Centric Networks Somaya Arianfar, Pasi Sarolahti, Jörg Ott Aalto University, Department of Communications.

Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.

Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.

03/03/051 Performance Engineering of Software and Distributed Systems Research Activities at IIT Bombay Varsha Apte March 3 rd, 2005.

Development of a QoE Model Himadeepa Karlapudi 03/07/03.

Grid Computing slide to be used anywhere Harness global resources to improve performance.

Sunpyo Hong, Hyesoon Kim

FroNtier Stress Tests at Tier-0 Status report Luis Ramos LCG3D Workshop – September 13, 2006.

A Two-Tier Heterogeneous Mobile Ad Hoc Network Architecture and Its Load-Balance Routing Problem C.-F. Huang, H.-W. Lee, and Y.-C. Tseng Department of.

1 Revision to DOE proposal Resource Optimization in Hybrid Core Networks with 100G Links Original submission: April 30, 2009 Date: May 4, 2009 PI: Malathi.

LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium.

1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.

Impact of Neighbor Selection on Performance and Resilience of Structured P2P Networks Sushma Maramreddy.

Standards and Patterns for Dynamic Resource Management

Auburn University COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University.

Department of Computer Science University of California, Santa Barbara

Shreeni Venkataramaiah

CARLA Buenos Aires, Argentina - Sept , 2017

pathChirp Efficient Available Bandwidth Estimation

pathChirp Efficient Available Bandwidth Estimation

Presentation transcript:

Rice01, slide 1 Characterizing NAS Benchmark Performance on Shared Heterogeneous Networks Jaspal Subhlok Shreenivasa Venkataramaiah Amitoj Singh University of Houston Heterogeneous Computing Workshop, April 15, 2002

Rice01, slide 2 Mapping/Adapting Distributed Applications on Networks Data Sim 1 Vis Sim 2 Stream Model Pre ? Application Network

Rice01, slide 3 Automatic node selection m-6 m-5 m-4 m-7 m-1 m-2 m-3 Congested route Compute nodes Routers m-8 Busy nodes selected nodes Select 4 nodes for execution : Choice is easy

Rice01, slide 4 Automatic node selection m-6 m-5 m-4 m-7 m-1 m-2 m-3 Congested route Compute nodes Routers m-8 Busy nodes selected nodes Select 5 nodes: choice depends on application

Rice01, slide 5 Mapping/Adapting Distributed Applications on Networks Data Sim 1 Vis Sim 2 Stream Model Pre ? ApplicationNetwork 1)Discover application characteristics and model performance in a shared heterogeneous environment 2)Discover network structure and available resources (e.g., NWS, REMOS) 3)Algorithms to map/remap applications to networks

Rice01, slide 6 Methodology for Building Application Performance Signature Performance signature = model to predict application execution time under given network conditions 1.Execute the application on a controlled testbed 2.Measure system level activity during execution –such as CPU, communication and memory usage 3.Analyze and discover program level activity (message sizes, sequences, synchronization waits) 4.Develop a performance signature No access to source code/libraries assumed

Rice01, slide 7 Discovering application characteristics 500MHz Pentium Duos ethernet switch (crossbar) 100 Mbps links Executable Application Code Benchmarking on a controlled testbed and analysis Model as a Performance Signature capture patterns of CPU loads and traffic during execution

Rice01, slide 8 Results in this paper Executable Application Code Benchmarking on a controlled testbed Measure performance with resource sharing Demonstrate that measured resource usage on a testbed is a good predictor of performance on a shared network for NAS benchmarks 500MHz Pentium Duos ethernet switch (crossbar) 100 Mbps links capture patterns of CPU loads and traffic during execution

Rice01, slide 9 Experiment Procedure Resource utilization of NAS benchmarks measured on a dedicated testbed –CPU probes based on “top” and “vmstat” utility –Bandwidth using “iptraf”, “tcpdump”, SNMP queries Performance of NAS benchmark measured with competing loads and limited bandwidth –Employ dummynet and NISTnet to limit bandwidth All measurements presented are on 500MHz Pentium Duos, 100 Mbps network, TCP/IP, FreeBSD All results on Class A, MPI, NAS Benchmarks

Rice01, slide 10 Discovered Communication Structure of NAS Benchmarks BT CG IS EP LU MG SP 2

Rice01, slide 11 Performance with competing computation loads Increase beyond 50% due to lack of coordinated (gang) scheduling and synchronization Correlation between low CPU utilization and smaller increase in execution time (e.g. MG shows only ~60% CPU utilization) Execution time is lower if least busy node has a competing load (20% difference in the busyness level for CG)

Rice01, slide 12 Performance with Limited Bandwidth (reduced from 100 to 10Mbps) on one link Close correlation between link utilization and performance with a shared or slow link

Rice01, slide 13 Performance with Limited Bandwidth (reduced from 100 to 10 Mbps) on all links Close correlation between total network traffic and performance with all shared or slow links

Rice01, slide 14 Results and Conclusions (not the last slide) Computation and communication patterns can be captured by passive, near non-intrusive, monitoring Benchmarked resource usage pattern is a strong indicator of performance with sharing –strong correlation between application traffic and performance with low bandwidth links –CPU utilization during normal execution a good indicator of performance with node sharing Synchronization and timing effects were not dominant for NAS Benchnmarks

Rice01, slide 15 Discussion and Ongoing Work (the last slide) Capture application level data exchange pattern from network probes (e.g. MPI message sequence, sizes) –slowdown different for different message sizes Infer the main synchronization/waiting patterns –Impact of unbalanced execution and lack of gang scheduling Capture impact of CPU scheduling policy for accurate prediction with sharing –Policies try to compensate for waits Goal is to build a quantitative “performance signature” to estimate execution time under any given network conditions, and use it in a resource management prototype system