The Sensitivity of Communication Mechanisms to Bandwidth and Latency Frederic T. Chong, Rajeev Barua, Fredrik Dahlgrenz, John D. Kubiatowicz, and Anant.

Slides:



Advertisements
Similar presentations
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
Advertisements

Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
Far Fetched Prefetching?
CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.
Chapter 10 Input/Output Organization. Connections between a CPU and an I/O device Types of bus (Figure 10.1) –Address bus –Data bus –Control bus.
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
FIU Chapter 7: Input/Output Jerome Crooks Panyawat Chiamprasert
AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, Tyson Condie,
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
Computer Architecture Evaluation, Simulation and Research OSU ECE OS Interaction with Cache Memories Dr. Sohum Sohoni School of Electrical and Computer.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Jonathan.
1: Operating Systems Overview
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
OPERATING SYSTEM OVERVIEW
Haoyuan Li CS 6410 Fall /15/2009.  U-Net: A User-Level Network Interface for Parallel and Distributed Computing ◦ Thorsten von Eicken, Anindya.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
I/O Systems CS 3100 I/O Hardware1. I/O Hardware Incredible variety of I/O devices Common concepts ◦Port ◦Bus (daisy chain or shared direct access) ◦Controller.
University College Cork IRELAND Hardware Concepts An understanding of computer hardware is a vital prerequisite for the study of operating systems.
Operating Systems Béat Hirsbrunner Main Reference: William Stallings, Operating Systems: Internals and Design Principles, 6 th Edition, Prentice Hall 2009.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
COM S 614 Advanced Systems Novel Communications U-Net and Active Messages.
CS 258 Parallel Computer Architecture LimitLESS Directories: A Scalable Cache Coherence Scheme David Chaiken, John Kubiatowicz, and Anant Agarwal Presented:
04/19/2004CSCI 315 Operating Systems Design1 Mass Storage Structure Notice: The slides for this lecture have been largely based on those accompanying the.
1 Computer System Overview Chapter 1 Review of basic hardware concepts.
Path selection Packet scheduling and multipath Sebastian Siikavirta and Antti aalto.
CHAPTER 13: I/O SYSTEMS Overview Overview I/O Hardware I/O Hardware I/O API I/O API I/O Subsystem I/O Subsystem Transforming I/O Requests to Hardware Operations.
Is Lambda Switching Likely for Applications? Tom Lehman USC/Information Sciences Institute December 2001.
Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley.
3/11/2002CSE Input/Output Input/Output Control Datapath Memory Processor Input Output Memory Input Output Network Control Datapath Processor.
An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms Authous: Al’ecio P. D. Binotto, Carlos.
Synchronization and Communication in the T3E Multiprocessor.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Block I/O. 2 Definition Any I/O operation in which the unit of data is several words, not just one word or byte.
11 SYSTEM PERFORMANCE IN WINDOWS XP Chapter 12. Chapter 12: System Performance in Windows XP2 SYSTEM PERFORMANCE IN WINDOWS XP  Optimize Microsoft Windows.
Parallel ICA Algorithm and Modeling Hongtao Du March 25, 2004.
(More) Interfacing concepts. Introduction Overview of I/O operations Programmed I/O – Standard I/O – Memory Mapped I/O Device synchronization Readings:
Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
I/O Interfacing A lot of handshaking is required between the CPU and most I/O devices. All I/O devices operate asynchronously with respect to the CPU.
Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.
Axel Jantsch 1 NOCARC Network on Chip Architecture KTH, VTT Nokia, Ericsson, Spirea TEKES, Vinnova.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
1: Operating Systems Overview 1 Jerry Breecher Fall, 2004 CLARK UNIVERSITY CS215 OPERATING SYSTEMS OVERVIEW.
Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.
Simics: A Full System Simulation Platform Synopsis by Jen Miller 19 March 2004.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
1 VxWorks 5.4 Group A3: Wafa’ Jaffal Kathryn Bean.
On-chip Parallelism Alvin R. Lebeck CPS 221 Week 13, Lecture 2.
Distributed Systems 2 Distributed Processing. Process A process is a logical representation of a physical processor that executes program code and has.
Chapter 13: I/O Systems Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 13: I/O Systems Overview I/O Hardware Application.
IT3002 Computer Architecture
Extreme Computing’05 Parallel Graph Algorithms: Architectural Demands of Pathological Applications Bruce Hendrickson Jonathan Berry Keith Underwood Sandia.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. Chapter 5 Distributed Memory Parallel Computing v9.0.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Interfacing and Conclusions Opening Discussion zWhat did we talk about last class? zHave you seen anything interesting in the news?
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Interactions with Microarchitectures and I/O Copyright 2004 Daniel.
Reactive Synchronization Algorithms for Multiprocessors
5.2 Eleven Advanced Optimizations of Cache Performance
CS 286 Computer Organization and Architecture
Address Translation for Manycore Systems
CS 258 Reading Assignment 4 Discussion Exploiting Two-Case Delivery for Fast Protected Messages Bill Kramer February 13, 2002 #
LoGPC: Modeling Network Contention in Message-Passing Programs
Gary M. Zoppetti Gagan Agrawal Rishi Kumar
Presentation transcript:

The Sensitivity of Communication Mechanisms to Bandwidth and Latency Frederic T. Chong, Rajeev Barua, Fredrik Dahlgrenz, John D. Kubiatowicz, and Anant Agarwal presented by: Scott Beamer

Overview Comparison of message passing and shared memory with respect to the latency and bandwidth of the network Done on hardware (Alewife) with real apps Also considers optimizations such as prefetching, polling, and bulk transfers

Supported options Message Passing with interrupts - active message style with polling - app checks queue with bulk transfer - DMA to network Shared Memory without prefetching - normal with prefetching - done in background

Performance Models

EM3D Models properties of electromagnetic waves through 3d objects

UNSTRUC Simulates fluid flows over unstructured meshes in 3D

ICCG General iterative sparse matrix solver using conjugate gradients with preconditioning

MOLDYN Computes interactions between molecules within a cut-off distance

Communication Volume

Bandwidth Experiment Controlled amount of I/O traffic across 2D mesh to reduce available bandwidth I/O message size didn’t affect MP, bigger messages improved SM’s performance

Latency Experiment To decrease latency, slowed down processor clock To increase latency, context switches to a delay loop

Conclusion Shared memory required less coding effort Message passing less sensitive to increased latency or reduced bandwidth Shared memory can provide good performance if there is enough bandwidth