Performance-Robust Parallel I/O

Slides:

Advertisements

Similar presentations

Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.

Advertisements

Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.

Lava: A Reality Check of Network Coding in Peer-to-Peer Live Streaming Mea Wang, Baochun Li Department of Electrical and Computer Engineering University.

Distributed Multimedia Systems

Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.

1 SEDA: An Architecture for Well- Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.

CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 27 – Media Server (Part 3) Klara Nahrstedt Spring 2011.

Scalable Content-aware Request Distribution in Cluster-based Network Servers Jianbin Wei 10/4/2001.

NETWORK LOAD BALANCING NLB.  Network Load Balancing (NLB) is a Clustering Technology.  Windows Based. (windows server).  To scale performance, Network.

Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.

Cooperative Caching Middleware for Cluster-Based Servers Francisco Matias Cuenca-Acuna Thu D. Nguyen Panic Lab Department of Computer Science Rutgers University.

Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?

A Novel Video Layout Strategy for Near-Video-on- Demand Servers Shenze Chen & Manu Thapar Hewlett-Packard Labs 1501 Page Mill Rd. Palo Alto, CA

1 Action Breakout Session Anil, AP, Nina Bhatti, Charles Berdnall, Joe Hellerstein, Wei Hu, Anthony Joseph, Randy Katz, Li, Machi Mukund Kimmo Raatikanen,

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan.

User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.

CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.

Web Server Load Balancing/Scheduling Asima Silva Tim Sutherland.

ACDN: A CDN for Applications Pradnya Karbhari Michael Rabinovich Zhen Xiao Fred Douglis AT&T Labs -- Research.

RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.

Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

1 The Google File System Reporter: You-Wei Zhang.

Google File System Simulator Pratima Kolan Vinod Ramachandran.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Overcast: Reliable Multicasting with an Overlay Network CS294 Paul Burstein 9/15/2003.

Scalable Web Server on Heterogeneous Cluster CHEN Ge.

Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.

Collective Buffering: Improving Parallel I/O Performance By Bill Nitzberg and Virginia Lo.

A Survey of Distributed Task Schedulers Kei Takahashi (M1)

Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

Overcast: Reliable Multicasting with an Overlay Network Paper authors: Jannotti, Gifford, Johnson, Kaashoek, O’Toole Jr. Slides by Chris Johnstone.

MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.

OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.

HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.

Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.

1 Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard.

6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris

A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.

Chapter 14: Mass-Storage Systems Disk Structure. Disk Scheduling. RAID.

Parallel IO for Cluster Computing Tran, Van Hoai.

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

/ Fast Web Content Delivery An Introduction to Related Techniques by Paper Survey B Li, Chien-chang R Sung, Chih-kuei.

CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 27 – Media Server (Part 2) Klara Nahrstedt Spring 2009.

The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)

Dynamic Behavior of Slowly Responsive Congestion Control Algorithms (Bansal, Balakrishnan, Floyd & Shenker, 2001)

Web Server Load Balancing/Scheduling

SEDA: An Architecture for Scalable, Well-Conditioned Internet Services

Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.

Web Server Load Balancing/Scheduling

Introduction to Load Balancing:

Action Breakout Session

Alternative system models

Mohammad Malli Chadi Barakat, Walid Dabbous Alcatel meeting

Introduction to Networks

Memory Management for Scalable Web Data Servers

Auburn University COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University.

Replication Middleware for Cloud Based Storage Service

Distributed P2P File System

Outline Midterm results summary Distributed file systems – continued

Hadoop Technopoints.

CLUSTER COMPUTING.

Distributed computing deals with hardware

Peer-to-Peer Streaming: An Hierarchical Approach

Process Migration Troy Cogburn and Gilbert Podell-Blume

THE GOOGLE FILE SYSTEM.

Chapter 5 Architectural Design.

Presentation transcript:

Performance-Robust Parallel I/O Virtual Streams Performance-Robust Parallel I/O Z. Morley Mao, Noah Treuhaft CS258 5/17/99 Professor Culler

Introduction Clusters exhibit performance heterogeneity static & dynamic, due to both hardware and software Consistent peak performance demands adaptive software building performance-robust parallel software means keeping heterogeneity in mind This work explores… adaptivity appropriate for I/O-bound parallel programs how to provide that adaptivity

Heterogeneity demands adaptivity Cluster Node Process Disk ... Physical I/O streams are simple to build and use But their performance is highly variable different drive models, bad blocks, multizone behavior, file layout, competing programs, host bottlenecks I/O-bound parallel programs run at rate of slowest disk

Virtual Streams Performance-robust programs want virtual streams that... eliminate dependence on individual disk behavior continually equalize throughput delivered to processes Process Virtual Streams Layer Disk

Graduated Declustering (GD): a Virtual Streams implementation data replicated (mirrored) for availability use replicas to provide performance availability, too fast network makes remote disk access comparable to local distributed algorithm for adaptivity client provides information about its progress server reacts by scheduling requests to even out progress client A client B Process GD client library GD server server server A B

GD in action Local decisions yield global behavior Before Perturbation To Client0 Before Perturbation After Perturbation 1 2 3 B Client1 Client2 Client3 Server0 Server1 Server2 Server3 From B/2 7B/8 3B/8 5B/8 B/4

Evaluation of original GD implementation: progress-based Seek overhead due to reading from all replicas Seek overhead

Deficiency of original GD implementation: seek overhead Under the assumption of sequential data access: Seek occurs even when there is no perturbation seeks are becoming more significant as disk transfer rate increases Need a new algorithm, that ... reads mostly from a single disk under no perturbation dynamically adjusts to perturbation when necessary achieves both performance adaptivity and minimal overhead

Proposed solution: response-rate-based GD Number of requests clients send to server based on server response rate servers use request queue lengths to make scheduling decisions uses implicit information, “historyless” no bandwidth information transmitted between server and client advantage: each client has a primary server

Evaluation of response-rate-based GD Graph of bandwidth vs. disk nodes perturbed Reduced Seek overhead

Historyless vs. History-based adaptiveness History-based: (progress based) Adjustment to perturbation occurs gradually over time Close to perfect knowledge, if the information not outdated extra overhead in sending control information Historyless: (response-rate based) primary server designation possible to increase sensitivity to real perturbation by creating “artificial” perturbation considers varying performance of data consumers takes longer to converge

Stability and Convergence How long does it take for the system to converge? Linear with the number of nodes Depends on the last occurrence of perturbation Influenced by the style of communication (implicit vs. explicit)

Server request handoff If a server finishes all its requests, it will contact other servers with the same replicas to help serve their clients (workstealing) server request handoff keeps all disks busy when possible design decisions? How many requests to handoff? Depending on the BW history of both servers, depending on the size of request queue. Benefit vs. Cost tradeoff

Writes Identical to reads except... Create incomplete replicas with “holes” track “holes” in metadata afterward, do “hole-filling” both for availability and for performance robustness Process

Conclusions What did we achieve? New load balancing algorithm--response-rate based Deliver equal BW to parallel-program processes in face of performance heterogeneity demonstrate the stability of the system reduce seek overhead server request handoff writes creates a useful abstraction for steaming I/O in clusters

Future Work Future work: hot file replication get peak BW after perturbation ceases achieve orderly replies multiple disks abstraction