Just-in-time Staging of Large Input Data for Supercomputing Jobs Henry Monti, Ali R. ButtSudharshan S. Vazhkudai.

Slides:



Advertisements
Similar presentations
Network Resource Broker for IPTV in Cloud Computing Lei Liang, Dan He University of Surrey, UK OGF 27, G2C Workshop 15 Oct 2009 Banff,
Advertisements

Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
The BitTorrent Protocol. What is BitTorrent?  Efficient content distribution system using file swarming. Does not perform all the functions of a typical.
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
SplitStream: High- Bandwidth Multicast in Cooperative Environments Monica Tudora.
Efficient and Flexible Parallel Retrieval using Priority Encoded Transmission(2004) CMPT 886 Represented By: Lilong Shi.
Outline for today Structured overlay as infrastructures Survey of design solutions Analysis of designs.
Context-based Information Sharing and Authorization in Mobile Ad Hoc Networks Incorporating QoS Constraints Sanjay Madria, Missouri University of Science.
Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University.
Applications over P2P Structured Overlays Antonino Virgillito.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.
FRIENDS: File Retrieval In a dEcentralized Network Distribution System Steven Huang, Kevin Li Computer Science and Engineering University of California,
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
High Performance Cooperative Data Distribution [J. Rick Ramstetter, Stephen Jenks] [A scalable, parallel file distribution model conceptually based on.
Wide-area cooperative storage with CFS
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
Project Mimir A Distributed Filesystem Uses Rateless Erasure Codes for Reliability Uses Pastry’s Multicast System Scribe for Resource discovery and Utilization.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
SIMULATING A MOBILE PEER-TO-PEER NETWORK Simo Sibakov Department of Communications and Networking (Comnet) Helsinki University of Technology Supervisor:
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
Slicing the Onion: Anonymity Using Unreliable Overlays Sachin Katti Jeffrey Cohen & Dina Katabi.
Exploring VoD in P2P Swarming Systems By Siddhartha Annapureddy, Saikat Guha, Christos Gkantsidis, Dinan Gunawardena, Pablo Rodriguez Presented by Svetlana.
BitTorrent How it applies to networking. What is BitTorrent P2P file sharing protocol Allows users to distribute large amounts of data without placing.
PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.
CS An Overlay Routing Scheme For Moving Large Files Su Zhang Kai Xu.
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
1 BitHoc: BitTorrent for wireless ad hoc networks Jointly with: Chadi Barakat Jayeoung Choi Anwar Al Hamra Thierry Turletti EPI PLANETE 28/02/2008 MAESTRO/PLANETE.
UbiStore: Ubiquitous and Opportunistic Backup Architecture. Feiselia Tan, Sebastien Ardon, Max Ott Presented by: Zainab Aljazzaf.
High Throughput Computing on P2P Networks Carlos Pérez Miguel
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Optimal Content Delivery with Network Coding Derek Leong, Tracey Ho California Institute of Technology Rebecca Cathey BAE Systems CISS 2009 March 19, 2009.
Communication Paradigm for Sensor Networks Sensor Networks Sensor Networks Directed Diffusion Directed Diffusion SPIN SPIN Ishan Banerjee
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
DHT-based unicast for mobile ad hoc networks Thomas Zahn, Jochen Schiller Institute of Computer Science Freie Universitat Berlin 報告 : 羅世豪.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Peer to Peer Network Design Discovery and Routing algorithms
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
P2P Content Search: Give the Web Back to the People Matthias Bender Sebastin Michel Peter Triantafillou Gerhard Weikum Christian Zimmer Mariam John CSE.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Malugo – a scalable peer-to-peer storage system..
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY The stagesub tool Sudharshan S. Vazhkudai Computer Science Research Group CSMD Oak Ridge National.
NGMAST Mobile DHT Energy1 Optimizing Energy Consumption of Mobile Nodes in Heterogeneous Kademlia-based Distributed Hash Tables Imre Kelényi Budapest.
Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
An example of peer-to-peer application
Introduction to BitTorrent
Trustworthiness Management in the Social Internet of Things
Plethora: Infrastructure and System Design
DHT Routing Geometries and Chord
Presentation transcript:

Just-in-time Staging of Large Input Data for Supercomputing Jobs Henry Monti, Ali R. ButtSudharshan S. Vazhkudai

HPC Center Data Stage-in Problem  Data stage-in entails moving all necessary input files for a job to a center’s local storage Requires significant commitment of center resources while waiting for the job to run Storage failures are common, and users may be required to restage data  Delaying input data causes costly job rescheduling  Staging data too early is undesirable From a center standpoint: Wastes scratch space that could be used for other jobs From a user job standpoint: Potential job rescheduling due to storage system failure  Coinciding Input Data Stage-in time with job execution time improves HPC center serviceability 2

Current Methods to Stage-in Data  No standardized method  Utilizes common point-to-point transfer tools: GridFTP, HSI, scp, …  Limitations Entail early stage-in to ensure data availability Do not leverage orthogonal bandwidth Oblivious to deadlines or job start times Not an optimal approach for HPC data stage-in 33

Our Contribution: A Just-in-time Data Stage-in Service  Coincides data stage-in with job start time  Attempts to reduce overall scratch space usage  Uses an army of intermediate storage locations  Allows for quick stage-in during storage failures  Integrates with real-world tools Portable Batch System (PBS) BitTorrent  Provides a fault-tolerant way to stage data while efficiently utilizing scratch space 44

55 Storage system failures may require data to be re-staged! Transfer complete much earlier than job startup time Job Rescheduled!

66 Provides better opportunities for accomplishing JIT Staging Time between stage-in and job start minimal

Challenges Faced in Our Approach 7 1.Obtaining accurate job start times 2.Adapting to dynamic network behavior 3.Ensuring data reliability and availability 4.Managing deadlines during Stage-in 5.Utilizing intermediate nodes 6.Providing incentives to participate

Obtaining Accurate Job Start Times 8  Accurate estimates of job start time needed Prevent job delay with Just-in-time staging  Batch Queue Prediction (BQP) Provides statistical upper bound on job wait Predicts probability of job starting by the deadline  Obtain predictions of job start time from BQP  Stage-in data using this deadline

Adapting Data Distribution To Dynamic Network Behavior  Available bandwidth can change Distribute data randomly - may not be effective Utilize network monitoring  Network Weather Service (NWS) Provides bandwidth Measurement Predicts future bandwidth  Choose dynamically changing data paths  Select enough nodes to satisfy a given deadline  Monitor and update the selected nodes 9 10 Mb/s 5 Mb/s 4 Mb/s 1 Mb/s ?

Protecting Data from Intermediate Storage Location Failure  Problem: Node failure may cause data loss  Solution: 1.Use data replication Achieved through multiple data flow paths 2.Employ Erasure coding Can be done by the user or at the intermediates 10

Managing Deadlines during Stage-in  Use NWS to measure available bandwidths Use Direct if it can meet a deadline Otherwise, perform decentralized stage-in  If end host fails or cannot meet deadline Utilize decentralized stage-in approach 11 T Stage <= T JobStartup

Intermediate Node Discovery  User specifies known and trusted nodes  Utilize P2P Overlay  Nodes advertise their availability to others  Receiving nodes discovers the advertiser  Discovered nodes utilized as necessary 12 Identifier space

P2P Data Storage and Dissemination  P2P-based storage Enables robust storage of data on loosely coupled distributed participants: CFS, PAST, OceanStore, …  P2P-based multicast  Enables application-level one to many communication  Example: BitTorrent Uses a scatter-gather protocol to distribute files Leverages Seeds - peers that store entire files Employs a tracker to maintain lists of peers Uses a “torrent file” containing metadata for data retrieval 13

Incentives to Participate in Stage-in Process  Modern HPC jobs are often collaborative “Virtual Organizations” - set of geographically distributed users from different sites Jobs in TeraGrid usually from such organizations  Resource bartering among participants to facilitate each others stage-in over time  Nodes specified and trusted by the user 14

Integrating Stage-in with PBS  Provide new PBS directives Specifies destination, intermediate nodes, and deadline #PBS -N myjob #PBS -l nodes=128, walltime=12:00 mpirun -np 128 ~/MyComputation #Stagein file://SubmissionSite:/home/user/input1 #InterNode node1.Site1:49665:50GB... #InterNode nodeN.SiteN:49665:30GB #Deadline 1/14/2007:12:00 15

Adapting BitTorrent Functionality to Data Stage-in  Tailor BitTorrent to meet the needs of our stage-in  Restrict the amount of result-data sent to a peer Peers with less storage than the input size can be utilized  Incorporate global information into peer selection Use NWS bandwidth measurements Use knowledge of node capacity from PBS scripts Choose the appropriate nodes with storage capacity  Recipients are not necessarily end-hosts They may simply pass data onward 16

17 Evaluation: Experimental Setup  Objectives Compare with direct transfer, and BitTorrent Validate our method as an alternative to other stage-in methods  PlanetLab test bed 6 PlanetLab nodes: center + end user + 4 intermediate nodes  Experiments: Compare the proposed method with Point-to-point transfer (scp) Standard BitTorrent  Observe the effect of bandwidth changes

18 Results: Data Transfer Times with Respect to Direct Transfer Times are in seconds File Size100 MB240 MB500 MB2.1 GB Direct Client Offload Pull A JIT stage-in is capable of significantly improving transfer times

19 Results: Data Transfer Times with Respect to Standard BitTorrent Times are in seconds Transferring 2.1 GB file PhaseBitTorrentOur Method Send to all intermediate nodes (Client Offload) Submission site download (Pull) Monitoring based stage-in is capable of outperforming standard BitTorrent

Conclusion  A fresh look at Data Stage-in Decentralized approach Monitoring-based adaptation  Considers deadlines and job start times  Integrated with real-world tools  Outperforms direct transfer by 73.3% in our experiments 20

Future Work  Measuring scratch space savings  Measuring potential job delays  Testing other stage-in scenarios  Contact Virginia Tech. Distributed Systems and Storage Lab. {hmonti, ORNL 21

Evaluation Objectives 1.Compare with direct transfer, and BitTorrent 2.Observe how system reacts to failures and bandwidth fluctuations: a.How SLAs are enforced? b.How fault tolerance is achieved? 3.Validate our method as a viable alternative to other stage-in methods 22

23 Overlay Networks P2P networks are self-organizing overlay networks without central control ISP3 ISP1ISP2 Site 1 Site 4 Site 3 Site 2 N NN N N N N

Structured P2P Overlays  Overlays with imposed structure Each node has a unique random nodeId Each message has a key The nodeId and key reside in the same name space  Routing: Takes a message with a key and sends it to a unique node  Implements Distributed Hash Table (DHT) abstraction DHT abstraction is preserved in the presence of node failure/departure Many implementations available, e.g. Pastry, Tapestry, Chord, CAN … 24 Identifier space