Parasol Architecture A mild case of scary asynchronous system stuff.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Categories of I/O Devices
CSC 360- Instructor: K. Wu Overview of Operating Systems.
Operating System.
Executional Architecture
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
MapReduce Online Veli Hasanov Fatih University.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”
Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.
6/13/20151 CS 160: Lecture 13 Professor John Canny Fall 2004.
A CHAT CLIENT-SERVER MODULE IN JAVA BY MAHTAB M HUSSAIN MAYANK MOHAN ISE 582 FALL 2003 PROJECT.
1 SWE Introduction to Software Engineering Lecture 21 – Architectural Design (Chapter 13)
1: Operating Systems Overview
Application architectures
Introduction  What is an Operating System  What Operating Systems Do  How is it filling our life 1-1 Lecture 1.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Processes.
1 Operating Systems Ch An Overview. Architecture of Computer Hardware and Systems Software Irv Englander, John Wiley, Bare Bones Computer.
Lecture 1: Introduction CS170 Spring 2015 Chapter 1, the text book. T. Yang.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
State of Delaware Department of Natural Resources and Environmental Control.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
Hadoop & Cheetah. Key words Cluster  data center – Lots of machines thousands Node  a server in a data center – Commodity device fails very easily Slot.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Christopher Jeffers August 2012
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Robert Fourer, Jun Ma, Kipp Martin Copyright 2006 An Enterprise Computational System Built on the Optimization Services (OS) Framework and Standards Jun.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
Bonrix SMPP Client. Index Introduction Software and Hardware Requirements Architecture Set Up Installation HTTP API Features Screen-shots.
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
DUCKS – Distributed User-mode Chirp- Knowledgeable Server Joe Thompson Jay Doyle.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.
Java Threads 11 Threading and Concurrent Programming in Java Introduction and Definitions D.W. Denbo Introduction and Definitions D.W. Denbo.
Some Design Notes Iteration - 2 Method - 1 Extractor main program Runs from an external VM Listens for RabbitMQ messages Starts a light database engine.
Operating Systems Lecture 7 OS Potpourri Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of Software.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Privilege separation in Condor Bruce Beckles University of Cambridge Computing Service.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Server to Server Communication Redis as an enabler Orion Free
INFORMATION SYSTEM-SOFTWARE Topic: OPERATING SYSTEM CONCEPTS.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
1: Operating Systems Overview 1 Jerry Breecher Fall, 2004 CLARK UNIVERSITY CS215 OPERATING SYSTEMS OVERVIEW.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 4 Computer Systems Review.
Mock Objects in Functional Testing Sven Rosvall. Dimension Data Cloud Business Unit.
CSC 520 – Advanced Object Oriented Programming, Fall, 2010 Thursday, October 14 Week 7, UML Diagrams
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.
Application architectures Advisor : Dr. Moneer Al_Mekhlafi By : Ahmed AbdAllah Al_Homaidi.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
BIG DATA/ Hadoop Interview Questions.
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
Introduction to Operating Systems Concepts
TensorFlow– A system for large-scale machine learning
REAL-TIME OPERATING SYSTEMS
Chapter 2: System Structures
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Replication Middleware for Cloud Based Storage Service
DUCKS – Distributed User-mode Chirp-Knowledgeable Server
Design Components are Code Components
Presentation transcript:

Parasol Architecture A mild case of scary asynchronous system stuff

Initial Design Goals Handle huge batches of jobs, large clusters –100,000 blastz jobs on 1000 cpus Error tolerant –Transient network glitches, compute node failures, software failures –Allow easy restart when bugs are fixed Easy to check status of jobs –Bundling jobs in batches rather than tracking individual jobs. Sharing cluster between users As robust as possible: –Simple –Leveraging earlier work on jabba ‘job babysitter’ for Condor scheduler.

Technical Considerations Very busy networks complicate things: –Messages may be dropped so have to have retry logic. –Retry logic means it’s not instant to figure out that a machine is down. A design where a central scheduler just communicated with one cluster node at a time would be too slow. Multiple threads/processes can lead to hard to debug race conditions.

Process/Thread Architecture Parasol processes/threads (circles) and message flow (arrows). All processes reside on the scheduling machine except for the node processes. A spoke process can send messages to any node. The hub, spoke, and heartbeat are all threads of a hub process.

Node Process Runs as root. Forks and changes to user to run job. Keeps list of last 10 jobs it has finished as well as the ones (one for each CPU) it is working on. Responds to job-start, job-kill and job-status- query messages. Sends job-end and job-status messages. –Job-end message includes error code –Stores stderr in a local file which it will send to hub on request.

Para client process The para client manages batchs of jobs through the hub. It is designed to catch jobs which may have run into problems of any sort, and give the user a chance to rerun them after the problem is fixed. The major input to para is a job list. Each job can have checks associated with it before and after the job itself is run. Initially para reads the job list and transforms it into a job database. The central routine of para, paraCycle, reads the job database, queries the hub to see what jobs are running and waiting, looks at the results file to see what jobs are finished, performs output checks on the finished jobs, sends unsubmitted jobs or jobs that need to be rerun to the hub, updates the database in memory, and writes it back out. The database is in a comma-delimited text format with one job per line. The job database keeps track of the timing and status of each job submission. The code to read and write this database was generated with AutoSql. para will avoid loading the hub with more than 100,000 jobs at a time, and will only submit failed jobs three times before giving up on them. Para is a direct descendant of the “jabba” wrapper we put around the Condor scheduler.

Hub Process Where the rubber really meets the road, the most complex part of the system. Multithreaded around a central message queue. Talk goes into chalk-talk mode here

Chalk talk outline Message queue synchronization Revisit architecture diagram Message passing –Udp between processes –Message queue between threads of hub Main thread eats messages from queue, sends messages to spokes, clients. Other threads so simple, easy to see that they don’t write things main thread uses other than message queue. Main thread designed to respond to any one message quickly, deferring longer stuff to spoke. Heartbeat messages trigger status checks, cleanup. Main data structures: machine, user, batch, job