George Kola Computer Sciences Department University of Wisconsin-Madison Data Pipelines: Real Life Fully.

Slides:



Advertisements
Similar presentations
A MapReduce Workflow System for Architecting Scientific Data Intensive Applications By Phuong Nguyen and Milton Halem phuong3 or 1.
Advertisements

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
Jim Basney Computer Sciences Department University of Wisconsin-Madison Managing Network Resources in.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Vladimir Litvin, Harvey Newman Caltech CMS Scott Koranda, Bruce Loftis, John Towns NCSA Miron Livny, Peter Couvares, Todd Tannenbaum, Jamie Frey Wisconsin.
Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Introduction to Hadoop and HDFS
Distributed systems A collection of autonomous computers linked by a network, with software designed to produce an integrated computing facility –A well.
Distributed Multigrid for Processing Huge Spherical Images Michael Kazhdan Johns Hopkins University.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Welcome and Condor Project Overview.
Zach Miller Computer Sciences Department University of Wisconsin-Madison Bioinformatics Applications.
Peter F. Couvares (based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber) Associate Researcher, Condor Team Computer Sciences Department University.
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar and Miron Livny University of Wisconsin-Madison March 25 th, 2004 Tokyo, Japan.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
Alain Roy Computer Sciences Department University of Wisconsin-Madison I/O Access in Condor and Grid.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
George Kola Computer Sciences Department University of Wisconsin-Madison DiskRouter: A Mechanism for High.
Cluster 2004 San Diego, CA A Client-centric Grid Knowledgebase George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison September 23 rd,
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar University of Wisconsin-Madison May 25 th, 2004 CERN.
1 Stork: State of the Art Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Alain Roy Computer Sciences Department University of Wisconsin-Madison Condor & Middleware: NMI & VDT.
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO DES Data Management Ray Plante.
Peter F. Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Managing Job.
A Fully Automated Fault- tolerant System for Distributed Video Processing and Off­site Replication George Kola, Tevfik Kosar and Miron Livny University.
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Condor and Virtual Machines.
Reliable and Efficient Grid Data Placement using Stork and DiskRouter Tevfik Kosar University of Wisconsin-Madison April 15 th, 2004.
GridShell/Condor: A virtual login Shell for the NSF TeraGrid (How do you run a million jobs on the NSF TeraGrid?) The University of Texas at Austin.
Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.
Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.
Zach Miller Computer Sciences Department University of Wisconsin-Madison Supporting the Computation Needs.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Condor on Dedicated Clusters Peter Couvares and Derek Wright Computer Sciences Department University of Wisconsin-Madison
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Replication of Heavyweight Data Scott Koranda University of Wisconsin-Milwaukee &
Condor DAGMan: Managing Job Dependencies with Condor
Pegasus WMS Extends DAGMan to the grid world
Cloudy Skies: Astronomy and Utility Computing
U.S. ATLAS Grid Production Experience
Example: Rapid Atmospheric Modeling System, ColoState U
Outline Problem DiskRouter Overview Details Real life DiskRouters
Outline What users want ? Data pipeline overview
STORK: A Scheduler for Data Placement Activities in Grid
Mats Rynge USC Information Sciences Institute
Condor-G Making Condor Grid Enabled
Frieda meets Pegasus-WMS
 Is a machine that is able to take information (input), do some work on (process), and to make new information (output) COMPUTER.
Overview of Computer system
Presentation transcript:

George Kola Computer Sciences Department University of Wisconsin-Madison Data Pipelines: Real Life Fully Automated Fault-tolerant Data Movement and Processing

2 Outline › What users want ? › Data pipeline overview › Real life Data pipelines  NCSA and WCER pipelines › Conclusions

3 What users want ? › Make data available at different sites › Process data and make results available at different sites › Use distributed computing resources for processing › Full automation and fault-tolerance

4 What users want ? › Can we press a button and expect it to complete ? › Can we not bother about failures ? › Can we get acceptable throughput ? › Yes… Data pipeline is the solution!

5 Data Pipeline Overview › Fully automated framework for data movement and processing › Fault tolerant & resilient to failures  Understands failures and handles them › Self-tuning › Rich statistics › Dynamic visualization of system state

6 Data Pipelines Design › View data placement and computation as full fledged jobs › Data placement handled by Stork › Computation handled by Condor/Condor-G › Dependencies between jobs handled by DAGMan › Tunable statistics generation/collection tool › Visualization handled by DEVise

7 Fault Tolerance › Failure makes automation difficult › Variety of failures happen in real life  Network, software, hardware › System designed taking failure into account › Hierarchical fault tolerance  Stork/Condor, DAGMan › Understands failures  Stork switches protocols › Persistent logging. Recovers from machine crashes

8 Self Tuning › Users are domain experts and not necessarily computer experts › Data movement tuned using  Storage system characteristics  Dynamic network characteristics › Computation scheduled on data availability

9 Statistics/Visualization › Network statistics › Job run-times, data transfer times › Tunable statistics collection › Statistics entered into Postgres database › Interesting facts can be derived from the data › Dynamic system visualization using DEVise

10 Real life Data Pipelines › Astronomy data processing pipeline  ~3 TB (2611 x 1.1 GB files)  Joint work with Robert Brunner, Michael Remijan et al. at NCSA › WCER educational video pipeline  ~6TB (13 GB files)  Joint work with Chris Thorn et al at WCER

11 DPOSS Data › Palomar-Oschin photographic plates used to map one half of celestial sphere › Each photographic plate digitized into a single image › Calibration done by software pipeline at Caltech › Want to run SExtractor on the images The Palomar Digital Sky Survey (DPOSS)

12 Input Data flow Output Data flow Processing Staging Condor Condor Staging NCSA Pipeline

13 NCSA Pipeline › Moved & Processed 3 TB of DPOSS image data in under 6 days  Most powerful astronomy data processing facility! › Adapt for other datasets (Petabytes): Quest2, CARMA, NOAO, NRAO, LSST › Key component in future Astronomy Cyber infrastructure

14 WCER Pipeline › Need to convert DV videos to MPEG-1, MPEG-2 and MPEG-4 › Each 1 hour video is 13 GB › Videos accessible through ‘transana’ software › Need to stage the original and processed videos to SDSC

15 WCER Pipeline › First attempt at such large scale distributed video processing › Decoder problems with large 13 GB files › Uses bleeding edge technology EncodingResolutionFile SizeAverage Time MPEG-1Half (320 x 240)600 MB2 hours MPEG-2Full (720x480)2 GB8 hours MPEG-4Half (320 x 480)250 MB4 hours

16 SRB Staging Condor Input Data flow Output Data flow Processing WCER Pipeline WCER

17 Conclusion › Large scale data movement & processing can be fully automated! › Successfully processed terabytes of data › Data pipelines are useful for diverse fields › We have shown two working case studies in astronomy and educational research › We are working with our collaborators to make this production quality MSScmd DiskRouter UniTree Management User submits a DAG at management site Staging Staging Condor Globus-url-copy Condor File Transfer Mechanism Condor 2a 2b Control flow Input Data flow Output Data flow Processing 4 5 4b 4a 3 A B C DE

18 Questions › Thanks for listening › Contact Information George Kola Tevfik Kosar Office: 3361 Computer Science › Collaborators NCSA Robert Brunner NCSA Michael Remijan WCER Chris Thorn