Www.ci.anl.gov www.ci.uchicago.edu Skeleton Key: Sharing Data Across Campus Infrastructures Suchandra Thapa Computation Institute / University of Chicago.

Slides:



Advertisements
Similar presentations
3rd Campus Grid SIG Meeting. Agenda Welcome OMII Requirements document Grid Data Group HTC Workshop Research Computing SIG? AOB Next meeting (AG)
Advertisements

Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.
Bosco: Enabling Researchers to Expand Their HTC Resources The Bosco Team: Dan Fraser, Jaime Frey, Brooklin Gore, Marco Mambelli, Alain Roy, Todd Tannenbaum,
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
SARA Reken- en Netwerkdiensten ToPoS: High-Throughput Parallel Processing Pipelines on the Grid Pieter van Beek SARA Computing and Networking Services.
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
The Prototype Laurence Field IT/SDC 11 November 2014.
Member of the ExperTeam Group Ralf Ratering Pallas GmbH Hermülheimer Straße Brühl, Germany
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
Minerva Infrastructure Meeting – October 04, 2011.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Enabling Grids for E-sciencE Medical image processing web portal : Requirements analysis. An almost end user point of view … H. Benoit-Cattin,
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
HTCondor workflows at Utility Supercomputing Scale: How? Ian D. Alderman Cycle Computing.
Rsv-control Marco Mambelli – Site Coordination meeting October 1, 2009.
Connect.usatlas.org ci.uchicago.edu ATLAS Connect Technicals & Usability David Champion Computation Institute & Enrico Fermi Institute University of Chicago.
Integration and Sites Rob Gardner Area Coordinators Meeting 12/4/08.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Introduction to CVMFS A way to distribute HEP software on cloud Tian Yan (IHEP Computing Center, BESIIICGEM Cloud Computing Summer School.
A. Sim, CRD, L B N L 1 OSG Applications Workshop 6/1/2005 OSG SRM/DRM Readiness and Plan Alex Sim / Jorge Rodriguez Scientific Data Management Group Computational.
BOSCO Architecture Derek Weitzel University of Nebraska – Lincoln.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.
The Cooperative Computing Lab  We collaborate with people who have large scale computing problems in science, engineering, and other fields.  We operate.
Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Remote Cluster Connect Factories David Lesny University of Illinois.
Turning science problems into HTC jobs Wednesday, July 29, 2011 Zach Miller Condor Team University of Wisconsin-Madison.
Privilege separation in Condor Bruce Beckles University of Cambridge Computing Service.
Styx Grid Services: Lightweight, easy-to-use middleware for e-Science Jon Blower Keith Haines Reading e-Science Centre, ESSC, University of Reading, RG6.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
© 2007 UC Regents1 Track 1: Cluster and Grid Computing NBCR Summer Institute Session 1.1: Introduction to Cluster and Grid Computing July 31, 2007 Wilfred.
Convert generic gUSE Portal into a science gateway Akos Balasko 02/07/
Unified scripts ● Currently they are composed of a main shell script and a few auxiliary ones that handle mostly the local differences. ● Local scripts.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,
Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center February.
Jini Architecture Introduction System Overview An Example.
The impacts of climate change on global hydrology and water resources Simon Gosling and Nigel Arnell, Walker Institute for Climate System Research, University.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
AMH001 (acmse03.ppt - 03/7/03) REMOTE++: A Script for Automatic Remote Distribution of Programs on Windows Computers Ashley Hopkins Department of Computer.
Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal.
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
The Gateway Computational Web Portal Marlon Pierce Indiana University March 15, 2002.
MGRID Architecture Andy Adamson Center for Information Technology Integration University of Michigan, USA.
Application Specific Module Tutorial Zoltán Farkas, Ákos Balaskó 03/27/
Condor Project Computer Sciences Department University of Wisconsin-Madison Running Interpreted Jobs.
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
UC3: A Framework for Cooperative Computing at the University of Chicago Lincoln Bryant Computation and Enrico Fermi.
Any Data, Anytime, Anywhere Dan Bradley representing the AAA Team At OSG All Hands Meeting March 2013, Indianapolis.
Five todos when moving an application to distributed HTC.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
Parag Mhashilkar (Fermi National Accelerator Laboratory)
Out of the basement, into the cloud: Extending the Tier 3 Lincoln Bryant Computation and Enrico Fermi Institutes.
Condor: Job Management
MIK 2.1 DBNS - introduction to WS-PGRADE, 2013
Haiyan Meng and Douglas Thain
The Condor JobRouter.
Any Data, Anytime, Anywhere
Presentation transcript:

Skeleton Key: Sharing Data Across Campus Infrastructures Suchandra Thapa Computation Institute / University of Chicago January 24, 2013

2 Introduction Data and software access challenges in campus infrastructures “Unlocking doors” using Skeleton Key Future Steps BOSCO teleconference

3 Campus Infrastructure Considerations Computation is relatively well understood: Condor and Campus Factory/BOSCO allow jobs to be flocked and moved between loosely coupled clusters but… How to handle data and software access? BOSCO teleconference

4 Considerations for Campus Infrastructure Data Access Need to have secure access to data Don’t want to force users to use X.509 certificates Need to be able to expand to support applications running on OSG and accessing data Must be fairly simple for users BOSCO teleconference

5 Possible solution using Parrot and Chirp Software components that provide user applications secure remote access to files on a given system All done in user space – So no need for root access Provides a solution to data problem: – Users keep their data on local storage and use Chirp and Parrot to allow their applications to access it regardless of where their applications may be running – Track-record of successful use by other groups in OSG (e.g. UW-Madison group) so BOSCO teleconference

6 A Parrot-Chirp system: basic idea BOSCO teleconference

7 Advantages of using Parrot Allows applications to use remote file systems as if they were mounted locally Works behind the scenes to make it look like files are present on local filesystem to applications Supports remote access to CVMFS repositories and file systems (when using Chirp) Can be downloaded and run from user home directory or scratch space BOSCO teleconference

8 Why Chirp? Server software that acts as a proxy to access local filesystem or HDFS filesystem Run in user space by user Can use several different authentication methods (unix, tickets, X.509 certificates, hostname verification) Primarily interested in tickets because it allows access from applications running on arbitrary clusters without the overhead of X.509 certificates BOSCO teleconference

9 Skeleton Key aims to simplify Skeleton Key in a nutshell: – Based on Chirp and Parrot – But hides some of complexity of using them – User specifies application parameters and what needs to be shared in a configuration file – Skeleton Key then creates a wrapper, and invokes application in a way that hides data access details to diverse data resources BOSCO teleconference

Workflow for generated scripts BOSCO teleconference

Using Skeleton Key Wrapper configuration done using easy to understand configuration file Generates a shell script that can then be used in a jobmanager submit file or even copied to another system and then run Example run on a data server: skeleton_key –c path_to_config_file Get a script (job_script.sh) that can then be used in condor submit file BOSCO teleconference

Example of a Configuration File [Directories] chirp_base = /mnt/hadoop/sthapa write = /, chirp, chirp/stats [Application] location = script =./benchmark/get_chirp_performance.sh arguments = BOSCO teleconference Arguments that passed to script or binary, can also give arguments in condor submit file Location data and directories can be accessed using FUSE mount

Using Skeleton Key output in a HT Condor submit file universe = vanilla executable =./job_script.sh arguments = $(Process) notification = Error input = output = /tmp/chirp_job.out.$(Process) error = /tmp/chirp_job.err.$(Process) log = /tmp/chirp_job.log should_transfer_files = YES when_to_transfer_output = ON_EXIT queue 40 BOSCO teleconference Shell script generated by Skeleton Key Additional arguments passed to user script

What’s the performance? Ran benchmarks to compare data access using Chirp + Parrot and using a FUSE mounted HDFS filesystem Both cases had 40 clients simultaneously accessing HDFS filesystem Clients run using condor to schedule jobs onto lightly loaded clusters in order to more closely simulate actual user jobs BOSCO teleconference

Read performance using Parrot/Chirp with HDFS backend BOSCO teleconference

Write performance using Parrot/Chirp with HDFS backend BOSCO teleconference

Outbound Data rates using Parrot/Chirp with HDFS BOSCO teleconference

Inbound Data rates using Parrot/Chirp with HDFS BOSCO teleconference

Chirp/Parrot network speeds when using HDFS backend Inbound and outbound bandwidth used is almost identical since Chirp is acting as a proxy to HDFS filesystem Chirp/Parrot utilizes approximately 400MB/s although it has extended peaks at 700MB/s Currently investigating optimizations to get better performance and even out traffic BOSCO teleconference

Chirp/Parrot read performance using a POSIX FS backend BOSCO teleconference

Chirp/Parrot write performance using a POSIX FS backend BOSCO teleconference

Outbound Data rates using Parrot/Chirp with POSIX filesystem BOSCO teleconference Benchmark initially writes to filesystem so very few reads occur

Inbound Data rates using Parrot/Chirp with POSIX filesystem BOSCO teleconference Writes from first half of benchmark Most clients completed writes and are reading from Chirp

Chirp/Parrot network speeds when using POSIX backend Chirp serving data from locally mounted filesystem so inbound and outbound traffic is not tightly coupled Limited by I/O speed of hardware (2 drives in RAID1 array): ~400MB/s BOSCO teleconference

Mathematica runtimes Used a simple mathematica script to calculate Mandelbrot set and compared runtime when running Mathematica from local disk vs. over CVMFS using parrot BOSCO teleconference

Mathematica runtimes using local filesystem BOSCO teleconference

Mathematica runtime using Parrot/CVMFS BOSCO teleconference

Mathmetica runtimes continued Running Mathematica using Parrot/CVMFS takes 480.7±330.3s while running it on local filesystem takes about 15.9±2.7s About an order of magnitude greater to run using Parrot/CVMFS Run time drops to below 60s if Mathematica is run again in same session, majority of runtime in initial invocation due to latency in fetching file and filling Parrot’s local cache BOSCO teleconference

Conclusion Skeleton Key provides a convenient way to use Chirp and Parrot to remotely access data and software Performance fairly good for client access Future directions: – Expand to other users and add enhancements based on user feedback Questions? BOSCO teleconference

Further information Skeleton Key: – Git: – Documentation: Chirp, Parrot, HDFS – Douglas Thain and Miron Livny,Parrot: An Application Environment for Data-Intensive Computing,Scalable Computing: Practice and Experience, 6(3), pages 9-18, September, Parrot: An Application Environment for Data-Intensive Computing,Scalable Computing: Practice and Experience, 6(3), pages 9-18, September, – Douglas Thain, Christopher Moretti, and Jeffrey Hemmes,Chirp: A Practical Global Filesystem for Cluster and Grid Computing,Journal of Grid Computing, 7(1), pages 51-72, March, DOI: /s Chirp: A Practical Global Filesystem for Cluster and Grid Computing,Journal of Grid Computing, 7(1), pages 51-72, March, DOI: /s – Patrick Donnelly, Peter Bui, Douglas Thain,Attaching Cloud Storage to a Campus Grid Using Parrot, Chirp, and Hadoop,IEEE International Conference on Cloud Computing Technology and Science, pages , November, DOI: /CloudCom Attaching Cloud Storage to a Campus Grid Using Parrot, Chirp, and Hadoop,IEEE International Conference on Cloud Computing Technology and Science, pages , November, DOI: /CloudCom BOSCO teleconference

Acknowledgements CCTools team, Dan UW-Madison Colleagues at UC3: – Lincoln Bryant, Marco Mambelli, Rob Gardner BOSCO teleconference