Open Science Grid: More compute power Alan De Smet

Slides:



Advertisements
Similar presentations
4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.
Advertisements

Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.
Intermediate HTCondor: More Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Intermediate Condor: DAGMan Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Communicating with Users about HTCondor and High Throughput Computing Lauren Michael, Research Computing Facilitator HTCondor Week 2015.
When and How to Use Large-Scale Computing: CHTC and HTCondor Lauren Michael, Research Computing Facilitator Center for High Throughput Computing STAT 692,
Matlab, R and Other Jobs in CHTC. chtc.cs.wisc.edu No suitable R No Matlab runtime Missing shared libraries Missing compilers … Running On Bare Bones.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc.
LIGO-G E ITR 2003 DMT Sub-Project John G. Zweizig LIGO/Caltech Argonne, May 10, 2004.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
High Performance Louisiana State University - LONI HPC Enablement Workshop – LaTech University,
Grid Computing I CONDOR.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Intermediate Condor Rob Quick Open Science Grid HTC - Indiana University.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
Part 8: DAGMan A: Grid Workflow Management B: DAGMan C: Laboratory: DAGMan.
Grid Compute Resources and Job Management. 2 Local Resource Managers (LRM)‏ Compute resources have a local resource manager (LRM) that controls:  Who.
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar and Miron Livny University of Wisconsin-Madison March 25 th, 2004 Tokyo, Japan.
Turning science problems into HTC jobs Wednesday, July 29, 2011 Zach Miller Condor Team University of Wisconsin-Madison.
TeraGrid CTSS Plans and Status Dane Skow for Lee Liming and JP Navarro OSG Consortium Meeting 22 August, 2006.
Alain Roy Computer Sciences Department University of Wisconsin-Madison I/O Access in Condor and Grid.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Condor-G A Quick Introduction Alan De Smet Condor Project University of Wisconsin - Madison.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Review of Condor,SGE,LSF,PBS
Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Intermediate Condor: Workflows Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Grid Compute Resources and Job Management. 2 How do we access the grid ?  Command line with tools that you'll use  Specialised applications Ex: Write.
High Energy FermiLab Two physics detectors (5 stories tall each) to understand smallest scale of matter Each experiment has ~500 people doing.
Grid Compute Resources and Job Management. 2 Job and compute resource management This module is about running jobs on remote compute resources.
Experiences Running Seismic Hazard Workflows Scott Callaghan Southern California Earthquake Center University of Southern California SC13 Workflow BoF.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor and DAGMan Barcelona,
Nanbor Wang, Balamurali Ananthan Tech-X Corporation Gerald Gieraltowski, Edward May, Alexandre Vaniachine Argonne National Laboratory 2. ARCHITECTURE GSIMF:
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Introduction &
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Interoperability Achieved by GADU in using multiple Grids. OSG, Teragrid and ANL Jazz Presented by: Dinanath Sulakhe Mathematics and Computer Science Division.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
What’s Coming? What are we Planning?. › Better docs › Goldilocks – This slot size is just right › Storage › New.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor NT Condor ported.
Condor Project Computer Sciences Department University of Wisconsin-Madison Running Interpreted Jobs.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
RI EGI-TF 2010, Tutorial Managing an EGEE/EGI Virtual Organisation (VO) with EDGES bridged Desktop Resources Tutorial Robert Lovas, MTA SZTAKI.
Five todos when moving an application to distributed HTC.
OSG Facility Miron Livny OSG Facility Coordinator and PI University of Wisconsin-Madison Open Science Grid Scientific Advisory Group Meeting June 12th.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
HTCondor-CE. 2 The Open Science Grid OSG is a consortium of software, service and resource providers and researchers, from universities, national laboratories.
Emulating Volunteer Computing Scheduling Policies Dr. David P. Anderson University of California, Berkeley May 20, 2011.
Accessing the VI-SEEM infrastructure
Operations Support Manager - Open Science Grid
Matt Lemons Nate Mayotte
Harnessing the Power of Condor for Human Genetics
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
US CMS Testbed.
Grid Means Business OGF-20, Manchester, May 2007
Haiyan Meng and Douglas Thain
STORK: A Scheduler for Data Placement Activities in Grid
Mats Rynge USC Information Sciences Institute
Condor-G Making Condor Grid Enabled
Presentation transcript:

Open Science Grid: More compute power Alan De Smet

chtc.cs.wisc.edu (CPU days each day averaged over one month) CHTC Cores In Use 1,500

chtc.cs.wisc.edu (CPU days each day averaged over one month) OSG Cores In Use 60,000

chtc.cs.wisc.edu Open Science Grid

chtc.cs.wisc.edu CHTC and OSG usage (CPU days each day)

chtc.cs.wisc.edu Challenges Solved We worry about all of this. You don’t have to. › Authentication  X.509 certificates, certificate authorities, VOMS › Interface  Globus, GridFTP, Grid universe › Validation  Linux distribution, glibc version, basic libraries

chtc.cs.wisc.edu Using OSG › Before universe = vanilla executable = myjob log = myjob.log queue

chtc.cs.wisc.edu Using OSG › After universe = vanilla executable = myjob log = myjob.log +WantGlidein = true queue

chtc.cs.wisc.edu Challenge: Opportunistic › OSG computers go away without notice › Solutions  Condor restarts automatically  Sub-hour jobs  Self-checkpointing  Automated checkpointing Condor’s standard universe DMTCP

chtc.cs.wisc.edu Challenge: Local Software

chtc.cs.wisc.edu Challenge: Local Software › Bare-bones Linux systems › Solution  Bring everything with you  CHTC provided MATLAB and R packages RunDagEnv/mkdag

chtc.cs.wisc.edu Challenge: Erratic Failures › Complex systems fail sometimes › Solution  Expect failures and automatically retry  DAGMan for retries  DAGMan POST scripts to detect problems RunDagEnv/mkdag

chtc.cs.wisc.edu Challenge: Bandwidth › Solutions  Only send what you need  Store large, shared files in our web cache  Read small amounts of data on the fly Condor’s standard universe Parrot