HTPC - High Throughput Parallel Computing (on the OSG) Dan Fraser, UChicago OSG Production Coordinator Horst Severini, OU (Greg Thain, Uwisc) OU Supercomputing.

Slides:



Advertisements
Similar presentations
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Advertisements

Introduction to Grid Application On-Boarding Nick Werstiuk
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Greg Thain Computer Sciences Department University of Wisconsin-Madison Condor Parallel Universe.
Campus High Throughput Computing (HTC) Infrastructures (aka Campus Grids) Dan Fraser OSG Production Coordinator Campus Grids Lead.
A Computation Management Agent for Multi-Institutional Grids
Workload Management Massimo Sgaravatto INFN Padova.
Post ASP 2012: Protein Docking at the IU School of Medicine.
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
Minerva Infrastructure Meeting – October 04, 2011.
Utilizing Condor and HTC to address archiving online courses at Clemson on a weekly basis Sam Hoover 1 Project Blackbird Computing,
Assessment of Core Services provided to USLHC by OSG.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc.
1 Integrating GPUs into Condor Timothy Blattner Marquette University Milwaukee, WI April 22, 2009.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
A. Mohapatra, HEPiX 2013 Ann Arbor1 UW Madison CMS T2 site report D. Bradley, T. Sarangi, S. Dasu, A. Mohapatra HEP Computing Group Outline  Infrastructure.
Key Project Drivers - FY11 Ruth Pordes, June 15th 2010.
HTCondor workflows at Utility Supercomputing Scale: How? Ian D. Alderman Cycle Computing.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Running Climate Models On The NERC Cluster Grid Using G-Rex Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre Environmental.
Gayathri Namala Center for Computation & Technology Louisiana State University Representing the SURA Coastal Ocean Observing and Prediction Program (SCOOP)
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
Campus Grids Report OSG Area Coordinator’s Meeting Dec 15, 2010 Dan Fraser (Derek Weitzel, Brian Bockelman)
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, UWisc Condor Week April 13, 2010.
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
3-2.1 Topics Grid Computing Meta-schedulers –Condor-G –Gridway Distributed Resource Management Application (DRMAA) © 2010 B. Wilkinson/Clayton Ferner.
Turning science problems into HTC jobs Wednesday, July 20 th 9am Zach Miller & Greg Thain Condor Team University of Wisconsin-Madison.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
Development Timelines Ken Kennedy Andrew Chien Keith Cooper Ian Foster John Mellor-Curmmey Dan Reed.
Grid job submission using HTCondor Andrew Lahiff.
Production Coordination Staff Retreat July 21, 2010 Dan Fraser – Production Coordinator.
Workflows: from development to Production Thursday morning, 10:00 am Greg Thain University of Wisconsin - Madison.
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
OSG Production Report OSG Area Coordinator’s Meeting Aug 12, 2010 Dan Fraser.
TeraGrid Advanced Scheduling Tools Warren Smith Texas Advanced Computing Center wsmith at tacc.utexas.edu.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center February.
VO Privilege Activity. The VO Privilege Project develops and implements fine-grained authorization to grid- enabled resources and services Started Spring.
Production Coordination Area VO Meeting Feb 11, 2009 Dan Fraser – Production Coordinator.
Open Science Grid as XSEDE Service Provider Open Science Grid as XSEDE Service Provider December 4, 2011 Chander Sehgal OSG User Support.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
Purdue RP Highlights TeraGrid Round Table May 20, 2010 Preston Smith Manager - HPC Grid Systems Rosen Center for Advanced Computing Purdue University.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Job Router.
How High Throughput was my cluster? Greg Thain Center for High Throughput Computing.
Condor Project Computer Sciences Department University of Wisconsin-Madison Running Interpreted Jobs.
Summary of OSG Activities by LIGO and LSC LIGO NSF Review November 9-11, 2005 Kent Blackburn LIGO Laboratory California Institute of Technology LIGO DCC:
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Campus Grids Working Meeting Report Rob Gardner University of Chicago OSG All Hands March 10, 2010.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
OSG Facility Miron Livny OSG Facility Coordinator and PI University of Wisconsin-Madison Open Science Grid Scientific Advisory Group Meeting June 12th.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
Parag Mhashilkar (Fermi National Accelerator Laboratory)
Greg Thain Computer Sciences Department University of Wisconsin-Madison HTPC on the OSG.
Harnessing the Power of Condor for Human Genetics
Grid Computing.
Condor: Job Management
CHTC working w/ Chemistry Dept
Understanding Supernovae with Condor
The Condor JobRouter.
PU. Setting up parallel universe in your pool and when (not
Presentation transcript:

HTPC - High Throughput Parallel Computing (on the OSG) Dan Fraser, UChicago OSG Production Coordinator Horst Severini, OU (Greg Thain, Uwisc) OU Supercomputing Symposium Oct 6, 2010

Rough Outline What is the OSG? (think ebay) HTPC as a new paradigm Advantages of HTPC for parallel jobs How does HTPC work? Who is using it? The Future Conclusions

Making sense of the OSG OSG = Technology + Process + Sociology 70+ sites (& growing) -- Supply contribute resources to the OSG contribute resources to the OSG Virtual Organizations -- Demand VO’s are Multidisciplinary Research Groups VO’s are Multidisciplinary Research Groups Sites and VOs often overlap OSG Delivers: ~1M CPU hours every day ~1M CPU hours every day 1 Pbyte of data transferred every day 1 Pbyte of data transferred every day

eBay (naïve) EBAY DemandSupply List itemSearch/buy

eBay (more realistic) DemandSupply UPS Endpoints Match Maker AccountingRatings Complaints PayPal UPS Bank 1Bank 2

OSG-Bay VO’s (Demand) Resource Providers (Supply) Client Stack Match Making AccountingMonitoring Integration & Testing Engage Support Server Stack SoftwareP ackaging Ticketing Servers & Storage Security Coordination Proposal Anchoring … …

Where does HTPC fit?

The two familiar HPC Models High Throughput Computing (e.g. OSG) Run ensembles of single core jobs Run ensembles of single core jobs Capability Computing (e.g. TeraGrid) A few jobs parallelized over the whole system A few jobs parallelized over the whole system Use whatever parallel s/w is on the system Use whatever parallel s/w is on the system

HTPC – an emerging model Ensembles of small- way parallel jobs (10’s – 1000’s) Use whatever parallel s/w you want Use whatever parallel s/w you want (It ships with the job)

Tackling Four Problems Parallel job portability Effective use of multi-core technologies Identify suitable resources & submit jobs Job Management, tracking, accounting, …

Current plan of attack Force jobs to consume an entire processor Today 4-8+ cores, tomorrow 32+ cores, … Today 4-8+ cores, tomorrow 32+ cores, … Package jobs with a parallel library Package jobs with a parallel library HTPC jobs as portable as any other job MPI, OpenMP, your own scripts, … Parallel libraries can be optimized for on-board memory access All memory is available for efficient utilization All memory is available for efficient utilization Submit the jobs via OSG (or Condor-G) Submit the jobs via OSG (or Condor-G)

Problem areas Advertising HTPC capability on OSG Adapting OSG job submission/mgmt tools GlideinWMS GlideinWMS Ensure that Gratia accounting can identify jobs and apply the correct multiplier Support more HTPC scientists HTPC enable more sites

What’s the magic RSL? Site Specific We’re working on documents/standards We’re working on documents/standardsPBS (host_xcount=1)(xcount=8)(queue=?) (host_xcount=1)(xcount=8)(queue=?)LSF (queue=?)(exclusive=1) (queue=?)(exclusive=1)Condor (condorsubmit=(‘+WholeMachine’ true)) (condorsubmit=(‘+WholeMachine’ true))

Examples of HTPC users: Oceanographers: Brian Blanton, Howard Lander (RENCI) Brian Blanton, Howard Lander (RENCI) Redrawing flood map boundaries ADCIRC ADCIRC Coastal circulation and storm surge model Runs on 256+ cores, several days Parameter sensitivity studies Parameter sensitivity studies Determine best settings for large runs 220 jobs to determine optimal mesh size Each job takes 8 processors, several hours

Examples of HTPC users: Chemists UW Chemistry group UW Chemistry group Gromacs Gromacs Jobs take 24 hours on 8 cores Jobs take 24 hours on 8 cores Steady stream of jobs/day Steady stream of jobs/day Peak usage is 320,000 hours per month Peak usage is 320,000 hours per month Written 9 papers in 10 months based on this

Chemistry Usage of HTPC

OSG sites that allow HTPC OU The first site to run HTPC jobs on the OSG! The first site to run HTPC jobs on the OSG!PurdueClemsonNebraska San Diego, CMS Tier-2 Your site can be on this list!

Future Directions More Sites, more cycles! More users Working with Atlas (AthenaMP) Working with Atlas (AthenaMP) Working with Amber 9 Working with Amber 9 There is room for you… There is room for you… Use glide-in to homogenize access

Conclusions HTPC adds a new dimension to HPC computing – ensembles of parallel jobs This approach minimizes portability issues with parallel codes Keep same job submission model Not hypothetical – we’re already running HTPC jobs Thanks to many helping hands

Additional Slides Some of these are from Greg Thain (UWisc)

The players Dan Fraser Computation Inst. University of Chicago Miron Livny U Wisconsin John McGee RENCI Greg Thain U Wisconsin Key Developer Funded by NSF-STCI

Configuring Condor for HTPC Two strategies: Suspend/drain jobs to open HTPC slots Suspend/drain jobs to open HTPC slots Hold empty cores until HTPC slot is open Hold empty cores until HTPC slot is openhttp://condor-wiki.cs.wisc.edu

How to submit universe = vanilla requirements = (CAN_RUN_WHOLE_MACHINE =?= TRUE) +RequiresWholeMachine=true executable = some job arguments = arguments should_transfer_files = yes when_to_transfer_output = on_exit transfer_input_files = inputs queue

MPI on Whole machine jobs universe = vanilla requirements = (CAN_RUN_WHOLE_MACHINE =?= TRUE) +RequiresWholeMachine=true executable = mpiexec arguments = -np 8 real_exe should_transfer_files = yes when_to_transfer_output = on_exit transfer_input_files = real_exe queue Whole machine mpi submit file

How to submit to OSG universe = grid GridResource = some_grid_host GlobusRSL = MagicRSL executable = wrapper.sh arguments = arguments should_transfer_files = yes when_to_transfer_output = on_exit transfer_input_files = inputs transfer_output_files = output queue