Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.

Slides:



Advertisements
Similar presentations
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Advertisements

Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Bosco: Enabling Researchers to Expand Their HTC Resources The Bosco Team: Dan Fraser, Jaime Frey, Brooklin Gore, Marco Mambelli, Alain Roy, Todd Tannenbaum,
Building Campus HTC Sharing Infrastructures Derek Weitzel University of Nebraska – Lincoln (Open Science Grid Hat)
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Open Science Grid: More compute power Alan De Smet
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
Communicating with Users about HTCondor and High Throughput Computing Lauren Michael, Research Computing Facilitator HTCondor Week 2015.
Utilizing Condor and HTC to address archiving online courses at Clemson on a weekly basis Sam Hoover 1 Project Blackbird Computing,
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
SCD FIFE Workshop - GlideinWMS Overview GlideinWMS Overview FIFE Workshop (June 04, 2013) - Parag Mhashilkar Why GlideinWMS? GlideinWMS Architecture Summary.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc.
A. Mohapatra, HEPiX 2013 Ann Arbor1 UW Madison CMS T2 site report D. Bradley, T. Sarangi, S. Dasu, A. Mohapatra HEP Computing Group Outline  Infrastructure.
HTCondor workflows at Utility Supercomputing Scale: How? Ian D. Alderman Cycle Computing.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
OSG Site Provide one or more of the following capabilities: – access to local computational resources using a batch queue – interactive access to local.
BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
Campus Grids Report OSG Area Coordinator’s Meeting Dec 15, 2010 Dan Fraser (Derek Weitzel, Brian Bockelman)
Workflows: from Development to Production Thursday morning, 10:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
BOSCO Architecture Derek Weitzel University of Nebraska – Lincoln.
Sep 21, 20101/14 LSST Simulations on OSG Sep 21, 2010 Gabriele Garzoglio for the OSG Task Force on LSST Computing Division, Fermilab Overview OSG Engagement.
CSIU Submission of BLAST jobs via the Galaxy Interface Rob Quick Open Science Grid – Operations Area Coordinator Indiana University.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.
Workflows: from development to Production Thursday morning, 10:00 am Greg Thain University of Wisconsin - Madison.
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
Remote Cluster Connect Factories David Lesny University of Illinois.
Turning science problems into HTC jobs Wednesday, July 29, 2011 Zach Miller Condor Team University of Wisconsin-Madison.
Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National Laboratory April 4, 2002.
Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center February.
NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Dan Bradley Condor Project CS and Physics Departments University of Wisconsin-Madison CCB The Condor Connection Broker.
Condor Project Computer Sciences Department University of Wisconsin-Madison Running Interpreted Jobs.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Any Data, Anytime, Anywhere Dan Bradley representing the AAA Team At OSG All Hands Meeting March 2013, Indianapolis.
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
Five todos when moving an application to distributed HTC.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Creating Grid Resources for Undergraduate Coursework John N. Huffman Brown University Richard Repasky Indiana University Joseph Rinkovsky Indiana University.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
Parag Mhashilkar (Fermi National Accelerator Laboratory)
HTCondor-CE. 2 The Open Science Grid OSG is a consortium of software, service and resource providers and researchers, from universities, national laboratories.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Condor Week May 2012No user requirements1 Condor Week 2012 An argument for moving the requirements out of user hands - The CMS experience presented.
Honolulu - Oct 31st, 2007 Using Glideins to Maximize Scientific Output 1 IEEE NSS 2007 Making Science in the Grid World - Using Glideins to Maximize Scientific.
Turning science problems into HTC jobs Tuesday, Dec 7 th 2pm Zach Miller Condor Team University of Wisconsin-Madison.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Open OnDemand: Open Source General Purpose HPC Portal
Examples Example: UW-Madison CHTC Example: Global CMS Pool
Provisioning 160,000 cores with HEPCloud at SC17
HTCondor and LSST Stephen Pietrowicz Senior Research Programmer National Center for Supercomputing Applications HTCondor Week May 2-5, 2017.
Building Grids with Condor
Haiyan Meng and Douglas Thain
What’s Different About Overlay Systems?
Introduction to High Throughput Computing and HTCondor
The Condor JobRouter.
Grid Laboratory Of Wisconsin (GLOW)
GLOW A Campus Grid within OSG
PU. Setting up parallel universe in your pool and when (not
Thursday AM, Lecture 1 Lauren Michael
Presentation transcript:

Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing

Technology Dan Bradley2

HTCondor Dan Bradley3 submit machine Condor Pool firewall flocking Open one port and use shared_port on submit machine. submit machine If execute nodes are behind NAT but have outgoing net, use CCB. pools: 5 submit nodes: 50 user groups: 106 execute nodes: 1,600 cores: 10,000 executable = a.out RequestMemory = 1000 output = stdout error = stderr queue 1000 executable = a.out RequestMemory = 1000 output = stdout error = stderr queue 1000 CCB NAT

Accessing Files No campus-wide shared FS HTCondor file transfer for most cases: – Send software + input files to job – Grind, grind, … – Send output files back to submit node Some other cases: – AFS: works on most of campus, but not across OSG – httpd + SQUID(s): when xfer from submit node doesn’t scale – CVMFS: read-only http FS (see talk tomorrow) – HDFS: big datasets on lots of disks – Xrootd: good for access from anywhere Used on top of HDFS and local FS Dan Bradley4

Managing Workflows A simple submit file works for many users – We provide an example job wrapper script to help download and set up common software packages: MATLAB, python, R DAGMan is used by many others – Common pattern: User drops files into a directory structure Script generates DAG from that Rinse, lather, repeat Some application portals are also used – e.g. NEOS Online Optimization Service Dan Bradley5

Overflowing to OSG glideinWMS – We run a glideinWMS “frontend” – Uses OSG glidein factories – Appears to users as just another pool to flock to But jobs must opt-in: +WantGlidein = True Dan Bradley6 million hours used We customize glideins to make them look more like other nodes on campus: publish OS version, glibc version, CVMFS availability

A Clinical Health Application Tyler Churchill: modeling cochlear implants to improve signal processing. Used OSG + campus resources to run simulations that include important acoustic temporal fine structure, which is typically ignored due to difficulty. “We can't do much about sound resolution given hardware limitations, but we can improve the integrated software. OSG and distributed high-throughput computing are helping us rapidly produce results that directly benefit CI wearers.” 7Dan Bradley

Engaging Users Dan Bradley8

Engaging Users Meet with individuals (PI + techs) – Diagram workflow – How much input, output, memory, time? – Suitable for exporting to OSG? – Where will the output go? – What software is needed? Licenses? Tech support as needed Periodic reviews Dan Bradley9

Training Users Workshops on campus – New users can learn about HTCondor, OSG, etc. – Existing groups can send new students – Show examples of what others have done Classes – Scripting for scientific users: python, perl, submitting batch jobs, DAGMan Dan Bradley10

User Resources Many bring only their (big) brains – Use central or local department submit nodes – Use only modest scratch space Some have their own submit node – Can attach their own storage – Control user access – Install system software packages Dan Bradley11

Submitting Big Kick started work with big run in EC2, now continuing on campus. Building a database to quickly classify stem cells and identify important genes active in cell states useful for clinical applications. Victor Ruotti, winner of Cycle Computing’s Big Science Challenge 12Dan Bradley

Users with Clusters Three flavors: – condominium User provides cash, we do the rest – neighborhood association User provides space, power, cooling, machines Configuration is standardized – sister cities Independent pools that people want to share e.g. student computer labs Dan Bradley13

Laboratory for Molecular and Computational Genomics Cluster integrated into campus grid Combined resources can map data representing the equivalent of one human genome in 90 minutes. Tackling challenging cases such as the important maize genome, which is difficult for traditional sequence assembly approaches. Using whole genome single molecule optical mapping technique. 14Dan Bradley

Reaching Further Dan Bradley15 Research Groups by Discipline