Open Science Grid Frank Würthwein OSG Application Coordinator Experimental Elementary Particle Physics UCSD.

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE The gLite middleware distribution OSG Consortium Meeting Seattle,
Advertisements

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Campus High Throughput Computing (HTC) Infrastructures (aka Campus Grids) Dan Fraser OSG Production Coordinator Campus Grids Lead.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Open Science Grid June 28, 2006 Bill Kramer Chair of the Open Science Grid Council NERSC Center General Manager, LBNL.
Open Science Grid and Applications Bockjoo Kim U of KISTI on July 5, 2007.
R. Pordes, I Brazilian LHC Computing Workshop 1 What is Open Science Grid?  High Throughput Distributed Facility  Shared opportunistic access to existing.
Open Science Grid Frank Würthwein UCSD. 2/13/2006 GGF 2 “Airplane view” of the OSG  High Throughput Computing — Opportunistic scavenging on cheap hardware.
Open Science Ruth Pordes Fermilab, July 17th 2006 What is OSG Where Networking fits Middleware Security Networking & OSG Outline.
Welcome to CW 2007!!!. The Condor Project (Established ‘85) Distributed Computing research performed by.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
Key Project Drivers - FY11 Ruth Pordes, June 15th 2010.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
INFSO-RI Enabling Grids for E-sciencE The US Federation Miron Livny Computer Sciences Department University of Wisconsin – Madison.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
1 Open Science Grid.. An introduction Ruth Pordes Fermilab.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
May 8, 20071/15 VO Services Project – Status Report Gabriele Garzoglio VO Services Project – Status Report Overview and Plans May 8, 2007 Computing Division,
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
OSG Project Manager Report for OSG Council Meeting OSG Project Manager Report for OSG Council Meeting October 14, 2008 Chander Sehgal.
Mar 28, 20071/9 VO Services Project Gabriele Garzoglio The VO Services Project Don Petravick for Gabriele Garzoglio Computing Division, Fermilab ISGC 2007.
ALICE-USA Grid-Deployment Plans (By the way, ALICE is an LHC Experiment, TOO!) Or (We Sometimes Feel Like and “AliEn” in our own Home…) Larry Pinsky—Computing.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Data Intensive Science Network (DISUN). DISUN Started in May sites: Caltech University of California at San Diego University of Florida University.
Tools for collaboration How to share your duck tales…
Storage, Networks, Data Management Report on Parallel Session OSG Meet 8/2006 Frank Würthwein (UCSD)
Partnerships & Interoperability - SciDAC Centers, Campus Grids, TeraGrid, EGEE, NorduGrid,DISUN Ruth Pordes Fermilab Open Science Grid Joint Oversight.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The Open Science Grid OSG Ruth Pordes Fermilab. 2 What is OSG? A Consortium of people working together to Interface Farms and Storage to a Grid and Researchers.
1October 9, 2001 Sun in Scientific & Engineering Computing Grid Computing with Sun Wolfgang Gentzsch Director Grid Computing Cracow Grid Workshop, November.
Open Science Grid An Update and Its Principles Ruth Pordes Fermilab.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
Open Science Grid Open Science Grid: Beyond the Honeymoon Dane Skow Fermilab September 1, 2005.
OSG Consortium Meeting (January 23, 2006)Paul Avery1 University of Florida Open Science Grid Progress Linking Universities and Laboratories.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center February.
Yet Another Grid Project: The Open Science Grid at SLAC Matteo Melani, Booker Bense and Wei Yang SLAC Hepix Conference 10/13/05, SLAC, Menlo Park, CA,
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
Open Science Grid & its Security Technical Group ESCC22 Jul 2004 Bob Cowles
Status Organization Overview of Program of Work Education, Training It’s the People who make it happen & make it Work.
OSG Abhishek Rana Frank Würthwein UCSD.
CMS Usage of the Open Science Grid and the US Tier-2 Centers Ajit Mohapatra, University of Wisconsin, Madison (On Behalf of CMS Offline and Computing Projects)
June 15, PMG Ruth Pordes Status Report US CMS PMG July 15th Tier-1 –LCG Service Challenge 3 (SC3) –FY05 hardware delivery –UAF support Grid Services.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
Open Science Grid: Beyond the Honeymoon Dane Skow Fermilab October 25, 2005.
OSG Deployment Preparations Status Dane Skow OSG Council Meeting May 3, 2005 Madison, WI.
Open Science Grid in the U.S. Vicky White, Fermilab U.S. GDB Representative.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Открытая решетка науки строя открытое Cyber- инфраструктура для науки GRID’2006 Dubna, Россия Июнь 26, 2006 Robertovich Gardner Университет Chicago.
1 An update on the Open Science Grid for IHEPCCC Ruth Pordes, Fermilab.
1 Open Science Grid.. An introduction Ruth Pordes Fermilab.
Towards deploying a production interoperable Grid Infrastructure in the U.S. Vicky White U.S. Representative to GDB.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
Ruth Pordes Executive Director University of Washingon Seattle OSG Consortium Meeting 21st August University of Washingon Seattle.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
Defining the Technical Roadmap for the NWICG – OSG Ruth Pordes Fermilab.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
2 nd EGEE/OSG Workshop Data Management in Production Grids 2 nd of series of EGEE/OSG workshops – 1 st on security at HPDC 2006 (Paris) Goal: open discussion.
Open Science Grid Progress and Status
Open Science Grid at Condor Week
Presentation transcript:

Open Science Grid Frank Würthwein OSG Application Coordinator Experimental Elementary Particle Physics UCSD

8/7/06 NBCR Particle Physics & Computing  Science Driver Event rate = Luminosity x Crossection  LHC Revolution starting in 2008 — Luminosity x 10 — Crossection x 150 (e.g. top-quark)  Computing Challenge — 20PB in first year of running — ~ 100MSpecInt2000 ~ close to 100,000 cores

8/7/06 NBCR Overview  OSG in a nutshell  Organization  “Architecture”  Using the OSG  Present Utilization & Expected Growth  Summary of OSG Status

8/7/06 NBCR OSG in a nutshell  High Throughput Computing — Opportunistic scavenging on cheap hardware. — Owner controlled policies.  “open consortium” — Add OSG project to an open consortium to provide cohesion and sustainability.  Heterogeneous Middleware stack — Minimal site requirements & optional services — Production grid allows coexistence of multiple OSG releases.  “Linux rules”: mostly RHEL3 on Intel/AMD  Grid of clusters — Compute & storage (mostly) on private Gb/s LANs. — Some sites with (multiple) 10Gb/s WAN “uplink”.

Organization Started in 2005 as Consortium with contributed effort only. Now adding OSG project to sustain production grid. People coming together to build … People paid to operate …

8/7/06 NBCR Consortium & Project  Consortium Council  IT Departments & their hardware resources  Science Application Communities  Middleware Providers  Funded Project (Starting 9/06)  Operate services for a distributed facility.  Improve, Extend, Expand & Interoperate  Engagement, Education & Outreach Argonne Nat. Lab. Brookhaven Nat. Lab. CCR SUNY Buffalo Fermi Nat. Lab Thomas Jefferson Nat. Lab. Lawrence Berkeley Nat. Lab. Stanford Lin. Acc. Center Texas Adv. Comp. Center RENCI Purdue US Atlas Collaboration BaBar Collaboration CDF Collaboration US CMS Collaboration D0 Collaboration GRASE LIGO SDSS STAR Council Members: US Atlas S&C Project US CMS S&C Project Condor Globus SRM OSG Project

8/7/06 NBCR Consortium & Project  Consortium Council  IT Departments & their hardware resources  Science Application Communities  Middleware Providers  Funded Project (Starting 9/06)  Operate services for a distributed facility.  Improve, Extend, Expand & Interoperate  Engagement, Education & Outreach Argonne Nat. Lab. Brookhaven Nat. Lab. CCR SUNY Buffalo Fermi Nat. Lab Thomas Jefferson Nat. Lab. Lawrence Berkeley Nat. Lab. Stanford Lin. Acc. Center Texas Adv. Comp. Center RENCI Purdue US Atlas Collaboration BaBar Collaboration CDF Collaboration US CMS Collaboration D0 Collaboration GRASE LIGO SDSS STAR Council Members: US Atlas S&C Project US CMS S&C Project Condor Globus SRM OSG Project Middleware Hardware User Support

8/7/06 NBCR OSG Management Executive Director: Ruth Pordes Facility Coordinator: Miron Livny Application Coordinators: Torre Wenaus & fkw Resource Managers: P. Avery & A. Lazzarini Education Coordinator: Mike Wilde Engagement Coord.: Alan Blatecky Council Chair: Bill Kramer Diverse Set of people from Universities & National Labs, including CS, Science Apps, & IT infrastructure people.

8/7/06 NBCR OSG Management Structure

“Architecture” Grid of sites Me - My friends - the anonymous grid Grid of Grids

8/7/06 NBCR Grid of sites  IT Departments at Universities & National Labs make their hardware resources available via OSG interfaces. — CE: (modified) pre-ws GRAM — SE: SRM for large volume, gftp & (N)FS for small volume  Today’s scale: — “active” sites (depending on definition of “active”) — ~ 5000 batch slots — ~ 500TB storage — ~ 10 “active” sites with shared 10Gbps or better connectivity  Expected Scale for End of 2008 — ~50 “active” sites — ~30-50,000 batch slots — Few PB of storage — ~ 25-50% of sites with shared 10Gbps or better connectivity

8/7/06 NBCR Making the Grid attractive  Minimize entry threshold for resource owners — Minimize software stack. — Minimize support load.  Minimize entry threshold for users — Feature rich software stack. — Excellent user support. Resolve contradiction via “thick” Virtual Organization layer of services between users and the grid.

8/7/06 NBCR Me -- My friends -- The grid O(10 4 ) Users O( ) Sites O( ) VOs Thin client Thin “Grid API” Thick VO Middleware & Support Me My friends The anonymous Grid Domain science specific Common to all sciences

8/7/06 NBCR Grid of Grids - from local to global Science Community Infrastructure CS/IT Campus Grids National & International CyberInfrastructure for Science (e.g. Teragrid, EGEE, …) (e.g Atlas, CMS, LIGO…) (e.g GLOW, FermiGrid, …) OSG enables its users to operate transparently across Grid boundaries globally.

Using the OSG Authentication & Authorization Moving & Storing Data Submitting jobs & “workloads”

8/7/06 NBCR Authentication & Authorization  OSG Responsibilities — X509 based middleware — Accounts may be dynamic/static, shared/FQAN-specific  VO Responsibilities — Instantiate VOMS — Register users & define/manage their roles  Site Responsibilities — Choose security model (what accounts are supported) — Choose VOs to allow — Default accept of all users in VO but individuals or groups within VO can be denied.

8/7/06 NBCR User Management  User obtains DN from CA that is vetted by TAGPMA  User registers with VO and is added to VOMS of VO. — VO responsible for registration of VOMS with OSG GOC. — VO responsible for users to sign AUP. — VO responsible for VOMS operations. — VOMS shared for ops on multiple grids globally by some VOs. — Default OSG VO exists for new communities & single PIs.  Sites decide which VOs to support (striving for default admit) — Site populates GUMS daily from VOMSes of all VOs — Site chooses uid policy for each VO & role — Dynamic vs static vs group accounts  User uses whatever services the VO provides in support of users — VOs generally hide grid behind portal  Any and all support is responsibility of VO — Helping its users — Responding to complains from grid sites about its users.

8/7/06 NBCR Moving & storing data  OSG Responsibilities — Define storage types & their APIs from WAN & LAN — Define information schema for “finding” storage — All storage is local to site - no global filesystem!  VO Responsibilities — Manage data transfer & catalogues  Site Responsibilities — Choose storage type to support & how much — Implement storage type according to OSG rules — Truth in advertisement

8/7/06 NBCR Disk areas in some detail:  Shared filesystem as applications area at site. — Read only from compute cluster. — Role based installation via GRAM.  Batch slot specific local work space. — No persistency beyond batch slot lease. — Not shared across batch slots. — Read & write access (of course).  SRM/gftp controlled data area. — “persistent” data store beyond job boundaries. — Job related stage in/out. — SRM v1.1 today. — SRM v2 expected in late 2006 (space reservation).

8/7/06 NBCR Securing your data  Archival Storage in your trusted Archive — You control where your data is archived.  Data moved by party you trust — You control who moves your data — You control encryption of your data  You compute at sites you trust — E.g. sites that guarantee specific unix uid for you. — E.g. sites whose security model satisfies your needs. You decide how secure your data needs to be!

8/7/06 NBCR Submitting jobs/workloads  OSG Responsibilities — Define Interface to batch system (today: pre-ws GRAM) — Define information schema — Provide middleware that implements the above.  VO Responsibilities — Manage submissions & workflows — VO controlled workload management system or wms from other grids, e.g. EGEE/LCG.  Site Responsibilities — Choose batch system — Configure interface according to OSG rules — Truth in advertisement

8/7/06 NBCR Simple Workflow  Install Application Software at site(s) — VO admin install via GRAM. — VO users have read only access from batch slots.  “Download” data to site(s) — VO admin move data via SRM/gftp. — VO users have read only access from batch slots.  Submit job(s) to site(s) — VO users submit job(s)/DAG via condor-g. — Jobs run in batch slots, writing output to local disk. — Jobs copy output from local disk to SRM/gftp data area.  Collect output from site(s) — VO users collect output from site(s) via SRM/gftp as part of DAG.

8/7/06 NBCR Some technical details  Job submission — Condor: — Condor-g — “schedd on the side” (simple multi-site brokering using condor schedd) — Condor glide-in — EGEE workload management system — OSG CE compatble with glite Classic CE — Submissions via either LCG 2.7 RB or glite RB, including bulk submission — Virtual Data System (VDS) in use on OSG  Data placement using SRM — SRM/dCache in use to virtualize many disks into one storage system — Schedule WAN Xfer across many gftp servers — Typical WAN IO capability today ~ 10TB/day ~ 2Gbps — Schedule random access from batch slots to many disks via LAN — Typical LAN IO capability today ~ Gbyte/sec — Space reservation

8/7/06 NBCR Middleware lifecycle Domain science requirements. Joint projects between OSG applications group & Middleware developers to develop & test on community grids. Integrate into VDT and deploy on OSG-itb. Inclusion into OSG release & deployment on (part of) production grid. EGEE et al.

Status of Utilization OSG job = job submitted via OSG CE “Accounting” of OSG jobs not (yet) required!

8/7/06 NBCR OSG use by Numbers 32 Virtual Organizations 3 with >1000 jobs max. (all particle physics) 3 with max. (all outside physics) 5 with max (particle, nuclear, and astro physics)

8/7/06 NBCR Experimental Particle Physics Bio/Eng/Med/Math Campus Grids Non-HEP physics 100 jobs 850 jobs 2250 jobs 5/05-5/06 GADU using VDS PI from Campus Grid

8/7/06 NBCR Example GADU run in April Bioinformatics App using VDS across 8 sites on OSG.

8/7/06 NBCR Number of running (and monitored) “OSG jobs” in June

8/7/06 NBCR CMS Xfer on OSG in June 2006 All CMS sites have exceeded 5TB per day in June Caltech, Purdue, UCSD, UFL, UW exceeded 10TB/day. Hoping to reach 30-40TB/day capability by end of MByte/sec

Grid of Grids OSG enable single PIs and user communities to operate transparently across Grid boundaries globally. E.g.: CMS a particle physics experiment

8/7/06 NBCR CMS Experiment - a global community grid Germany CMS Experiment Taiwan UK Italy Data & jobs moving locally, regionally & globally within CMS grid. Transparently across grid boundaries from campus to globus. Florida CERN Caltech Wisconsin UCSD France Purdue MIT UNL OSG EGEE

8/7/06 NBCR Grid of Grids - Production Interop Job submission: 16,000 jobs per day submitted across EGEE & OSG via “LCG RB”. Jobs brokered transparently onto both grids. Data Transfer: Peak IO of 5Gbps from FNAL to 32 EGEE and 7 OSG sites. All 8 CMS sites on OSG have exceeded 5TB/day goal. Caltech, FNAL, Purdue, UCSD/SDSC, UFL, UW exceed 10TB/day.

8/7/06 NBCR The US CMS center at FNAL transfers data to 39 sites worldwide in CMS global Xfer challenge. Peak Xfer rates of ~5Gbps are reached. CMS Xfer FNAL to World

8/7/06 NBCR Summary of OSG Status  OSG facility opened July 22nd  OSG facility is under steady use — ~ jobs at all times — Mostly HEP but large Bio/Eng/Med occasionally — Moderate other physics (Astro/Nuclear)  OSG project — 5 year Proposal to DOE & NSF funded starting FY07. — Facility & Improve/Expand/Extend/Interoperate & E&O  Off to a running start … but lot’s more to do. — Routinely exceeding 1Gbps at 6 sites — Scale by x4 by 2008 and many more sites — Routinely exceeding 1000 running jobs per client — Scale by at least x10 by 2008 — Have reached 99% success rate for 10,000 jobs per day submission — Need to reach this routinely, even under heavy load