Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego The Open Science Grid Ted Hesselroth Fermilab.

Slides:



Advertisements
Similar presentations
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Advertisements

1 IWLSC, Kolkata India 2006 Jérôme Lauret for the Open Science Grid consortium The Open-Science-Grid: Building a US based Grid infrastructure for Open.
The Open Science Grid: Bringing the power of the Grid to scientific research
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Open Science Grid June 28, 2006 Bill Kramer Chair of the Open Science Grid Council NERSC Center General Manager, LBNL.
Open Science Grid Frank Würthwein UCSD. 2/13/2006 GGF 2 “Airplane view” of the OSG  High Throughput Computing — Opportunistic scavenging on cheap hardware.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
Assessment of Core Services provided to USLHC by OSG.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Open Science Ruth Pordes Fermilab, July 17th 2006 What is OSG Where Networking fits Middleware Security Networking & OSG Outline.
1 The Open Science Grid Fermilab. The Open Science Grid2 The Vision Practical support for end-to-end community systems in a heterogeneous gobal environment.
Welcome to CW 2007!!!. The Condor Project (Established ‘85) Distributed Computing research performed by.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
Key Project Drivers - FY11 Ruth Pordes, June 15th 2010.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Experiment Requirements for Global Infostructure Irwin Gaines FNAL/DOE.
Integration and Sites Rob Gardner Area Coordinators Meeting 12/4/08.
SAN DIEGO SUPERCOMPUTER CENTER NUCRI Advisory Board Meeting November 9, 2006 Science Gateways on the TeraGrid Nancy Wilkins-Diehr TeraGrid Area Director.
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
May 8, 20071/15 VO Services Project – Status Report Gabriele Garzoglio VO Services Project – Status Report Overview and Plans May 8, 2007 Computing Division,
OSG Project Manager Report for OSG Council Meeting August 5, 2008 Chander Sehgal.
OSG Area Coordinators Meeting Proposal Chander Sehgal Fermilab
Open Science Grid  Consortium of many organizations (multiple disciplines)  Production grid cyberinfrastructure  80+ sites, 25,000+ CPU.
OSG Project Manager Report for OSG Council Meeting OSG Project Manager Report for OSG Council Meeting October 14, 2008 Chander Sehgal.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
Russ Hobby Program Manager Internet2 Cyberinfrastructure Architect UC Davis.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.
Data Intensive Science Network (DISUN). DISUN Started in May sites: Caltech University of California at San Diego University of Florida University.
Partnerships & Interoperability - SciDAC Centers, Campus Grids, TeraGrid, EGEE, NorduGrid,DISUN Ruth Pordes Fermilab Open Science Grid Joint Oversight.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The Open Science Grid OSG Ruth Pordes Fermilab. 2 What is OSG? A Consortium of people working together to Interface Farms and Storage to a Grid and Researchers.
Authors: Ronnie Julio Cole David
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
Open Science Grid Open Science Grid: Beyond the Honeymoon Dane Skow Fermilab September 1, 2005.
OSG Consortium Meeting (January 23, 2006)Paul Avery1 University of Florida Open Science Grid Progress Linking Universities and Laboratories.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Open Science Grid & its Security Technical Group ESCC22 Jul 2004 Bob Cowles
Status Organization Overview of Program of Work Education, Training It’s the People who make it happen & make it Work.
April 26, Executive Director Report Executive Board 4/26/07 Things under control Things out of control.
U.S. Grid Projects and Involvement in EGEE Ian Foster Argonne National Laboratory University of Chicago EGEE-LHC Town Meeting,
GADU: A System for High-throughput Analysis of Genomes using Heterogeneous Grid Resources. Mathematics and Computer Science Division Argonne National Laboratory.
Sep 25, 20071/5 Grid Services Activities on Security Gabriele Garzoglio Grid Services Activities on Security Gabriele Garzoglio Computing Division, Fermilab.
Open Science Grid: Beyond the Honeymoon Dane Skow Fermilab October 25, 2005.
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
OSG Report for DOE/NSF Joint Oversight Group U.S. Large Hadron Collider Program OSG Report for DOE/NSF Joint Oversight Group U.S. Large Hadron Collider.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
1 An update on the Open Science Grid for IHEPCCC Ruth Pordes, Fermilab.
1 Open Science Grid.. An introduction Ruth Pordes Fermilab.
Opensciencegrid.org Operations Interfaces and Interactions Rob Quick, Indiana University July 21, 2005.
Towards deploying a production interoperable Grid Infrastructure in the U.S. Vicky White U.S. Representative to GDB.
Victoria A. White Head, Computing Division, Fermilab Fermilab Grid Computing – CDF, D0 and more..
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
Summary of OSG Activities by LIGO and LSC LIGO NSF Review November 9-11, 2005 Kent Blackburn LIGO Laboratory California Institute of Technology LIGO DCC:
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
OSG Facility Miron Livny OSG Facility Coordinator and PI University of Wisconsin-Madison Open Science Grid Scientific Advisory Group Meeting June 12th.
Defining the Technical Roadmap for the NWICG – OSG Ruth Pordes Fermilab.
Grid Colombia Workshop with OSG Week 2 Startup Rob Gardner University of Chicago October 26, 2009.
Accessing the VI-SEEM infrastructure
Gene Oleynik, Head of Data Storage and Caching,
Open Science Grid Progress and Status
Leigh Grundhoefer Indiana University
Open Science Grid at Condor Week
Status of Grids for HEP and HENP
Presentation transcript:

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego The Open Science Grid Ted Hesselroth Fermilab Slide attribution: Ruth Pordes, Miron Livny, Frank Wuerthwein, Paul Avery, Kent Blackburn,CNGrid

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego Map of OSG Sites OSG is a grid organization funded by a SciDAC-2/NSF grant  30 million dollars over five years. 33 FTE.  77 compute and 15 storage elements

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego OSG Mission Statement Practical support for end-to-end community systems in a heterogeneous gobal environment to Transform compute and data intensive science through a national cyberinfrastructure that includes from the smallest to the largest organizations.

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego OSG Goals – Use of Existing Resources ● Enable scientists to use and share a greater % of available compute cycles. ● Help scientists to use distributed systems, storage, processors, and software with less effort. ● Enable more sharing and reuse of software and reduce duplication of effort through providing effort in integration and extensions. ● Establish “open-source” community working together to communicate knowledge and experience.erience and also overheads for new participants.

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego OSG - forming communities Software Developers Sites Experiments (VOs) (e.g Condor, Globus, SRM, …) (e.g BNL, FNAL, LBNL, SLAC, LHC-T2s, DISUN, …) OSG enables community formation to solve compute and data intensive scientific problems. (USCMS, USATLAS, CDF, D0, LIGO, BioTech, NanoTech, …)  Coordinating role

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego Principal Science Drivers ● High energy and nuclear physics  100s of petabytes (LHC)2008  Several petabytes2005 ● LIGO (gravity wave detector)  several petabytes2002 ● Digital astronomy  10s of petabytes2009  10s of terabytes2001 ● Other sciences coming forward  Bioinformatics (10s of petabytes)  Nanoscience  Environmental  Chemistry  Applied mathematics  Materials Science? Data growth Community growth

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego The Evolution of the OSG PPDG GriPhyN iVDGL TrilliumGrid3 OSG (DOE) (DOE+NSF) (NSF) Campus, regional grids LHC Ops LHC construction, preparation LIGO operation LIGO preparation European Grid + Worldwide LHC Computing Grid DOE Science Grid (DOE)

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego VOs in OSG *=non-physics

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego Example Campus Grid: Grid Laboratory of Wisconsin (GLOW)

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego Institutions Green=Contributing staff *=non-physics

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego China National Grid (CNGrid) ● 17 TFlops

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego ● Many resources are owned or statically allocated to one user community.  The institutions which own resources typically have ongoing relationships with (a few) particular user communities (VOs) The remainder of an organization’s available resources can be “used by everyone or anyone else”. organizations can decide against supporting particular VOs. OSG staff are responsible for monitoring and, if needed, managing this usage. Our challenge is to maximize good - successful - output from the whole system. Use of Existing Resources

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego ● Increased usage of CPUs and infrastructure alone (ie cost of processing cycles) is not the persuading cost-benefit value. The benefits come from reducing risk in and sharing support for large, complex systems which must be run for many years with a short life-time workforce.  Opportunity and flexibility to distribute load and address peak needs.  Savings in effort for integration, system and software support.  Maintainance of an experienced workforce in a common system  Lowering the cost of entry to new contributors.  Enabling of new computational opportunities to communities that would not otherwise have access to such resources. Benefits to Sites

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego The “don’t”s and “do”s of OSG ● The OSG Facility does not –  “Own” any compute (processing, storage and communication) resources  “Own” any middleware  Fund any site or VO administration/operation personel The OSG Facility does – –Help sites join the OSG facility and enable effective guaranteed and opportunistic usage of their resources (including data) by remote users –Help VOs join the OSG facility and enable effective guaranteed and opportunistic harnessing of remote resources (including data) –Define interfaces which people can use. –Maintain and supports an integrated software stack that meets the needs of the stakeholders of the OSG consortium –Reach out to non-HEP communities to help them use the OSG –Train new users, administrators, and software developers

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego What Can the OSG Offer? ● Middleware  Packaging  Testing  Support  Security operations ● Organizational support  The OSG Consortium (brings together the stakeholders)  The OSG Facility (brings together resources and users) ● Technical Support  Troubleshooting distributed computing technologies ● Extensions  Software capbilities needed by OSG ● Engagment  Consultation on OSG participation ● Instruction  Workshops  Documentation

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego OSG Project Effort Roughly 2/3 of leadership positions filled from outside HEP !

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego Benefits to HEP thus far ● LHC –Middleware stack for the LHC distributed computing systems of USATLAS and USCMS –Strong partner to negotiate technical and operational problems with EGEE and Nordugrid. –Framework for integrating “Tier-3” resources. Tevatron and other FNAL based HEP –CDF: MC production on OSG –D0: reprocessing on OSG –Other HEP benefit via FNAL campus grid ● Other HEP starting to show interest as well.

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego ● D0’s own resources are committed to the processing of newly acquired data and analysis of the processed datasets. ● In Nov ‘06 D0 asked to use CPUs for 2-4 months for re-processing of an existing dataset (~500 million events) for science results for the summer conferences in July ‘07. ● The Executive Board estimated there were currently sufficient opportunistically available resources on OSG to meet the request; We also looked into the local storage and I/O needs. D0 Reprocessing

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego D0 Reprocessing OSG Portion

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego LIGO: Search for Gravity Waves ● LIGO Grid  6 US sites  3 EU sites (UK & Germany) * LHO, LLO: LIGO observatory sites * LSC: LIGO Scientific Collaboration  Cardiff AEI/Golm Birmingham

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego Sloan Digital Sky Survey: Mapping the Sky

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego Astronomy Experiences on the Grid ● Experience tells us that Grid is more suitable for CPU Intensive Jobs …  achieve parallelism …  more jobs…  finish sooner ● Running locally would limit the number of jobs run simultaneously ● On OSG, can run several run- rerun and camcols within a run-rerun in parallel ● Current Workflow also will facilitate further analysis Grid not very happy Ideal for Grid Grid Match per day ? per day Avg. Rate of Job Completion 12 Kilobytes 2 Megabytes Data Output/Jo b 9 Gigabytes 1 Megabyte Data Input/Job 180 ~50000Total No. of Jobs NEO Data&CPU Intensive Quasar Spectra CPU Intensive

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego Engagement ● Currently main stakeholders are from physics - US LHC experiments, LIGO, STAR experiment, the Tevatron Run II and Astrophysics experiments ● Active “engagement” effort to add new domains and resource providers to the OSG consortium –  Rosetta at Kulhman Laboratory  Weather Research and Forecast (WRF) model  nanoHub applications – BioMoca and nanoWire  Chemistry at Harvard Molecular Mechanics (CHARMM)

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego Rosetta Protein Folding Application ● “What impressed me most was how quickly we were able to access the grid and start using it. We learned about it [at RENCI], and we were running jobs about two weeks later,” Brian Kuhlman, PI. ● 3,000 CPU hours per protein ● CASP similar protein: 3 hours on the 114 teraflops IBM Blue Gene Watson machine

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego Genome Analysis and Database Update system ● Runs across TeraGrid and OSG. Uses the Virtual Data System (VDS) workflow & provenance. ● 3.1 million protein sequences, 93,000 jobs. ● “During the last run in January (2006), GADU VO jobs had access to only about 8-10 OSG sites and were not authenticated by a large number of sites. With the help of the GOC, we are working on getting more sites to authenticate GADU jobs.” Dinanath Sulakhe, Argonne National Laboratory

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego Integrated Database Integrated Database Includes:  Parsed Sequence Data and Annotation Data from Public web sources.  Results of different tools used for Analysis: Blast, Blocks, TMHMM, … GADU using Grid Applications executed on Grid as workflows and results are stored in integrated Database. GADU Performs:  Acquisition: to acquire Genome Data from a variety of publicly available databases and store temporarily on the file system.  Analysis: to run different publicly available tools and in-house tools on the Grid using Acquired data & data from Integrated database.  Storage: Store the parsed data acquired from public databases and parsed results of the tools and workflows used during analysis. Bidirectional Data Flow Public Databases Genomic databases available on the web. Eg: NCBI, PIR, KEGG, EMP, InterPro, etc. Applications (Web Interfaces) Based on the Integrated Database PUMA2 Evolutionary Analysis of Metabolism Chisel Protein Function Analysis Tool. TARGET Targets for Structural analysis of proteins. PATHOS Pathogenic DB for Bio-defense research Phyloblocks Evolutionary analysis of protein families TeraGridOSGDOE SG GNARE – Genome Analysis Research Environment Services to Other Groups SEED (Data Acquisition) Shewanella Consortium (Genome Analysis) Others.. Bioinformatics: GADU / GNARE

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego NanoHub ● BioMOCA (Biology Monte Carlo) transport Monte Carlo tool. ● Written at Network for Computational Nanotechnology ● PI: Umberto Ravaioli, UIUC  Ion transfer in artificial membranes ● Job run is 8-40 days

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego Network Collaboration Internet 2 National Lambda Rail Ultralight

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego OSG Activities ● Facility  Software  Operations  Deployment  Integration  Troubleshooting  Engagement ● Security ● Education ● Extensions  Middleware Improvement  Workload Management  Scalability Testing  Tools and prototypes ● User Support ● Admin

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego The Software Stack

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego What is the VDT? ● A collection of software  Grid software: Condor, Globus and lots more  Virtual Data System: Origin of the name “VDT” (toolkit)  Utilities: Monitoring, Authorization, Configuration  Built for >10 flavors/versions of Linux ● Automated Build and Test: Integration and regression testing. ● An easy installation:  Push a button, everything just works.  Quick update processes. ● Responsive to user needs:  process to add new components based on community needs. ● A support infrastructure:  front line software support,  triaging between users and software providers for deeper issues.

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego How we get to a Production Software Stack Input from stakeholders and OSG directors VDT Release OSG Integration Testbed Release OSG Production Release Test on OSG Validation Testbed

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego Troubleshooting ● GOC Tickets  Assigns responsible  Interoperability with EGEE ● Mailing Lists ● “Office Hours”

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego OSG Storage Activities ● Support for Storage Elements in OSG (4 FTE)  dCache  Bestman ● Validation  Tier2-level test stand  With UCSD Tier2 ● Packaging  Installation scripts  Through VDT

Ted Hesselroth Nordugrid 2007 September 24-28, 2007 Abhishek Singh Rana and Frank Wuerthwein UC San Diego OSG Storage Activities ● Support  Mailing list ● Tools  For site administrators  Will collect existing tools ● Extensions  Space reservation file cleaner  dCache logging