1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28,

Slides:



Advertisements
Similar presentations
Dec 14, 20061/10 VO Services Project – Status Report Gabriele Garzoglio VO Services Project WBS Dec 14, 2006 OSG Executive Board Meeting Gabriele Garzoglio.
Advertisements

GUMS status Gabriele Carcassi PPDG Common Project 12/9/2004.
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Plateforme de Calcul pour les Sciences du Vivant SRB & gLite V. Breton.
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
A conceptual model of grid resources and services Authors: Sergio Andreozzi Massimo Sgaravatto Cristina Vistoli Presenter: Sergio Andreozzi INFN-CNAF Bologna.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
AustrianGrid, LCG & more Reinhard Bischof HPC-Seminar April 8 th 2005.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
December 1, 2004Rob Quick - iVDGL Grid Operations Center1 Grid Operations Rob Quick Grid Technologist Indiana University Open Science Grid.
XCAT Science Portal Status & Future Work July 15, 2002 Shava Smallen Extreme! Computing Laboratory Indiana University.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
10/20/05 LIGO Scientific Collaboration 1 LIGO Data Grid: Making it Go Scott Koranda University of Wisconsin-Milwaukee.
HEP Experiment Integration within GriPhyN/PPDG/iVDGL Rick Cavanaugh University of Florida DataTAG/WP4 Meeting 23 May, 2002.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
INFSO-RI Enabling Grids for E-sciencE The US Federation Miron Livny Computer Sciences Department University of Wisconsin – Madison.
VOX Project Status T. Levshina. Talk Overview VOX Status –Registration –Globus callouts/Plug-ins –LRAS –SAZ Collaboration with VOMS EDG team Preparation.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
G RID M IDDLEWARE AND S ECURITY Suchandra Thapa Computation Institute University of Chicago.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The Open Science Grid OSG Ruth Pordes Fermilab. 2 What is OSG? A Consortium of people working together to Interface Farms and Storage to a Grid and Researchers.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
Open Science Grid Open Science Grid: Beyond the Honeymoon Dane Skow Fermilab September 1, 2005.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
OSG Integration Activity Report Rob Gardner Leigh Grundhoefer OSG Technical Meeting UCSD Dec 16, 2004.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
VO Privilege Activity. The VO Privilege Project develops and implements fine-grained authorization to grid- enabled resources and services Started Spring.
Alain Roy Computer Sciences Department University of Wisconsin-Madison Condor & Middleware: NMI & VDT.
Status Organization Overview of Program of Work Education, Training It’s the People who make it happen & make it Work.
The OSG and Grid Operations Center Rob Quick Open Science Grid Operations Center - Indiana University ATLAS Tier 2-Tier 3 Meeting Bloomington, Indiana.
Virtual Organization Membership Service eXtension (VOX) Ian Fisk On behalf of the VOX Project Fermilab.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
Open Science Grid: Beyond the Honeymoon Dane Skow Fermilab October 25, 2005.
OSG Deployment Preparations Status Dane Skow OSG Council Meeting May 3, 2005 Madison, WI.
INDIANAUNIVERSITYINDIANAUNIVERSITY Grid2003 Report John Hicks TransPAC HPCC Engineer Indiana University HENP Meeting – Hawaii 25-January-2004.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
VOX Project Tanya Levshina. 05/17/2004 VOX Project2 Presentation overview Introduction VOX Project VOMRS Concepts Roles Registration flow EDG VOMS Open.
VOX Project Status T. Levshina. 5/7/2003LCG SEC meetings2 Goals, team and collaborators Purpose: To facilitate the remote participation of US based physicists.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
1 Open Science Grid.. An introduction Ruth Pordes Fermilab.
Opensciencegrid.org Operations Interfaces and Interactions Rob Quick, Indiana University July 21, 2005.
Towards deploying a production interoperable Grid Infrastructure in the U.S. Vicky White U.S. Representative to GDB.
CERN Certification & Testing LCG Certification & Testing Team (C&T Team) Marco Serra - CERN / INFN Zdenek Sekera - CERN.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
Defining the Technical Roadmap for the NWICG – OSG Ruth Pordes Fermilab.
1 Grid2003 Monitoring, Metrics, and Grid Cataloging System Leigh GRUNDHOEFER, Robert QUICK, John HICKS (Indiana University) Robert GARDNER, Marco MAMBELLI,
Grid Colombia Workshop with OSG Week 2 Startup Rob Gardner University of Chicago October 26, 2009.
Regional Operations Centres Core infrastructure Centres
Open Science Grid Progress and Status
U.S. ATLAS Grid Production Experience
Grid3 and a few thoughts on federating grids
Leigh Grundhoefer Indiana University
Supporting Grid Environments
Open Science Grid at Condor Week
Presentation transcript:

1 Grid3: an Application Grid Laboratory for Science Rob Gardner University of Chicago on behalf of the Grid3 project CHEP ’04, Interlaken September 28, 2004

2 Grid2003: an application grid laboratory virtual data grid laboratory virtual data research end-to-end HENP applications CERN LHC: US ATLAS testbeds & data challenges CERN LHC: USCMS testbeds & data challenges Grid3

3 Grid3 at a Glance Grid environment built from core Globus and Condor middleware, as delivered through the Virtual Data Toolkit (VDT)  GRAM, GridFTP, MDS, RLS, VDS …equipped with VO and multi-VO security, monitoring, and operations services …allowing federation with other Grids where possible, eg. CERN LHC Computing Grid (LCG)  USATLAS: GriPhyN VDS execution on LCG sites  USCMS: storage element interoperability (SRM/dCache) Delivering the US LHC Data Challenges

4 Grid3 Design Simple approach:  Sites consisting of Computing element (CE) Storage element (SE) Information and monitoring services  VO level, and multi-VO VO information services Operations (iGOC) Minimal use of grid-wide systems  No centralized workload manager, replica or data management catalogs, or command line interface higher level services are provided by individual VO’s site VO … iGOC site CE SE

5 Site Services and Installation Goal is to install and configure with minimal human intervention Use Pacman and distributed software “caches” Registers site with VO and Grid3 level services Accounts, application install areas & working directories Compute Element Storage Grid3 Site %pacman –get iVDGL:Grid3 VDT VO service GIIS register Info providers Grid3 Schema Log management $app $tmp four hour install and validate

6 Multi-VO Security Model DOEGrids Certificate Authority PPDG or iVDGL Registration Authority Authorization service: VOMS Each Grid3 site generates a Globus gridmap file with an authenticated SOAP query to each VO service Site-specific adjustments or mappings Group accounts to associate VOs with jobs iVDGL US CMS LSC SDSS BTeV Site Grid3 grid- map VOMS US ATLAS

7 iVDGL Operations Center (iGOC) Co-located with Abilene NOC (Indianapolis) Hosts/manages multi-VO services:  top level Ganglia, GIIS collectors  MonALISA web server and archival service  VOMS servers for iVDGL, BTeV, SDSS  Site Catalog service, Pacman caches Trouble ticket systems  phone (24 hr), web and based collection and reporting system  Investigation and resolution of grid middleware problems at the level of 30 contacts per week Weekly operations meetings for troubleshooting

8 Service monitoring Grid3 – a snapshot of sites Sep sites, multi-VO shared resources ~3000 CPUs (shared)

9 Grid3 Monitoring Framework c.f. M. Mambelli, B. Kim et al., #490

10 Monitors Jobs by VO ACDC Job Queues (Monalisa) Data IO (Monalisa) Metrics (MDViewer)

11 Use of Grid3 – led by US LHC 7 Scientific applications and 3 CS demonstrators  A third HEP and two biology experiments also participated Over 100 users authorized to run on Grid3  Application execution performed by dedicated individuals  Typically ~few users ran the applications from a particular experiment

12 US CMS Data Challenge DC04 CMS dedicated (red) Opportunistic use of Grid3 non-CMS (blue) Events produced vs. day c.f. A. Fanfani, #497

13 Ramp up ATLAS DC2 Sep 10 Mid July CPU-day c.f. R. Gardner, et al., #503

14 Shared infrastructure, last 6 months cms dc04 atlas dc2 Sep 10 Usage: CPUs

15 ATLAS DC2 production on Grid3: a joint activity with LCG and NorduGrid G. Poulard, 9/21/04 # Validated Jobs total c.f. L. Goossens, #501 & O. Smirnova #499 Day

16 Typical Job distribution on Grid3 G. Poulard, 9/21/04

17 Beyond LHC applications… Astrophysics and Astronomical  LIGO/LSC: blind search for continuous gravitational waves  SDSS: maxBcg, cluster finding package Biochemical  SnB: Bio-molecular program, analyses on X-ray diffraction to find molecular structures  GADU/Gnare: Genome analysis, compares protein sequences Computer Science  Supporting Ph.D. research adaptive data placement and scheduling algorithms mechanisms for policy information expression, use, and monitoring

18 Astrophysics: Sloan Sky Survey Image stripes of the sky from telescope data sources:  galaxy cluster finding  red shift analysis, weak lensing effects Analyze weighted images  Increase sensitivity by 2 orders of magnitude  with object detection and measurement code Workflow:  replicate sky segment data to Grid3 sites  average, analyze, send output to Fermilab  44,000 jobs, 30% complete

19 Time Period: May 1 - Sept. 1, 2004 Total Number of Jobs: Total CPU Time: 774 CPU Days Average Job Runtime: 0.26 Hr SDSS Job Statistics on Grid3

20 Structural Biology SnB is a computer program based on Shake-and-Bake where:  A dual-space direct-methods procedure for determining molecular crystal structures from X-ray diffraction data is used.  As many as 2000 unique non-H atom difficult molecular structures have been solved in a routine fashion.  SnB has been routinely applied to jump-start the solution of large proteins, increasing the number of selenium atoms determined in Se-Met molecules from dozens to several hundred.  SnB is expected to play a vital role in the study of ribosomes and large macromolecular assemblies containing many different protein molecules and hundreds of heavy- atom sites.

21 Genomic Searches and Analysis Searches and find new genomes on public databases (eg. NCBI) Each genome composed of ~4k genes Each gene needs to be processed and characterized  Each gene handled by separate process Save results for future use  also: BLAST protein sequences GADU 250 processors 3M sequences ID’d: bacterial, viral, vertebrate, mammal

22 Lessons (1) Human interactions in grid building costly Keeping site requirements light lead to heavy loads on gatekeeper hosts Diverse set of sites made jobs requirements exchange difficult Single point failures – rarely happen; certificate revocation lists expiring happened twice Configuration problems – Pacman helped, but still spent enormous amounts of time diagnosing problems

23 Lessons (2) Software updates were relatively easy or extremely painful Authorization: simple in Grid3, but coarse grained Troubleshooting: efficiency for submitted jobs was not as high as we’d like.  Complex system with many failure modes and points of failure. Need fine grained monitoring tools.  Need to improve at both service level and user level

24 Operations Experience iGOC and US ATLAS Tier1 (BNL) developed operations response model in support of DC2 Tier1 center  core services, “on-call” person available always  response protocol developed iGOC  Coordinates problem resolution for Tier1 “off hours”  Trouble handling for non-ATLAS Grid3 sites. Problems resolved at weekly iVDGL operations meetings  ~600 trouble tickets (generic); ~20 ATLAS DC2 specific Extensive use of lists

25 Not major problems bringing sites into single purpose grids simple computational grids for highly portable applications specific workflows as defined by today’s JDL and/or DAG approaches centralized, project-managed grids to a particular scale, yet to be seen

26 Major problems: two perspectives Site & service providing perspective:  maintaining multiple “logical” grids with a given resource; maintaining robustness; long term management; dynamic reconfiguration; platforms  complex resource sharing policies (department, university, projects, collaborative), user roles Application developer perspective:  challenge of building integrated distributed systems  end-to-end debugging of jobs, understanding faults  common workload and data management systems developed separately for each VO

27 Grid3 is evolving into OSG Main features/enhancements  Storage Resource Management  Improve authorization service  Add data management capabilities  Improve monitoring and information services  Service challenges and interoperability with other Grids Timeline  Current Grid3 remains stable through 2004  Service development continues  Grid3dev platform c.f. R. Pordes, #192

28 Conclusions Grid3 taught us many lessons about how to deploy and run a production grid Breakthrough in demonstrated use of “opportunistic” resources enabled by grid technologies Grid3 will be a critical resource for continued data challenges through 2004, and environment to learn how to operate and upgrade large scale production grids Grid3 is evolving to OSG with enhanced capabilities

29 Acknowledgements R. Pordes  (Grid3 co-coordinator) and the rest of the Grid3 team which did all the work!  Site administrators  VO service administrators  Application developers  Developers and contributors  iGOC team  Project teams