Digital Divide Meeting (May 23, 2005)Paul Avery1 University of Florida U.S. Grid Projects: Grid3 and Open Science Grid International.

Slides:



Advertisements
Similar presentations
SWITCH Visit to NeSC Malcolm Atkinson Director 5 th October 2004.
Advertisements

Virtual Data and the Chimera System* Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science.
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
GriPhyN & iVDGL Architectural Issues GGF5 BOF Data Intensive Applications Common Architectural Issues and Drivers Edinburgh, 23 July 2002 Mike Wilde Argonne.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Other servers Java client, ROOT (analysis tool), IGUANA (CMS viz. tool), ROOT-CAVES client (analysis sharing tool), … any app that can make XML-RPC/SOAP.
The Grid as Infrastructure and Application Enabler Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
CANS Meeting (December 1, 2004)Paul Avery1 University of Florida UltraLight U.S. Grid Projects and Open Science Grid Chinese American.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Experiment Requirements for Global Infostructure Irwin Gaines FNAL/DOE.
CASC Meeting (July 14, 2004)Paul Avery1 University of Florida Physics Grids and Open Science Grid CASC Meeting Washington, DC July 14,
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery1 University of Florida Integrating Universities and Laboratories In National Cyberinfrastructure.
GridChem Workshop (March 9, 2006)Paul Avery1 University of Florida Open Science Grid Linking Universities and Laboratories in National.
SURA Infrastructure Workshop (Dec. 7, 2005)Paul Avery1 University of Florida Open Science Grid Linking Universities and Laboratories.
10/20/05 LIGO Scientific Collaboration 1 LIGO Data Grid: Making it Go Scott Koranda University of Wisconsin-Milwaukee.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
INFSO-RI Enabling Grids for E-sciencE The US Federation Miron Livny Computer Sciences Department University of Wisconsin – Madison.
1 Open Science Grid.. An introduction Ruth Pordes Fermilab.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
D0SAR Workshop (March 30, 2006)Paul Avery1 University of Florida Open Science Grid Linking Universities and Laboratories in National.
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
GriPhyN EAC Meeting (Jan. 7, 2002)Carl Kesselman1 University of Southern California GriPhyN External Advisory Committee Meeting Gainesville,
Grid Physics Network & Intl Virtual Data Grid Lab Ian Foster* For the GriPhyN & iVDGL Projects SCI PI Meeting February 18-20, 2004 *Argonne, U.Chicago,
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
Responsibilities of ROC and CIC in EGEE infrastructure A.Kryukov, SINP MSU, CIC Manager Yu.Lazin, IHEP, ROC Manager
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
The Open Science Grid OSG Ruth Pordes Fermilab. 2 What is OSG? A Consortium of people working together to Interface Farms and Storage to a Grid and Researchers.
Open Science Grid An Update and Its Principles Ruth Pordes Fermilab.
The GriPhyN Planning Process All-Hands Meeting ISI 15 October 2001.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
…building the next IT revolution From Web to Grid…
Middleware Camp NMI (NSF Middleware Initiative) Program Director Alan Blatecky Advanced Networking Infrastructure and Research.
Open Science Grid Open Science Grid: Beyond the Honeymoon Dane Skow Fermilab September 1, 2005.
OSG Consortium Meeting (January 23, 2006)Paul Avery1 University of Florida Open Science Grid Progress Linking Universities and Laboratories.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
The Grid Effort at UF Presented by Craig Prescott.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
High Energy Physics and Grids at UF (Dec. 13, 2002)Paul Avery1 University of Florida High Energy Physics.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Open Science Grid & its Security Technical Group ESCC22 Jul 2004 Bob Cowles
Alain Roy Computer Sciences Department University of Wisconsin-Madison Condor & Middleware: NMI & VDT.
Status Organization Overview of Program of Work Education, Training It’s the People who make it happen & make it Work.
The OSG and Grid Operations Center Rob Quick Open Science Grid Operations Center - Indiana University ATLAS Tier 2-Tier 3 Meeting Bloomington, Indiana.
GriPhyN Project Paul Avery, University of Florida, Ian Foster, University of Chicago NSF Grant ITR Research Objectives Significant Results Approach.
U.S. Grid Projects and Involvement in EGEE Ian Foster Argonne National Laboratory University of Chicago EGEE-LHC Town Meeting,
OSG Deployment Preparations Status Dane Skow OSG Council Meeting May 3, 2005 Madison, WI.
Open Science Grid in the U.S. Vicky White, Fermilab U.S. GDB Representative.
1 Open Science Grid.. An introduction Ruth Pordes Fermilab.
Towards deploying a production interoperable Grid Infrastructure in the U.S. Vicky White U.S. Representative to GDB.
Victoria A. White Head, Computing Division, Fermilab Fermilab Grid Computing – CDF, D0 and more..
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
Mardi Gras Conference (February 3, 2005)Paul Avery1 University of Florida Grid3 and Open Science Grid Mardi Gras Conference Louisiana.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
OSG Facility Miron Livny OSG Facility Coordinator and PI University of Wisconsin-Madison Open Science Grid Scientific Advisory Group Meeting June 12th.
Defining the Technical Roadmap for the NWICG – OSG Ruth Pordes Fermilab.
Grid Colombia Workshop with OSG Week 2 Startup Rob Gardner University of Chicago October 26, 2009.
Open Science Grid Interoperability
Bob Jones EGEE Technical Director
Open Science Grid Progress and Status
Ian Bird GDB Meeting CERN 9 September 2003
Leigh Grundhoefer Indiana University
Presentation transcript:

Digital Divide Meeting (May 23, 2005)Paul Avery1 University of Florida U.S. Grid Projects: Grid3 and Open Science Grid International ICFA Workshop on HEP, Networking & Digital Divide Issues for Global e-Science Daegu, Korea May 23, 2005

Digital Divide Meeting (May 23, 2005)Paul Avery2 U.S. “Trillium” Grid Partnership  Trillium = PPDG + GriPhyN + iVDGL  Particle Physics Data Grid:$12M (DOE)(1999 – 2006)  GriPhyN:$12M (NSF)(2000 – 2005)  iVDGL:$14M (NSF)(2001 – 2006)  Basic composition (~150 people)  PPDG:4 universities, 6 labs  GriPhyN:12 universities, SDSC, 3 labs  iVDGL:18 universities, SDSC, 4 labs, foreign partners  Expts:BaBar, D0, STAR, Jlab, CMS, ATLAS, LIGO, SDSS/NVO  Coordinated internally to meet broad goals  GriPhyN:CS research, Virtual Data Toolkit (VDT) development  iVDGL:Grid laboratory deployment using VDT, applications  PPDG:“End to end” Grid services, monitoring, analysis  Common use of VDT for underlying Grid middleware  Unified entity when collaborating internationally

Digital Divide Meeting (May 23, 2005)Paul Avery3 Goal: Peta-scale Data Grids for Global Science Virtual Data Tools Request Planning & Scheduling Tools Request Execution & Management Tools Transforms Distributed resources (code, storage, CPUs, networks) Resource Management Services Security and Policy Services Other Grid Services Interactive User Tools Production Team Single Researcher Workgroups Raw data source  PetaOps  Petabytes  Performance

Digital Divide Meeting (May 23, 2005)Paul Avery4 Grid Middleware: Virtual Data Toolkit Sources (CVS) Patching GPT src bundles NMI Build & Test Condor pool 22+ Op. Systems Build Test Package VDT Build Many Contributors Build Pacman cache RPMs Binaries Test A unique laboratory for testing, supporting, deploying, packaging, upgrading, & troubleshooting complex sets of software!

Digital Divide Meeting (May 23, 2005)Paul Avery5 VDT Growth Over 3 Years # of components VDT 1.0 Globus 2.0b Condor VDT Switch to Globus 2.2 VDT Grid3 VDT First real use by LCG

Digital Divide Meeting (May 23, 2005)Paul Avery6 Trillium Science Drivers  Experiments at Large Hadron Collider  New fundamental particles and forces  100s of Petabytes ?  High Energy & Nuclear Physics expts  Top quark, nuclear matter at extreme density  ~1 Petabyte (1000 TB)1997 – present  LIGO (gravity wave search)  Search for gravitational waves  100s of Terabytes2002 – present  Sloan Digital Sky Survey  Systematic survey of astronomical objects  10s of Terabytes2001 – present Data growth Community growth

Digital Divide Meeting (May 23, 2005)Paul Avery7 LHC: Petascale Global Science  Complexity:Millions of individual detector channels  Scale:PetaOps (CPU), 100s of Petabytes (Data)  Distribution:Global distribution of people & resources CMS Example Physicists 250+ Institutes 60+ Countries BaBar/D0 Example Physicists 100+ Institutes 35+ Countries

Digital Divide Meeting (May 23, 2005)Paul Avery8 CMS Experiment LHC Global Data Grid (2007+) Online System CERN Computer Center USA Korea Russia UK Maryland MB/s >10 Gb/s Gb/s Gb/s Tier 0 Tier 1 Tier 3 Tier 2 Physics caches PCs Iowa UCSDCaltech U Florida  5000 physicists, 60 countries  10s of Petabytes/yr by 2008  1000 Petabytes in < 10 yrs? FIU Tier 4

Digital Divide Meeting (May 23, 2005)Paul Avery9 University LHC Tier2 Centers  Tier2 facility  Essential university role in extended computing infrastructure  20 – 25% of Tier1 national laboratory, supported by NSF  Validated by 3 years of experience (CMS, ATLAS)  Functions  Perform physics analysis, simulations  Support experiment software, smaller institutions  Official role in Grid hierarchy (U.S.)  Sanctioned by MOU with parent organization (ATLAS, CMS)  Selection by collaboration via careful process

Digital Divide Meeting (May 23, 2005)Paul Avery10 Grids and Globally Distributed Teams  Non-hierarchical: Chaotic analyses + productions  Superimpose significant random data flows

Digital Divide Meeting (May 23, 2005)Paul Avery11 CMS: Grid Enabled Analysis Architecture Scheduler Catalogs Grid Services Web Server Execution Priority Manager Grid Wide Execution Service Data Manage- ment Fully- Concrete Planner Fully- Abstract Planner Analysis Client Virtual Data Replica Applications Monitoring Partially- Abstract Planner Metadata HTTP, SOAP, XML- RPC Chimera Sphinx MonALISA ROOT (analysis tool) Python Cojac (detector viz)/ IGUANA (cms viz) Clarens MCRunjob BOSS RefDB POOL ORCA ROOT FAMOS VDT-Server MOPDB Discovery ACL management Cert. based access uClients talk standard protocols to “Grid Services Web Server” uSimple Web service API allows simple or complex analysis clients uTypical clients: ROOT, Web Browser, …. uClarens portal hides complexity uKey features: Global Scheduler, Catalogs, Monitoring, Grid-wide Execution service Analysis Client

Digital Divide Meeting (May 23, 2005)Paul Avery12 Grid3: A National Grid Infrastructure  32 sites, 4000 CPUs: Universities + 4 national labs  Part of LHC Grid, Running since October 2003  Sites in US, Korea, Brazil, Taiwan  Applications in HEP, LIGO, SDSS, Genomics, fMRI, CS Brazil

Digital Divide Meeting (May 23, 2005)Paul Avery13 Grid3 Components  Computers & storage at ~30 sites: 4000 CPUs  Uniform service environment at each site  Globus 3.2: Authentication, execution management, data movement  Pacman: Installation of numerous VDT and application services  Global & virtual organization services  Certification & reg. authorities, VO membership & monitor services  Client-side tools for data access & analysis  Virtual data, execution planning, DAG management, execution management, monitoring  IGOC: iVDGL Grid Operations Center  Grid testbed: Grid3dev  Middleware development and testing, new VDT versions, etc.

Digital Divide Meeting (May 23, 2005)Paul Avery14 Grid3 Applications CMS experimentp-p collision simulations & analysis ATLAS experimentp-p collision simulations & analysis BTEV experimentp-p collision simulations & analysis LIGOSearch for gravitational wave sources SDSSGalaxy cluster finding Bio-molecular analysisShake n Bake (SnB) (Buffalo) Genome analysisGADU/Gnare fMRIFunctional MRI (Dartmouth) CS DemonstratorsJob Exerciser, GridFTP, NetLogger

Digital Divide Meeting (May 23, 2005)Paul Avery15 Grid3 Shared Use Over 6 months CMS DC04 ATLAS DC2 Sep 10, 2004 Usage: CPUs

Digital Divide Meeting (May 23, 2005)Paul Avery16 Grid3 Production Over 13 Months

Digital Divide Meeting (May 23, 2005)Paul Avery17 U.S. CMS 2003 Production  10M p-p collisions; largest ever  2  simulation sample  ½ manpower  Multi-VO sharing

Digital Divide Meeting (May 23, 2005)Paul Avery18 Grid3 Lessons Learned  How to operate a Grid as a facility  Tools, services, error recovery, procedures, docs, organization  Delegation of responsibilities (Project, VO, service, site, …)  Crucial role of Grid Operations Center (GOC)  How to support people  people relations  Face-face meetings, phone cons, 1-1 interactions, mail lists, etc.  How to test and validate Grid tools and applications  Vital role of testbeds  How to scale algorithms, software, process  Some successes, but “interesting” failure modes still occur  How to apply distributed cyberinfrastructure  Successful production runs for several applications

Digital Divide Meeting (May 23, 2005)Paul Avery19 Grid3  Open Science Grid  Iteratively build & extend Grid3  Grid3  OSG-0  OSG-1  OSG-2  …  Shared resources, benefiting broad set of disciplines  Grid middleware based on Virtual Data Toolkit (VDT)  Consolidate elements of OSG collaboration  Computer and application scientists  Facility, technology and resource providers (labs, universities)  Further develop OSG  Partnerships with other sciences, universities  Incorporation of advanced networking  Focus on general services, operations, end-to-end performance  Aim for July 2005 deployment

Digital Divide Meeting (May 23, 2005)Paul Avery20

Digital Divide Meeting (May 23, 2005)Paul Avery21 Enterprise Technical Groups Research Grid Projects VOs Researchers Sites Service Providers Universities, Labs activity 1 activity 1 activity 1 Activities Advisory Committee Core OSG Staff (few FTEs, manager) OSG Council (all members above a certain threshold, Chair, officers) Executive Board (8-15 representatives Chair, Officers) OSG Organization

Digital Divide Meeting (May 23, 2005)Paul Avery22 OSG Technical Groups & Activities  Technical Groups address and coordinate technical areas  Propose and carry out activities related to their given areas  Liaise & collaborate with other peer projects (U.S. & international)  Participate in relevant standards organizations.  Chairs participate in Blueprint, Integration and Deployment activities  Activities are well-defined, scoped tasks contributing to OSG  Each Activity has deliverables and a plan  … is self-organized and operated  … is overseen & sponsored by one or more Technical Groups TGs and Activities are where the real work gets done

Digital Divide Meeting (May 23, 2005)Paul Avery23 OSG Technical Groups GovernanceCharter, organization, by-laws, agreements, formal processes PolicyVO & site policy, authorization, priorities, privilege & access rights SecurityCommon security principles, security infrastructure Monitoring and Information Services Resource monitoring, information services, auditing, troubleshooting StorageStorage services at remote sites, interfaces, interoperability Support CentersInfrastructure and services for user support, helpdesk, trouble ticket Education / OutreachTraining, interface with various E/O projects Networks (new)Including interfacing with various networking projects

Digital Divide Meeting (May 23, 2005)Paul Avery24 OSG Activities BlueprintDefining principles and best practices for OSG DeploymentDeployment of resources & services ProvisioningConnected to deployment Incidence responsePlans and procedures for responding to security incidents IntegrationTesting & validating & integrating new services and technologies Data Resource Management (DRM) Deployment of specific Storage Resource Management technology DocumentationOrganizing the documentation infrastructure AccountingAccounting and auditing use of OSG resources InteroperabilityPrimarily interoperability between OperationsOperating Grid-wide services

Digital Divide Meeting (May 23, 2005)Paul Avery25 The Path to the OSG Operating Grid OSG Deployment Activity Metrics & Certification Application validation VO Application Software Installation OSG Integration Activity Release Description Middleware Interoperability Software & packaging Functionality & Scalability Tests Readiness plan adopted Service deployment OSG Operations-Provisioning Activity Release Candidate Readiness plan Effort Resources feedback

Digital Divide Meeting (May 23, 2005)Paul Avery26 OSG Integration Testbed >20 Sites and Rising Brazil

Digital Divide Meeting (May 23, 2005)Paul Avery27 Status of OSG Deployment  OSG infrastructure release “accepted” for deployment  US CMS application “flood testing” successful  D0 simulation & reprocessing jobs running on selected OSG sites  Others in various stages of readying applications & infrastructure (ATLAS, CMS, STAR, CDF, BaBar, fMRI)  Deployment process underway: End of July?  Open OSG and transition resources from Grid3  Applications will use growing ITB & OSG resources during transition

Digital Divide Meeting (May 23, 2005)Paul Avery28 Connections to LCG and EGEE Many LCG-OSG interactions

Digital Divide Meeting (May 23, 2005)Paul Avery29 Interoperability & Federation  Transparent use of federated Grid infrastructures a goal  LCG, EGEE  TeraGrid  State-wide Grids  Campus Grids (Wisconsin, Florida, etc)  Some early activities with LCG  Some OSG/Grid3 sites appear in LCG map  D0 bringing reprocessing to LCG sites through adaptor node  CMS and ATLAS can run their jobs on both LCG and OSG  Increasing interaction with TeraGrid  CMS and ATLAS sample simulation jobs are running on TeraGrid  Plans for TeraGrid allocation for jobs running in Grid3 model (group accounts, binary distributions, external data management, etc)

Digital Divide Meeting (May 23, 2005)Paul Avery30 UltraLight: Advanced Networking in Applications 10 Gb/s+ network Caltech, UF, FIU, UM, MIT SLAC, FNAL Int’l partners Level(3), Cisco, NLR Funded by ITR2004

Digital Divide Meeting (May 23, 2005)Paul Avery31 UltraLight: New Information System  A new class of integrated information systems  Includes networking as a managed resource for the first time  Uses “Hybrid” packet-switched and circuit-switched optical network infrastructure  Monitor, manage & optimize network and Grid Systems in realtime  Flagship applications: HEP, eVLBI, “burst” imaging  “Terabyte-scale” data transactions in minutes  Extend Real-Time eVLBI to the 10 – 100 Gb/s Range  Powerful testbed  Significant storage, optical networks for testing new Grid services  Strong vendor partnerships  Cisco, Calient, NLR, CENIC, Internet2/Abilene 

Digital Divide Meeting (May 23, 2005)Paul Avery32 iVDGL, GriPhyN Education/Outreach Basics  $200K/yr  Led by UT Brownsville  Workshops, portals, tutorials  New partnerships with QuarkNet, CHEPREO, LIGO E/O, …

Digital Divide Meeting (May 23, 2005)Paul Avery33 U.S. Grid Summer School  First of its kind in the U.S. (June 2004, South Padre Island)  36 students, diverse origins and types (M, F, MSIs, etc)  Marks new direction for U.S. Grid efforts  First attempt to systematically train people in Grid technologies  First attempt to gather relevant materials in one place  Today:Students in CS and Physics  Next:Students, postdocs, junior & senior scientists  Reaching a wider audience  Put lectures, exercises, video, on the web  More tutorials, perhaps 2-3/year  Dedicated resources for remote tutorials  Create “Grid Cookbook”, e.g. Georgia Tech  Second workshop: July 11–15, 2005  South Padre Island

Digital Divide Meeting (May 23, 2005)Paul Avery34 QuarkNet/GriPhyN e-Lab Project

Digital Divide Meeting (May 23, 2005)Paul Avery35 Student Muon Lifetime Analysis in GriPhyN/QuarkNet

CHEPREO: Center for High Energy Physics Research and Educational Outreach Florida International University  Physics Learning Center  CMS Research  iVDGL Grid Activities  AMPATH network (S. America)  Funded September 2003  $4M initially (3 years)  MPS, CISE, EHR, INT

Digital Divide Meeting (May 23, 2005)Paul Avery37 Science Grid Communications Broad set of activities  News releases, PR, etc.  Science Grid This Week  Katie Yurkewicz talk

Digital Divide Meeting (May 23, 2005)Paul Avery38 Grids and the Digital Divide Rio de Janeiro + Daegu Background  World Summit on Information Society  HEP Standing Committee on Inter- regional Connectivity (SCIC) Themes  Global collaborations, Grids and addressing the Digital Divide  Focus on poorly connected regions  Brazil (2004), Korea (2005)

Digital Divide Meeting (May 23, 2005)Paul Avery39 New Campus Research Grids (e.g., Florida) HEPQTPHCSCISE HPC IHPC IIHPC III User Support Middleware Database Operations Certificate Authority Grid Operations Service Providers ChemQTP Grid Infrastructure Applications Facilities Astro HCS Bio Ge o ACIS MBIACIS MBI Nano DWICMS Tier2CNSQTP

Digital Divide Meeting (May 23, 2005)Paul Avery40 US HEP Data Grid Timeline GriPhyN approved, $11.9M+$1.6M PPDG approved, $9.5M iVDGL approved, $13.7M+$2M UltraLight approved, $2M CHEPREO approved, $4M DISUN approved, $10M Grid Communications Grid Summer School I Grid Summer School II Start Grid3 operations Start Open Science Grid operations VDT 1.0 First US-LHC Grid Testbeds Digital Divide Workshops LIGO Grid GLORIAD funded

Digital Divide Meeting (May 23, 2005)Paul Avery41 Summary  Grids enable 21 st century collaborative science  Linking research communities and resources for scientific discovery  Needed by global collaborations pursuing “petascale” science  Grid3 was an important first step in developing US Grids  Value of planning, coordination, testbeds, rapid feedback  Value of learning how to operate a Grid as a facility  Value of building & sustaining community relationships  Grids drive need for advanced optical networks  Grids impact education and outreach  Providing technologies & resources for training, education, outreach  Addressing the Digital Divide  OSG: a scalable computing infrastructure for science?  Strategies needed to cope with increasingly large scale

Digital Divide Meeting (May 23, 2005)Paul Avery42 Grid Project References  Open Science Grid   Grid3   Virtual Data Toolkit   GriPhyN   iVDGL   PPDG   CHEPREO   UltraLight  ultralight.cacr.caltech.edu  Globus   Condor   LCG   EU DataGrid   EGEE 

Digital Divide Meeting (May 23, 2005)Paul Avery43 Extra Slides

Digital Divide Meeting (May 23, 2005)Paul Avery44 GriPhyN Goals  Conduct CS research to achieve vision  Virtual Data as unifying principle  Planning, execution, performance monitoring  Disseminate through Virtual Data Toolkit  A “concrete” deliverable  Integrate into GriPhyN science experiments  Common Grid tools, services  Educate, involve, train students in IT research  Undergrads, grads, postdocs,  Underrepresented groups

Digital Divide Meeting (May 23, 2005)Paul Avery45 iVDGL Goals  Deploy a Grid laboratory  Support research mission of data intensive experiments  Provide computing and personnel resources at university sites  Provide platform for computer science technology development  Prototype and deploy a Grid Operations Center (iGOC)  Integrate Grid software tools  Into computing infrastructures of the experiments  Support delivery of Grid technologies  Hardening of the Virtual Data Toolkit (VDT) and other middleware technologies developed by GriPhyN and other Grid projects  Education and Outreach  Lead and collaborate with Education and Outreach efforts  Provide tools and mechanisms for underrepresented groups and remote regions to participate in international science projects

Digital Divide Meeting (May 23, 2005)Paul Avery46 “Virtual Data”: Derivation & Provenance  Most scientific data are not simple “measurements”  They are computationally corrected/reconstructed  They can be produced by numerical simulation  Science & eng. projects are more CPU and data intensive  Programs are significant community resources (transformations)  So are the executions of those programs (derivations)  Management of dataset dependencies critical!  Derivation: Instantiation of a potential data product  Provenance: Complete history of any existing data product  Previously:Manual methods  GriPhyN:Automated, robust tools

Digital Divide Meeting (May 23, 2005)Paul Avery47 decay = WW WW  e Pt > 20 decay = WW WW  e decay = WW WW  leptons mass = 160 decay = WW decay = ZZ decay = bb Other cuts Scientist adds a new derived data branch & continues analysis Virtual Data Example: HEP Analysis Other cuts

Digital Divide Meeting (May 23, 2005)Paul Avery48  Language: define software environments  Interpreter: create, install, configure, update, verify environments  Version released Jan  LCG/Scram  ATLAS/CMT  CMS DPE/tar/make  LIGO/tar/make  OpenSource/tar/make  Globus/GPT  NPACI/TeraGrid/tar/make  D0/UPS-UPD  Commercial/tar/make Combine and manage software from arbitrary sources. % pacman –get iVDGL:Grid3 “1 button install”: Reduce burden on administrators Remote experts define installation/ config/updating for everyone at once VDT ATLA S NPAC I D- Zero iVDGL UCHEP % pacman VDT CMS/DPE LIGO Packaging of Grid Software: Pacman

Digital Divide Meeting (May 23, 2005)Paul Avery49 “I’ve detected a muon calibration error and want to know which derived data products need to be recomputed.” “I’ve found some interesting data, but I need to know exactly what corrections were applied before I can trust it.” “I want to search a database for 3 muon events. If a program that does this analysis exists, I won’t have to write one from scratch.” “I want to apply a forward jet analysis to 100M events. If the results already exist, I’ll save weeks of computation.” Virtual Data Motivations VDC Describe Discover ReuseValidate

Digital Divide Meeting (May 23, 2005)Paul Avery50  U.S. Funded Projects  GriPhyN (NSF)  iVDGL (NSF)  Particle Physics Data Grid (DOE)  UltraLight  TeraGrid (NSF)  DOE Science Grid (DOE)  NEESgrid (NSF)  NSF Middleware Initiative (NSF) Background: Data Grid Projects  EU, Asia projects  EGEE (EU)  LCG (CERN)  DataGrid  EU national Projects  DataTAG (EU)  CrossGrid (EU)  GridLab (EU)  Japanese, Korea Projects  Many projects driven/led by HEP + CS  Many 10s x $M brought into the field  Large impact on other sciences, education Driven primarily by HEP applications

Digital Divide Meeting (May 23, 2005)Paul Avery51 “Virtual Data”: Derivation & Provenance  Most scientific data are not simple “measurements”  They are computationally corrected/reconstructed  They can be produced by numerical simulation  Science & eng. projects are more CPU and data intensive  Programs are significant community resources (transformations)  So are the executions of those programs (derivations)  Management of dataset dependencies critical!  Derivation: Instantiation of a potential data product  Provenance: Complete history of any existing data product  Previously:Manual methods  GriPhyN:Automated, robust tools

Digital Divide Meeting (May 23, 2005)Paul Avery52 Muon Lifetime Analysis Workflow

pythia_input pythia.exe cmsim_input cmsim.exe writeHits writeDigis begin v /usr/local/demo/scripts/cmkin_input.csh file i ntpl_file_path file i template_file file i num_events stdout cmkin_param_file end begin v /usr/local/demo/binaries/kine_make_ntpl_pyt_cms121.exe pre cms_env_var stdin cmkin_param_file stdout cmkin_log file o ntpl_file end begin v /usr/local/demo/scripts/cmsim_input.csh file i ntpl_file file i fz_file_path file i hbook_file_path file i num_trigs stdout cmsim_param_file end begin v /usr/local/demo/binaries/cms121.exe condor copy_to_spool=false condor getenv=true stdin cmsim_param_file stdout cmsim_log file o fz_file file o hbook_file end begin v /usr/local/demo/binaries/writeHits.sh condor getenv=true pre orca_hits file i fz_file file i detinput file i condor_writeHits_log file i oo_fd_boot file i datasetname stdout writeHits_log file o hits_db end begin v /usr/local/demo/binaries/writeDigis.sh pre orca_digis file i hits_db file i oo_fd_boot file i carf_input_dataset_name file i carf_output_dataset_name file i carf_input_owner file i carf_output_owner file i condor_writeDigis_log stdout writeDigis_log file o digis_db end (Early) Virtual Data Language CMS “Pipeline”

Digital Divide Meeting (May 23, 2005)Paul Avery54 QuarkNet Portal Architecture  Simpler interface for non-experts  Builds on Chiron portal

Digital Divide Meeting (May 23, 2005)Paul Avery55 Integration of GriPhyN and IVDGL  Both funded by NSF large ITRs, overlapping periods  GriPhyN:CS Research, Virtual Data Toolkit(9/2000–9/2005)  iVDGL:Grid Laboratory, applications(9/2001–9/2006)  Basic composition  GriPhyN:12 universities, SDSC, 4 labs(~80 people)  iVDGL:18 institutions, SDSC, 4 labs(~100 people)  Expts:CMS, ATLAS, LIGO, SDSS/NVO  GriPhyN (Grid research) vs iVDGL (Grid deployment)  GriPhyN:2/3 “CS” + 1/3 “physics”( 0% H/W)  iVDGL:1/3 “CS” + 2/3 “physics”(20% H/W)  Many common elements  Common Directors, Advisory Committee, linked management  Common Virtual Data Toolkit (VDT)  Common Grid testbeds  Common Outreach effort

Science Review Production Manager Researcher instrument Applications storage element Grid Grid Fabric storage element storage element data Services discovery sharing Virtual Data ProductionAnalysis params exec. data composition Virtual Data planning Planning ProductionAnalysis params exec. data Planning Execution planning Virtual Data Toolkit Chimera virtual data system Pegasus planner DAGman Globus Toolkit Condor Ganglia, etc. GriPhyN Overview Execution

Digital Divide Meeting (May 23, 2005)Paul Avery57 Chiron/QuarkNet Architecture

Digital Divide Meeting (May 23, 2005)Paul Avery58 Cyberinfrastructure “A new age has dawned in scientific & engineering research, pushed by continuing progress in computing, information, and communication technology, & pulled by the expanding complexity, scope, and scale of today’s challenges. The capacity of this technology has crossed thresholds that now make possible a comprehensive “cyberinfrastructure” on which to build new types of scientific & engineering knowledge environments & organizations and to pursue research in new ways & with increased efficacy.” [NSF Blue Ribbon Panel report, 2003]

Digital Divide Meeting (May 23, 2005)Paul Avery59 Fulfilling the Promise of Next Generation Science Our multidisciplinary partnership of physicists, computer scientists, engineers, networking specialists and education experts, from universities and laboratories, has achieved tremendous success in creating and maintaining general purpose cyberinfrastructure supporting leading-edge science. But these achievements have occurred in the context of overlapping short-term projects. How can we ensure the survival of valuable existing cyber- infrastructure while continuing to address new challenges posed by frontier scientific and engineering endeavors?

Digital Divide Meeting (May 23, 2005)Paul Avery60 Production Simulations on Grid3 Used = 1.5  US-CMS resources US-CMS Monte Carlo Simulation USCMS Non-USCMS

Digital Divide Meeting (May 23, 2005)Paul Avery61 Components of VDT  Globus  Condor  RLS 3.0  ClassAds  Replica  DOE/EDG CA certs  ftsh  EDG mkgridmap  EDG CRL Update  GLUE Schema 1.0  VDS 1.3.5b  Java  Netlogger  Gatekeeper-Authz  MyProxy1.11  KX509  System Profiler  GSI OpenSSH 3.4  Monalisa  PyGlobus  MySQL  UberFTP 1.11  DRM 1.2.6a  VOMS  VOMS Admin  Tomcat  PRIMA 0.2  Certificate Scripts  Apache  jClarens  New GridFTP Server  GUMS 1.0.1

Digital Divide Meeting (May 23, 2005)Paul Avery62 Collaborative Relationships: A CS + VDT Perspective Computer Science Research Virtual Data Toolkit Partner science projects Partner networking projects Partner outreach projects Larger Science Community Globus, Condor, NMI, iVDGL, PPDG EU DataGrid, LHC Experiments, QuarkNet, CHEPREO, Dig. Divide Production Deployment Tech Transfer Techniques & software Requirements Prototyping & experiments Other linkages  Work force  CS researchers  Industry U.S.Grids Int’l Outreach