Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid Physics Network & Intl Virtual Data Grid Lab Ian Foster* For the GriPhyN & iVDGL Projects SCI PI Meeting February 18-20, 2004 *Argonne, U.Chicago,

Similar presentations


Presentation on theme: "Grid Physics Network & Intl Virtual Data Grid Lab Ian Foster* For the GriPhyN & iVDGL Projects SCI PI Meeting February 18-20, 2004 *Argonne, U.Chicago,"— Presentation transcript:

1 Grid Physics Network & Intl Virtual Data Grid Lab Ian Foster* For the GriPhyN & iVDGL Projects SCI PI Meeting February 18-20, 2004 *Argonne, U.Chicago, Globus; foster@mcs.anl.gov

2 2 Cyberinfrastructure “A new age has dawned in scientific & engineering research, pushed by continuing progress in computing, information, and communication technology, & pulled by the expanding complexity, scope, and scale of today’s challenges. The capacity of this technology has crossed thresholds that now make possible a comprehensive “cyberinfrastructure” on which to build new types of scientific & engineering knowledge environments & organizations and to pursue research in new ways & with increased efficacy.” [Blue Ribbon Panel report, 2003] But how will we learn how to build, operate, & use it?

3 3 Our Approach: Experimental & Collaborative Experimental procedure: Mix together, and shake well:  Physicists* with an overwhelming need to pool resources to solve fundamental science problems  Computer scientists with a vision of a Grid that will enable virtual communities to share resources Monitor byproducts  Heat: sometimes incendiary  Light: considerable, in eScience, computer science, & cyberinfrastructure engineering  Operational cyberinfrastructure (hardware & software), with an enthusiastic and knowledgeable user community, and real scientific benefits * We use “physicist” as a generic term indicating a non-computer scientist

4 4 Who are the “Physicists”? – GriPhyN/iVDGL Science Drivers US-ATLAS, US-CMS (LHC expts)  Fundamental nature of matter  100s of Petabytes LIGO observatory  Gravitational wave search  100s of Terabytes Sloan Digital Sky Survey  Astronomical research  10s of Terabytes Data growth Community growth 2007 2005 2003 2001 2009 + a growing number of biologists & other scientists + computer scientists needing experimental apparatus

5 5 Common Underlying Problem: Data-Intensive Analysis Users & resources in many institutions …  1000s of users, 100s of institutions, petascale resources … engage in collaborative data analysis  Both structured/scheduled & interactive Many overlapping virtual orgs must  Define activities  Pool resources  Prioritize tasks  Manage data  …

6 6 Vision & Goals Develop the technologies & tools needed to exploit a distributed cyberinfrastructure Apply and evaluate those technologies & tools in challenging scientific problems Develop the technologies & procedures to support a persistent cyberinfrastructure Create and operate a persistent cyberinfrastructure in support of diverse discipline goals GriPhyN + iVDGL + DOE Particle Physics Data Grid (PPDG) = Trillium End-to-end

7 7 Two Distinct but Integrated Projects Both NSF-funded, overlapping periods  GriPhyN: $11.9M (NSF) + $1.6M (match) (2000–2005) CISE  iVDGL: $13.7M (NSF) + $2M (match) (2001–2006) MPS Basic composition  GriPhyN:12 universities, SDSC, 3 labs(~80 people)  iVDGL:18 institutions, SDSC, 4 labs(~100 people)  Large overlap in people, institutions, experiments, software GriPhyN (Grid research) vs iVDGL (Grid deployment)  GriPhyN:2/3 “CS” + 1/3 “physics”( 0% H/W)  iVDGL:1/3 “CS” + 2/3 “physics”(20% H/W) Many common elements:  Directors, Advisory Committee, linked management  Virtual Data Toolkit (VDT), Grid testbeds, Outreach effort Build on the Globus Toolkit, Condor, and other technologies

8 8 Project Specifics: GriPhyN Develop the technologies & tools needed to exploit a distributed cyberinfrastructure Apply and evaluate those technologies & tools in challenging scientific problems Develop the technologies & procedures to support a persistent cyberinfrastructure Create and operate a persistent cyberinfrastructure in support of diverse discipline goals

9 Science Review Production Manager Researcher instrument Applications storage element Grid Grid Fabric storage element storage element data Services discovery sharing Virtual Data ProductionAnalysis params exec. data composition Virtual Data planning Planning ProductionAnalysis params exec. data PlanningExecution planning Execution Virtual Data Toolkit Chimera virtual data system Pegasus planner DAGman Globus Toolkit Condor Ganglia, etc. GriPhyN Overview

10 pythia_input pythia.exe cmsim_input cmsim.exe writeHits writeDigis begin v /usr/local/demo/scripts/cmkin_input.csh file i ntpl_file_path file i template_file file i num_events stdout cmkin_param_file end begin v /usr/local/demo/binaries/kine_make_ntpl_pyt_cms121.exe pre cms_env_var stdin cmkin_param_file stdout cmkin_log file o ntpl_file end begin v /usr/local/demo/scripts/cmsim_input.csh file i ntpl_file file i fz_file_path file i hbook_file_path file i num_trigs stdout cmsim_param_file end begin v /usr/local/demo/binaries/cms121.exe condor copy_to_spool=false condor getenv=true stdin cmsim_param_file stdout cmsim_log file o fz_file file o hbook_file end begin v /usr/local/demo/binaries/writeHits.sh condor getenv=true pre orca_hits file i fz_file file i detinput file i condor_writeHits_log file i oo_fd_boot file i datasetname stdout writeHits_log file o hits_db end begin v /usr/local/demo/binaries/writeDigis.sh pre orca_digis file i hits_db file i oo_fd_boot file i carf_input_dataset_name file i carf_output_dataset_name file i carf_input_owner file i carf_output_owner file i condor_writeDigis_log stdout writeDigis_log file o digis_db end (Early) Virtual Data Language CMS “Pipeline”

11 mass = 200 decay = WW stability = 1 LowPt = 20 HighPt = 10000 mass = 200 decay = WW stability = 1 event = 8 mass = 200 decay = WW stability = 1 plot = 1 mass = 200 decay = WW plot = 1 mass = 200 decay = WW event = 8 mass = 200 decay = WW stability = 1 mass = 200 decay = WW stability = 3 mass = 200 decay = WW mass = 200 decay = ZZ mass = 200 decay = bb mass = 200 plot = 1 mass = 200 event = 8...The scientist adds a new derived data branch......and continues to investigate… Search for WW decays of the Higgs Boson and where only stable, final state particles are recorded: stability = 1 Scientist discovers an interesting result – wants to know how it was derived. Virtual Data Example: High Energy Physics Work and slide by Rick Cavanaugh and Dimitri Bourilkov, University of Florida

12 12 Galaxy cluster size distribution Task Graph Virtual Data Example: Sloan Galaxy Cluster Analysis Sloan Data Jim Annis, Steve Kent, Vijay Sehkri, Neha Sharma, Fermilab, Michael Milligan, Yong Zhao, Chicago

13 13 Virtual Data Example: NVO/NASA Montage A small (1200 node) workflow Construct custom mosaics on demand from multiple data sources User specifies projection, coordinates, size, rotation, spatial sampling Work by Ewa Deelman et al., USC/ISI and Caltech

14 14 Virtual Data Example: Education (Work in Progress) “We uploaded the data to the Grid & used the grid analysis tools to find the shower”

15 15 Project Specifics: iVDGL Develop the technologies & tools needed to exploit a distributed cyberinfrastructure Apply and evaluate those technologies & tools in challenging scientific problems Develop the technologies & procedures to support a persistent cyberinfrastructure Create and operate a persistent cyberinfrastructure in support of diverse discipline goals

16 16 iVDGL Goals Deploy a Grid laboratory  Support research mission of data-intensive expts  Computing & personnel resources at university sites  Provide platform for computer science development  Prototype and deploy a Grid Operations Center Integrate Grid software tools  Into computing infrastructures of the experiments Support delivery of Grid technologies  Harden VDT & other middleware technologies developed by GriPhyN and other Grid projects Education and Outreach  Enable underrepresented groups & remote regions to participate in international science projects

17 17 Virtual Data Toolkit Sources (CVS) Patching GPT src bundles NMI Build & Test Condor pool (37 computers) … Build Test Package VDT Build Contributors (VDS, etc.) Build Pacman cache RPMs Binaries Test Will use NMI processes soon A unique laboratory for managing, testing, supporting, deploying, packaging, upgrading, & troubleshooting complex sets of software!

18 18 Virtual Data Toolkit: Tools in VDT 1.1.12 Condor Group  Condor/Condor-G  DAGMan  Fault Tolerant Shell  ClassAds Globus Alliance  Grid Security Infrastructure (GSI)  Job submission (GRAM)  Information service (MDS)  Data transfer (GridFTP)  Replica Location (RLS) EDG & LCG  Make Gridmap  Cert. Revocation List Updater  Glue Schema/Info provider ISI & UC  Chimera & related tools  Pegasus NCSA  MyProxy  GSI OpenSSH LBL  PyGlobus  Netlogger Caltech  MonaLisa VDT  VDT System Profiler  Configuration software Others  KX509 (U. Mich.)

19 19 VDT Growth VDT 1.0 Globus 2.0b Condor 6.3.1 VDT 1.1.3, 1.1.4 & 1.1.5 pre-SC 2002 VDT 1.1.7 Switch to Globus 2.2 VDT 1.1.11 Grid2003 VDT 1.1.8 First real use by LCG

20 20 Grid2003: An Operational Grid  28 sites (2100-2800 CPUs) & growing  400-1300 concurrent jobs  7 substantial applications + CS experiments  Running since October 2003 Korea http://www.ivdgl.org/grid2003

21 21 Grid2003 Components Computers & storage at 28 sites (to date)  2800+ CPUs Uniform service environment at each site  Globus Toolkit provides basic authentication, execution management, data movement  Pacman installation system enables installation of numerous other VDT and application services Global & virtual organization services  Certification & registration authorities, VO membership services, monitoring services Client-side tools for data access & analysis  Virtual data, execution planning, DAG management, execution management, monitoring IGOC: iVDGL Grid Operations Center

22 22 Grid2003 Metrics MetricTargetAchieved Number of CPUs4002762 (28 sites) Number of users> 10102 (16) Number of applications> 410 (+CS) Number of sites running concurrent apps > 1017 Peak number of concurrent jobs10001100 Data transfer per day> 2-3 TB4.4 TB max

23 23 Grid2003 Applications To Date CMS proton-proton collision simulation ATLAS proton-proton collision simulation LIGO gravitational wave search SDSS galaxy cluster detection ATLAS interactive analysis BTeV proton-antiproton collision simulation SnB biomolecular analysis GADU/Gnare genone analysis Various computer science experiments www.ivdgl.org/grid2003/applications

24 24 Grid2003 Usage

25 25 10M events produced: largest ever contribution Almost double the number of events during first 25 days vs. 2002: with half the manpower  Production run with 1 person working 50%  400 jobs at once vs. 200 previous year Multi-VO sharing Grid2003 Scientific Impact: E.g., U.S. CMS 2003 Production Continuing at an accelerating rate into 2004 Many issues remain: e.g., scaling, missing functionality

26 26 Grid2003 as CS Research Lab: E.g., Adaptive Scheduling Adaptive data placement in a realistic environment (K. Ranganathan) Enables comparisons with simulations

27 27 Grid2003 Lessons Learned How to operate a Grid  Add sites, recover from errors, provide information, update software, test applications, …  Tools, services, procedures, docs, organization  Need reliable, intelligent, skilled people How to scale algorithms, software, process  “Interesting” failure modes as scale increases  Increasing scale must not overwhelm human resources How to delegate responsibilities  At Project, Virtual Org., service, site, appln level  Distribution of responsibilities for future growth How to apply distributed cyberinfrastructure

28 28 Summary: We Are Building Cyberinfrastructure … GriPhyN/iVDGL (+ DOE PPDG & LHC, etc.) are  Creating an (inter)national-scale, multi-disciplinary infrastructure for distributed data-intensive science;  Demonstrating the utility of such infrastructure via a broad set of applications (not just physics!);  Learning many things about how such infrastructures should be created, operated, and evolved; and  Capturing best practices in software & procedures, including VDT, Pacman, monitoring tools, etc. Unique scale & application breadth  Grid3: 10 apps (science & CS), 28 sites, 2800 CPUs, 1300 jobs, and growing rapidly CS-applications-operations partnership  Having a strong impact on all three

29 29 … And Are Open for Business Virtual Data Toolkit  Distributed workflow & data management & analysis  Data replication, data provenance, etc.  Virtual organization management  Globus Toolkit, Condor, and other good stuff Grid2003  Adapt your applications to use VDT mechanisms and obtain order-of-magnitude increases in performance  Add your site to Grid2003 & join a national-scale cyberinfrastructure  Propose computer science experiments in a unique environment  Write an NMI proposal to fund this work

30 30 For More Information GriPhyN  www.griphyn.org iVDGL  www.ivdgl.org PPDG  www.ppdg.net Grid2003  www.ivdgl.org/grid2003 Virtual Data Toolkit  www.griphyn.org/vdt  www.griphyn.org/chimera 2nd Edition www.mkp.com/grid2


Download ppt "Grid Physics Network & Intl Virtual Data Grid Lab Ian Foster* For the GriPhyN & iVDGL Projects SCI PI Meeting February 18-20, 2004 *Argonne, U.Chicago,"

Similar presentations


Ads by Google