Download presentation
Presentation is loading. Please wait.
Published byVincent Tyler Modified over 9 years ago
1
Grid Physics Network & Intl Virtual Data Grid Lab Ian Foster* For the GriPhyN & iVDGL Projects SCI PI Meeting February 18-20, 2004 *Argonne, U.Chicago, Globus; foster@mcs.anl.gov
2
2 Cyberinfrastructure “A new age has dawned in scientific & engineering research, pushed by continuing progress in computing, information, and communication technology, & pulled by the expanding complexity, scope, and scale of today’s challenges. The capacity of this technology has crossed thresholds that now make possible a comprehensive “cyberinfrastructure” on which to build new types of scientific & engineering knowledge environments & organizations and to pursue research in new ways & with increased efficacy.” [Blue Ribbon Panel report, 2003] But how will we learn how to build, operate, & use it?
3
3 Our Approach: Experimental & Collaborative Experimental procedure: Mix together, and shake well: Physicists* with an overwhelming need to pool resources to solve fundamental science problems Computer scientists with a vision of a Grid that will enable virtual communities to share resources Monitor byproducts Heat: sometimes incendiary Light: considerable, in eScience, computer science, & cyberinfrastructure engineering Operational cyberinfrastructure (hardware & software), with an enthusiastic and knowledgeable user community, and real scientific benefits * We use “physicist” as a generic term indicating a non-computer scientist
4
4 Who are the “Physicists”? – GriPhyN/iVDGL Science Drivers US-ATLAS, US-CMS (LHC expts) Fundamental nature of matter 100s of Petabytes LIGO observatory Gravitational wave search 100s of Terabytes Sloan Digital Sky Survey Astronomical research 10s of Terabytes Data growth Community growth 2007 2005 2003 2001 2009 + a growing number of biologists & other scientists + computer scientists needing experimental apparatus
5
5 Common Underlying Problem: Data-Intensive Analysis Users & resources in many institutions … 1000s of users, 100s of institutions, petascale resources … engage in collaborative data analysis Both structured/scheduled & interactive Many overlapping virtual orgs must Define activities Pool resources Prioritize tasks Manage data …
6
6 Vision & Goals Develop the technologies & tools needed to exploit a distributed cyberinfrastructure Apply and evaluate those technologies & tools in challenging scientific problems Develop the technologies & procedures to support a persistent cyberinfrastructure Create and operate a persistent cyberinfrastructure in support of diverse discipline goals GriPhyN + iVDGL + DOE Particle Physics Data Grid (PPDG) = Trillium End-to-end
7
7 Two Distinct but Integrated Projects Both NSF-funded, overlapping periods GriPhyN: $11.9M (NSF) + $1.6M (match) (2000–2005) CISE iVDGL: $13.7M (NSF) + $2M (match) (2001–2006) MPS Basic composition GriPhyN:12 universities, SDSC, 3 labs(~80 people) iVDGL:18 institutions, SDSC, 4 labs(~100 people) Large overlap in people, institutions, experiments, software GriPhyN (Grid research) vs iVDGL (Grid deployment) GriPhyN:2/3 “CS” + 1/3 “physics”( 0% H/W) iVDGL:1/3 “CS” + 2/3 “physics”(20% H/W) Many common elements: Directors, Advisory Committee, linked management Virtual Data Toolkit (VDT), Grid testbeds, Outreach effort Build on the Globus Toolkit, Condor, and other technologies
8
8 Project Specifics: GriPhyN Develop the technologies & tools needed to exploit a distributed cyberinfrastructure Apply and evaluate those technologies & tools in challenging scientific problems Develop the technologies & procedures to support a persistent cyberinfrastructure Create and operate a persistent cyberinfrastructure in support of diverse discipline goals
9
Science Review Production Manager Researcher instrument Applications storage element Grid Grid Fabric storage element storage element data Services discovery sharing Virtual Data ProductionAnalysis params exec. data composition Virtual Data planning Planning ProductionAnalysis params exec. data PlanningExecution planning Execution Virtual Data Toolkit Chimera virtual data system Pegasus planner DAGman Globus Toolkit Condor Ganglia, etc. GriPhyN Overview
10
pythia_input pythia.exe cmsim_input cmsim.exe writeHits writeDigis begin v /usr/local/demo/scripts/cmkin_input.csh file i ntpl_file_path file i template_file file i num_events stdout cmkin_param_file end begin v /usr/local/demo/binaries/kine_make_ntpl_pyt_cms121.exe pre cms_env_var stdin cmkin_param_file stdout cmkin_log file o ntpl_file end begin v /usr/local/demo/scripts/cmsim_input.csh file i ntpl_file file i fz_file_path file i hbook_file_path file i num_trigs stdout cmsim_param_file end begin v /usr/local/demo/binaries/cms121.exe condor copy_to_spool=false condor getenv=true stdin cmsim_param_file stdout cmsim_log file o fz_file file o hbook_file end begin v /usr/local/demo/binaries/writeHits.sh condor getenv=true pre orca_hits file i fz_file file i detinput file i condor_writeHits_log file i oo_fd_boot file i datasetname stdout writeHits_log file o hits_db end begin v /usr/local/demo/binaries/writeDigis.sh pre orca_digis file i hits_db file i oo_fd_boot file i carf_input_dataset_name file i carf_output_dataset_name file i carf_input_owner file i carf_output_owner file i condor_writeDigis_log stdout writeDigis_log file o digis_db end (Early) Virtual Data Language CMS “Pipeline”
11
mass = 200 decay = WW stability = 1 LowPt = 20 HighPt = 10000 mass = 200 decay = WW stability = 1 event = 8 mass = 200 decay = WW stability = 1 plot = 1 mass = 200 decay = WW plot = 1 mass = 200 decay = WW event = 8 mass = 200 decay = WW stability = 1 mass = 200 decay = WW stability = 3 mass = 200 decay = WW mass = 200 decay = ZZ mass = 200 decay = bb mass = 200 plot = 1 mass = 200 event = 8...The scientist adds a new derived data branch......and continues to investigate… Search for WW decays of the Higgs Boson and where only stable, final state particles are recorded: stability = 1 Scientist discovers an interesting result – wants to know how it was derived. Virtual Data Example: High Energy Physics Work and slide by Rick Cavanaugh and Dimitri Bourilkov, University of Florida
12
12 Galaxy cluster size distribution Task Graph Virtual Data Example: Sloan Galaxy Cluster Analysis Sloan Data Jim Annis, Steve Kent, Vijay Sehkri, Neha Sharma, Fermilab, Michael Milligan, Yong Zhao, Chicago
13
13 Virtual Data Example: NVO/NASA Montage A small (1200 node) workflow Construct custom mosaics on demand from multiple data sources User specifies projection, coordinates, size, rotation, spatial sampling Work by Ewa Deelman et al., USC/ISI and Caltech
14
14 Virtual Data Example: Education (Work in Progress) “We uploaded the data to the Grid & used the grid analysis tools to find the shower”
15
15 Project Specifics: iVDGL Develop the technologies & tools needed to exploit a distributed cyberinfrastructure Apply and evaluate those technologies & tools in challenging scientific problems Develop the technologies & procedures to support a persistent cyberinfrastructure Create and operate a persistent cyberinfrastructure in support of diverse discipline goals
16
16 iVDGL Goals Deploy a Grid laboratory Support research mission of data-intensive expts Computing & personnel resources at university sites Provide platform for computer science development Prototype and deploy a Grid Operations Center Integrate Grid software tools Into computing infrastructures of the experiments Support delivery of Grid technologies Harden VDT & other middleware technologies developed by GriPhyN and other Grid projects Education and Outreach Enable underrepresented groups & remote regions to participate in international science projects
17
17 Virtual Data Toolkit Sources (CVS) Patching GPT src bundles NMI Build & Test Condor pool (37 computers) … Build Test Package VDT Build Contributors (VDS, etc.) Build Pacman cache RPMs Binaries Test Will use NMI processes soon A unique laboratory for managing, testing, supporting, deploying, packaging, upgrading, & troubleshooting complex sets of software!
18
18 Virtual Data Toolkit: Tools in VDT 1.1.12 Condor Group Condor/Condor-G DAGMan Fault Tolerant Shell ClassAds Globus Alliance Grid Security Infrastructure (GSI) Job submission (GRAM) Information service (MDS) Data transfer (GridFTP) Replica Location (RLS) EDG & LCG Make Gridmap Cert. Revocation List Updater Glue Schema/Info provider ISI & UC Chimera & related tools Pegasus NCSA MyProxy GSI OpenSSH LBL PyGlobus Netlogger Caltech MonaLisa VDT VDT System Profiler Configuration software Others KX509 (U. Mich.)
19
19 VDT Growth VDT 1.0 Globus 2.0b Condor 6.3.1 VDT 1.1.3, 1.1.4 & 1.1.5 pre-SC 2002 VDT 1.1.7 Switch to Globus 2.2 VDT 1.1.11 Grid2003 VDT 1.1.8 First real use by LCG
20
20 Grid2003: An Operational Grid 28 sites (2100-2800 CPUs) & growing 400-1300 concurrent jobs 7 substantial applications + CS experiments Running since October 2003 Korea http://www.ivdgl.org/grid2003
21
21 Grid2003 Components Computers & storage at 28 sites (to date) 2800+ CPUs Uniform service environment at each site Globus Toolkit provides basic authentication, execution management, data movement Pacman installation system enables installation of numerous other VDT and application services Global & virtual organization services Certification & registration authorities, VO membership services, monitoring services Client-side tools for data access & analysis Virtual data, execution planning, DAG management, execution management, monitoring IGOC: iVDGL Grid Operations Center
22
22 Grid2003 Metrics MetricTargetAchieved Number of CPUs4002762 (28 sites) Number of users> 10102 (16) Number of applications> 410 (+CS) Number of sites running concurrent apps > 1017 Peak number of concurrent jobs10001100 Data transfer per day> 2-3 TB4.4 TB max
23
23 Grid2003 Applications To Date CMS proton-proton collision simulation ATLAS proton-proton collision simulation LIGO gravitational wave search SDSS galaxy cluster detection ATLAS interactive analysis BTeV proton-antiproton collision simulation SnB biomolecular analysis GADU/Gnare genone analysis Various computer science experiments www.ivdgl.org/grid2003/applications
24
24 Grid2003 Usage
25
25 10M events produced: largest ever contribution Almost double the number of events during first 25 days vs. 2002: with half the manpower Production run with 1 person working 50% 400 jobs at once vs. 200 previous year Multi-VO sharing Grid2003 Scientific Impact: E.g., U.S. CMS 2003 Production Continuing at an accelerating rate into 2004 Many issues remain: e.g., scaling, missing functionality
26
26 Grid2003 as CS Research Lab: E.g., Adaptive Scheduling Adaptive data placement in a realistic environment (K. Ranganathan) Enables comparisons with simulations
27
27 Grid2003 Lessons Learned How to operate a Grid Add sites, recover from errors, provide information, update software, test applications, … Tools, services, procedures, docs, organization Need reliable, intelligent, skilled people How to scale algorithms, software, process “Interesting” failure modes as scale increases Increasing scale must not overwhelm human resources How to delegate responsibilities At Project, Virtual Org., service, site, appln level Distribution of responsibilities for future growth How to apply distributed cyberinfrastructure
28
28 Summary: We Are Building Cyberinfrastructure … GriPhyN/iVDGL (+ DOE PPDG & LHC, etc.) are Creating an (inter)national-scale, multi-disciplinary infrastructure for distributed data-intensive science; Demonstrating the utility of such infrastructure via a broad set of applications (not just physics!); Learning many things about how such infrastructures should be created, operated, and evolved; and Capturing best practices in software & procedures, including VDT, Pacman, monitoring tools, etc. Unique scale & application breadth Grid3: 10 apps (science & CS), 28 sites, 2800 CPUs, 1300 jobs, and growing rapidly CS-applications-operations partnership Having a strong impact on all three
29
29 … And Are Open for Business Virtual Data Toolkit Distributed workflow & data management & analysis Data replication, data provenance, etc. Virtual organization management Globus Toolkit, Condor, and other good stuff Grid2003 Adapt your applications to use VDT mechanisms and obtain order-of-magnitude increases in performance Add your site to Grid2003 & join a national-scale cyberinfrastructure Propose computer science experiments in a unique environment Write an NMI proposal to fund this work
30
30 For More Information GriPhyN www.griphyn.org iVDGL www.ivdgl.org PPDG www.ppdg.net Grid2003 www.ivdgl.org/grid2003 Virtual Data Toolkit www.griphyn.org/vdt www.griphyn.org/chimera 2nd Edition www.mkp.com/grid2
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.