Download presentation
Presentation is loading. Please wait.
Published byProsper Carroll Modified over 9 years ago
1
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery1 University of Florida avery@phys.ufl.edu Integrating Universities and Laboratories In National Cyberinfrastructure PASI Lecture Mendoza, Argentina May 17, 2005
2
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery2 Outline of Talk Cyberinfrastructure and Grids Data intensive disciplines and Data Grids The Trillium Grid collaboration GriPhyN, iVDGL, PPDG The LHC and its computing challenges Grid3 and the Open Science Grid A bit on networks Education and Outreach Challenges for the future Summary Presented from a physicist’s perspective!
3
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery3 Cyberinfrastructure (cont) Software programs, services, instruments, data, information, knowledge, applicable to specific projects, disciplines, and communities. Cyberinfrastructure layer of enabling hardware, algorithms, software, communications, institutions, and personnel. A platform that empowers researchers to innovate and eventually revolutionize what they do, how they do it, and who participates. Base technologies: Computation, storage, and communication components that continue to advance in raw capacity at exponential rates. [Paraphrased from NSF Blue Ribbon Panel report, 2003] Challenge: Creating and operating advanced cyberinfrastructure and integrating it in science and engineering applications.
4
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery4 Cyberinfrastructure and Grids Grid: Geographically distributed computing resources configured for coordinated use Fabric: Physical resources & networks provide raw capability Ownership:Resources controlled by owners and shared w/ others Middleware: Software ties it all together: tools, services, etc. Enhancing collaboration via transparent resource sharing US-CMS “Virtual Organization”
5
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery5 Data Grids & Collaborative Research Team-based 21 st century scientific discovery Strongly dependent on advanced information technology People and resources distributed internationally Dominant factor: data growth (1 Petabyte = 1000 TB) 2000~0.5 Petabyte 2005~10 Petabytes 2010~100 Petabytes 2015-7~1000 Petabytes? Drives need for powerful linked resources: “Data Grids” ComputationMassive, distributed CPU Data storage and accessDistributed hi-speed disk and tape Data movementInternational optical networks Collaborative research and Data Grids Data discovery, resource sharing, distributed analysis, etc. How to collect, manage, access and interpret this quantity of data?
6
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery6 Examples of Data Intensive Disciplines High energy & nuclear physics Belle, BaBar, Tevatron, RHIC, JLAB Large Hadron Collider (LHC) Astronomy Digital sky surveys, “Virtual” Observatories VLBI arrays: multiple- Gb/s data streams Gravity wave searches LIGO, GEO, VIRGO, TAMA, ACIGA, … Earth and climate systems Earth Observation, climate modeling, oceanography, … Biology, medicine, imaging Genome databases Proteomics (protein structure & interactions, drug delivery, …) High-resolution brain scans (1-10 m, time dependent) Primary driver
7
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery7 Our Vision & Goals Develop the technologies & tools needed to exploit a Grid-based cyberinfrastructure Apply and evaluate those technologies & tools in challenging scientific problems Develop the technologies & procedures to support a permanent Grid-based cyberinfrastructure Create and operate a persistent Grid-based cyberinfrastructure in support of discipline-specific research goals GriPhyN + iVDGL + DOE Particle Physics Data Grid (PPDG) = Trillium End-to-end
8
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery8 Our Science Drivers Experiments at Large Hadron Collider New fundamental particles and forces 100s of Petabytes2007 - ? High Energy & Nuclear Physics expts Top quark, nuclear matter at extreme density ~1 Petabyte (1000 TB)1997 – present LIGO (gravity wave search) Search for gravitational waves 100s of Terabytes2002 – present Sloan Digital Sky Survey Systematic survey of astronomical objects 10s of Terabytes2001 – present Data growth Community growth 2007 2005 2003 2001 2009
9
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery9 Grid Middleware: Virtual Data Toolkit Sources (CVS) Patching GPT src bundles NMI Build & Test Condor pool 22+ Op. Systems Build Test Package VDT Build Many Contributors Build Pacman cache RPMs Binaries Test A unique laboratory for testing, supporting, deploying, packaging, upgrading, & troubleshooting complex sets of software!
10
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery10 VDT Growth Over 3 Years # of components VDT 1.0 Globus 2.0b Condor 6.3.1 VDT 1.1.7 Switch to Globus 2.2 VDT 1.1.11 Grid3 VDT 1.1.8 First real use by LCG www.griphyn.org/vdt/
11
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery11 Components of VDT 1.3.5 Globus 3.2.1 Condor 6.7.6 RLS 3.0 ClassAds 0.9.7 Replica 2.2.4 DOE/EDG CA certs ftsh 2.0.5 EDG mkgridmap EDG CRL Update GLUE Schema 1.0 VDS 1.3.5b Java Netlogger 3.2.4 Gatekeeper-Authz MyProxy1.11 KX509 System Profiler GSI OpenSSH 3.4 Monalisa 1.2.32 PyGlobus 1.0.6 MySQL UberFTP 1.11 DRM 1.2.6a VOMS 1.4.0 VOMS Admin 0.7.5 Tomcat PRIMA 0.2 Certificate Scripts Apache jClarens 0.5.3 New GridFTP Server GUMS 1.0.1
12
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery12 Collaborative Relationships: A CS + VDT Perspective Computer Science Research Virtual Data Toolkit Partner science projects Partner networking projects Partner outreach projects Larger Science Community Globus, Condor, NMI, iVDGL, PPDG EU DataGrid, LHC Experiments, QuarkNet, CHEPREO, Dig. Divide Production Deployment Tech Transfer Techniques & software Requirements Prototyping & experiments Other linkages Work force CS researchers Industry U.S.Grids Int’l Outreach
13
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery13 U.S. “Trillium” Grid Partnership Trillium = PPDG + GriPhyN + iVDGL Particle Physics Data Grid:$12M (DOE)(1999 – 2006) GriPhyN:$12M (NSF)(2000 – 2005) iVDGL:$14M (NSF)(2001 – 2006) Basic composition (~150 people) PPDG:4 universities, 6 labs GriPhyN:12 universities, SDSC, 3 labs iVDGL:18 universities, SDSC, 4 labs, foreign partners Expts:BaBar, D0, STAR, Jlab, CMS, ATLAS, LIGO, SDSS/NVO Coordinated internally to meet broad goals GriPhyN:CS research, Virtual Data Toolkit (VDT) development iVDGL:Grid laboratory deployment using VDT, applications PPDG:“End to end” Grid services, monitoring, analysis Common use of VDT for underlying Grid middleware Unified entity when collaborating internationally
14
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery14 Goal: Peta-scale Data Grids for Global Science Virtual Data Tools Request Planning & Scheduling Tools Request Execution & Management Tools Transforms Distributed resources (code, storage, CPUs, networks) Resource Management Services Security and Policy Services Other Grid Services Interactive User Tools Production Team Single Researcher Workgroups Raw data source PetaOps Petabytes Performance
15
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery15 Sloan Digital Sky Survey (SDSS) Using Virtual Data in GriPhyN Galaxy cluster size distribution Sloan Data
16
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery16 The LIGO Scientific Collaboration (LSC) and the LIGO Grid LIGO Grid: 6 US sites * LHO, LLO: observatory sites * LSC - LIGO Scientific Collaboration - iVDGL supported iVDGL has enabled LSC to establish a persistent production grid Cardiff AEI/Golm + 3 EU sites (Cardiff/UK, AEI/Germany) Birmingham
17
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery17 Large Hadron Collider & its Frontier Computing Challenges
18
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery18 Search for Origin of Mass New fundamental forces Supersymmetry Other new particles 2007 – ? TOTEM LHCb ALICE 27 km Tunnel in Switzerland & France CMS ATLAS Large Hadron Collider (LHC) @ CERN
19
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery19 CMS: “Compact” Muon Solenoid Inconsequential humans
20
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery20 LHC Data Rates: Detector to Storage Level 1 Trigger: Special Hardware 40 MHz 75 KHz 75 GB/sec 5 KHz 5 GB/sec Level 2 Trigger: Commodity CPUs 100 Hz 0.15 – 1.5 GB/sec Level 3 Trigger: Commodity CPUs Raw Data to storage (+ simulated data) Physics filtering ~TBytes/sec
21
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery21 All charged tracks with pt > 2 GeV Reconstructed tracks with pt > 25 GeV (+30 minimum bias events) 10 9 collisions/sec, selectivity: 1 in 10 13 Complexity: Higgs Decay to 4 Muons
22
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery22 LHC: Petascale Global Science Complexity:Millions of individual detector channels Scale:PetaOps (CPU), 100s of Petabytes (Data) Distribution:Global distribution of people & resources CMS Example- 2007 5000+ Physicists 250+ Institutes 60+ Countries BaBar/D0 Example - 2004 700+ Physicists 100+ Institutes 35+ Countries
23
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery23 CMS Experiment LHC Global Data Grid (2007+) Online System CERN Computer Center USA Korea Russia UK Maryland 150 - 1500 MB/s >10 Gb/s 10-40 Gb/s 2.5-10 Gb/s Tier 0 Tier 1 Tier 3 Tier 2 Physics caches PCs Iowa UCSDCaltech U Florida 5000 physicists, 60 countries 10s of Petabytes/yr by 2008 1000 Petabytes in < 10 yrs? FIU Tier 4
24
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery24 University Tier2 Centers Tier2 facility Essential university role in extended computing infrastructure 20 – 25% of Tier1 national laboratory, supported by NSF Validated by 3 years of experience (CMS, ATLAS, LIGO) Functions Perform physics analysis, simulations Support experiment software Support smaller institutions Official role in Grid hierarchy (U.S.) Sanctioned by MOU with parent organization (ATLAS, CMS, LIGO) Selection by collaboration via careful process Local P.I. with reporting responsibilities
25
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery25 Grids and Globally Distributed Teams Non-hierarchical: Chaotic analyses + productions Superimpose significant random data flows
26
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery26 Grid3 and Open Science Grid
27
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery27 Grid3: A National Grid Infrastructure 32 sites, 4000 CPUs: Universities + 4 national labs Part of LHC Grid, Running since October 2003 Sites in US, Korea, Brazil, Taiwan Applications in HEP, LIGO, SDSS, Genomics, fMRI, CS Brazil http://www.ivdgl.org/grid3
28
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery28 Grid3 World Map
29
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery29 Grid3 Components Computers & storage at ~30 sites: 4000 CPUs Uniform service environment at each site Globus Toolkit: Provides basic authentication, execution management, data movement Pacman: Installs numerous other VDT and application services Global & virtual organization services Certification & registration authorities, VO membership services, monitoring services Client-side tools for data access & analysis Virtual data, execution planning, DAG management, execution management, monitoring IGOC: iVDGL Grid Operations Center Grid testbed: Grid3dev Middleware development and testing, new VDT versions, etc.
30
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery30 Grid3 Applications CMS experimentp-p collision simulations & analysis ATLAS experimentp-p collision simulations & analysis BTEV experimentp-p collision simulations & analysis LIGOSearch for gravitational wave sources SDSSGalaxy cluster finding Bio-molecular analysisShake n Bake (SnB) (Buffalo) Genome analysisGADU/Gnare fMRIFunctional MRI (Dartmouth) CS DemonstratorsJob Exerciser, GridFTP, NetLogger www.ivdgl.org/grid3/applications
31
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery31 Grid3 Shared Use Over 6 months CMS DC04 ATLAS DC2 Sep 10 Usage: CPUs
32
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery32 Grid3 Production Over 13 Months
33
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery33 U.S. CMS 2003 Production 10M p-p collisions; largest ever 2x simulation sample ½ manpower Multi-VO sharing
34
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery34 Grid3 as CS Research Lab: E.g., Adaptive Scheduling Adaptive data placement in a realistic environment (K. Ranganathan) Enables comparisons with simulations
35
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery35 Grid3 Lessons Learned How to operate a Grid as a facility Tools, services, error recovery, procedures, docs, organization Delegation of responsibilities (Project, VO, service, site, …) Crucial role of Grid Operations Center (GOC) How to support people people relations Face-face meetings, phone cons, 1-1 interactions, mail lists, etc. How to test and validate Grid tools and applications Vital role of testbeds How to scale algorithms, software, process Some successes, but “interesting” failure modes still occur How to apply distributed cyberinfrastructure Successful production runs for several applications
36
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery36 Grid3 Open Science Grid Iteratively build & extend Grid3 Grid3 OSG-0 OSG-1 OSG-2 … Shared resources, benefiting broad set of disciplines Grid middleware based on Virtual Data Toolkit (VDT) Emphasis on “end to end” services for applications OSG collaboration Computer and application scientists Facility, technology and resource providers (labs, universities) Further develop OSG Partnerships and contributions from other sciences, universities Incorporation of advanced networking Focus on general services, operations, end-to-end performance Aim for Summer 2005 deployment
37
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery37 http://www.opensciencegrid.org
38
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery38 Enterprise Technical Groups Research Grid Projects VOs Researchers Sites Service Providers Universities, Labs activity 1 activity 1 activity 1 Activities Advisory Committee Core OSG Staff (few FTEs, manager) OSG Council (all members above a certain threshold, Chair, officers) Executive Board (8-15 representatives Chair, Officers) OSG Organization
39
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery39 OSG Technical Groups & Activities Technical Groups address and coordinate technical areas Propose and carry out activities related to their given areas Liaise & collaborate with other peer projects (U.S. & international) Participate in relevant standards organizations. Chairs participate in Blueprint, Integration and Deployment activities Activities are well-defined, scoped tasks contributing to OSG Each Activity has deliverables and a plan … is self-organized and operated … is overseen & sponsored by one or more Technical Groups TGs and Activities are where the real work gets done
40
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery40 OSG Technical Groups GovernanceCharter, organization, by-laws, agreements, formal processes PolicyVO & site policy, authorization, priorities, privilege & access rights SecurityCommon security principles, security infrastructure Monitoring and Information Services Resource monitoring, information services, auditing, troubleshooting StorageStorage services at remote sites, interfaces, interoperability Support CentersInfrastructure and services for user support, helpdesk, trouble ticket Education / OutreachTraining, interface with various E/O projects Networks (new)Including interfacing with various networking projects
41
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery41 OSG Activities BlueprintDefining principles and best practices for OSG DeploymentDeployment of resources & services ProvisioningConnected to deployment Incidence responsePlans and procedures for responding to security incidents IntegrationTesting & validating & integrating new services and technologies Data Resource Management (DRM) Deployment of specific Storage Resource Management technology DocumentationOrganizing the documentation infrastructure AccountingAccounting and auditing use of OSG resources InteroperabilityPrimarily interoperability between OperationsOperating Grid-wide services
42
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery42 Connections to European Projects: LCG and EGEE
43
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery43 The Path to the OSG Operating Grid OSG Deployment Activity Metrics & Certification Application validation VO Application Software Installation OSG Integration Activity Release Description Middleware Interoperability Software & packaging Functionality & Scalability Tests Readiness plan adopted Service deployment OSG Operations-Provisioning Activity Release Candidate Readiness plan Effort Resources feedback
44
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery44 OSG Integration Testbed Brazil
45
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery45 Status of OSG Deployment OSG infrastructure release accepted for deployment. US CMS MOP “flood testing” successful D0 simulation & reprocessing jobs running on selected OSG sites Others in various stages of readying applications & infrastructure (ATLAS, CMS, STAR, CDF, BaBar, fMRI) Deployment process underway: End of July? Open OSG and transition resources from Grid3 Applications will use growing ITB & OSG resources during transition http://osg.ivdgl.org/twiki/bin/view/Integration/WebHome
46
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery46 Interoperability & Federation Transparent use of Federated Grid infrastructures a goal There are sites that appear as part of “LCG” as well as part of OSG/Grid3 D0 bringing reprocessing to LCG sites through adaptor node CMS and ATLAS can run their jobs on both LCG and OSG Increasing interaction with TeraGrid CMS and ATLAS sample simulation jobs are running on TeraGrid Plans for TeraGrid allocation for jobs running in Grid3 model: with group accounts, binary distributions, external data management, etc
47
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery47 Networks
48
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery48 Evolving Science Requirements for Networks (DOE High Perf. Network Workshop) Science Areas Today End2End Throughput 5 years End2End Throughput 5-10 Years End2End Throughput Remarks High Energy Physics 0.5 Gb/s100 Gb/s 1000 Gb/s High bulk throughput Climate (Data & Computation) 0.5 Gb/s160-200 Gb/s N x 1000 Gb/s High bulk throughput SNS NanoScience Not yet started 1 Gb/s1000 Gb/s + QoS for Control Channel Remote control and time critical throughput Fusion Energy0.066 Gb/s (500 MB/s burst) 0.2 Gb/s (500MB/ 20 sec. burst) N x 1000 Gb/s Time critical throughput Astrophysics0.013 Gb/s (1 TB/week) N*N multicast 1000 Gb/s Computational steering and collaborations Genomics Data & Computation 0.091 Gb/s (1 TB/day) 100s of users1000 Gb/s + QoS for Control Channel High throughput and steering See http://www.doecollaboratory.org/meetings/hpnpw /
49
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery49 UltraLight: Advanced Networking in Applications 10 Gb/s+ network Caltech, UF, FIU, UM, MIT SLAC, FNAL Int’l partners Level(3), Cisco, NLR Funded by ITR2004
50
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery50 UltraLight: New Information System A new class of integrated information systems Includes networking as a managed resource for the first time Uses “Hybrid” packet-switched and circuit-switched optical network infrastructure Monitor, manage & optimize network and Grid Systems in realtime Flagship applications: HEP, eVLBI, “burst” imaging “Terabyte-scale” data transactions in minutes Extend Real-Time eVLBI to the 10 – 100 Gb/s Range Powerful testbed Significant storage, optical networks for testing new Grid services Strong vendor partnerships Cisco, Calient, NLR, CENIC, Internet2/Abilene
51
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery51 Education and Outreach
52
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery52 iVDGL, GriPhyN Education/Outreach Basics $200K/yr Led by UT Brownsville Workshops, portals, tutorials New partnerships with QuarkNet, CHEPREO, LIGO E/O, …
53
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery53 June 2004 Grid Summer School First of its kind in the U.S. (South Padre Island, Texas) 36 students, diverse origins and types (M, F, MSIs, etc) Marks new direction for U.S. Grid efforts First attempt to systematically train people in Grid technologies First attempt to gather relevant materials in one place Today:Students in CS and Physics Next:Students, postdocs, junior & senior scientists Reaching a wider audience Put lectures, exercises, video, on the web More tutorials, perhaps 2-3/year Dedicated resources for remote tutorials Create “Grid Cookbook”, e.g. Georgia Tech Second workshop: July 11–15, 2005 South Padre Island again
54
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery54 QuarkNet/GriPhyN e-Lab Project http://quarknet.uchicago.edu/elab/cosmic/home.jsp
55
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery55 Student Muon Lifetime Analysis in GriPhyN/QuarkNet
56
CHEPREO: Center for High Energy Physics Research and Educational Outreach Florida International University Physics Learning Center CMS Research iVDGL Grid Activities AMPATH network (S. America) Funded September 2003 $4M initially (3 years) MPS, CISE, EHR, INT
57
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery57 NEWS: Bulletin: ONE TWO WELCOME BULLETIN General Information Registration Travel Information Hotel Registration Participant List How to Get UERJ/Hotel Computer Accounts Useful Phone Numbers Program Contact us: Secretariat Chairmen Grids and the Digital Divide Rio de Janeiro, Feb. 16-20, 2004 Background World Summit on Information Society HEP Standing Committee on Inter-regional Connectivity (SCIC) Themes Global collaborations, Grids and addressing the Digital Divide Focus on poorly connected regions Next meeting: Daegu, Korea May 23-27, 2005
58
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery58 Partnerships Drive Success Integrating Grids in scientific research “Lab-centric”:Activities center around large facility “Team-centric”:Resources shared by distributed teams “Knowledge-centric”:Knowledge generated/used by a community Strengthening the role of universities in frontier research Couples universities to frontier data intensive research Brings front-line research and resources to students Exploits intellectual resources at minority or remote institutions Driving advances in IT/science/engineering Domain sciences Computer Science Universities Laboratories Scientists Students NSF projects NSF projects NSF DOE Research communities IT industry
59
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery59 Fulfilling the Promise of Next Generation Science Supporting permanent, national-scale Grid infrastructure Large CPU, storage and network capability crucial for science Support personnel, equipment maintenance, replacement, upgrade Tier1 and Tier2 resources a vital part of infrastructure Open Science Grid a unique national infrastructure for science Supporting the maintenance, testing and dissemination of advanced middleware Long-term support of the Virtual Data Toolkit Vital for reaching new disciplines & for supporting large international collaborations Continuing support for HEP as a frontier challenge driver Huge challenges posed by LHC global interactive analysis New challenges posed by remote operation of Global Accelerator Network
60
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery60 Fulfilling the Promise (2) Creating even more advanced cyberinfrastructure Integrating databases in large-scale Grid environments Interactive analysis with distributed teams Partnerships involving CS research with application drivers Supporting the emerging role of advanced networks Reliable, high performance LANs and WANs necessary for advanced Grid applications Partnering to enable stronger, more diverse programs Programs supported by multiple Directorates, a la CHEPREO NSF-DOE joint initiatives Strengthen ability of universities and labs to work together Providing opportunities for cyberinfrastructure training, education & outreach Grid tutorials, Grid Cookbook Collaborative tools for student-led projects & research
61
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery61 Summary Grids enable 21 st century collaborative science Linking research communities and resources for scientific discovery Needed by global collaborations pursuing “petascale” science Grid3 was an important first step in developing US Grids Value of planning, coordination, testbeds, rapid feedback Value of learning how to operate a Grid as a facility Value of building & sustaining community relationships Grids drive need for advanced optical networks Grids impact education and outreach Providing technologies & resources for training, education, outreach Addressing the Digital Divide OSG: a scalable computing infrastructure for science? Strategies needed to cope with increasingly large scale
62
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery62 Grid Project References Open Science Grid www.opensciencegrid.org Grid3 www.ivdgl.org/grid3 Virtual Data Toolkit www.griphyn.org/vdt GriPhyN www.griphyn.org iVDGL www.ivdgl.org PPDG www.ppdg.net CHEPREO www.chepreo.org UltraLight ultralight.cacr.caltech.edu Globus www.globus.org Condor www.cs.wisc.edu/condor LCG www.cern.ch/lcg EU DataGrid www.eu-datagrid.org EGEE www.eu-egee.org
63
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery63 Extra Slides
64
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery64 GriPhyN Goals Conduct CS research to achieve vision Virtual Data as unifying principle Planning, execution, performance monitoring Disseminate through Virtual Data Toolkit A “concrete” deliverable Integrate into GriPhyN science experiments Common Grid tools, services Educate, involve, train students in IT research Undergrads, grads, postdocs, Underrepresented groups
65
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery65 iVDGL Goals Deploy a Grid laboratory Support research mission of data intensive experiments Provide computing and personnel resources at university sites Provide platform for computer science technology development Prototype and deploy a Grid Operations Center (iGOC) Integrate Grid software tools Into computing infrastructures of the experiments Support delivery of Grid technologies Hardening of the Virtual Data Toolkit (VDT) and other middleware technologies developed by GriPhyN and other Grid projects Education and Outreach Lead and collaborate with Education and Outreach efforts Provide tools and mechanisms for underrepresented groups and remote regions to participate in international science projects
66
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery66 CMS: Grid Enabled Analysis Architecture Scheduler Catalogs Grid Services Web Server Execution Priority Manager Grid Wide Execution Service Data Manage- ment Fully- Concrete Planner Fully- Abstract Planner Analysis Client Virtual Data Replica Applications Monitoring Partially- Abstract Planner Metadata HTTP, SOAP, XML- RPC Chimera Sphinx MonALISA ROOT (analysis tool) Python Cojac (detector viz)/ IGUANA (cms viz) Clarens MCRunjob BOSS RefDB POOL ORCA ROOT FAMOS VDT-Server MOPDB Discovery ACL management Cert. based access uClients talk standard protocols to “Grid Services Web Server” uSimple Web service API allows simple or complex analysis clients uTypical clients: ROOT, Web Browser, …. uClarens portal hides complexity uKey features: Global Scheduler, Catalogs, Monitoring, Grid-wide Execution service Analysis Client
67
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery67 “Virtual Data”: Derivation & Provenance Most scientific data are not simple “measurements” They are computationally corrected/reconstructed They can be produced by numerical simulation Science & eng. projects are more CPU and data intensive Programs are significant community resources (transformations) So are the executions of those programs (derivations) Management of dataset dependencies critical! Derivation: Instantiation of a potential data product Provenance: Complete history of any existing data product Previously:Manual methods GriPhyN:Automated, robust tools
68
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery68 decay = WW WW e Pt > 20 decay = WW WW e decay = WW WW leptons mass = 160 decay = WW decay = ZZ decay = bb Other cuts Scientist adds a new derived data branch & continues analysis Virtual Data Example: HEP Analysis Other cuts
69
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery69 Language: define software environments Interpreter: create, install, configure, update, verify environments Version 3.0.2 released Jan. 2005 LCG/Scram ATLAS/CMT CMS DPE/tar/make LIGO/tar/make OpenSource/tar/make Globus/GPT NPACI/TeraGrid/tar/make D0/UPS-UPD Commercial/tar/make Combine and manage software from arbitrary sources. % pacman –get iVDGL:Grid3 “1 button install”: Reduce burden on administrators Remote experts define installation/ config/updating for everyone at once VDT ATLA S NPAC I D- Zero iVDGL UCHEP % pacman VDT CMS/DPE LIGO Packaging of Grid Software: Pacman
70
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery70 “I’ve detected a muon calibration error and want to know which derived data products need to be recomputed.” “I’ve found some interesting data, but I need to know exactly what corrections were applied before I can trust it.” “I want to search a database for 3 muon events. If a program that does this analysis exists, I won’t have to write one from scratch.” “I want to apply a forward jet analysis to 100M events. If the results already exist, I’ll save weeks of computation.” Virtual Data Motivations VDC Describe Discover ReuseValidate
71
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery71 U.S. Funded Projects GriPhyN (NSF) iVDGL (NSF) Particle Physics Data Grid (DOE) UltraLight TeraGrid (NSF) DOE Science Grid (DOE) NEESgrid (NSF) NSF Middleware Initiative (NSF) Background: Data Grid Projects EU, Asia projects EGEE (EU) LCG (CERN) DataGrid EU national Projects DataTAG (EU) CrossGrid (EU) GridLab (EU) Japanese, Korea Projects Many projects driven/led by HEP + CS Many 10s x $M brought into the field Large impact on other sciences, education Driven primarily by HEP applications
72
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery72 “Virtual Data”: Derivation & Provenance Most scientific data are not simple “measurements” They are computationally corrected/reconstructed They can be produced by numerical simulation Science & eng. projects are more CPU and data intensive Programs are significant community resources (transformations) So are the executions of those programs (derivations) Management of dataset dependencies critical! Derivation: Instantiation of a potential data product Provenance: Complete history of any existing data product Previously:Manual methods GriPhyN:Automated, robust tools
73
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery73 Muon Lifetime Analysis Workflow
74
pythia_input pythia.exe cmsim_input cmsim.exe writeHits writeDigis begin v /usr/local/demo/scripts/cmkin_input.csh file i ntpl_file_path file i template_file file i num_events stdout cmkin_param_file end begin v /usr/local/demo/binaries/kine_make_ntpl_pyt_cms121.exe pre cms_env_var stdin cmkin_param_file stdout cmkin_log file o ntpl_file end begin v /usr/local/demo/scripts/cmsim_input.csh file i ntpl_file file i fz_file_path file i hbook_file_path file i num_trigs stdout cmsim_param_file end begin v /usr/local/demo/binaries/cms121.exe condor copy_to_spool=false condor getenv=true stdin cmsim_param_file stdout cmsim_log file o fz_file file o hbook_file end begin v /usr/local/demo/binaries/writeHits.sh condor getenv=true pre orca_hits file i fz_file file i detinput file i condor_writeHits_log file i oo_fd_boot file i datasetname stdout writeHits_log file o hits_db end begin v /usr/local/demo/binaries/writeDigis.sh pre orca_digis file i hits_db file i oo_fd_boot file i carf_input_dataset_name file i carf_output_dataset_name file i carf_input_owner file i carf_output_owner file i condor_writeDigis_log stdout writeDigis_log file o digis_db end (Early) Virtual Data Language CMS “Pipeline”
75
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery75 QuarkNet Portal Architecture Simpler interface for non-experts Builds on Chiron portal
76
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery76 Integration of GriPhyN and IVDGL Both funded by NSF large ITRs, overlapping periods GriPhyN:CS Research, Virtual Data Toolkit(9/2000–9/2005) iVDGL:Grid Laboratory, applications(9/2001–9/2006) Basic composition GriPhyN:12 universities, SDSC, 4 labs(~80 people) iVDGL:18 institutions, SDSC, 4 labs(~100 people) Expts:CMS, ATLAS, LIGO, SDSS/NVO GriPhyN (Grid research) vs iVDGL (Grid deployment) GriPhyN:2/3 “CS” + 1/3 “physics”( 0% H/W) iVDGL:1/3 “CS” + 2/3 “physics”(20% H/W) Many common elements Common Directors, Advisory Committee, linked management Common Virtual Data Toolkit (VDT) Common Grid testbeds Common Outreach effort
77
Science Review Production Manager Researcher instrument Applications storage element Grid Grid Fabric storage element storage element data Services discovery sharing Virtual Data ProductionAnalysis params exec. data composition Virtual Data planning Planning ProductionAnalysis params exec. data Planning Execution planning Virtual Data Toolkit Chimera virtual data system Pegasus planner DAGman Globus Toolkit Condor Ganglia, etc. GriPhyN Overview Execution
78
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery78 Chiron/QuarkNet Architecture
79
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery79 Cyberinfrastructure “A new age has dawned in scientific & engineering research, pushed by continuing progress in computing, information, and communication technology, & pulled by the expanding complexity, scope, and scale of today’s challenges. The capacity of this technology has crossed thresholds that now make possible a comprehensive “cyberinfrastructure” on which to build new types of scientific & engineering knowledge environments & organizations and to pursue research in new ways & with increased efficacy.” [NSF Blue Ribbon Panel report, 2003]
80
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery80 Fulfilling the Promise of Next Generation Science Our multidisciplinary partnership of physicists, computer scientists, engineers, networking specialists and education experts, from universities and laboratories, has achieved tremendous success in creating and maintaining general purpose cyberinfrastructure supporting leading-edge science. But these achievements have occurred in the context of overlapping short-term projects. How can we ensure the survival of valuable existing cyber- infrastructure while continuing to address new challenges posed by frontier scientific and engineering endeavors?
81
PASI: Mendoza, Argentina (May 17, 2005)Paul Avery81 Production Simulations on Grid3 Used = 1.5 US-CMS resources US-CMS Monte Carlo Simulation USCMS Non-USCMS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.