US Grid Efforts Lee Lueking D0 Remote Analysis Workshop February 12, 2002
Lee Lueking - D0 RACE2 All of these projects are working towards the common goal of providing transparent access to the massively distributed computing infrastructure that is needed to meet the challenges of modern experiments … (From the EU DataTAG proposal) All of these projects are working towards the common goal of providing transparent access to the massively distributed computing infrastructure that is needed to meet the challenges of modern experiments … (From the EU DataTAG proposal)
February 12, 2002Lee Lueking - D0 RACE3 Grid Projects Timeline Q3 00 Q4 00 Q4 01 Q3 01 Q2 01 Q1 01 Q1 02 GriPhyN: $11.9M+$1.6M PPDG:$9.5M iVDGL:$13.65M EU DataGrid: $9.3M EU DataTAG:4M Euros GridPP:
February 12, 2002Lee Lueking - D0 RACE4 PPDG Particle Physics Data Grid “Vertical Integration” of Grid middleware components into HENP experiments’ ongoing work “Vertical Integration” of Grid middleware components into HENP experiments’ ongoing work An unparalleled laboratory for “experimental computer science” An unparalleled laboratory for “experimental computer science” iVDGL International Virtual Data Grid Laboratory A place to conduct Data Grid tests “at scale” A place to conduct Data Grid tests “at scale” A mechanism to create common Grid infrastructure A mechanism to create common Grid infrastructure A facility to perform production exercises for LHC experiments A facility to perform production exercises for LHC experiments
February 12, 2002Lee Lueking - D0 RACE5 PPDG Funded through DOE MICS and HENP programs Funded through DOE MICS and HENP programs PPDG will develop, acquire and deliver vitally needed Grid-enabled tools for data-intensive requirements of particle and nuclear physics. PPDG will develop, acquire and deliver vitally needed Grid-enabled tools for data-intensive requirements of particle and nuclear physics. PPDG is a collaboration of computer scientists with a strong record in distributed computing and Grid technology, and physicists with leading roles in the software and network infrastructures for major high- energy and nuclear experiments. PPDG is a collaboration of computer scientists with a strong record in distributed computing and Grid technology, and physicists with leading roles in the software and network infrastructures for major high- energy and nuclear experiments. Our goals and plans are ultimately guided by the immediate, medium-term and longer-term needs and perspectives of the physics experiments. Our goals and plans are ultimately guided by the immediate, medium-term and longer-term needs and perspectives of the physics experiments.
February 12, 2002Lee Lueking - D0 RACE6 GriPhyN: Grid Physics Network Virtual data technologies. Advances are required in information models and in new methods of cataloging, characterizing, validating, and archiving software components to implement virtual data manipulations Virtual data technologies. Advances are required in information models and in new methods of cataloging, characterizing, validating, and archiving software components to implement virtual data manipulations Policy-driven request planning and scheduling of networked data and computational resources. We require mechanisms for representing and enforcing both local and global policy constraints and new policy-aware resource discovery techniques. Policy-driven request planning and scheduling of networked data and computational resources. We require mechanisms for representing and enforcing both local and global policy constraints and new policy-aware resource discovery techniques. Management of transactions and task-execution across national-scale and worldwide virtual organizations. New mechanisms are needed to meet user requirements for performance, reliability, and cost. Agent computing will be important to permit the grid to balance user requirements and grid throughput, with fault tolerance. Management of transactions and task-execution across national-scale and worldwide virtual organizations. New mechanisms are needed to meet user requirements for performance, reliability, and cost. Agent computing will be important to permit the grid to balance user requirements and grid throughput, with fault tolerance.
February 12, 2002Lee Lueking - D0 RACE7 iVDGL:International Virtual Data Grid Laboratory The iVDGL will provide a global computing resource for several leading international experiments in physics and astronomy, The iVDGL will provide a global computing resource for several leading international experiments in physics and astronomy, Global services and centralized monitoring, management, and support functions functions will be coordinated by the Grid Operations Center (GOC) located at Indiana University, with technical effort provided by GOC staff, iVDGL site staff, and the CS support teams. Global services and centralized monitoring, management, and support functions functions will be coordinated by the Grid Operations Center (GOC) located at Indiana University, with technical effort provided by GOC staff, iVDGL site staff, and the CS support teams. The GOC will operate iVDGL as a NOC manages a network, providing a single, dedicated point of contact for iVDGL status, configuration, and management, and addressing overall robustness issues. The GOC will operate iVDGL as a NOC manages a network, providing a single, dedicated point of contact for iVDGL status, configuration, and management, and addressing overall robustness issues.
February 12, 2002Lee Lueking - D0 RACE8 iVDGL:International Virtual Data Grid Laboratory Management of the iVDGL will be integrated with that of the GriPhyN Project, funded by NSF in September 2000 for $11.9M. Management of the iVDGL will be integrated with that of the GriPhyN Project, funded by NSF in September 2000 for $11.9M. GriPhyN and Particle Physics Data Grid will provide the basic R&D and software toolkits needed for the laboratory. GriPhyN and Particle Physics Data Grid will provide the basic R&D and software toolkits needed for the laboratory. The European Union DataGrid is also a major participant and will contribute basic technologies and tools. The European Union DataGrid is also a major participant and will contribute basic technologies and tools. The iVDGL will be based on the open Grid infrastructure provided by the Globus Toolkit and will also build on other technologies such as Condor resource management tools. The iVDGL will be based on the open Grid infrastructure provided by the Globus Toolkit and will also build on other technologies such as Condor resource management tools.
February 12, 2002Lee Lueking - D0 RACE9 So What’s the Difference? PPDGiVDGL Funding US DOE approved 1/1/3/3/3 $M, 99 – 03 US NSF proposed 3/3/3/3/3 $M, 02 – 06 Computer Science Globus (Foster), Condor (Livny), SDM (Shoshani), SRB (Moore) Globus (Foster, Kesselman), Condor (Livny) Physics BaBar, Dzero, STAR, JLAB, ATLAS, CMS ATLAS, CMS, LIGO, SDSS, NVO National Laboratories BNL, Fermilab, JLAB, SLAC, ANL, LBNL ANL,BNL, Fermilab (all unfunded collaborators) Universities Caltech, SDSS, UCSD, Wisconsin Florida, Chicago, Caltech, UCSD, Indiana, Boston, Wisconsin at Milwaukee, Pennsylvania State, Johns Hopkins, Wisconsin at Madison, Northwestern, USC, UT Brownsville, Hampton, Salish Kootenai College HardwareNone ~20% of funding (Tier-2 Centers) Network No funding requested No funding requested DataTAG complementary
February 12, 2002Lee Lueking - D0 RACE10 PPDG Collaborators
February 12, 2002Lee Lueking - D0 RACE11 PPDG Computer Science Groups Condor – develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing on large collections of computing resources with distributed ownership. Globus - developing fundamental technologies needed to build persistent environments that enable software applications to integrate instruments, displays, computational and information resources that are managed by diverse organizations in widespread locations SDM - Scientific Data Management Research Group – optimized and standardized access to storage systems Storage Resource Broker - client-server middleware that provides a uniform interface for connecting to heterogeneous data resources over a network and cataloging/accessing replicated data sets.
February 12, 2002Lee Lueking - D0 RACE12 PPDG Project Activities CMS GDMP: Grid Data Mirroring Project CMS GDMP: Grid Data Mirroring Project D0 Job Management D0 Job Management CMS-MOP: Monte Carlo Distributed Production CMS-MOP: Monte Carlo Distributed Production STAR-DDM: Uses HRM, (Hierarchical Resource Manager) STAR-DDM: Uses HRM, (Hierarchical Resource Manager) JLAB-SRB:Storage Resource Broker, Replication JLAB-SRB:Storage Resource Broker, Replication ATLAS MAGDA: distributed data manager (ATLAS-Globus) ATLAS MAGDA: distributed data manager (ATLAS-Globus)
February 12, 2002Lee Lueking - D0 RACE13 PPDG Cross-cut Activities SC2001 SC2001 Certificate /Registration Authority Certificate /Registration Authority Collaboration with IEPM, Network Performance Monitoring Collaboration with IEPM, Network Performance Monitoring
February 12, 2002Lee Lueking - D0 RACE14 Security, Privacy, Legal Super Computing 2001 in Denver
February 12, 2002Lee Lueking - D0 RACE15
February 12, 2002Lee Lueking - D0 RACE16 Common Services Job Description Language Job Description Language Scheduling and Management of Processing and Data Placement Activities Scheduling and Management of Processing and Data Placement Activities Monitoring and Status Reporting Monitoring and Status Reporting Storage Resource Management Storage Resource Management Reliable Replica Management Services Reliable Replica Management Services File Transfer Services File Transfer Services Collect and Cocument Current Experimental Practices Collect and Cocument Current Experimental Practices R & D, Evaluation R & D, Evaluation Authentication, Authorization, and Security Authentication, Authorization, and Security End-to-End Applications and Testbeds End-to-End Applications and Testbeds
February 12, 2002Lee Lueking - D0 RACE17 Delivery of End-to-End Applications & Integrated Production Systems to allow thousands of physicists to share data & computing resources for scientific processing and analyses Operators & Users Resources: Computers, Storage, Networks PPDG Focus: - Robust Data Replication - Intelligent Job Placement and Scheduling - Management of Storage Resources - Monitoring and Information of Global Services Relies on Grid infrastructure: - Security & Policy - High Speed Data Transfer - Network management
February 12, 2002Lee Lueking - D0 RACE18 Project Activities, End-to-End Applications and Cross-Cut Pilots Project Activities are focused Experiment – Computer Science Collaborative developments. Replicated data sets for science analysis – BaBar, CMS, STAR Distributed Monte Carlo production services – ATLAS, D0, CMS Common storage management and interfaces – STAR, JLAB End-to-End Applications used in Experiment data handling systems to give real-world requirements, testing and feedback. Error reporting and response Fault tolerant integration of complex components Cross-Cut Pilots for common services and policies Certificate Authority policy and authentication File transfer standards and protocols Resource Monitoring – networks, computers, storage.
February 12, 2002Lee Lueking - D0 RACE19 PPDG activities as part of the Global Grid Community Coordination with other Grid Projects in our field: GriPhyN – Grid for Physics Network European DataGrid Storage Resource Management collaboratory HENP Data Grid Coordination Committee Participation in Experiment and Grid deployments in our field: ATLAS, BaBar, CMS, D0, Star, JLAB experiment data handling systems iVDGL/DataTAG – International Virtual Data Grid Laboratory Use DTF computational facilities? Active in Standards Committees: Internet2 HENP Working Group Global Grid Forum
February 12, 2002Lee Lueking - D0 RACE20 PPDG and GridPP Projects Use of Standard Middleware to Promote Interoperability Use of Standard Middleware to Promote Interoperability Move to Globus infrastructure: GSI, GridFTP Use of Condor as a supported system for job submission Publish availability of resources and file catalog Additional Grid Functionality for Job Specification, Submission, and Tracking Additional Grid Functionality for Job Specification, Submission, and Tracking Use Condor for migration and check pointing Enhanced job specification language and services Enhanced Monitoring and Diagnostic Capabilities Enhanced Monitoring and Diagnostic Capabilities Fabric Management Fabric Management
February 12, 2002Lee Lueking - D0 RACE21 PPDG Management and Coordination PIs Livny, Newman, Mount Steering Committee Ruth Pordes, Chair Doug Olson, Physics Deputy Chair Miron Livny, Computer Science Deputy Chair Computer Science Group Representatives Physics Experiment Representatives PIs (ex officio) STARSDMBaBarSRBJLABATLASGlobusCMSCondorDZero Executive Team (>1.0 FTE on PPDG) Steering Committee Chair Steering Committee Physics and CS Deputy Chairs
February 12, 2002Lee Lueking - D0 RACE22 iVDGL International Virtual-Data Grid Laboratory International Virtual-Data Grid Laboratory A global Grid laboratory with participation from US, EU, Asia, etc. A place to conduct Data Grid tests “at scale” A mechanism to create common Grid infrastructure A facility to perform production exercises for LHC experiments A laboratory for other disciplines to perform Data Grid tests “We propose to create, operate and evaluate, over a sustained period of time, an international research laboratory for data-intensive science.” From NSF proposal, 2001
February 12, 2002Lee Lueking - D0 RACE23 iVDGL Summary Information Principal components (as seen by USA) Principal components (as seen by USA) Tier1 sites (laboratories) Tier2 sites (universities and other institutes) Selected Tier3 sites (universities) Fast networks: US, Europe, transatlantic International Grid Operations Center (iGOC) Computer Science support teams Coordination, management Proposed international partners Proposed international partners Initially US, EU, Japan, Australia Other world regions later Discussions w/ Russia, China, Pakistan, India, South America Complementary EU project: DataTAG Complementary EU project: DataTAG Transatlantic network from CERN to STAR-TAP (+ people) Initially 2.5 Gb/s
February 12, 2002Lee Lueking - D0 RACE24 US Proposal to NSF US proposal approved by NSF Sept. 25, 2001 US proposal approved by NSF Sept. 25, 2001 “Part 2” of GriPhyN project Much more application oriented than first GriPhyN proposal $15M, 5 $3M per year (huge constraint) CMS + ATLAS + LIGO + SDSS/NVO + Computer Science Scope of US proposal Scope of US proposal Deploy Grid laboratory with international partners Acquire Tier2 hardware, Tier2 support personnel Integrate of Grid software into applications CS support teams (+ 6 UK Fellows) to harden tools Establish International Grid Operations Center (iGOC) Deploy hardware at 3 minority institutions (Tier3)
February 12, 2002Lee Lueking - D0 RACE25 US iVDGL Proposal Participants T2/Software CS support T3/Outreach T1/Labs U FloridaCMS CaltechCMS, LIGO UC San DiegoCMS, CS Indiana UATLAS, iGOC Boston UATLAS U Wisconsin, MilwaukeeLIGO Penn StateLIGO Johns HopkinsSDSS, NVO U ChicagoCS U Southern CaliforniaCS U Wisconsin, MadisonCS Salish KootenaiOutreach, LIGO Hampton UOutreach, ATLAS U Texas, BrownsvilleOutreach, LIGO FermilabCMS, SDSS, NVO BrookhavenATLAS Argonne LabATLAS, CS
February 12, 2002Lee Lueking - D0 RACE26 iVDGL Partners National partners National partners PPDG (Particle Physics Data Grid ) DTF: Distributed Terascale Facility CAL-IT2 (new California Grid initiative) Current international partners Current international partners EU-DataGrid UK PPARC funding agency UK Core e-Science Program 6 UK Fellowships INFN (Italy) 2 Japanese institutes 1 Australian institute (APAC)
February 12, 2002Lee Lueking - D0 RACE27 iVDGL Map Circa Tier0/1 facility Tier2 facility 10 Gbps link 2.5 Gbps link 622 Mbps link Other link Tier3 facility
February 12, 2002Lee Lueking - D0 RACE28 iVDGL Requirements Realistic scale Realistic scale In number, diversity, distribution, network connectivity Delegated management and local autonomy Delegated management and local autonomy Management needed to operate as large, single facility Autonomy needed for sites and experiments Support large-scale experimentation Support large-scale experimentation To provide useful information for building real Data Grids Robust operation Robust operation For long running applications in complex environment Instrumentation and monitoring Instrumentation and monitoring Required for an experimental facility Integration with international “cyberinfrastructure” Integration with international “cyberinfrastructure” Extensibility Extensibility
February 12, 2002Lee Lueking - D0 RACE29 Approach Define a laboratory architecture Define a laboratory architecture Define expected laboratory functions Build in scalability, extensibility, reproducibility Define instrumentation, monitoring Establish CS support teams (develop/harden tools, support users) Define working relationship, coordination with partners Create and operate global-scale laboratory Create and operate global-scale laboratory Deploy hardware, software, personnel at Tier2, Tier3 sites Establish iGOC, single point of contact for monitoring, support, … Help international partners establish sites Evaluate and improve iVDGL through experimentation Evaluate and improve iVDGL through experimentation CS support teams will work with experiments Extend results to partners Engage underrepresented groups Engage underrepresented groups Integrate minority institutions as Tier3 sites
February 12, 2002Lee Lueking - D0 RACE30 iVDGL as a Laboratory Grid Exercises Grid Exercises “Easy”, intra-experiment tests first (10-30%, national, transatlantic) “Harder” wide-scale tests later (30-100% of all resources) CMS is already conducting transcontinental simulation productions Operation as a facility Operation as a facility Common software, central installation to ensure compatibility CS teams to “harden” tools, support applications iGOC to monitor performance, handle problems
February 12, 2002Lee Lueking - D0 RACE31 Emphasize Simple Operation “Local” control of resources vitally important “Local” control of resources vitally important (Site level or national level) Experiments, politics demand it Operate mostly as a “partitioned” testbed Operate mostly as a “partitioned” testbed (Experiment, nation, etc.) Avoids excessive coordination Allows software tests in different partitions Hierarchy of operation must be defined Hierarchy of operation must be defined E.g., (1) National + experiment, (2) inter-expt., (3) global tests
February 12, 2002Lee Lueking - D0 RACE32 Other Disciplines Use by other disciplines Use by other disciplines Expected to be at the 10% level Other HENP experiments Virtual Observatory (VO) community in Europe/US Gravity wave community in Europe/US/Australia/Japan Earthquake engineering Bioinformatics Our CS colleagues (wide scale tests)
February 12, 2002Lee Lueking - D0 RACE33 US iVDGL Management and Coordination Project Directors Avery, Foster Project Coordination Group Project Coordinator Project Directors Coordinators of Systems Integration, Education/Outreach Physics Experiment Representatives University Research Center or Group Representatives PACI Representatives iVDGL Design and Deployment Integration with Applications University Research Centers / Groups International Grid Operations Center Collaboration Board (Advisory) External Advisory Board
February 12, 2002Lee Lueking - D0 RACE34 Conclusion Grid involvement offers many challenges and opportunities. Grid involvement offers many challenges and opportunities. PPDG, and iVDGL are complementary in their approach and deliverables. PPDG, and iVDGL are complementary in their approach and deliverables. These efforts, along with our European partners will provide exciting new ways to share data and computing resources in the future. These efforts, along with our European partners will provide exciting new ways to share data and computing resources in the future. Acknowledgements: Richard Mount (SLAC) and Paul Avery (University of Florida), Ruth Pordes (FNAL). Acknowledgements: Richard Mount (SLAC) and Paul Avery (University of Florida), Ruth Pordes (FNAL).