September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Third LCB Workshop Distributed Computing and Regional.

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Third LCB Workshop Distributed Computing and Regional Centres Session Harvey B. Newman (CIT) Harvey B. Newman (CIT) Marseilles, September 29, 1999 http://l3www.cern.ch/~newman/marseillessep29.ppthttp://l3www.cern.ch/~newman/marseillessep29/index.htm

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) LHC Computing: Different from Previous Experiment Generations è Geographical dispersion: of people and resources è Complexity: the detector and the LHC environment è Scale: Petabytes per year of data 1800 Physicists 150 Institutes 32 Countries Major challenges associated with:  Coordinated Use of Distributed computing resources  Remote software development and physics analysis  Communication and collaboration at a distance R&D: New Forms of Distributed Systems R&D: New Forms of Distributed Systems

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Challenges: Complexity Events: è Signal event is obscured by 20 overlapping uninteresting collisions in same crossing è Track reconstruction time at 10 34 Luminosity several times 10 33 è Time does not scale from previous generations

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) HEP Bandwidth Needs & Price Evolution HEP GROWTH HEP GROWTH 1989 - 1999 A Factor of one to Several Hundred on Principal Transoceanic Links A Factor of Up to 1000 in Domestic Academic and Research Nets A Factor of Up to 1000 in Domestic Academic and Research Nets HEP NEEDS HEP NEEDS 1999 - 2006 Continued Study by ICFA-SCIC; 1998 Results of ICFA-NTF Show A Factor of One to Several Hundred (2X Per Year) COSTS ( to Vendors) COSTS ( to Vendors) Optical Fibers and WDM: a factor > 2/year reduction now ? Limits of Transmission Speed, Electronics, Protocol Speed PRICE to HEP ? Complex Market, but Increased Budget likely to be needed Reference BW/Price Evolution: ~1.5 times/year

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Cost Evolution: CMS 1996 Versus 1999 Technology Tracking Team Compare to 1999 Technology Tracking Team Projections for 2005  CPU: Unit cost will be close to early prediction  Disk: Will be more expensive (by ~2) than early prediction  Tape: Currently Zero to 10% Annual Cost Decrease (Potential Problem) CMS 1996 Estimates

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) LHC (and HENP) Computing and Software Challenges Software: Modern Languages, Methods and Tools The Key to Manage Complexity è FORTRAN The End of an Era; OBJECTS A Coming of Age è “TRANSPARENT” Access To Data: k Location and Storage Medium Independence Data Grids: A New Generation of Data-Intensive Network-Distributed Systems for Analysis è A Deep Heterogeneous Client/Server Hierarchy, of Up to 5 Levels è An Ensemble of Tape and Disk Mass Stores è LHC: Object Database Federations Interaction of the Software and Data Handling Architectures: The Emergence of New Classes of Operating Systems

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Four Experiments The Petabyte to Exabyte Challenge ATLAS, CMS, ALICE, LHCB Higgs and New particles; Quark-Gluon Plasma; CP Violation Data written to tape ~5 Petabytes/Year and UP (1 PB = 10 15 Bytes) Data written to tape ~5 Petabytes/Year and UP (1 PB = 10 15 Bytes) 0.1 to 1 Exabyte (1 EB = 10 18 Bytes) (~2010) (~2020 ?) Total for the LHC Experiments 0.1 to 1 Exabyte (1 EB = 10 18 Bytes) (~2010) (~2020 ?) Total for the LHC Experiments

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) To Solve: the HENP “Data Problem” While the proposed future computing and data handling facilities are large by present-day standards, They will not support FREE access, transport or reconstruction for more than a Minute portion of the data. While the proposed future computing and data handling facilities are large by present-day standards, They will not support FREE access, transport or reconstruction for more than a Minute portion of the data. è Need effective global strategies to handle and prioritise requests, based on both policies and marginal utility è Strategies must be studied and prototyped, to ensure Viability: acceptable turnaround times; efficient resource utilization  Problems to be Explored; How To è Meet the demands of hundreds of users who need transparent access to local and remote data, in disk caches and tape stores è Prioritise hundreds to thousands of requests from local and remote communities è Ensure that the system is dimensioned “optimally”, for the aggregate demand

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Models Of Networked Analysis At Regional Centers Caltech, CERN, Columbia, FNAL, Heidelberg, Helsinki, INFN, IN2P3, KEK, Marseilles, MPI, Munich, Orsay, Oxford, Tufts GOALS è Specify the main parameters characterizing the Model’s performance: throughputs, latencies è Develop “Baseline Models” in the “feasible” category è Verify resource requirement baselines: (computing, data handling, networks) COROLLARIES: è Define the Analysis Process è Define RC Architectures and Services è Provide Guidelines for the final Models è Provide a Simulation Toolset for Further Model studies 622 Mbits/s Desk tops CERN 6.10 7 MIPS 2000 Tbyte; Robot University n.10 6 MIPS 100 Tbyte; Robot FNAL/BNL 4.10 6 MIPS 200 Tbyte; Robot 622 Mbits/s Desk tops Desk tops Model Circa 2006

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC General Conclusions on LHC Computing Following discussions of computing and network requirements, technology evolution and projected costs, support requirements etc. Following discussions of computing and network requirements, technology evolution and projected costs, support requirements etc.  The scale of LHC “Computing” is such that it requires a worldwide effort to accumulate the necessary technical and financial resources  The uncertainty in the affordable network BW implies that several scenarios of computing resource-distribution must be developed  A distributed hierarchy of computing centres will lead to better use of the financial and manpower resources of CERN, the Collaborations, and the nations involved, than a highly centralised model focused at CERN k Hence: The distributed model also provides better use of physics opportunities at the LHC by physicists and students  At the top of the hierarchy is the CERN Center, with the ability to perform all analysis-related functions, but not the ability to do them completely  At the next step in the hierarchy is a collection of large, multi-service “Tier1 Regional Centres”, each with k 10-20% of the CERN capacity devoted to one experiment  There will be Tier2 or smaller special purpose centers in many regions

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Grid-Hierarchy Concept Matched to the Worldwide-Distributed Collaboration Structure of LHC Experiments Best Suited for the Multifaceted Balance Between  Proximity of the data to centralized processing resources  Proximity to end-users for frequently accessed data  Efficient use of limited network bandwidth (especially transoceanic; and many world regions) through organized caching/mirroring/replication  Appropriate use of (world-) regional and local computing and data handling resources  Effective involvement of scientists and students in each world region, in the data analysis and the physics

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Phase 1 and 2 Deliverables  September 1999: Benchmark test validating the simulation è Milestone completed  Fall 1999: A Baseline Model representing a possible (somewhat simplified) solution for LHC Computing. è Baseline numbers for a set of system and analysis process parameters k CPU times, data volumes, frequency and site of jobs and data... è Reasonable “ranges” of parameters k “Derivatives”: How the effectiveness depends on some of the more sensitive parameters  Agreement of the experiments on the reasonableness of the Baseline Model  Chapter on Computing Models in the CMS and ATLAS Computing Technical Progress Reports

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC and Regional Centres  MONARC RC Representative Meetings in April and August  Regional Centre Planning well-advanced, with optimistic outlook, in US (FNAL for CMS; BNL for ATLAS), France (CCIN2P3), Italy è Proposals to be submitted late this year or early next  Active R&D and prototyping underway, especially in US, Italy, Japan; and UK (LHCb), Russia (MSU, ITEP), Finland (HIP/Tuovi)  Discussions in the national communities also underway in Japan, Finland, Russia, UK, Germany è Varying situations: according to the funding structure and outlook  Need for more active planning outside of US, Europe, Japan, Russia è Important for R&D and overall planning  There is a near-term need to understand the level and sharing of support for LHC computing between CERN and the outside institutes, to enable the planning in several countries to advance. è MONARC + CMS/SCB assumption: “traditional” 1/3: 2/3 sharing

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Working Groups & Chairs “Analysis Process Design” “Analysis Process Design” P. Capiluppi (Bologna, CMS) P. Capiluppi (Bologna, CMS) “Architectures” “Architectures” Joel Butler (FNAL, CMS) Joel Butler (FNAL, CMS) “Simulation” “Simulation” Krzysztof Sliwa (Tufts, ATLAS) Krzysztof Sliwa (Tufts, ATLAS) “Testbeds” “Testbeds” Lamberto Luminari (Rome, ATLAS) Lamberto Luminari (Rome, ATLAS) “Steering” “Steering” Laura Perini (Milan, ATLAS) Laura Perini (Milan, ATLAS) Harvey Newman (Caltech, CMS) Harvey Newman (Caltech, CMS) & “Regional Centres Committee” & “Regional Centres Committee”

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Architectures WG  Discussion and study of Site Requirements è Analysis task division between CERN and RC è Facilities required with different analysis scenarios, and network bandwidth è Support required to (a) sustain the Centre, and (b) contribute effectively to the distributed system  Reports è Rough Sizing Estimates for a Large LHC Experiment Facility è Computing Architectures of Existing Experiments: k LEP, FNAL Run2, CERN Fixed Target (NA45, NA48), FNAL Fixed Target (KTeV, FOCUS) è Regional Centres for LHC Computing: (functionality & services) è Computing Architectures of Future Experiments (in progress) k Babar, RHIC, COMPASS  Conceptual Designs, Drawings and Specifications for Candidate Site Architecture

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Comparisons with LHC sized experiment: CMS or ATLAS è [*] Total CPU: CMS or ATLAS ~ 1.5-2,000,000 MSi95 (Current Concepts; maybe for 10 33 Luminosity)

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Architectural Sketch: One Major LHC Experiment, At CERN (L. Robertson) è Mass Market Commodity PC Farms è LAN-SAN and LAN-WAN “Stars” (Switch/Routers) è Tapes (Many Drives for ALICE); an archival medium only ?

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Architectures WG: Lessons and Challenges for LHC è SCALE: ~100 Times more CPU and ~10 Times more Data than CDF at Run2 (2000-2003) è DISTRIBUTION: Mostly Achieved in HEP Only for Simulation. For Analysis (and some re-Processing), it will not happen without advance planning and commitments è REGIONAL CENTRES: Require Coherent support, continuity, the ability to maintain the code base, calibrations and job parameters up-to-date è HETEROGENEITY: Of facility architecture and mode of use, and of operating systems, must be accommodated. è FINANCIAL PLANNING: Analysis of the early planning for the LEP era showed a definite tendency to underestimate the more requirements (by more than an order of magnitude) k Partly due to budgetary considerations

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Tapes Network from CERN Network from Tier 2 & simulation centers Tape Mass Storage & Disk Servers Database Servers Physics Software Development R&D Systems and Testbeds Info servers Code servers Web Servers Telepresence Servers Training Consulting Help Desk Production Reconstruction Raw/Sim  ESD Scheduled, predictable experiment/ physics groups Production Analysis ESD  AOD AOD  DPD Scheduled Physics groups Individual Analysis AOD  DPD and plots Chaotic Physicists Desktops Tier 2 Local institutes CERN Tapes Regional Centre Architecture Example by I. Gaines

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Architectures WG: Regional Centre Facilities & Services Regional Centres Should Provide Regional Centres Should Provide è All technical and data services required to do physics analysis è All Physics Objects, Tags and Calibration data è Significant fraction of raw data è Caching or mirroring calibration constants è Excellent network connectivity to CERN and the region’s users è Manpower to share in the development of common maintenance, validation and production software è A fair share of post- and re-reconstruction processing è Manpower to share in the work on Common R&D Projects è Service to members of other regions on a (?) best effort basis è Excellent support services for training, documentation, troubleshooting at the Centre or remote sites served by it Long Term Commitment for staffing, hardware evolution and support for R&D, as part of the distributed data analysis architecture

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Analysis Process WG “ How much data is processed by how many people, how often, in how many places, with which priorities…” “ How much data is processed by how many people, how often, in how many places, with which priorities…”  Analysis Process Design: Initial Steps è Consider number and type of processing and analysis jobs, frequency, number of events, data volumes, CPU etc. è Consider physics goals, triggers, signals and background rates è Studies covered Reconstruction, Selection/Sample Reduction (one or more passes), Analysis, Simulation è Lessons from existing experiments are limited: each case is tuned to the detector, run conditions, physics goals and technology of the time è Limited studies so far, from the user rather than the system point of view; more as feedback from simulations are obtained è Limitations on CPU dictate a largely “Physics Analysis Group” oriented approach to reprocessing of data k And Regional (“local”) support for individual activities k Implies dependence on the RC Hierarchy

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Analysis Process: Initial Sharing Assumptions  Assume similar computing capacity available outside CERN for re-processing and data analysis[*]  There is no allowance for event simulation and reconstruction of simulated data, which it is assumed will be performed entirely outside CERN [*]  Investment, services and infrastructure should be optimised to reduce overall costs [TCO]  Tape sharing makes sense if Alice needs so much more at a different time of the year [*] First two assumptions would likely result in “at least” a 1/3:2/3 CERN:Outside ratio of resources (I.e., likely to be larger outside).

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Analysis Process Example

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Analysis Process Baseline Group-Oriented Analysis

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Baseline Analysis Process: ATLAS/CMS Reconstruction Step

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Sizes Sizes è Raw data 1 MB/event è ESD 100 KB/event è AOD 10 KB/event è TAG or DPD1 KB/event CPU Time in SI95 seconds CPU Time in SI95 seconds (without ODBMS overhead; ~ 20%) (without ODBMS overhead; ~ 20%) è Creating ESD (from Raw) 350 è Selecting ESD 0.25 è Creating AOD (from ESD) 2.5 è Creating TAG (from AOD) 0.5 è Analyzing TAG or DPD 3.0 è Analyzing AOD 3.0 è Analyzing ESD 3.0 è Analyzing RAW 350 Monarc Analysis Model Baseline: Event Sizes and CPU Times

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Monarc Analysis Model Baseline: ATLAS or CMS at CERN Center è CPU Power520 KSI95 è Disk space540 TB è Tape capacity3 PB, 400 MB/sec è Link speed to RC40 MB/sec (1/2 of 622 Mbps) è Raw data100%1-1.5 PB/year è ESD data100%100-150 TB/year è Selected ESD100%20 TB/year [*] è Revised ESD100%40 TB/year [*] è AOD data100%2 TB/year [*] è Revised AOD100%4 TB/year [*] è TAG/DPD100%200 GB/year è Simulated data100%100 TB/year (repository) [*] Covering all Analysis Groups; each selecting ~1% of Total ESD or AOD data for a Typical Analysis

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Monarc Analysis Model Baseline: ATLAS or CMS at CERN Center è CPU Power520 KSI95 è Disk space540 TB è Tape capacity3 PB, 400 MB/sec è Link speed to RC40 MB/sec (1/2 of 622 Mbps) è Raw data100%1-1.5 PB/year è ESD data100%100-150 TB/year è Selected ESD100%20 TB/year è Revised ESD100%40 TB/year è AOD data100%2 TB/year è Revised AOD100%4 TB/year è TAG/DPD100%200 GB/year è Simulated data100%100 TB/year (repository) Some of these Basic Numbers require further Study Some of these Basic Numbers require further Study 300 KSI95 ? 300 KSI95 ? 200 TB/yr 200 TB/yr 140 TB/yr 140 TB/yr ~1-10 TB/yr ~70 TB/yr ~70 TB/yr LHCb (Prelim.)

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Monarc Analysis Model Baseline: ATLAS or CMS “Typical” Tier1 RC è CPU Power~100 KSI95 è Disk space~100 TB è Tape capacity300 TB, 100 MB/sec è Link speed to Tier210 MB/sec (1/2 of 155 Mbps) è Raw data1% 10-15 TB/year è ESD data100%100-150 TB/year è Selected ESD25%5 TB/year [*] è Revised ESD25%10 TB/year [*] è AOD data100%2 TB/year [**] è Revised AOD100%4 TB/year [**] è TAG/DPD100%200 GB/year Simulated data 25%25 TB/year (repository) [*] Covering Five Analysis Groups; each selecting ~1% of Total ESD or AOD data for a Typical Analysis [**] Covering All Analysis Groups

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Analysis Process WG: A Short List of Upcoming Issues è Priorities, schedules and policies k Production vs. Analysis Group vs. Individual activities k Allowed percentage of access to higher data tiers (TAG /Physics Objects/Reconstructed/RAW) è Improved understanding of the Data Model, and ODBMS è Including MC production; simulated data storage and access è Mapping the Analysis Process onto heterogeneous distributed resources è Determining the role of Institutes’ workgroup servers and desktops, in the Regional Centre Hierarchy è Understanding how to manage persistent data: e.g. storage / migration / transport / re-compute strategies è Deriving a methodology for Model testing and optimisation k Metrics for evaluating the global efficiency of a Model: Cost vs throughput; turnaround; reliability of data access

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Testbeds WG Measurements of Key Parameters governing the behavior and scalability of the Models è Simple testbed configuration defined and implemented è Sun Solaris 2.6, C++ compiler version 4.2 k Objectivity 5.1 with /C++, /stl, /FTO, /Java options è Set up at CNAF, FNAL, Genova, Milano, Padova, Roma, KEK, Tufts, CERN è Four “Use Case” Applications Using Objectivity: k ATLASFAST++, GIOD/JavaCMS, ATLAS 1 TB Milestone, CMS Test Beams è System Performance Tests; Simulation Validation Milestone Carried Out: See I. Legrand talk

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Testbed Systems

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Testbeds WG: Isolation of Key Parameters Some Parameters Measured, Installed in the MONARC Simulation Models, and Used in First Round Validation of Models.  Objectivity AMS Response Time-Function, and its dependence on è Object clustering, page-size, data class-hierarchy and access pattern è Mirroring and caching (e.g. with the Objectivity DRO option)  Scalability of the System Under “Stress”: è Performance as a function of the number of jobs, relative to the single-job performance  Performance and Bottlenecks for a variety of data access patterns è Frequency of following TAG  AOD; AOD  ESD; ESD  RAW è Data volume accessed remotely k Fraction on Tape, and on Disk k As Function of Net Bandwidth; Use of QoS

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) k The Java (JDK 1.2) environment is well-suited for developing a flexible and distributed process oriented simulation. k This Simulation program is still under development, and dedicated measurements to evaluate realistic parameters and “validate” the simulation program are in progress. MONARC Simulation A CPU- and code-efficient approach for the simulation of distributed systems has been developed for MONARC èprovides an easy way to map the distributed data processing, transport, and analysis tasks onto the simulation ècan handle dynamically any Model configuration, including very elaborate ones with hundreds of interacting complex Objects ècan run on real distributed computer systems, and may interact with real components

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Example : Physics Analysis at Regional Centres èSimilar data processing jobs are performed in several RCs èEach Centre has “TAG” and “AOD” databases replicated. èMain Centre provides “ESD” and “RAW” data èEach job processes AOD data, and also a a fraction of ESD and RAW.

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Example: Physics Analysis

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Simple Validation Measurements The AMS Data Access Case SimulationMeasurements Raw DataDB LAN 4 CPUs Client

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Strategy and Tools for Phase 2 Strategy : Vary System Capacity and Network Performance Parameters Over a Wide Range Strategy : Vary System Capacity and Network Performance Parameters Over a Wide Range  Avoid complex, multi-step decision processes that could require protracted study. k Keep for a possible Phase 3  Majority of the workload satisfied in an acceptable time k Up to minutes for interactive queries, up to hours for short jobs, up to a few days for the whole workload  Determine requirements “baselines” and/or flaws in certain Analysis Processes in this way  Perform a comparison of a CERN-tralised Model, and suitable variations of Regional Centre Models Tools and Operations to be Designed in Phase 2  Query estimators  Affinity evaluators, to determine proximity of multiple requests in space or time  Strategic algorithms for caching, reclustering, mirroring, or pre-emptively moving data (or jobs or parts of jobs)

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Phase 2 Detailed Milestones July 1999: Complete Phase 1; Begin Second Cycle of Simulations with More Refined Models

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Possible Phase 3 TIMELINESS and USEFUL IMPACT  Facilitate the efficient planning and design of mutually compatible site and network architectures, and services è Among the experiments, the CERN Centre and Regional Centres  Provide modelling consultancy and service to the experiments and Centres  Provide a core of advanced R&D activities, aimed at LHC computing system optimisation and production prototyping  Take advantage of work on distributed data-intensive computing for HENP this year in other “next generation” projects [*] è For example in US: “Particle Physics Data Grid” (PPDG) of DoE/NGI; “A Physics Optimized Grid Environment for Experiments” (APOGEE) to DoE/HENP; joint “GriPhyN” proposal to NSF by ATLAS/CMS/LIGO [*] See H. Newman, http://www.cern.ch/MONARC/progress_report/longc7.html

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Phase 3 Possible Technical Goal: System Optimisation Maximise Throughput and/or Reduce Long Turnaround  Include long and potentially complex decision-processes in the studies and simulations è Potential for substantial gains in the work performed or resources saved Phase 3 System Design Elements  RESILIENCE, resulting from flexible management of each data transaction, especially over WANs  FAULT TOLERANCE, resulting from robust fall-back strategies to recover from abnormal conditions  SYSTEM STATE & PERFORMANCE TRACKING, to match and co-schedule requests and resources, detect or predict faults Synergy with PPDG and other Advanced R&D Projects. Potential Importance for Scientific Research and Industry: Simulation of Distributed Systems for Data-Intensive Computing.

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) MONARC Status: Conclusions  MONARC is well on its way to specifying baseline Models representing cost-effective solutions to LHC Computing.  Initial discussions have shown that LHC computing has a new scale and level of complexity. è A Regional Centre hierarchy of networked centres appears to be the most promising solution.  A powerful simulation system has been developed, and we are confident of delivering a very useful toolset for further model studies by the end of the project.  Synergy with other advanced R&D projects has been identified. This may be of considerable mutual benefit.  We will deliver important information, and example Models: è That is very timely for the Hoffmann Review and discussions of LHC Computing over the next months è In time for the Computing Progress Reports of ATLAS and CMS

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) LHC Data Models: RD45 HEP data models are complex! è Rich hierarchy of hundreds of complex data types (classes) è Many relations between them è Different access patterns (Multiple Viewpoints) LHC experiments rely on OO technology è OO applications deal with networks of objects (and containers) è Pointers (or references) are used to describe relations Existing solutions do not scale è Solution suggested by RD45: ODBMS coupled to a Mass Storage System Event TrackList TrackerCalorimeter Track Track Track Track Track HitList Hit Hit Hit Hit Hit

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) System View of Data Analysis by 2005 Multi-Petabyte Object Database Federation Backed by a Networked Set of Archival Stores u High Availability and Immunity from Corruption u “Seamless” response to database queries è Location Independence; storage brokers; caching è Clustering and Reclustering of Objects è Transfer only “useful” data: k Tape/disk; across networks; disk/client u Access and Processing Flexibility è Resource and application profiling, state tracking, co-scheduling è Continuous retrieval/recalculation/storage decisions è Trade off data storage, CPU and network capabilities to optimize performance and costs

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) CMS Analysis and Persistent Object Store Online Common Filters and Pre-Emptive Object Creation On Demand Object Creation CMS Slow Control Detector Monitoring “L4” L2/L3 L1 Persistent Object Store Object Database Management System Filtering Simulation Calibrations, Group Analyses User Analysis Data Organized In a(n Object) “Hierarchy” u Raw, Reconstructed (ESD), Analysis Objects (AOD), Tags Data Distribution u All raw, reconstructed and master parameter DB’s at CERN u All event TAG and AODs at all regional centers u Selected reconstructed data sets at each regional center u HOT data (frequently accessed) moved to RCs

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) GIOD Summary: (Caltech/CERN/FNAL/HP/SDSC) Hit Track Detector GIOD has è Constructed a Terabyte-scale set of fully simulated events and used these to create a large OO database è Learned how to create large database federations è Completed “100” (to 170) Mbyte/sec CMS Milestone è Developed prototype reconstruction and analysis codes, and Java 3D OO visualization prototypes, that work seamlessly with persistent objects over networks è Deployed facilities and database federations as useful testbeds for Computing Model studies

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Babar OOFS: Putting The Pieces Together Filesystem Implementation SLAC Designed & Developed Veritas IBM DOE Objectivity Filesystem Logical Layer Filesystem Physical Layer Database Protocol Layer User

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Dynamic Load Balancing Hierarchical Secure AMS Dynamic Selection Tapes are transparent. Tape resources can reside anywhere. Defer Request Protocol è Transparently delays client while data is made available è Accommodates high latency storage systems (e.g., tape) Request Redirect Protocol è Redirects client to an alternate AMS è Provides for dynamic replication and real-time load balancing

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Regional Centers Concept: A Data Grid Hierarchy LHC Grid Hierarchy Example  Tier0: CERN  Tier1: National “Regional” Center  Tier2: Regional Center  Tier3: Institute Workgroup Server  Tier4: Individual Desktop Total 5 Levels

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Background: Why “Grids”? I. Foster, ANL/Chicago Because the resources needed to solve complex problems are rarely colocated è Advanced scientific instruments è Large amounts of storage è Large amounts of computing è Groups of smart people For a variety of reasons è Resource allocations not optimized for one application è Required resource configurations change è Different views of priorities and truth For transparent, rapid access and delivery of Petabyte-scale data (and Multi-TIPS computing resources)

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Grid Services Architecture [*] GridFabric GridServices ApplnToolkitsApplns Data stores, networks, computers, display devices,… ; associated local services Protocols, authentication, policy, resource management, instrumentation, resource discovery,etc.... RemoteviztoolkitRemotecomp.toolkitRemotedatatoolkitRemotesensorstoolkitRemotecollab.toolkit A Rich Set of HEP Data-Analysis Related Applications [*] Adapted from Ian Foster: there are computing grids, data grids, access (collaborative) grids,...

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) u RD45, GIOD:Networked Object Databases u Clipper/GC; High speed access to Objects or File data FNAL/SAM for processing and analysis u SLAC/OOFS Distributed File System + Objectivity Interface u NILE, Condor:Fault Tolerant Distributed Computing with Heterogeneous CPU Resources u MONARC:LHC Computing Models: Architecture, Simulation, Testbeds; Strategy, Politics u PPDG:First Distributed Data Services and Grid System Prototype u ALDAP: OO Database Structures and Access Methods for Astrophysics and HENP Data u APOGEE: Full-Scale Grid Design; Instrumentation, System Modeling and Simulation, Evaluation/Optimization u GriPhyN: Production Prototype Grid in Hardware and Software; then Production Roles of HENP Projects for Distributed Analysis (  Grids)

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) ALDAP : Accessing Large Data Archives in Astronomy and Particle Physics ALDAP : Accessing Large Data Archives in Astronomy and Particle Physics NSF Knowledge Discovery Initiative (KDI) CALTECH, Johns Hopkins, FNAL(SDSS) u Explore advanced adaptive database structures, physical data storage hierarchies for archival storage of next generation astronomy and particle physics data u Develop spatial indexes, novel data organizations, distribution and delivery strategies, for Efficient and transparent access to data across networks u Create prototype network-distributed data query execution systems using Autonomous Agent workers u Explore commonalities and find effective common solutions for particle physics and astrophysics data ALDAP (NSF/KDI) Project

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) The China Clipper Project: A Data Intensive Grid China Clipper Goal Develop and demonstrate middleware allowing applications transparent, high-speed access to large data sets distributed over wide-area networks.  Builds on expertise and assets at ANL, LBNL & SLAC  NERSC, ESnet  Builds on Globus Middleware and high-performance distributed storage system (DPSS from LBNL)  Initial focus on large DOE HENP applications  RHIC/STAR, BaBar  Demonstrated data rates to 57 Mbytes/sec. ANL-SLAC-Berkeley

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) HENP Grand Challenge/Clipper Testbed and Tasks High-Speed Testbed è Computing and networking (NTON, ESnet) infrastructure Differentiated Network Services è Traffic shaping on ESnet End-to-end Monitoring Architecture (QE, QM, CM) è Traffic analysis, event monitor agents to support traffic shaping and CPU scheduling Transparent Data Management Architecture è OOFS/HPSS, DPSS/ADSM Application Demonstration è Standard Analysis Framework (STAF) è Access data at SLAC, LBNL, or ANL (net and data quality)

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) The Particle Physics Data Grid (PPDG) DoE/NGI Next Generation Internet Project ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC, U.Wisc/CS Goal: To be able to query and partially retrieve data from PB data stores across Wide Area Networks within seconds è Drive progress in the development of the necessary middleware, networks and fundamental computer science of distributed systems. è Deliver some of the infrastructure for widely distributed data analysis at multi-PetaByte scales by 100s to 1000s of physicists

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) PPDG First Year Deliverable: Site-to-Site Replication Service u Network Protocols Tuned for High Throughput u Use of DiffServ for (1) Predictable high priority delivery of high - bandwidth data streams (2) Reliable background transfers u Use of integrated instrumentation to detect/diagnose/correct problems in long-lived high speed transfers [NetLogger + DoE/NGI developments] u Coordinated reservaton/allocation techniques for storage-to-storage performance u First Year Goal: Optimized cached read access to 1-10 Gbytes, drawn from a total data set of order One Petabyte PRIMARY SITE Data Acquisition, CPU, Disk, Tape Robot SECONDARY SITE CPU, Disk, Tape Robot

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) PPDG Multi-site Cached File Access System University CPU, Disk, Users PRIMARY SITE Data Acquisition, Tape, CPU, Disk, Robot Satellite Site Tape, CPU, Disk, Robot Satellite Site Tape, CPU, Disk, Robot University CPU, Disk, Users University Users Satellite Site Tape, CPU, Disk, Robot

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) First Year PPDG “System” Components Middleware Components (Initial Choice): See PPDG Proposal Page 15 Object and File-Based Objectivity/DB (SLAC enhanced) Application Services GC Query Object, Event Iterator, Query Monitor Object and File-Based Objectivity/DB (SLAC enhanced) Application Services GC Query Object, Event Iterator, Query Monitor FNAL SAM System FNAL SAM System Resource ManagementStart with Human Intervention (but begin to deploy resource discovery & mgmnt tools) Resource ManagementStart with Human Intervention (but begin to deploy resource discovery & mgmnt tools) File Access Service Components of OOFS (SLAC) File Access Service Components of OOFS (SLAC) Cache ManagerGC Cache Manager (LBNL) Cache ManagerGC Cache Manager (LBNL) Mass Storage ManagerHPSS, Enstore, OSM (Site-dependent) Matchmaking Service Condor (U. Wisconsin) Matchmaking Service Condor (U. Wisconsin) File Replication Index MCAT (SDSC) File Replication Index MCAT (SDSC) Transfer Cost Estimation ServiceGlobus (ANL) Transfer Cost Estimation ServiceGlobus (ANL) File Fetching ServiceComponents of OOFS File Movers(s) SRB (SDSC); Site specific File Movers(s) SRB (SDSC); Site specific End-to-end Network ServicesGlobus tools for QoS reservation End-to-end Network ServicesGlobus tools for QoS reservation Security and authenticationGlobus (ANL)

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) PPDG Middleware Architecture for Reliable High Speed Data Delivery Object-based and File-based Application Services Cache Manager File Access Service MatchmakingService Cost Estimation File Fetching Service File Replication Index End-to-End Network Services Mass Storage Manager ResourceManagement File Mover Site Boundary Security Domain

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) PPDG Developments In 2000-2001 uCo-Scheduling algorithms è Matchmaking and Prioritization èDual-Metric Prioritization è Policy and Marginal Utility uDiffServ on Networks to segregate tasks è Performance Classes uTransaction Management è Cost Estimators è Application/TM Interaction è Checkpoint/Rollback uAutonomous Agent Hierarchy

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Mobile Agents: Reactive, Autonomous, Goal Driven, Adaptive è Execute Asynchronously è Reduce Network Load: Local Conversations è Overcome Network Latency; Some Outages è Adaptive  Robust, Fault Tolerant è Naturally Heterogeneous è Extensible Concept: Agent Hierarchies Beyond Traditional Architectures: Mobile Agents (Java Aglets) “Agents are objects with rules and legs” -- D. Taylor Application ServiceAgent

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Distributed Data Delivery and LHC Software Architecture LHC Software and/or Analysis Process must account for data and resource-related realities LHC Software and/or Analysis Process must account for data and resource-related realities è Delay for data location, queueing, scheduling; sometimes for transport and reassembly è Allow for long transaction times, performance shifts, errors, out-of-order arrival of data Software Architectural Choices Software Architectural Choices è Traditional, single-threaded applications k Allow for data arrival and reassembly OR è Performance-Oriented (Complex) k I/O requests up-front; multi-threaded; data driven; respond to ensemble of (changing) cost estimates k Possible code movement as well as data movement k Loosely coupled, dynamic: e.g. Agent-based implementation

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) GriPhyN: First Production Scale “Grid Physics Network” Develop a New Form of Integrated Distributed System, while Meeting Primary Goals of the US LIGO and LHC Programs Develop a New Form of Integrated Distributed System, while Meeting Primary Goals of the US LIGO and LHC Programs  Single Unified GRID System Concept; Hierarchical Structure  (Sub-)Implementations, for LIGO, SDSS, US CMS, US ATLAS: è ~20 Centers: Few Each in US for LIGO, CMS, ATLAS, SDSS  Aspects Complementary to Centralized DoE Funding è University-Based Regional Tier2 Centers, Partnering with the Tier1 Centers è Emphasis on Training, Mentoring and Remote Collaboration Making the Process of Search and Discovery Accessible to Students Making the Process of Search and Discovery Accessible to Students GriPhyN Web Site: http://www.phys.ufl.edu/~avery/mre/ White Paper: http://www.phys.ufl.edu/~avery/mre/white_paper.html

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) APOGEE/GriPhyN Data Grid Implementation An Integrated Distributed System of Tier1 and Tier2 Centers An Integrated Distributed System of Tier1 and Tier2 Centers  Flexible relatively low-cost (PC-based) Tier2 architectures è Medium-scale (for the LHC era) data storage and I/O capability è Well-adapted to local operation; modest system engineer support è Meet changing local and regional needs in the active, early phases of data analysis  Interlinked with Gbps Network Links: Internet2 and Regional Nets Circa 2001-2005 è State of the Art “QoS” techniques to prioritise and shape traffic, to manage bandwidth. Preview transoceanic BW, within the US  A working Production-Prototype (2001-2003) for Petabyte-Scale Distributed Computing Models è Focus on LIGO (+ BaBar and Run2) handling of real data, and LHC Mock Data Challenges with simulated data è Meet the needs, and learn from system performance under stress

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) VRVS: From Videoconferencing to Collaborative Environments > 1400 registered hosts, 22 reflectors, 34 Countries > 1400 registered hosts, 22 reflectors, 34 Countries Running in U.S. Europe and Asia Running in U.S. Europe and Asia è Switzerland: CERN (2) è Italy: CNAF Bologna è UK: Rutherford Lab è France: IN2P3 Lyon, Marseilles è Germany: Heidelberg Univ. è Finland: FUNET è Spain: IFCA-Univ. Cantabria è Russia: Moscow State Univ., Tver. U. è U.S: k Caltech, LBNL, SLAC, FNAL, k ANL, BNL, Jefferson Lab. è DoE HQ Germantown è Asia: Academia Sinica, Taiwan è South America: CeCalcula, Venezuala

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT)

Role of Simulation for Distributed Systems Simulations are widely recognized and used as essential tools for the design, performance evaluation and optimisation of complex distributed systems Simulations are widely recognized and used as essential tools for the design, performance evaluation and optimisation of complex distributed systems è From battlefields to agriculture; from the factory floor to telecommunications systems è Discrete event simulations with an appropriate and high level of abstraction are powerful tools è “Time” intervals, interrupts and performance/load characteristics are the essentials è Not yet an integral part of the HENP culture, but k Some experience in trigger, DAQ and tightly coupled computing systems: CERN CS2 models Simulation is a vital part of the study of site architectures, network behavior, data access/processing/delivery strategies, for HENP Grid Design and Optimization Simulation is a vital part of the study of site architectures, network behavior, data access/processing/delivery strategies, for HENP Grid Design and Optimization

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Monitoring Architecture: Use of NetLogger as in CLIPPER End-to-end monitoring of grid assets [*] is needed to  Resolve network throughput problems  Dynamically schedule resources Add precision-timed event monitor agents to: è ATM switches è DPSS servers è Testbed computational resources Produce trend analysis modules for monitor agents Make results available to applications [*] See talk by B. Tierney

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Summary The HENP/LHC Data Analysis Problem The HENP/LHC Data Analysis Problem Worldwide-distributed Petabyte scale compacted binary data, and computing resources Worldwide-distributed Petabyte scale compacted binary data, and computing resources u Development of a robust networked data access and analysis system is mission-critical u An aggressive R&D program is required u to develop systems for reliable data access, processing and analysis across an ensemble of networks u An effective inter-field partnership is now developing through many R&D projects è HENP analysis is now one of the driving forces for the development of “Data Grids” u Solutions to this problem could be widely applicable in other scientific fields and industry, by LHC startup

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) LHC Computing: Upcoming Issues u Cost of Computing at CERN for the LHC Program è May Exceed 100 MCHF at CERN; Correspondingly More in Total è Some ATLAS/CMS Basic Numbers (CPU, 100 kB Reco. Event) from 1996; Require Further Study è We cannot “scale up” from previous generations (new methods) u CERN/Outside Sharing: MONARC and CMS/SCB Use 1/3:2/3 “Rule” u Computing Architecture and Cost Evaluation è Integration and “Total Cost of Ownership” è Possible Role of Central I/O Servers u Manpower Estimates è CERN versus scaled Regional Centre estimates è Scope of services and support provided u Limits of CERN support and service and the need for Regional Centres u Understanding that LHC Computing is “Different” è A different scale; and worldwide distributed computing for the first time Continuing, System R&D is required Continuing, System R&D is required

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Third LCB Workshop Distributed Computing and Regional.

Similar presentations

Presentation on theme: "September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Third LCB Workshop Distributed Computing and Regional."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Third LCB Workshop Distributed Computing and Regional.

Similar presentations

Presentation on theme: "September 29,1999: LCB Workshop Session on Distributed Computing & Regional Centres Harvey Newman (CIT) Third LCB Workshop Distributed Computing and Regional."— Presentation transcript:

Similar presentations

About project

Feedback