July 26, 1999MONARC Meeting CERN MONARC Meeting CERN July 26, 1999
MONARC Meeting CERN MONARC July 26 Agenda u IntroductionHN, LP 10’ u Discussion of results from RD45 Workshop;Eva Arderiu; 25’ Items most relevant to MONARCYouhei Morita u Validation ProcedureK. Sliwa 15’ u Recent Objectivity tests and implications Youhei; 15’ for the validation milestoneM. Sgaravatto 15’ u Strawman Computing Facility for One Les Robertson 30’ Large Experiment at CERN u Tier1 Regional Centre Facility Irwin Gaines 20’ u Items from the WG Chairs 15’ u Preparations and Policy for the Marseilles HN, LP 20’ Meeting, Worldwide Computing Session u AOB; Adjourn by 19:30 or earlier BREAK 15’ u Steering Committee Steering Committee Adjourns by 21:00 Or Earlier
July 26, 1999MONARC Meeting CERN MONARC Phase 1 and 2 Possible Deliverables u Summer 1999: Benchmark test validating the simulation u Fall 1999: A Baseline Model representing a possible (somewhat simplified) solution for LHC Computing. è Baseline numbers for a set of system and analysis process parameters è Reasonable “ranges” of parameters k “Derivatives”: How the effectiveness depends on some of the more sensitive parameters è Agreement of the experiments on the reasonableness of the Baseline Model è Progress towards the Baseline shown at Marseilles LCB Meeting, at the end of September u Chapter on Computing Models in the CMS and ATLAS Computing Progress Reports
July 26, 1999MONARC Meeting CERN MONARC Phase 2 Milestones July 1999: Complete Phase 1; Begin Second Cycle of Simulations with More Refined Models
July 26, 1999MONARC Meeting CERN MONARC Issues to Discuss at this Meeting u Basic Parameters: Starting with the ESD Size u Architecture: LAN/SAN Only, or with SMP “I/O Server” u Simulation Modelers’ Team è CERN/US/Japan/Italy/Russia Coordination è Set up and maintenance of simulation software base è Division of labor for implementing “Objects” and performance/load characteristics è Release-tool and support for simulation software releases è Coordination of runs and reporting of results k Repository of Models and Results on the Web
July 26, 1999MONARC Meeting CERN Projects Aimed at LHC Data Analysis+ Other APPROVED PROJECTS uPPDG: Particle Physics Data Grid [DoE/NGI]HENP Labs, ANL (CS), Caltech, Uwisc (CS), SDSC uALPhAD: Access to Large Physics and Astronomy Databases [NSF/KDI]Johns Hopkins, Caltech and FNAL (SDSS) PROPOSAL IN PROGRESS uHENPVDS: HENP Virtual Data System [DoE/SSI ?] US ATLAS/US CMS/ LIGO PROPOSAL uGRAND: Grid Analysis of Networked Data, or uI(O)DA(LL): Internetworked (Object) Data Analysis (for LHC and LIGO), or... Additional Projects or Services uCLIPPER, NILE, I2-DSI, Condor, GLOBUS; Data Grids
July 26, 1999MONARC Meeting CERN Architectural Sketch: One Major LHC Experiment, At CERN See
July 26, 1999MONARC Meeting CERN MONARC Analysis Process WG A “Short” List of Upcoming Issues A “Short” List of Upcoming Issues è Review Event Sizes: How much data is stored; how much accessed for different analyses ? è Review CPU Times: Tracking at full luminosity è How much Reprocessing, and where (sharing scheme) ? è Priorities, schedules and policies k Production vs. Analysis Group vs. Individual activities k Allowed percentage of access to higher data tiers (TAG /Physics Objects/Reconstructed/RAW) è Including MC production; simulated data storage and access è Understanding how to manage persistent data: e.g. storage / migration / transport / re-compute strategies è Deriving a methodology for Model testing and optimisation k Metrics for evaluating the global efficiency of a Model: Cost vs throughput; turnaround; reliability of data access è Determining the role of Institutes’ workgroup servers (Tier3) and desktops (Tier4), in the Regional Centre Hierarchy
July 26, 1999MONARC Meeting CERN MONARC Testbeds WG Some Parameters to Be Measured, Installed in the MONARC Simulation Models, and Used in First Round Validation of Models. Isolation of “Key” Parameters Via Studies of u Objectivity AMS Response Time-Function, and its dependence on è Object clustering, page-size, data class-hierarchy and access pattern è Mirroring and caching with the DRO option u Scalability of the System Under “Stress”: è Performance as a function of the number of jobs, relative to the single-job performance u Performance and Bottlenecks for a variety of data access patterns è Frequency of following TAG AOD; AOD ESD; ESD RAW è Data volume accessed remotely k Fraction on Tape, and on Disk k As Function of Net Bandwidth; Use of QoS
July 26, 1999MONARC Meeting CERN MONARC Strategy and Tools for Phase 2 Strategy : Vary System Capacity and Network Performance Parameters Over a Wide Range Strategy : Vary System Capacity and Network Performance Parameters Over a Wide Range u Avoid complex, multi-step decision processes that could require protracted study. k Keep for a possible Phase 3 u Majority of the workload satisfied in an acceptable time k Up to minutes for interactive queries, up to hours for short jobs, up to a few days for the whole workload u Determine requirements “baselines” and/or flaws in certain Analysis Processes in this way Tools and Operations to be Designed in Phase 2 u Query estimators u Affinity evaluators, to determine proximity of multiple requests in space or time u Strategic algorithms for caching, reclustering, mirroring, or pre-emptively moving data (or jobs)
July 26, 1999MONARC Meeting CERN MONARC Possible Phase 3 TIMELINESS and USEFUL IMPACT u Facilitate the efficient planning and design of mutually compatible site and network architectures, and services è Among the experiments, the CERN Centre and Regional Centres u Provide modelling consultancy and service to the experiments and Centres u Provide a core of advanced R&D activities, aimed at LHC computing system optimisation and production prototyping u Take advantage of work on distributed data-intensive computing for HENP this year in other “next generation” projects [*] è PPDG, ALPhAD, HENPVDS, and Our Project and joint proposal to NSF by ATLAS/CMS/LIGO in the US [*] See H. Newman,
July 26, 1999MONARC Meeting CERN MONARC Phase 3 Possible Technical Goal: System Optimisation Maximise Throughput and/or Reduce Long Turnaround u Include long and potentially complex decision-processes in the studies and simulations è Potential for substantial gains in the work performed or resources saved Phase 3 System Design Elements u RESILIENCE, resulting from flexible management of each data transaction, especially over WANs u FAULT TOLERANCE, resulting from robust fall-back strategies to recover from abnormal conditions u SYSTEM STATE & PERFORMANCE TRACKING, to match and co-schedule requests and resources, detect or predict faults Synergy with PPDG and other Advanced R&D Projects. Potential Importance for Scientific Research and Industry: Simulation of Distributed Systems for Data-Intensive Computing.