Presentation is loading. Please wait.

Presentation is loading. Please wait.

No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC Project Status Report Harvey Newman California Institute.

Similar presentations


Presentation on theme: "No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC Project Status Report Harvey Newman California Institute."— Presentation transcript:

1 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC Project Status Report http://www.cern.ch/MONARC Harvey Newman California Institute of Technlogy http://l3www.cern.ch/monarc/monarc_lehman151100.ppt http://l3www.cern.ch/monarc/monarc_lehman151100.ppt DOE/NSF Joint Review of Software and Computing, BNL DOE/NSF Joint Review of Software and Computing, BNL November 15, 2000 November 15, 2000

2 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC: Common Project MONARC: Common Project Models Of Networked Analysis At Regional Centers Caltech, CERN, Columbia, FNAL, Heidelberg, Helsinki, INFN, IN2P3, KEK, Marseilles, MPI Munich, Orsay, Oxford, Tufts PROJECT GOALS è Develop “Baseline Models” è Specify the main parameters characterizing the Model’s performance: throughputs, latencies è Verify resource requirement baselines: (computing, data handling, networks) TECHNICAL GOALS è Define the Analysis Process è Define RC Architectures and Services è Provide Guidelines for the final Models è Provide a Simulation Toolset for Further Model studies 2.5 Gbits/s Univ 2 CERN 700k SI95 1000+ TB Disk; Robot Tier2 Ctr ~35k SI95 ~100 TB Disk Robot FNAL/BNL 167k SI95 650 Tbyte Disk; Robot 622 Mbits/s N X2.5 Gbits/s 2.5 Gbits/s Univ 1 Univ M Model Circa 2005 or 2006

3 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC History MONARC History  Spring 1998 First Distributed Center Models (Bunn; Von Praun)  6/1998Presentation to LCB; Project Assignment Plan  Summer 1998MONARC Project Startup (ATLAS, CMS, LHCb)  9 - 10/1998Project Execution Plan; Approved by LCB  1/1999First Analysis Process to be Modeled  2/1999First Java Based Simulation Models (I. Legrand)  Spring 1999Java2 Based Simulations; GUI  4/99; 8/99; 12/99Regional Centre Representative Meetings  6/1999Mid-Project Progress Report Including MONARC Baseline Models  9/1999Validation of MONARC Simulation on Testbeds Reports at LCB Workshop (HN, I. Legrand)  1/2000Phase 3 Letter of Intent (4 LHC Experiments)  2/2000Six Papers and Presentations at CHEP2000: D385, F148, D127, D235, C113, C169  3/2000Phase 2 Report  Spring 2000New Tools: SNMP-based Monitoring; S.O.N.N.  5/2000Phase 3 Simulation of ORCA4 Production; Begin Studies with Tapes  Spring 2000MONARC Model Recognized by Hoffmann WWC Panel; Basis of Data Grid Efforts in US and Europe

4 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC Working Groups/Chairs MONARC Working Groups/Chairs Analysis Process Design WG P. Capiluppi (Bologna, CMS) Studied the analysis workload, job mix and profiles, time to complete the reco. and analysis jobs. Worked with the Simulation WG to verify that the specified resources in the models could handle the workload. Architectures WG Joel Butler (FNAL, CMS) Studied the site and network architectures, operational modes and services provided by Regional Centres, data volumes stored and analyzed, candidate architectures for CERN, Tier1 (and Tier2) Centres Simulation WG K. Sliwa (Tufts, ATLAS) Defined the methodology, then (I. Legrand et al.) designed, built and further developed the simulation system as a toolset for users. Validated the simulation with the Testbeds group. Testbeds WG L. Luminari (Rome, ATLAS) Set up small and larger prototype systems at CERN, several INFN and US sites and Japan, and used them to characterize the performance of the main elements that could limit throughput in the simulated systems Steering GroupLaura Perini (Milan, ATLAS) Harvey Newman (Caltech, CMS) Harvey Newman (Caltech, CMS) è Regional Centres Committee

5 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) Regional Center Architecture Example by I. Gaines Regional Center Architecture Example by I. Gaines Tapes Network from CERN Network from Tier 2 & simulation centers Tape Mass Storage & Disk Servers Database Servers Physics Software Development R&D Systems and Testbeds Info servers Code servers Web Servers Telepresence Servers Training Consulting Help Desk Production Reconstruction Raw/Sim  ESD Scheduled, predictable experiment/ physics groups Production Analysis ESD  AOD AOD  DPD Scheduled Physics groups Individual Analysis AOD  DPD and plots Chaotic Physicists Desktops Tier 2 Local institutes CERN Tapes

6 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC Analysis Model Hierarchy of Processes (Experiment, Analysis Groups, Individuals) Reconstruction Selection Analysis Re-processing 3 per year Iterative selection Once per month Different Physics cuts & MC comparison ~Once per day Experiment- Wide Activity (10 9 events) ~20 Groups’ Activity (10 9  10 7 events) ~25 Individual per Group Activity (10 6 –10 8 events) New detector calibrations Or understanding Trigger based and Physics based refinements Algorithms applied to data to get results 3000 SI95sec/event 1 job year 3000 SI95sec/event 1 job year 3000 SI95sec/event 3 jobs per year 3000 SI95sec/event 3 jobs per year 25 SI95sec/event ~20 jobs per month 25 SI95sec/event ~20 jobs per month 10 SI95sec/event ~500 jobs per day 10 SI95sec/event ~500 jobs per day Monte Carlo 5000 SI95sec/event

7 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) Simulation Validation: LAN Measurements (Y. Morita et al) Simulation Validation: LAN Measurements (Y. Morita et al) Machine A: Sun Enterprise450 (400MHz 4x CPU) Machine B: Sun Ultra5 (270MHz): The Lock Server Machine C: Sun Enterprise 450 (300MHz 2x CPU) Tests: (1) Machine A local (2 CPU) (1) Machine A local (2 CPU) (2) Machine C local (4 CPU) (2) Machine C local (4 CPU) (3) Machine A (client) and Machine C (server) (3) Machine A (client) and Machine C (server) number of client processes: 1, 2, 4,..., 32 number of client processes: 1, 2, 4,..., 32 Raw Data jobs Raw Data jobs Raw Data jobs (1) (2) (3) Job on Machine A Time CPUI/OCPUI/OCPUI/O... Event CPUI/OCPUI/OCPUI/O... Job on Machine C CPU 17.4 SI95 I/O 207MB/s @ 54MB file CPU 14.0 SI95 I/O 31MB/s @ 54MB file

8 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC Simulation System Multitasking Processing Model MONARC Simulation System Multitasking Processing Model “Interrupt” driven scheme: For each new task or when one task is finished, an interrupt is generated and all “times to completion” are recomputed. It provides: An easy way to apply different load balancing schemes An efficient mechanism to simulate multitask processing è Assign active tasks (CPU, I/O, network) to Java threads è Concurrent running tasks share resources (CPU, memory, I/O)

9 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) Example : Physics Analysis at Regional Centres Example : Physics Analysis at Regional Centres èSimilar data processing jobs are performed in each of several RCs èThere is profile of jobs, each submitted to a job scheduler èEach Centre has “TAG” and “AOD” databases replicated. èMain Centre provides “ESD” and “RAW” data èEach job processes AOD data, and also a a fraction of ESD and RAW data.

10 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) Example: Physics Analysis Example: Physics Analysis

11 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) Validation Measurements AMS Data Across a LAN Validation Measurements AMS Data Across a LAN Raw DataDB LAN 4 CPUs Client Simulation Measurements Distribution of 32 Jobs’ Processing Time Simulation mean 109.5 Measurement mean 114.3

12 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) ORCA Production on CERN/IT-Loaned Event Filter Farm Test Facility Pileup DB Pileup DB Pileup DB Pileup DB Pileup DB HPSS Pileup DB Pileup DB Signal DB Signal DB Signal DB... 6 Servers for Signal Output Server Output Server Lock Server Lock Server SUN... FARM 140 Processing Nodes 17 Servers 9 Servers Total 24 Pile Up Servers 2 Objectivity Federations The strategy is to use many commodity PCs as Database Servers

13 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) Network Traffic & Job Efficiency Network Traffic & Job Efficiency Mean measured Value ~48MB/s Measurement Simulation Jet Muon

14 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) Total Time for Jet & Muon Production Jobs Total Time for Jet & Muon Production Jobs

15 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) Job Scheduling in Distributed Systems Self-Organizing Neural Network (SONN) u Efficient job scheduling policies in large distributed systems, which evolve dynamically, is one of challenging tasks in HEP computing. Analyze a large number of parameters describing the jobs and the time dependent state of the system. The problem is more difficult when not all these parameters are correctly identified, when knowledge about the system state is incomplete and/or known with a certain delay. u This study is to develop tools able to generate effective job scheduling policies in distributed architectures, based on a “Self Organizing Neural Network” (SONN) system able to dynamically learn and cluster information in a large dimensional parameter space. An adaptive middleware layer, aware of current available resources and learning from "past experience”, developing decision rules heuristically. u We applied the SONN approach to the problem of distributing jobs among regional centers. The evaluation for this job scheduling procedure has been done with the Monarc Simulation Tool.

16 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) Intuitive Scheduling Model Intuitive scheduling scheme Job description parameters {J} External RCs state description {R} Job Scheduling decision {D} Knowledge & experience ( +constants ) Local RC State {S} Execution Job Evaluation Performance quantification {X} Job

17 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) A simple toy example We assume that the time to execute a job in the local farm having a certain load (  ) is: Where t 0 is the theoretical time to perform the job and f(  ) describes the effect of the farm load in the job execution time. If the job is executed on a remote site, an extra factor (  ) is introduced in the response time:

18 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) Evaluating the SONN Scheduling Scheme with the MONARC Simulation DECISION Intuitive scheduling scheme RCs SIMULATION Activities Simulation Job “Self Organizing Neural Net” Job Warming up The learning process of the Self Organizing Network dynamically adapts itself to changes in the system configuration. It may require a "classical" scheduling algorithm as a starting point with the aim to dynamically improve it in time.

19 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) 2 RCs Learning to Export Jobs Day = 9 Cern 30 CPUs Caltech 25 CPUs No Activity Kek 20 CPUs 0.8MB/s ; 200 ms RTT =0.30 =0.70 =0.69

20 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) 2 RCs Learning to Export Jobs caltech kek cern Day 0 Day 1Day 2Day 9Day 6

21 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC Simulation: I. LeGrand Workplan (2000) May 2000: CMS HLT - simulation k k http://www.cern.ch/MONARC/sim_tool/Publish/CMS/publish/ k k http://home.cern.ch/clegrand/MONARC/CMS_HLT/sim_cms_hlt.pdf June 2000 Tape usage study k k http://www.cern.ch/MONARC/sim_tool/Publish/TAPE/publish/ Aug 2000 Update of the Simulation tool for large scale simulations. k k http://home.cern.ch/clegrand/MONARC/WSC/wsc_final.pdf (to be presented at the IEEE Winter Simulation Conference: WSC2000) k k http://home.cern.ch/clegrand/MONARC/ACAT/sim.ppt Oct 2000 A study in using SONN for job scheduling k k http://www.cern.ch/MONARC/sim_tool/Publish/SONN/publish/ k k http://home.cern.ch/clegrand/MONARC/ACAT/sonn.ppt Nov 2000 Update of the CMS computing needs k k Based on the new requirements data, to update the baseline models for CMS computing Dec 2000 Simulation of the current CMS Higher Level Trigger production

22 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC Simulation: Workplan (2001) Jan 2001Update of the MONARC Simulation System Jan 2001Update of the MONARC Simulation System New release, including dynamic scheduling and replication modules (policies); Improved Documentation Feb 2001Role of Disk and Tapes in Tier1 and Tier2 Centers More elaborate studies to describe Tier2-Tier1 interaction and to evaluate data storage needs May 2001Complex Tier0 - Tier1 - Tier2 simulation: Study the role of Tier2 centers Aim is to perform a complete CMS data processing scenario including all major tasks distributed among regional centers Jul 2001Real SONN module for job scheduling; based on Mobile agents Create a Mobile Agents framework able to provide the basic mechanism for scheduling between regional centers Sep 2001Add monitoring agents for network and system states based on (SNMP) Collect system dependent parameters using SNMP and integrate them into the mobile agents used for scheduling Dec 2001Study of the correlation between data replication and job scheduling Combine the scheduling policies with data replication to optimize different cost functions; Integrate this into the Mobile Agents framework

23 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC Status u MONARC is on the way to specifying baseline Models representing cost-effective solutions to LHC Computing u MONARC’s Regional Centre hierarchy model has been accepted by all four LHC Experiments è And is the basis of HEP Data Grid work. u A powerful simulation system has been developed, and is being used both for further Computing Model, Strategy Development, and Grid-component studies. u There is strong synergy with other advanced R&D projects: PPDG, GriPhyN, EU HEP Data Grid, ALDAP and others. u Example Computing Models have been provided, and are being updated è This is important input for the Hoffmann LHC Computing Review  The MONARC Simulation System is now being applied to key Data Grid issues, and Grid-tool design and Development

24 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC Future: Some “Large” Grid Issues to Be Studied MONARC Future: Some “Large” Grid Issues to Be Studied  Query estimation and transaction-design for replica management  Queueing and co-scheduling strategies  Strategy for use of tapes  Strategy for resource sharing among sites and activities  Packaging of object-collections into blocks for transport across networks; integration with databases  Effect on Networks of Large windows, QoS, etc.  Behavior of the Grid Services to be developed

25 No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) CD CH MD MH TH MC UF.boot MyFED.boot User Collection MD CD MC TD AMS ORCA 4 tutorial, part II - 14. October 2000 From UserFederation To Private Copy


Download ppt "No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC Project Status Report Harvey Newman California Institute."

Similar presentations


Ads by Google