September, 1999MONARC - Distributed System Simulation I.C. Legrand1 MONARC Models Of Networked Analysis at Regional Centers Iosif C. Legrand (CERN/CIT)

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Categories of I/O Devices
Distributed Processing, Client/Server and Clusters
Database System Concepts and Architecture
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
IT Systems Multiprocessor System EN230-1 Justin Champion C208 –
1 Web Server Performance in a WAN Environment Vincent W. Freeh Computer Science North Carolina State Vsevolod V. Panteleenko Computer Science & Engineering.
I.1 Distributed Systems Prof. Dr. Alexander Schill Dresden Technical University Computer Networks Dept.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
MONARC Simulation Framework Corina Stratan, Ciprian Dobre UPB Iosif Legrand, Harvey Newman CALTECH.
1: Operating Systems Overview
OPERATING SYSTEM OVERVIEW
POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC.
October 2003 Iosif Legrand Iosif Legrand California Institute of Technology.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
PRASHANTHI NARAYAN NETTEM.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Chapter 3 Operating Systems Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Computer System Architectures Computer System Software
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Chapter 1. Introduction What is an Operating System? Mainframe Systems
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
Operating Systems CS3502 Fall 2014 Dr. Jose M. Garrido
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
COMP 410 & Sky.NET May 2 nd, What is COMP 410? Forming an independent company The customer The planning Learning teamwork.
CHAPTER 2: COMPUTER-SYSTEM STRUCTURES Computer system operation Computer system operation I/O structure I/O structure Storage structure Storage structure.
July, 2000.Simulation of distributed computing systems I.C. Legrand1 MONARC Models Of Networked Analysis at Regional Centers Iosif C. Legrand (CALTECH)
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Network from CERN Network from Tier 2 and simulation centers Physics Software.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC Project Status Report Harvey Newman California Institute.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection Network Structure.
December 10,1999: MONARC Plenary Meeting Harvey Newman (CIT) Phase 3 Letter of Intent (1/2)  Short: N Pages è May Refer to MONARC Internal Notes to Document.
October, 2000.A Self Organsing NN for Job Scheduling in Distributed Systems I.C. Legrand1 Iosif C. Legrand CALTECH.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
Operating System Principles And Multitasking
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
1: Operating Systems Overview 1 Jerry Breecher Fall, 2004 CLARK UNIVERSITY CS215 OPERATING SYSTEMS OVERVIEW.
26 Nov 1999 F Harris LHCb computing workshop1 Development of LHCb Computing Model F Harris Overview of proposed workplan to produce ‘baseline computing.
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
Distributed applications monitoring at system and network level A.Brunengo (INFN- Ge), A.Ghiselli (INFN-Cnaf), L.Luminari (INFN-Roma1), L.Perini (INFN-Mi),
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
April 2003 Iosif Legrand MONitoring Agents using a Large Integrated Services Architecture Iosif Legrand California Institute of Technology.
PPDG February 2002 Iosif Legrand Monitoring systems requirements, Prototype tools and integration with other services Iosif Legrand California Institute.
Preliminary results Monarc Testbeds Working Group A.Brunengo (INFN- Ge), A.Ghiselli (INFN-Cnaf), L.Luminari (INFN-Roma1), L.Perini (INFN-Mi), S.Resconi.
Preliminary Validation of MonacSim Youhei Morita *) KEK Computing Research Center *) on leave to CERN IT/ASD.
6 march Building the INFN Grid Proposal outline a.ghiselli,l.luminari,m.sgaravatto,c.vistoli INFN Grid meeting, milano.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
June 22, 1999MONARC Simulation System I.C. Legrand1 MONARC Models Of Networked Analysis at Regional Centres Distributed System Simulation Iosif C. Legrand.
January 20, 2000K. Sliwa/ Tufts University DOE/NSF ATLAS Review 1 SIMULATION OF DAILY ACTIVITITIES AT REGIONAL CENTERS MONARC Collaboration Alexander Nazarenko.
Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
OPERATING SYSTEMS CS 3502 Fall 2017
Processes and threads.
Chapter 1: Introduction
Operating Systems Chapter 5: Input/Output Management
CS703 - Advanced Operating Systems
Threads Chapter 4.
Operating Systems : Overview
Operating Systems : Overview
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
Chapter 1: Introduction
Development of LHCb Computing Model F Harris
Presentation transcript:

September, 1999MONARC - Distributed System Simulation I.C. Legrand1 MONARC Models Of Networked Analysis at Regional Centers Iosif C. Legrand (CERN/CIT) Distributed System Simulation

September, 1999MONARC - Distributed System Simulation I.C. Legrand2 u Design and Development of a Simulation program for large scale distributed computing systems. u Performance measurements based on an Object Oriented data model for specific HEP applications. u Measurements vs. Simulation. u An Example in using the simulation program: è Distributed Physics Analysis Contents

September, 1999MONARC - Distributed System Simulation I.C. Legrand3 u To perform realistic simulation and modelling of the distributed computing systems, customised for specific HEP applications. u To reliably model the behaviour of the computing facilities and networks, using specific application software (OODB model) and the usage patterns. u To offer a dynamic and flexible simulation environment. u To provide a design framework to evaluate the performance of a range of possible computer systems, as measured by their ability to provide the physicists with the requested data in the required time, and to optimise the cost. u To narrow down a region in this parameter space in which viable models can be chosen by any of the LHC-era experiments. The GOALS of the Simulation Program

September, 1999MONARC - Distributed System Simulation I.C. Legrand4 u To understand the performance and the limitations for the major software components intended to be used in LHC computing. u To provide reliable measurements to be used for simulation design of the time response function for major components in the system and how they interact. u To test and validate simulation schemes. The GOALS of the TestBed Working Group

September, 1999MONARC - Distributed System Simulation I.C. Legrand5 Design Considerations of the Simulation Program è The simulation and modelling task for the MONARC project requires to describe complex programs running in a distributed architecture. J Selecting tools which allow the easy to mapping of the logical model into the simulation environment. è A process oriented approach for discrete event simulation is well suited to describe concurrent running programs. k “Active objects” (having an execution thread, a program counter, stack...) provide an easy way to map the structure of a set of distributed running programs into the simulation environment.

September, 1999MONARC - Distributed System Simulation I.C. Legrand6 Design Considerations of the Simulation Program (2) u This simulation project is based on Java (TM) technology which provides adequate tools for developing a flexible and distributed process oriented simulation. Java has built-in multi-thread support for concurrent processing, which can be used for simulation purposes by providing a dedicated scheduling mechanism. u The distributed objects support (through RMI or CORBA) can be used on distributed simulations, or for an environment in which parts of the system are simulated and interfaced through such a mechanism with other parts which actually are running the real application. The distributed object model can also provide the environment to be used for autonomous mobile agents.

September, 1999MONARC - Distributed System Simulation I.C. Legrand7 Simulation Model Simulation Model uIt is necessary to abstract all components and their time dependent interaction from the real system. uTHE MODEL has to be equivalent to the simulated system in all important respects. CATEGORIES OF SIMULATION MODELS u Continuous time  usually solved by sets of differential equations u Discrete time  Systems which are considered only at selected moments in time u Continuous time + discrete event Discrete event simulations (DES) u EVENT ORIENTED  PROCESS ORIENTED

September, 1999MONARC - Distributed System Simulation I.C. Legrand8 Simulation Model(2) Process oriented DES Based on “ACTIVE OBJECTS” Thread: execution of a piece of code that occurs independently of and possibly concurrently with another one Execution state: set of state information needed to allow concurrent execution Mutual exclusion: mechanism that allows an action to performed on an object without interruption Asynchronous interaction: signals / semaphores for interrupts thread execution state KERNEL thread scheduling mechanism I/O queues “Active Object” Methods for communication with others objects thread execution state I/O queues “Active Object” Methods forcommunication with others objects

September, 1999MONARC - Distributed System Simulation I.C. Legrand9 Data Model ê It provides: uRealistic mapping for an object data base uSpecific HEP data structure uTransparent access to any data uAutomatic storage management uAn efficient way to handle very large number of objects. uEmulation of clustering factors for different types of access patterns. uHandling related objects in different data bases.

September, 1999MONARC - Distributed System Simulation I.C. Legrand10 Multitasking Processing Model Concurrent running tasks share resources (CPU, memory, I/O) “ Interrupt” driven scheme: For each new task or when one task is finished, an interrupt is generated and all “processing times” are recomputed. It provides: Handling of concurrent jobs with different priorities. An efficient mechanism to simulate multitask processing. An easy way to apply different load balancing schemes.

September, 1999MONARC - Distributed System Simulation I.C. Legrand11 “Interrupt” driven simulation  for each new message an interrupt is created and for all the active transfers the speed and the estimated time to complete the transfer are recalculated. An efficient and realistic way to simulate concurrent transfers having different sizes / protocols. LAN/WAN Simulation Model

September, 1999MONARC - Distributed System Simulation I.C. Legrand12 Regional Centre Model Complex Composite Object

September, 1999MONARC - Distributed System Simulation I.C. Legrand13 Arrival Patterns for( int k =0; k< jobs_per_group; k++) { Job job = new Job( this, Job.ANALYSIS, "TAG"+rc_name, 1, events_to_process-1, null, null ); job.setAnalyzeFactors( 0.01, ); farm.addJob( job); sim_hold( ) ; } A flexible mechanism to define the Stochastic process of data processing Dynamic loading of “Activity” tasks, which are threaded objects and are controlled by the simulation scheduling mechanism Physics Activities Generating Jobs Each “Activity” thread generates data processing jobs PA... job FARMS

September, 1999MONARC - Distributed System Simulation I.C. Legrand14 The Structure of the Simulation Program User Directory Config Files Initializing Data Define Activities (jobs) GUI Monarc Package Network Package Data Model Package Auxiliary tools Graphics SIMULATION ENGINE Parameters Prices.... Processing Package Statistics Regional Center, Farm, AMS, CPU Node, Disk, Tape Job, Active Jobs, Physics Activities, Init Functions, LAN, WAN, Messages Data Container, Database Database Index Dynamically Loadable Modules

September, 1999MONARC - Distributed System Simulation I.C. Legrand15 Input Parameters for the Simulation Program Response functions are based on “the previous state” of the component, a set of system related parameters (SysP) and parameters for a specific request (ReqP). Such a time response function allows to describe correctly Highly Nonlinear Processes or “Chaotic” Systems behavior (typical for caching, swapping…) It is important to correctly identify and describe the time response functions for all active components in the system. This should be done using realistic measurements. The simulation frame allows one to introduce any time dependent response function for the interacting components.

September, 1999MONARC - Distributed System Simulation I.C. Legrand16 Simulation GUI Simulation GUI One may dynamically add and (re)configure Regional Centers parameters

September, 1999MONARC - Distributed System Simulation I.C. Legrand17 Simulation GUI (2) Simulation GUI (2) Parameters can be dynamically changed, save or load from files

September, 1999MONARC - Distributed System Simulation I.C. Legrand18 Simulation GUI (3) Simulation GUI (3) On-line monitoring for major parameters in the simulation. Tools to analyze the data.

September, 1999MONARC - Distributed System Simulation I.C. Legrand19 Simulation GUI (4) Simulation GUI (4) Zoom in/out of all the parameters traced in the simulation

September, 1999MONARC - Distributed System Simulation I.C. Legrand20 Validation Measurements I “atlobj02”-local Raw DataDB DB on local disk 2 CPUs x 300MHz SI95/CPU “monarc01”-local Raw DataDB DB on local disk 4 CPUs x 400MHz 17.4 SI95/CPU DB on AMS Server server : “atlobj02” client : “monarc01” Raw DataDB Local DB access DB access via AMS monarc01 is a 4 CPUs SMP machine atlobj02 is a 2 CPUs SMP machine Multiple jobs reading concurrently objects from a data base. êObject Size = 10.8 MB

September, 1999MONARC - Distributed System Simulation I.C. Legrand21 Validation Measurements I Simulation Code Validation Measurements I Simulation Code public void RUN() { int jobs_to_doit; int events_per_job = 5; Vector jobs = new Vector(); double[] results = new double[128]; double delta_t = 10; double start_t; jobs_to_doit = 1; for ( int tests=0; tests < 6; tests ++ ) { // perform simulation for 1,2,4,8,16,32 parallel jobs start_t = clock(); for( int k =0; k< jobs_to_doit; k++) { // Job submission Job job = new Job( this, Job.CREATE_AOD, "ESD", k*events_per_job+1, (k+1)*events_per_job, null, null); job.setMaxEvtPerRequest(1); jobs.addElement(job); farm.addJob( job); } boolean done = false; while ( !done ) { done = true; // wait for all submitted jobs to finish for ( int k=0; k < jobs_to_doit; k++) if ( !((Job) jobs.elementAt(k) ).isDone() ) done=false; else results[k] = ( j.tf - start_t); // keep the processing time per job sim_hold( delta_t/10); // wait between testing that all jobs are done } // Compute and print the results sim_hold(delta_t); // wait between next case jobs_to_doit *=2; // prepare the new number of parallel jobs jobs.removeAllElements(); // clean the array used to keep the running jobs } The simulation code used for parallel read test

September, 1999MONARC - Distributed System Simulation I.C. Legrand22 Validation Measurements I Validation Measurements I The same “User Code” was used for different configurations: êLocal Data Base Access êAMS Data Base Access êDifferent CPU power For LAN parameterization : êRTT ~ 1ms êMaximum Bandwidth 12 MB/s

September, 1999MONARC - Distributed System Simulation I.C. Legrand23 Validation Measurements I The AMS Data Access Case Validation Measurements I The AMS Data Access Case SimulationMeasurements Raw DataDB LAN 4 CPUs Client

September, 1999MONARC - Distributed System Simulation I.C. Legrand24 Validation Measurements I Simulation Results Validation Measurements I Simulation Results DB access via AMS Local DB access Simulation results for AMS & Local Data Access 1,2,4,8,16,32 parallel jobs executed on 4 CPUs SMP system

September, 1999MONARC - Distributed System Simulation I.C. Legrand25 Validation Measurements I Validation Measurements I Local DB access 32 jobs The Distribution of the jobs processing time Simulation mean Measurement mean 114.3

September, 1999MONARC - Distributed System Simulation I.C. Legrand26 Validation Measurements I Measurements & Simulation Validation Measurements I Measurements & Simulation SimulationMeasurements

September, 1999MONARC - Distributed System Simulation I.C. Legrand27 Validation Measurements II Validation Measurements II Running multiple concurrent client programs to read events stored into an OODB. ê Event Size 40 KB, Normal Distributed ( SD = 20%) ê Processing time per event ~ 0.17 Si95 * s ê Each job reads 2976 events êOne AMS Server is used for all the Clients. êPerform the same “measurements” for different network connectivity between the Server and Client.

September, 1999MONARC - Distributed System Simulation I.C. Legrand28 Validation Measurements II Validation Measurements II 1000BaseT Server Client Test 1 CNAF Sun Ultra5, 333 MHz sunlab1 gsun 10BaseT Test 2 PADOVA Sun Ultra10, 333 MHz Sun Enterprise 4X450 MHz cmssun4 vlsi06 2 Mbps Test 3 CNAF-CERN Sun Ultra15, 333 MHz sunlab1 monarc01 Sun Sparc20, 125 MHz

September, 1999MONARC - Distributed System Simulation I.C. Legrand29 Validation Measurements II Test 1 Validation Measurements II Test 1 Gigabit Ethernet Client - Server SimulationMeasurements Jobs M S

September, 1999MONARC - Distributed System Simulation I.C. Legrand30 Validation Measurements II Test 2 Validation Measurements II Test 2 Ethernet Client - Server

September, 1999MONARC - Distributed System Simulation I.C. Legrand31 Validation Measurements II Test 3 Validation Measurements II Test 3 2Mbps WAN Client - Server CPU usage Data Traffic

September, 1999MONARC - Distributed System Simulation I.C. Legrand32 New Test Bed Measurements “Test 4” New Test Bed Measurements “Test 4” Server Clients sunlab1, gsun, cmssun4, atlsun1, atlas4: Sun Ultra5/10, 333 MHz, 128 MB vlsi06: Sun SPARC20, 125 MHz, 128 MB monarc01: Sun Enterprise 450, 4X 400 MHz, 512 MB 1000BaseT 2 Mbps 8 Mbps sunlab1 gsuncmssun4vlsi06atlsun1atlas4 monarc01 CNAF Padova MilanoCERNGenova

September, 1999MONARC - Distributed System Simulation I.C. Legrand33 Test Bed Measurements “Test 4” Test Bed Measurements “Test 4” Mean wall clock time No. of concurrent jobs Time [s] monarc01 atlas4 atlsun1 gsun cmssun4 vlsi06

September, 1999MONARC - Distributed System Simulation I.C. Legrand34 TestBed Measurements “Test 4” TestBed Measurements “Test 4” u Client CPU: never 100 % used u Server CPU: never 100 % used u Many jobs crash: “Timeout with AMS Server” u Wall clock time for a workstation connected with gigabit Ethernet is very high (i.e. for 10 jobs >1000’; 400’ without other clients in WAN): u Slow clients degrade performance on fast clients ??

September, 1999MONARC - Distributed System Simulation I.C. Legrand35 èSimilar data processing jobs are performed in three RCs è“CERN” has RAW, ESD, AOD, TAG è“CALTECH” has ESD, AOD, TAG è“INFN” has AOD, TAG RAW ESD AOD TAG ESD AOD TAG AOD TAG Physics Analysis Example Physics Analysis Example

September, 1999MONARC - Distributed System Simulation I.C. Legrand36 Physics Analysis Example One Physics Analysis Group: èAnalyze 4 * 10 6 events per day. èSubmit 100 Jobs ( ~ events per Job ) èEach group starts at 8:30 local time. More jobs are submitted in the first part of the day. èA Job analyzes AOD data and requires ESD data for 2% of the events and RAW data for 0.5%of the events.

September, 1999MONARC - Distributed System Simulation I.C. Legrand37 Physics Analysis Example è “CERN” Center ( RAW ESD AOD TAG): 10 Physics Analysis Groups --> access to 40 *10 6 events 200 CPU units at 50 Si jobs to run half of RAW data --> on tape è “CALTECH” Center ( ESD, AOD, TAG ) 5 Physics Analysis Groups --> access to 20 *10 6 events 100 CPU units 500 jobs to run è“INFN” Center (AOD, TAG ) 2 Physics Analysis Groups --> access to 8 *10 6 events 40 CPU units 200 jobs to run

September, 1999MONARC - Distributed System Simulation I.C. Legrand38 Physics Analysis Example “CERN” CPU & Memory UsageJobs in the System

September, 1999MONARC - Distributed System Simulation I.C. Legrand39 Physics Analysis Example “CALTECH” CPU & Memory UsageJobs in the System

September, 1999MONARC - Distributed System Simulation I.C. Legrand40 Physics Analysis Example “INFN” CPU & Memory UsageJobs in the System

September, 1999MONARC - Distributed System Simulation I.C. Legrand41 Physics Analysis Example Local Data Traffic “CALTECH”“INFN”

September, 1999MONARC - Distributed System Simulation I.C. Legrand42 Physics Analysis Example Mean 0.83 Mean 0.42Mean 0.57 Job efficiency distribution “CERN” “CALTECH” “INFN”

September, 1999MONARC - Distributed System Simulation I.C. Legrand43 Resource Utilisation vs. Job’s Response Time Mean 0.55Mean 0.72Mean 0.93 “CERN” - Physics Analysis Example 180 CPUs200 CPUs250 CPUs

September, 1999MONARC - Distributed System Simulation I.C. Legrand44 u Implement the Data Base replication mechanism. u Inter-site scheduling functions to allow job migrations. u Improve the evaluation of the estimated cost function and define additional “cost functions” to be considered for further optimisation procedures. u Implement the “desktop” data processing model. u Improve the Mass Storage Model and procedures to optimise the data distribution. u Improve the functionality of the GUIs. u Add additional functions to allow saving and analysing the results and provide a systematic way to compare different models. u Build a Model Repository and a Library for “dynamically loadable activities” and typical configuration files. The Plan for Short Term Developments

September, 1999MONARC - Distributed System Simulation I.C. Legrand45 k The Java (JDK 1.2) environment is well suited for developing a flexible and distributed process oriented simulation. k This Simulation program is still under development. k New dedicated measurements to evaluate realistic parameters and “validate” the simulation program are in progress. Summary A CPU- and code-efficient approach for the simulation of distributed systems has been developed for MONARC èprovides an easy way to map the distributed data processing, transport and analysis tasks onto the simulation ècan handle dynamically any model configuration, including very elaborate ones with hundreds of interacting complex objects ècan run on real distributed computer systems, and may interact with real components