July, 2000.Simulation of distributed computing systems I.C. Legrand1 MONARC Models Of Networked Analysis at Regional Centers Iosif C. Legrand (CALTECH)

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Performance Testing - Kanwalpreet Singh.
Distributed Processing, Client/Server and Clusters
Database System Concepts and Architecture
Executional Architecture
Database Architectures and the Web
IT Systems Multiprocessor System EN230-1 Justin Champion C208 –
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
1 Web Server Performance in a WAN Environment Vincent W. Freeh Computer Science North Carolina State Vsevolod V. Panteleenko Computer Science & Engineering.
September, 1999MONARC - Distributed System Simulation I.C. Legrand1 MONARC Models Of Networked Analysis at Regional Centers Iosif C. Legrand (CERN/CIT)
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
MONARC Simulation Framework Corina Stratan, Ciprian Dobre UPB Iosif Legrand, Harvey Newman CALTECH.
Chapter 13 Embedded Systems
POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC.
October 2003 Iosif Legrand Iosif Legrand California Institute of Technology.
Managing Agent Platforms with SNMP Brian Remick Research Proposal Defense June 27, 2015.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Measuring Performance Chapter 12 CSE807. Performance Measurement To assist in guaranteeing Service Level Agreements For capacity planning For troubleshooting.
Computer Organization and Architecture
1 Operating Systems Ch An Overview. Architecture of Computer Hardware and Systems Software Irv Englander, John Wiley, Bare Bones Computer.
Exercises for Chapter 6: Operating System Support
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
PRASHANTHI NARAYAN NETTEM.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Lecture 2 Process Concepts, Performance Measures and Evaluation Techniques.
ACAT 2003 Iosif Legrand Iosif Legrand California Institute of Technology.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
1 Performance Evaluation of Computer Systems and Networks Introduction, Outlines, Class Policy Instructor: A. Ghasemi Many thanks to Dr. Behzad Akbari.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Windows 2000 Course Summary Computing Department, Lancaster University, UK.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC Project Status Report Harvey Newman California Institute.
Database Architectures Database System Architectures Considerations – Data storage: Where do the data and DBMS reside? – Processing: Where.
December 10,1999: MONARC Plenary Meeting Harvey Newman (CIT) Phase 3 Letter of Intent (1/2)  Short: N Pages è May Refer to MONARC Internal Notes to Document.
October, 2000.A Self Organsing NN for Job Scheduling in Distributed Systems I.C. Legrand1 Iosif C. Legrand CALTECH.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
Computing Division Requests The following is a list of tasks about to be officially submitted to the Computing Division for requested support. D0 personnel.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
26 Nov 1999 F Harris LHCb computing workshop1 Development of LHCb Computing Model F Harris Overview of proposed workplan to produce ‘baseline computing.
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
April 2003 Iosif Legrand MONitoring Agents using a Large Integrated Services Architecture Iosif Legrand California Institute of Technology.
PPDG February 2002 Iosif Legrand Monitoring systems requirements, Prototype tools and integration with other services Iosif Legrand California Institute.
Preliminary Validation of MonacSim Youhei Morita *) KEK Computing Research Center *) on leave to CERN IT/ASD.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
June 22, 1999MONARC Simulation System I.C. Legrand1 MONARC Models Of Networked Analysis at Regional Centres Distributed System Simulation Iosif C. Legrand.
Multithreading The objectives of this chapter are: To understand the purpose of multithreading To describe Java's multithreading mechanism.
January 20, 2000K. Sliwa/ Tufts University DOE/NSF ATLAS Review 1 SIMULATION OF DAILY ACTIVITITIES AT REGIONAL CENTERS MONARC Collaboration Alexander Nazarenko.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.
Review of PARK Reflectometry Group 10/31/2007. Outline Goal Hardware target Software infrastructure PARK organization Use cases Park Components. GUI /
Dr D. Greer, Queens University Belfast ) Software Engineering Chapter 7 Software Architectural Design Learning Outcomes Understand.
California Institute of Technology
Applying Control Theory to Stream Processing Systems
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
Ch > 28.4.
World-Views of Simulation
Initial job submission and monitoring efforts with JClarens
Uniprocessor scheduling
Control Theory in Log Processing Systems
Development of LHCb Computing Model F Harris
Presentation transcript:

July, 2000.Simulation of distributed computing systems I.C. Legrand1 MONARC Models Of Networked Analysis at Regional Centers Iosif C. Legrand (CALTECH) Simulation of distributed computing systems

July, 2000.Simulation of distributed computing systems I.C. Legrand2 u The Design and the Development of a Simulation program for large scale distributed computing systems. u Validation tests è Queuing Theory è Performance measurements based on an Object Oriented data model for specific HEP applications. u A large system simulation: the CMS HLT Farm. u Proposal for a Dynamic, Self Organising scheduling system. u Summary Contents

July, 2000.Simulation of distributed computing systems I.C. Legrand3 The GOALS of the Simulation Program u To perform realistic simulation and modelling of large scale distributed computing systems, customised for specific HEP applications. u To provide a design framework to evaluate the performance of a range of possible computer systems, as measured by their ability to provide the physicists with the requested data in the required time, and to optimise the cost. u To narrow down a region in this parameter space in which viable models can be chosen by any of the LHC-era experiments. u To offer a dynamic and flexible simulation environment.

July, 2000.Simulation of distributed computing systems I.C. Legrand4 The Monarc Simulation Program HThis Simulation program is not intended to be a detailed simulator for basic components such as operating systems, data base servers or routers. Instead, based on realistic mathematical models and measured parameters on test bed systems for all the basic components, it aims to correctly describe the performance and limitations of large distributed systems with complex interactions. HAt the same time it provides a flexible framework for evaluating different strategies for middleware software design, providing dynamic load balancing and optimising resource utilisation as well as turnaround time for high priority tasks.

July, 2000.Simulation of distributed computing systems I.C. Legrand5 Design Considerations of the Simulation Program è The simulation and modelling task for the MONARC project requires to describe many complex programs running concurrently in a distributed architecture. è A process oriented approach for discrete event simulation is well suited to describe concurrent running programs. k “Active objects” (having an execution thread, a program counter, stack...) provide an easy way to map the structure of a set of distributed running programs into the simulation environment.

July, 2000.Simulation of distributed computing systems I.C. Legrand6 Design Considerations of the Simulation Program (2) u This simulation project is based on Java (TM) technology which provides adequate tools for developing a flexible and distributed process oriented simulation. Java has built-in multi-thread support for concurrent processing, which can be used for simulation purposes by providing a dedicated scheduling mechanism. u The distributed objects support (through RMI or CORBA) can be used on distributed simulations, or for an environment in which parts of the system are simulated and interfaced through such a mechanism with other parts which actually are running the real application. The distributed object model can also provide the environment to be used for autonomous mobile agents.

July, 2000.Simulation of distributed computing systems I.C. Legrand7 Data Model It provides: uRealistic mapping for an object data base uSpecific HEP data structure uTransparent access to any data uAutomatic storage management uAn efficient way to handle very large number of objects. uEmulation of clustering factors for different types of access patterns. uHandling related objects in different data bases.

July, 2000.Simulation of distributed computing systems I.C. Legrand8 Multitasking Processing Model Concurrent running tasks share resources (CPU, memory, I/O) “ Interrupt” driven scheme: For each new task or when one task is finished, an interrupt is generated and all “processing times” are recomputed. It provides: Handling of concurrent jobs with different priorities. An efficient mechanism to simulate multitask processing. An easy way to apply different load balancing schemes.

July, 2000.Simulation of distributed computing systems I.C. Legrand9 “Interrupt” driven simulation  for each new message an interrupt is created and for all the active transfers the speed and the estimated time to complete the transfer are recalculated. An efficient and realistic way to simulate concurrent transfers having different sizes / protocols. LAN/WAN Simulation Model

July, 2000.Simulation of distributed computing systems I.C. Legrand10 Arrival Patterns A flexible mechanism to define the Stochastic process of how users perform data processing tasks Dynamic loading of “Activity” tasks, which are threaded objects and are controlled by the simulation scheduling mechanism Physics Activities Injecting “Jobs” Each “Activity” thread generates data processing jobs for( int k =0; k< jobs_per_group; k++) { Job job = new Job( this, Job.ANALYSIS, "TAG”, 1, events_to_process); farm.addJob(job ); // submit the job sim_hold ( 1000 ); // wait 1000 s } Regional Centre Farm Job Activity Job Activity êThese dynamic objects are used to model the users behavior

July, 2000.Simulation of distributed computing systems I.C. Legrand11 Regional Centre Model Complex Composite Object

July, 2000.Simulation of distributed computing systems I.C. Legrand12 Input Parameters for the Simulation Program Response functions are based on “the previous state” of the component, a set of system related parameters (SysP) and parameters for a specific request (ReqP). Such a time response function allows to describe correctly Highly Nonlinear Processes or “Chaotic” Systems behavior (typical for caching, swapping…) It is important to correctly identify and describe the time response functions for all active components in the system. This should be done using realistic measurements. The simulation frame allows one to introduce any time dependent response function for the interacting components.

July, 2000.Simulation of distributed computing systems I.C. Legrand13 Simulation GUI Simulation GUI One may dynamically add and (re)configure Regional Centers parameters

July, 2000.Simulation of distributed computing systems I.C. Legrand14 Simulation GUI (2) Simulation GUI (2) On-line monitoring for major parameters in the simulation. Tools to analyze the data.

July, 2000.Simulation of distributed computing systems I.C. Legrand15 Results repository and the “publishing” procedure Results repository and the “publishing” procedure Web Server afs/nfs file system RMI Server Write Objects

July, 2000.Simulation of distributed computing systems I.C. Legrand16 Queueing theory (1) M | M | 1 Model Queueing theory (1) M | M | 1 Model E[S] arrivalswaitingin service E[N] Mean number of jobs E[R] Mean response time

July, 2000.Simulation of distributed computing systems I.C. Legrand17 Queueing theory (2) M | M | 1 network queue model

July, 2000.Simulation of distributed computing systems I.C. Legrand18 Validation Measurements I The AMS Data Access Case Validation Measurements I The AMS Data Access Case SimulationMeasurements Raw DataDB LAN 4 CPUs Client

July, 2000.Simulation of distributed computing systems I.C. Legrand19 Validation Measurements I Validation Measurements I Local DB access 32 jobs The Distribution of the jobs processing time Simulation mean Measurement mean 114.3

July, 2000.Simulation of distributed computing systems I.C. Legrand20 Validation Measurements I Measurements & Simulation Validation Measurements I Measurements & Simulation SimulationMeasurements

July, 2000.Simulation of distributed computing systems I.C. Legrand21 Validation Measurements II Validation Measurements II 2Mbps WAN Client - Server CPU usage Data Traffic

July, 2000.Simulation of distributed computing systems I.C. Legrand22 Simple Example: Resource Utilisation vs. Job’s Response Time Mean 0.55Mean 0.72Mean 0.93 Physics Analysis Example 180 CPUs200 CPUs250 CPUs

July, 2000.Simulation of distributed computing systems I.C. Legrand23 The CMS HLT Farm Simulation Pileup DB Pileup DB Pileup DB Pileup DB Pileup DB HPSS Pileup DB Pileup DB Signal DB Signal DB Signal DB... 6 Servers for Signal Output Server Output Server Lock Server Lock Server SUN... FARM 140 Processing Nodes 17 Servers 9 Servers Total 24 Pile Up Servers 2 Objectivity Federations

July, 2000.Simulation of distributed computing systems I.C. Legrand24 The HLT Farm Configuration HDistribute Pileup over 24 Linux servers HOther data over 6 Linux servers (70GB disk each) H2 Linux stations used for federations (Metadata, catalog etc) H2 Linux stations used for Journal files (used in locking) Hshift20 (SUN) Used for lockserving and Output (2 x ~250GB disks) The strategy is to use many commodity PCs as data Base Servers

July, 2000.Simulation of distributed computing systems I.C. Legrand25 Network Traffic & Job efficiency Mean measured Value ~48MB/s Measurement Simulation Jet Muon

July, 2000.Simulation of distributed computing systems I.C. Legrand26 Total Time for Jet & Muon Production Jobs

July, 2000.Simulation of distributed computing systems I.C. Legrand27 CPU LOAD Jet Production Job Muon Production Job Read Write

July, 2000.Simulation of distributed computing systems I.C. Legrand28 A Self-Organising Job Scheduling System The aim of this proposal is to describe a possible approach for the scheduling task, as a system able to dynamically learn and cluster information in a large dimensional parameter space. This dynamic scheduling system should be seen as an adaptive middle layer software, aware of current available resources and based on the “past experience” to optimise the job performance and resource utilisation.

July, 2000.Simulation of distributed computing systems I.C. Legrand29 A self organising model This approach is based on using the “past experience” from jobs that have been executed to create a dynamic decision making scheme. A competitive learning algorithm can be used to “cluster” correlated information in the multi-dimensional space. In our case, a feature mapping architecture able to map a high- dimensional input space into a much lower-dimensional structure in such a way that most of similarly correlated patters in the original data remain in the mapped space. Such a clustering scheme seems possible for this decision making scheme as we expert a strong correlation between the parameters involved. Compared with an “intuitive” model we expect that such an approach will offer a better way to analyse possible options and can evolve and improve itself dynamically.

July, 2000.Simulation of distributed computing systems I.C. Legrand30 A simple toy example We assume that the time to execute a job in the local farm having a certain load (  ) is: Where t 0 is the theoretical time to perform the job and f(  ) describes the effect of the farm load in the job execution time. If the job is executed on a remote site, an extra factor (  ) is introduced reducing the response time:

July, 2000.Simulation of distributed computing systems I.C. Legrand31 Clustering of the Self Organising Map

July, 2000.Simulation of distributed computing systems I.C. Legrand32 Evaluating the self organising scheduling with the simulation tool DECISION Intuitive scheduling scheme RCs SIMULATION Activities Simulation Job “Self Organizing Map” Job Warming up The decision for future jobs is based on identifying the clusters in the total parameter space, which are close to the hyper-plane defined in this space by the {Job} {System} subset. In this way the decision can be done evaluating the typical performances of this list of close clusters and chose a decision set which meets the expected / available performance / resources and cost.

July, 2000.Simulation of distributed computing systems I.C. Legrand33 Summary kA CPU- and code-efficient approach for the simulation of distributed systems has been developed for the MONARC project. èprovides an easy way to map the distributed data processing, transport and analysis tasks onto the simulation. ècan handle dynamically any model configuration, including very elaborate ones with hundreds of interacting complex objects. ècan run on real distributed computer systems, and may interact with real components. èThese results are encouraging and motivate us to continue developing this simulation tool. èModelling and understanding current systems, their performance and limitations, is essential for the design of the future large scale distributed processing systems. èA frame to evaluate different strategies for the middle layer software to optimize resource utilization in such distributed systems is under development.