LEGO train limits Offline week, 31/03/2016 LB 1. Incident of 22/03/2016 A train operator has submitted 107 LEGO trains in one go – These resulted in more.

Slides:



Advertisements
Similar presentations
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Advertisements

Access Management: Path to Improvement Presented by NFC Access Management Branch.
– Unfortunately, this problems is not yet fully under control – No enough information from monitoring that would allow us to correlate poor performing.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
Operating System Support Focus on Architecture
Morristown Central School District Alternatives. Reorganization – Reasons for Reorganization Extend Subject offerings Teachers serve in specialized fields.
CS 104 Introduction to Computer Science and Graphics Problems Operating Systems (2) Process Management 10/03/2008 Yang Song (Prepared by Yang Song and.
Computer Organization and Architecture
Active Water Resource Management in the Lower Rio Grande
Trains status&tests M. Gheata. Train types run centrally FILTERING – Default trains for p-p and Pb-Pb, data and MC (4) Special configuration need to be.
CS364 CH08 Operating System Support TECH Computer Science Operating System Overview Scheduling Memory Management Pentium II and PowerPC Memory Management.
Layers and Views of a Computer System Operating System Services Program creation Program execution Access to I/O devices Controlled access to files System.
ALICE Operations short summary and directions in 2012 Grid Deployment Board March 21, 2011.
ALICE Operations short summary and directions in 2012 WLCG workshop May 19-20, 2012.
Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –
Tutorial 7 Memory Management presented by: Antonio Maiorano Paul Di Marco.
Science What is “Safety” Freedom from danger Safety is the condition of being protected against failure, breakage, error, accidents, or harm. (Protection.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
1 Previous lecture review n Out of basic scheduling techniques none is a clear winner: u FCFS - simple but unfair u RR - more overhead than FCFS may not.
Chapter 5 Operating System Support. Outline Operating system - Objective and function - types of OS Scheduling - Long term scheduling - Medium term scheduling.
Implementing High Speed TCP (aka Sally Floyd’s) Yee-Ting Li & Gareth Fairey 1 st October 2002 DataTAG CERN (Kinda!)
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
CY2003 Computer Systems Lecture 09 Memory Management.
Project monitoring and Control
Scheduling policies for real- time embedded systems.
Simulations of the double funnel construction for LET. Comparison with a single funnel The aim was to optimise the double funnel configuration to give.
1 Our focus  scheduling a single CPU among all the processes in the system  Key Criteria: Maximize CPU utilization Maximize throughput Minimize waiting.
Analysis infrastructure/framework A collection of questions, observations, suggestions concerning analysis infrastructure and framework Compiled by Marco.
Structure of the academic year Senate 1 st December 2004 Professor Bob Craik.
SOCIAL MEDIA FINAL PRESENTATION. PROJECT SUMMARY Our job was to making a working social stream that incorporated all social medias for FSU, FSU CCI, and.
Update on replica management
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
PWG3 Analysis: status, experience, requests Andrea Dainese on behalf of PWG3 ALICE Offline Week, CERN, Andrea Dainese 1.
Andrei Gheata, Mihaela Gheata, Andreas Morsch ALICE offline week, 5-9 July 2010.
Analysis trains – Status & experience from operation Mihaela Gheata.
ALICE Operations short summary ALICE Offline week June 15, 2012.
Project Management.
Extreme programming (XP) Variant of agile Takes commonsense practices to extreme levels © 2012 by Václav Rajlich1.
Requirements Engineering Requirements Management Lecture-25.
Chapter 8 System Management Semester 2. Objectives  Evaluating an operating system  Cooperation among components  The role of memory, processor,
Computing for Alice at GSI (Proposal) (Marian Ivanov)
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
A. Gheata, ALICE offline week March 09 Status of the analysis framework.
AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.
1 Offline Week, October 28 th 2009 PWG3-Muon: Analysis Status From ESD to AOD:  inclusion of MC branch in the AOD  standard AOD creation for PDC09 files.
Vincenzo Innocente, CERN/EPUser Collections1 Grid Scenarios in CMS Vincenzo Innocente CERN/EP Simulation, Reconstruction and Analysis scenarios.
Data processing Offline review Feb 2, Productions, tools and results Three basic types of processing RAW MC Trains/AODs I will go through these.
M. Gheata ALICE offline week, October Current train wagons GroupAOD producersWork on ESD input Work on AOD input PWG PWG31 (vertexing)2 (+
Analysis Trains Costin Grigoras Jan Fiete Grosse-Oetringhaus ALICE Offline Week,
M. Gheata ALICE offline week, 24 June  A new analysis train macro was designed for production  /ANALYSIS/macros/AnalysisTrainNew.C /ANALYSIS/macros/AnalysisTrainNew.C.
Lecture 10 Page 1 CS 111 Online Memory Management CS 111 On-Line MS Program Operating Systems Peter Reiher.
Time Management And Work Load Management Presented By :
Lecture 4 CPU scheduling. Basic Concepts Single Process  one process at a time Maximum CPU utilization obtained with multiprogramming CPU idle :waiting.
CPU Scheduling Scheduling processes (or kernel-level threads) onto the cpu is one of the most important OS functions. The cpu is an expensive resource.
Copyright ©: Nahrstedt, Angrave, Abdelzaher
Analysis trains – Status & experience from operation
Copyright ©: Nahrstedt, Angrave, Abdelzaher
fields of possible improvement
How will execution time grow with SIZE?
Methodologies By Akinola Soyinka.
William Stallings Computer Organization and Architecture
Analysis framework - status
Roland Wilson, David Potter, & Dr. Dru Davison
ICS 143 Principles of Operating Systems
Incentives 26 September 2018.
Lesson Objectives Aims Key Words
Ainsley Smith Tel: Ex
Developing a User Involvement Strategy.
Prof. Deptii Chaudhari.
Presentation transcript:

LEGO train limits Offline week, 31/03/2016 LB 1

Incident of 22/03/2016 A train operator has submitted 107 LEGO trains in one go – These resulted in more than 4K master jobs (jobs that contain many sub-jobs, need to be split and optimized) – Kept the optimizer busy for 10 hours, blocking other requests (the remaining PWGs, users, production) The train operator was not aware of limits – Although warned when he became one The train were all on ESDs, making them heavier… Not much going on these days, limited impact, only one user complained directly 2

Current train agreement LEGO train have absolute priority (over all other jobs) There is a gentlemanly agreement that – Run max 30K jobs/day (not enforced, often surpassed) – Self-imposed limits within the PWG operators on number and content of trains – AOD trains have priority over ESD (submitted first, enforced by system) but not if a single PWG ‘grabs’ all available resources before everyone else Single Grid user – No need of ‘per-PWG’ job quotas (advantage) – If one PWG does something ‘bad’, all suffer (disadvantage) – Requires communication and respect 3

Train operation No job limits presently, very favorable for periods of high demand 4 In average trains use ~16% of Grid resources Uneven profile Low predictability (except conferences) Will not work well with ‘analysis facility’, i.e. steady-state analysis

Train limits - implementation Already discussed and agreed last year – Per-PWG accounts for train operation – Per-PWG limits This is mostly implemented but not activated as it has several disadvantages – Will actually require to put a cap on running LEGO jobs – Will not compensate as well as current ‘single user’ for under/overuse in case of quota not used by PWG(s) – usual principle is ‘use it or lose it’ – Will require a long and difficult discussion on quotas per PWG 5

Alternative proposal Put caps on submission, not use Global (or per-PWG) number of trains per day – Advantage: will smooth train running over time - even resources use – Advantage: avoid mistakes and system being blocked by a single operator – Advantage: allows not to impose limits, keep current analysis priority high – Advantage: Favors more compact AOD trains automatically (these are submitted first) + additional optimization (see ‘statistics’ later) – Disadvantage: requires planning in case of heavy analysis programme, but only within individual PWGs 6

Alternative proposal - implementation Limit to x the number of LEGO trains per day – Soft limit – allows to submit all trains in one go, the excess over x is submitted and executed automatically the next day(s) OR – Hard limit – if x is reached operator receives “You have reached your daily quota” error 7

Alternative proposal - limits 8 Avg.19/day Avg.30/day

Av. Number of wagons and runs/day 9 GAHFCFDQJELFUDTot w 8 r 7.3 w 6 r 3.2 w 5.6 r 1.1 w 2.6 r 9.9 w 5 r 2.6 w 3.5 r 0.9 w 1.4 r 5.1 w 20.3 r w 13 r 6.5 w 7.6 r 2.8 w 7.4 r 1.4 w 4.8 r 9.5 w 5.3 r 10 w* 3.1 r 0.9 w 2.3 r 4.6 w 30.3 r Subtracted from trains: tender/centrality/PID wagons  Overall increase of runs/day by 50%  Overall decrease of wagons/train by 10% * biased by few 500 wagon-long tests

Wagons per train global wagonscountwagonscount % % % % % % % % % % % % % % % % % % % >=20218>= % 5.13Average4.68Average

Efficiency 11 53% 57%

Resources jobs jobs (+60%) Growth of Grid resources +25% per year Individual user analysis 9% of total in % of total in

Why now Period of low activity – we can discuss and test limits without impact on PWG work – PWGs can work on trains optimization Avoid incidents in periods of high demand – The 22/03 type incident just before QM will have larger consequences Some limits must be imposed, as the appetite for trains grow – Last year 50% more trains than year before, 60% more CPU – Resources are limited, +25% CPU per year growth Preliminary feedback is positive, but tuning is necessary and will take time 13

Summary We have started (again) the discussion on LEGO train limits We favor simple limit(s), easily tunable and PWG- specifics independent – Hopefully, can be formalized and agreed upon without too much delay – Will be fair and uniform, prevent incidental blocking – Leave room for (as presently) cross-PWG agreements – Will not limit the overall resources used for analysis Will spur some optimization of the train setup and actions from PWG to fit within reasonable constraints – Hopefully the ESD to AOD migration will receive a boost – As well as train set optimization 14