Level 3 Trigger Leveling D. Cutts(Brown), A. Garcia-Bellido(UW), A. Haas(Columbia), G. Watts(UW)

Slides:



Advertisements
Similar presentations
IT253: Computer Organization
Advertisements

Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.
Final Project of Information Retrieval and Extraction by d 吳蕙如.
IP Telephony Project By: Liane Lewin Shahar Eytan Guided By: Ran Cohen - IBM Vitali Sokhin - Technion.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
F Fermilab Database Experience in Run II Fermilab Run II Database Requirements Online databases are maintained at each experiment and are critical for.
The Design and Implementation of a Log-Structured File System Presented by Carl Yao.
Analysis of Simulation Results Andy Wang CIS Computer Systems Performance Analysis.
RAMCloud Design Review Recovery Ryan Stutsman April 1,
1 DAQ Update. 2 DAQ Status DAQ was running successfully and stably in ’07 beam time Trigger bus scheme has proven to be very flexible – Added additional.
EPICS Archiving Appliance Test at ESS
CS 346 – Chapter 8 Main memory –Addressing –Swapping –Allocation and fragmentation –Paging –Segmentation Commitment –Please finish chapter 8.
Emlyn Corrin, DPNC, University of Geneva EUDAQ Status of the EUDET JRA1 DAQ software Emlyn Corrin, University of Geneva 1.
06/15/2009CALICE TB review RPC DHCAL 1m 3 test software: daq, event building, event display, analysis and simulation Lei Xia.
Distributed File Systems
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,
STAR COLLABORATION MEETING July 2004 Michael DePhillips 1 STAR DBs Near And Long Term Plans.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Chapter 4 Storage Management (Memory Management).
CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees.
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
DONE-08 Sizing and Performance Tuning N-Tier Applications Mike Furgal Performance Manager Progress Software
CDF Offline Production Farms Stephen Wolbers for the CDF Production Farms Group May 30, 2001.
Computing Infrastructure for Large Ecommerce Systems -- based on material written by Jacob Lindeman.
CSE332: Data Abstractions Lecture 8: Memory Hierarchy Tyler Robison Summer
L3 DAQ the past, the present, and your future Doug Chapin for the L3DAQ group DAQ Shifters Meeting 26 Mar 2002.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
Recent Software Issues L3 Review of SM Software, 28 Oct Recent Software Issues Occasional runs had large numbers of single-event files. INIT message.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
Navigation Timing Studies of the ATLAS High-Level Trigger Andrew Lowe Royal Holloway, University of London.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
Operating Systems Lecture 14 Segments Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of Software Engineering.
Simple ideas on how to integrate L2CAL and L2XFT ---> food for thoughts Ted May 25th, 2007.
EPICS Release 3.15 Bob Dalesio May 19, Features for 3.15 Support for large arrays - done for rsrv in 3.14 Channel access priorities - planned to.
L3 DAQ Doug Chapin for the L3DAQ group DAQShifters Meeting 10 Sep 2002 Overview of L3 DAQ uMon l3xqt l3xmon.
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
June 17th, 2002Gustaaf Brooijmans - All Experimenter's Meeting 1 DØ DAQ Status June 17th, 2002 S. Snyder (BNL), D. Chapin, M. Clements, D. Cutts, S. Mattingly.
EPICS Release 3.15 Bob Dalesio May 19, Features for 3.15 Support for large arrays Channel access priorities Portable server replacement of rsrv.
Slide 1/29 Informed Prefetching in ROOT Leandro Franco 23 June 2006 ROOT Team Meeting CERN.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
Memory Management OS Fazal Rehman Shamil. swapping Swapping concept comes in terms of process scheduling. Swapping is basically implemented by Medium.
Lecture 12 Page 1 CS 111 Online Using Devices and Their Drivers Practical use issues Achieving good performance in driver use.
Performance DPDs and trigger commissioning Preparing input to DPD task force.
November 1, 2004 ElizabethGallas -- D0 Luminosity Db 1 D0 Luminosity Database: Checklist for Production Elizabeth Gallas Fermilab Computing Division /
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
Straw readout status Status and plans in Prague compared with situation now Choke and error Conclusions and plans.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
News and Related Issues Ted & Kirsten May 27, 2005 TDWG News since last meeting (April 29th) Organization Issues: how to improve communication Future Plans:
1M. Ellis - MICE Tracker PC - 1st October 2007 Station QA Analysis (G4MICE)  Looking at the same data as Hideyuki, but using G4MICE.  Have not yet had.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
If you have a transaction processing system, John Meisenbacher
Lecture 5 Page 1 CS 111 Summer 2013 Bounded Buffers A higher level abstraction than shared domains or simple messages But not quite as high level as RPC.
Andy Haas – Columbia D0 Meeting b-ID: ** I'm on my own right now as convener... (might need more time at Fermi in the short term until new help arrives)
Status of NA62 straw readout
Complex Geometry Visualization TOol
SLS Timing Master Timo Korhonen, PSI.
CSE451 I/O Systems and the Full I/O Path Autumn 2002
Subject Name: File Structures
Run experience Cover FW SRB FW DAQ and monitoring
Main Memory Management
Example of DAQ Trigger issues for the SoLID experiment
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Chapter 2: Operating-System Structures
Chapter 4: Simulation Designs
Chapter 2: Operating-System Structures
CSE 332: Data Abstractions Memory Hierarchy
Presentation transcript:

Level 3 Trigger Leveling D. Cutts(Brown), A. Garcia-Bellido(UW), A. Haas(Columbia), G. Watts(UW)

The Problem Events take much longer to filter in Level 3 at high luminosity Original goal was 50 nodes/1kHz = 50ms/event We're now talking about >1s/event at high luminosity We'd need >1000 farm nodes to keep up at high luminosity Simply increasing the farm power (as we have been), will not be sufficient! Thanks to Rick

Idea: Use Idle CPU Cycles *New Monitor Plots being put together by Aran… Start of store – 100% Causing L3 disables !!! CPU Quickly Drops Cache events on L3 Farm for later processing Drain cache using unused CPU cycles 3

Driving The Solution is… Initial Part of store with steeply falling Luminosity Initial Part of store with steeply falling CPU time/event Will Not Address 1 kHz Input Rate Limit! Two bottlenecks: Event Rate and CPU limit. Event rate is also limited by the L3 readout speed. Less events later in the store free up the farm to process excess events at the start. The current farm must process every single event as it arrives, so the input rate is limited to the CPU available at the start of the store. We get around this with short runs and quick drops in prescale sets. An event early in the store takes longer to process than an event later in the store, thus freeing up more farm CPU time. We currently deal with short initial runs and prescale set changes. 4 The longer the run and the higher the starting luminosity the more effective trigger leveling will be

Simulation Of Performance Use falling exponential model of luminosity Use rising exponential model of CPU required per event Use falling exponential model of trigger rate Thanks to Rick 5

What Does CPU Buy You? 6 Currently can run at 160E30, 800 Hz input rate, with 100% busy CPU If you only reduce your CPU usage per event by 20% (no trigger leveling) simulation predicts: 171E30 And with 40% reduction in CPU it would get you to about 180E30.

Running at 950 Hz 7 I pulled all the parameters for this simulation from a store that had what looked like a max input rate of 800 Hz. Marco wants to run with 950 Hz as the nominal input rate. With the current farm, the max luminosity you could expect to run at would be 150E30, compared to 160E30 at 800Hz.

Adding Trigger Leveling 8 The length of the run now becomes important – the longer the run, the more extra CPU cycles will be available to process the initial events. The store I looked at had a 2 hour initial run with the prescale set. Keep the 2 hour time for now… The Current farm could get up to 180E30, the same as a 40% reduction in CPU usage (172E30 at 950 Hz input rate). The dual-core, dual-chip (4 CPU) machines now being added to the farm are thought to add 40% to the current farm’s power. Without trigger leveling that will get you up to 180E30, and with trigger leveling that will get you up to 205E30 (or 950 Hz). Trigger Leveling is worth a 40% increase in farm power (for a 2 hour run)

The 4 Hour Run 9 Increase the simulation to 4 hours. This allows more time to process the events that come in at the beginning. Note that your prescale set still has to keep you to 800 Hz! * This may be close to reality. Marco is pushing for run trasitions at inst. lum. thresholds, to adjust prescales. First run: 230->170e30 : 3.3 hours The current farm w/out trigger leveling remains unchanged at a max of 160E30. With trigger leveling the current farm could now handle 205E30 (or 950 Hz). With the additional 40% increase in farm power from new nodes, this would get you to 250E30 (or Hz). Trigger Leveling is like doubling the farm power (for a 4 hour run)

What Kind Of Trigger Leveling? 10 All events must be processed by the end of the run, no spill over from run-to-run. Minor modifications to L3, no changes to lumi system, CR, or DL. Some changes to how the control-room looks at things... Event CPU cycles between stores can be used. Extensive changes to L3, the online system, lumi, offline, databases, etc. But is only a software obstacle.

Event Latency 11 Events are cached in the L3 Nodes (in memory, and on disk!) The easiest thing for us, and certainly for the rest of the system, is to keep them time-ordered, i.e. FirstInFirstOut This means there will be a latency: the time between L1 trigger and arrival at the online system! What does this mean for the CR, DL, etc.? Probably no problem What does this mean for the Luminosity system? No problem For a 2 hour run with current farm you can expect at worst 10 minute delay. For a 4 hour run, the delay could be as long as 45 minutes!! The delay is maximal near the middle of the run...

Event Latency 12 What does this mean for the Examines & CR data quality monitoring? Si, CFT, etc., do online data checks (tick/turn number matching, etc.). Could that sort of thing be moved into the event builder? Could a small section of the farm be dedicated to running in copy mode and passing a subset of current events, routing them to the distributor and having them ignored by the data logger. Those would be used for monitoring. The generation of any run pausing alarm from an examine is bound to be problematic. Geoff Savage will look at a catalog of these.

Early Run End 13 Worst case scenario: 1 hour into a planned 2 hour trigger leveled run, captain decides to end the run. Would have to wait ~10 minutes for the farm cache to be processed... Often runs are stopped early soon after they start, when the bad condition of the beam or detector is realized; short runs will have few events cashed anyway. Prescales are tuned now so that the initial luminosity doesn't overload the cpu capacity of the farm. With L3 leveling, prescales will be tuned so that the queues are drained at the planned run-edge.

Event Losses 14 Node Crash (not filter shell crash!!...) Worst case: Could loose 8 minutes divided by 200 nodes worth of events. Events are randomly distributed. Is current lumi DAQ good enough to catch this when it happens? No! But it's not a big deal.

Software: L3 15 We are already doing Trigger Leveling! Nodes contain an internal queue of events which is probably never more than ¼ a second deep (3 events). 4 events/second * 2500 seconds = events events * 2MB/event = MB = 20GB A lot of memory... but not a lot of disk space! To first order turning on trigger leveling is just making this queue deeper. Queue depths of 8 minutes or so are within the Linux address space capability. Longer queue times will require authoring a backing store in a disk file (instead of swap space). Probably take someone in our group one week of full time work to get this up, running, and debugged. But there could be synchronization issues required for the DL/CR etc. which would require more work from us. The routing master would have to be more clever about which nodes it sends events to...

Monitoring 16 Features will be added to FuMon Number of free / full buffers in each node queue Processing rate of queue in each node Estimated time until empty queue for each node A total farm number of free / full buffers Processing rate of total farm queue Estimated time until last empty node queue Time stamps for events as they are being sent out of L3?

Tests 17 We have already done some work on L3 node code, may test this afternoon! Additional monitoring variables of FilterShell queue sizes Codes fixes to allow for more event buffers We will try increasing the number of buffers (on a couple nodes) Tests will be done while we're commissioning the 48 nodes borrowed from CAB

Conclusions We (L3DAQ group) think this is worth a try... and no showstoppers have been found so far. You get a huge performance gain in L3 filtering power needed for running at high luminosity. (About a doubling of farm power... depending on run conditions... worth nearly $1M) It should be pretty easy to implement. (Tests will begin soon...) There are downsides in terms of control-room flexibility and simplicity...