The BaBar Event Building and Level-3 Trigger Farm Upgrade S.Luitz, R. Bartoldus, S. Dasu, G. Dubois-Felsmann, B. Franek, J. Hamilton, R. Jacobsen, D. Kotturi,

Slides:



Advertisements
Similar presentations
Paging: Design Issues. Readings r Silbershatz et al: ,
Advertisements

Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.
Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.
Future Dataflow Bottlenecks Christopher O’Grady with A. Perazzo and M. Weaver Babar Dataflow Group.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.
ACAT 2002, Moscow June 24-28thJ. Hernández. DESY-Zeuthen1 Offline Mass Data Processing using Online Computing Resources at HERA-B José Hernández DESY-Zeuthen.
OEP infrastructure issues Gregory Dubois-Felsmann Trigger & Online Workshop Caltech 2 December 2004.
Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.
Emlyn Corrin, DPNC, University of Geneva EUDAQ Status of the EUDET JRA1 DAQ software Emlyn Corrin, University of Geneva 1.
Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.
D0 Run IIb Review 15-Jul-2004 Run IIb DAQ / Online status Stu Fuess Fermilab.
Engineering & Instrumentation Department, ESDG, Rob Halsall, 24th February 2005CFI/Confidential CFI - Opto DAQ - Status 24th February 2005.
Jan 3, 2001Brian A Cole Page #1 EvB 2002 Major Categories of issues/work Performance (event rate) Hardware  Next generation of PCs  Network upgrade Control.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Hall A DAQ status and upgrade plans Alexandre Camsonne Hall A Jefferson Laboratory Hall A collaboration meeting June 10 th 2011.
Author George Peck EVLA System PDR December 4-5, EVLA Monitor and Control Hardware.
Virtualization in the Data Center Virtual Servers – How it works – Pros – Cons IPAC’s implementation – Hardware resource usage and trends – Virtualization.
Status and plans for online installation LHCb Installation Review April, 12 th 2005 Niko Neufeld for the LHCb Online team.
PHENIX upgrade DAQ Status/ HBD FEM experience (so far) The thoughts on the PHENIX DAQ upgrade –Slow download HBD test experience so far –GTM –FEM readout.
Data Acquisition for the 12 GeV Upgrade CODA 3. The good news…  There is a group dedicated to development and support of data acquisition at Jefferson.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
KHz-SLR PC Board Eastbourne, October 2005 for kHz SLR Complete PC Board.
Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
Beam Line BPM Filter Module Nathan Eddy May 31, 2005.
June 17th, 2002Gustaaf Brooijmans - All Experimenter's Meeting 1 DØ DAQ Status June 17th, 2002 S. Snyder (BNL), D. Chapin, M. Clements, D. Cutts, S. Mattingly.
1 EIR Nov 4-8, 2002 DAQ and Online WBS 1.3 S. Fuess, Fermilab P. Slattery, U. of Rochester.
Reliability of KLOE Computing Paolo Santangelo for the KLOE Collaboration INFN LNF Commissione Scientifica Nazionale 1 Roma, 13 Ottobre 2003.
October Test Beam DAQ. Framework sketch Only DAQs subprograms works during spills Each subprogram produces an output each spill Each dependant subprogram.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
R. Fantechi. Shutdown work Refurbishment of transceiver power supplies Work almost finished in Orsay Small crisis 20 days ago due to late delivery of.
APEX DAQ rate capability April 19 th 2015 Alexandre Camsonne.
CERN IT Department CH-1211 Genève 23 Switzerland t SL(C) 5 Migration at CERN CHEP 2009, Prague Ulrich SCHWICKERATH Ricardo SILVA CERN, IT-FIO-FS.
CODA Graham Heyes Computer Center Director Data Acquisition Support group leader.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
1 Run IIb Event Builder upgrade Director’s Review Jan Steve TetherRon RechenmacherMarkus KluteBruce Knuteson MITFNAL/CDMITMIT RonSteve Motivation.
The BaBar Online Detector Control System Upgrade Matthias Wittgen, SLAC.
Studies of LHCb Trigger Readout Network Design Karol Hennessy University College Dublin Karol Hennessy University College Dublin.
ROM. ROM functionalities. ROM boards has to provide data format conversion. – Event fragments, from the FE electronics, enter the ROM as serial data stream;
10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.
Remigius K Mommsen Fermilab CMS Run 2 Event Building.
The Evaluation Tool for the LHCb Event Builder Network Upgrade Guoming Liu, Niko Neufeld CERN, Switzerland 18 th Real-Time Conference June 13, 2012.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Chapter 1: Introduction to the Personal Computer
Balazs Voneki CERN/EP/LHCb Online group
BaBar Transition: Computing/Monitoring
Software for Testing the New Injector BLM Electronics
Installation 1. Installation Sources
Electronics Trigger and DAQ CERN meeting summary.
Distributed Network Traffic Feature Extraction for a Real-time IDS
LHC experiments Requirements and Concepts ALICE
ATF/ATF2 Control System
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
CMS DAQ Event Builder Based on Gigabit Ethernet
Trigger, DAQ, & Online: Perspectives on Electronics
DCH Electronics Upgrade: Overview and Status
Instructor Materials Chapter 1: Introduction to the Personal Computer
The LHCb Event Building Strategy
Special edition: Farewell for Valerie Halyo
Example of DAQ Trigger issues for the SoLID experiment
Special edition: Farewell for Stephen Bailey
Systems Analysis and Design
Special edition: Farewell for Eunil Won
High-Performance Storage System for the LHCb Experiment
Network Processors for a 1 MHz Trigger-DAQ System
Presentation transcript:

The BaBar Event Building and Level-3 Trigger Farm Upgrade S.Luitz, R. Bartoldus, S. Dasu, G. Dubois-Felsmann, B. Franek, J. Hamilton, R. Jacobsen, D. Kotturi, I. Narsky, C. O’Grady, A. Perazzo, R. Rodriguez, E. Rosenberg, A. Salnikov, M. Weaver, M. Wittgen for the BaBar Computing Group CHEP 2003 San Diego

3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Outline BaBar Data Acquisition Overview The Old System Why upgrade? – Upgrade Options Adapting the Software Choosing Hardware Testing in the Real Environment Installation and Tests Other Performance Improvements Results – Summary - Plans

3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003

3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 The Old System Ca. 150 Read-Out Modules (ROMs) in 23 crates, 300MHz PPC 100 MBit/s Ethernet ROM  Switch 100 MBit/s Ethernet Switch  Farm Nodes Mhz Sun Ultra5 machines in level-3 trigger farm Ca. 12ms CPU /event/node (75%CPU) Various other limitations in system 2 kHz maximum L1 trigger rate

3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Why Upgrade the Farm? Increasing luminosities from PEP-II Detailed projections for trigger rates and event sizes At decision time: not sure about L1 trigger upgrades Factor 2 headroom desirable Absorb background spikes and non-ideal machine conditions Have more CPU-intensive level-3 trigger algorithms Better statistics for fast monitoring Sun hardware (bought 98/99) end of life? Increased hardware failure rate Reclaim rack space

3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Farm Upgrade Requirements Target: 10x as much CPU power as the original 32-node Sun Ultra-5 farm (for our specific application) Gigabit Ethernet on the event building network Farm side first ROM side to be upgraded later Fit in existing 32-node rack space

3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Upgrade Option 1 (at decision time in 2001) Sun UltraSPARC-II 440Mhz single-CPU nodes replace existing nodes Add more nodes, maybe replace farm later X 1.1 per CPU  Re-use BaBar offline machines?  No software modifications  Very large number of machines  Factor 10 in total CPU difficult to achieve (300 machines!)  Expensive if new machines

3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Upgrade Option 2 (at decision time in 2001) Dual-CPU Pentium-III 1.3 Ghz Linux X 2.6 per CPU  Relatively low hardware costs  Small number of nodes  1u form factor  Little endian (byte swapping modifications)  Mixed system

3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Upgrade Option 3 (at decision time in 2001) Dual-CPU UltraSPARC-III 750MHz X 1.8 per CPU  No software modifications necessary  High cost (factors, only server hardware available)  4u form factor  4-CPU (or more) machines not considered because of UDP network stack and SMP scaling issues

3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 The Choice After extensive consideration of all options Decision to go ahead with Pentium-III and Linux Plan for 50 Dual-CPU Pentium-III machines

3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Adapting the Software Data Flow Retrofit endian conversion PPC and SPARC big endian, original design did not foresee byte swapping for performance reasons All byte reordering done on Linux side Bulk 32-bit swapping of whole datagrams Takes care of control and navigational information Accessing the data from Linux Payload contains byte and 2-byte aligned data Data 32-bit pre-swapped Fix up byte and 2-byte aligned structures on demand Keep on-disk formats as big endian

3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Choosing the Hardware Limited resources and time for evaluation Start out with systems known to be reliable for the Windows group at SLAC: Dell PowerEdge 1550 Optical Gigabit (then: no experience with copper at SLAC) Acquire a few machines for testing

3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Testing in Lab and Real System Test stand testing of all software Parasitic of few nodes in real system for a few months Port monitoring (SPAN) feature of switch Feed copies of production datagrams to Linux nodes – no reply required Run event building software on mirrored events No stability problems observed

3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Purchasing, Installation and Tests By the time the testing was completed, hardware of choice no longer available Re-test next generation machines Dell PowerEdge 1.4GHz  OK Purchase 50 machines late spring 2002 and install in summer shutdown Keep enough Ultra-5 in place for shutdown DAQ needs New farm: 2 ½ water cooled racks Regular shelves, stack 2 machines No significant hardware problems (1 disk, 1 main board dead on arrival)

3/24/03 BaBar Farm Upgrade S.Luitz CHEP u Farm Nodes

3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Other Improvements In parallel: ROM Gigabit Ethernet Originally planned for later but we realized that this could be done by the end of the shutdown too Develop optimized zero-copy UDP stack Install optical Gigabit Ethernet PMC on readout modules Split crates to balance amounts of data Improve feature extraction ROM software

3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Result and Summary Very smooth transition System now capable of 5.5kHz L1 accept rate at current backgrounds Original design + performance: 2kHz System working very well in routine data taking No crashes No system stability problems No hardware problems

3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Further Improvements and Longer Term Plans Improvements Multi-CPU support Single L3 worker thread Run more than 1 L3 process per node Currently being implemented Migrating more software to Linux Longer Term Plans Keep Sun server infrastructure, however look into Linux as file servers Replace more systems with Linux machines