David L. Winter for the PHENIX Collaboration Event Building at Multi-kHz Rates: Lessons Learned from the PHENIX Event Builder Real Time 2005 Stockholm,

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Page 1 Dorado 400 Series Server Club Page 2 First member of the Dorado family based on the Next Generation architecture Employs Intel 64 Xeon Dual.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.
ACAT 2002, Moscow June 24-28thJ. Hernández. DESY-Zeuthen1 Offline Mass Data Processing using Online Computing Resources at HERA-B José Hernández DESY-Zeuthen.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Figure 1.1 Interaction between applications and the operating system.
VC Sept 2005Jean-Sébastien Graulich Report on DAQ Workshop Jean-Sebastien Graulich, Univ. Genève o Introduction o Monitoring and Control o Detector DAQ.
1 I/O Management in Representative Operating Systems.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
The Application of DAQ-Middleware to the J-PARC E16 Experiment E Hamada 1, M Ikeno 1, D Kawama 2, Y Morino 1, W Nakai 3, 2, Y Obara 3, K Ozawa 1, H Sendai.
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
Jan 3, 2001Brian A Cole Page #1 EvB 2002 Major Categories of issues/work Performance (event rate) Hardware  Next generation of PCs  Network upgrade Control.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
RiceNIC: A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Dr. Scott Rixner Rice Computer Architecture:
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R.
Copyright © 2000 OPNET Technologies, Inc. Title – 1 Distributed Trigger System for the LHC experiments Krzysztof Korcyl ATLAS experiment laboratory H.
PHENIX upgrade DAQ Status/ HBD FEM experience (so far) The thoughts on the PHENIX DAQ upgrade –Slow download HBD test experience so far –GTM –FEM readout.
ATCA based LLRF system design review DESY Control servers for ATCA based LLRF system Piotr Pucyk - DESY, Warsaw University of Technology Jaroslaw.
Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.
The ALICE DAQ: Current Status and Future Challenges P. VANDE VYVRE CERN-EP/AID.
Srihari Makineni & Ravi Iyer Communications Technology Lab
ENW-9800 Copyright © PLANET Technology Corporation. All rights reserved. Dual 10Gbps SFP+ PCI Express Server Adapter.
David Abbott - JLAB DAQ group Embedded-Linux Readout Controllers (Hardware Evaluation)
The PHENIX Event Builder David Winter Columbia University for the PHENIX Collaboration DNP 2004 Chicago, IL.
1 Network Performance Optimisation and Load Balancing Wulf Thannhaeuser.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
The PHENIX Event Builder David Winter Columbia University for the PHENIX Collaboration CHEP 2004 Interlaken, Switzerland.
Online Software 8-July-98 Commissioning Working Group DØ Workshop S. Fuess Objective: Define for you, the customers of the Online system, the products.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
DEPARTEMENT DE PHYSIQUE NUCLEAIRE ET CORPUSCULAIRE JRA1 Parallel - DAQ Status, Emlyn Corrin, 8 Oct 2007 EUDET Annual Meeting, Palaiseau, Paris DAQ Status.
The PHENIX Event Builder David Winter Columbia University for the PHENIX Collaboration CHEP 2004 Interlaken, Switzerland.
2003 Conference for Computing in High Energy and Nuclear Physics La Jolla, California Giovanna Lehmann - CERN EP/ATD The DataFlow of the ATLAS Trigger.
LHCb DAQ system LHCb SFC review Nov. 26 th 2004 Niko Neufeld, CERN.
1 The PHENIX Experiment in the RHIC Run 7 Martin L. Purschke, Brookhaven National Laboratory for the PHENIX Collaboration RHIC from space Long Island,
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Stefano Magni I.N.F.N Milano Italy - NSS 2003 Pomone, a PCI based data acquisition system Authors: G.Alimonti 3, G. Chiodini 1, B. Hall 2, S. Magni 3,
ICALEPCS 2007 The Evolution of the Elettra Control System The evolution of the Elettra Control Sytem C. Scafuri, L. Pivetta.
Online Monitoring System at KLOE Alessandra Doria INFN - Napoli for the KLOE collaboration CHEP 2000 Padova, 7-11 February 2000 NAPOLI.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
CODA Graham Heyes Computer Center Director Data Acquisition Support group leader.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
PHENIX DAQ RATES. RHIC Data Rates at Design Luminosity PHENIX Max = 25 kHz Every FEM must send in 40 us. Since we multiplex 2, that limit is 12 kHz and.
P H E N I X / R H I CQM04, Janurary 11-17, Event Tagging & Filtering PHENIX Level-2 Trigger Xiaochun He Georgia State University.
Remigius K Mommsen Fermilab CMS Run 2 Event Building.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
CRISP WP18, High-speed data recording Krzysztof Wrona, European XFEL PSI, 18 March 2013.
1 Performance and Upgrade Plans for the PHENIX Data Acquisition System Martin L. Purschke, Brookhaven National Laboratory for the PHENIX Collaboration.
The Evaluation Tool for the LHCb Event Builder Network Upgrade Guoming Liu, Niko Neufeld CERN, Switzerland 18 th Real-Time Conference June 13, 2012.
CHEP 2010, October 2010, Taipei, Taiwan 1 18 th International Conference on Computing in High Energy and Nuclear Physics This research project has.
The ALICE Data-Acquisition Read-out Receiver Card C. Soós et al. (for the ALICE collaboration) LECC September 2004, Boston.
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
LHC experiments Requirements and Concepts ALICE
ALICE – First paper.
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
J.M. Landgraf, M.J. LeVine, A. Ljubicic, Jr., M.W. Schulz
Chapter 3: Windows7 Part 2.
Nuclear Physics Data Management Needs Bruce G. Gibbard
Chapter 3: Windows7 Part 2.
CS703 - Advanced Operating Systems
Event Building With Smart NICs
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
EPICS: Experimental Physics and Industrial Control System
LHCb Online Meeting November 15th, 2000
Cluster Computers.
Presentation transcript:

David L. Winter for the PHENIX Collaboration Event Building at Multi-kHz Rates: Lessons Learned from the PHENIX Event Builder Real Time 2005 Stockholm, Sweden 5-10 June 2005

10 June 2005David Winter "PHENIX EvB" RT20052 Outline RHIC and PHENIX DAQ & Event Builder –DAQ and EvB overview –Design and History of the Event Builder –How to reach 5 kHz (or die trying…) Summary

10 June 2005David Winter "PHENIX EvB" RT20053 RHIC RHIC == Relativistic Heavy-Ion Collider At Brookhaven National Laboratory, 70 miles from NYC Is visible from space Long Island, NY Not visible from space: My apartment

10 June 2005David Winter "PHENIX EvB" RT20054 Taking Data at RHIC Two independent rings (3.83 km circ) Center of Mass Energy: è 500 GeV for p-p è 200 GeV for Au-Au (per N-N collision) Luminosity –Au-Au: 2 x cm -2 s -1 –p-p : 2 x cm -2 s -1 (polarized) PHENIX faces unparalleled challenges –Needs to handle wide range of collision systems –Designed for rare probes High statistics, deadtime-less DAQ –Large, complicated detector 14 subsystems ~ 200 kB/Event (AuAu) 1.5 Billion Events MB/s ~200 kB/event kHz rate Run-4 ~170(90) kB/event CuCu(pp) 5 kHz rate ð GB/s !! Run-5

10 June 2005David Winter "PHENIX EvB" RT20055 PHENIX Pioneering High-Energy Nuclear Interaction eXperiment Two central arms for measuring hadrons, photons and electrons Two forward arms for measuring muons Event characterization detectors in center PHENIX, circa Jan 1999

10 June 2005David Winter "PHENIX EvB" RT20056 PHENIX DAQ Overview EvB A granule is a detector subsystem, including readout electronics A partition is one or more granules that receives the same triggers & busies 1 Data Collection Module Group => 1 Partitioner Module Drift Chamber FEMs Data to DCMs 14 Granules 38 DCM Groups / partitioners 1-15kB / Group 6 data loggers 32 TB local disk space

10 June 2005David Winter "PHENIX EvB" RT20057 PHENIX EvB at-a-glance Data stream from granule SEB GBit S W I T C H ATP STORAGE SEB ATP EBC Data stream from granule Control provided by Run Control (Java GUI) Partition 1 Partition 2 (x1) (x52) Suite of multithreaded client/server apps SubEvent Buffer (SEB): receives data from Data Collection Modules Event Builder Controller (EBC): receives event notification, assigns events, acknowledges event completion Assembly Trigger Processor (ATP): polls SEBs for subevents, optional stage for Level-2, writes assembled events to local storage All connected by gigabit backbone Network Backbone

10 June 2005David Winter "PHENIX EvB" RT20058 Software and Design –Pull architecture with automatic ATP load-balancing –Extensive use of C++, STL, and Boost template library –Slow control via CORBA (IONA’s Orbix) 105 1U Rack-mounted dual CPU x86 servers –1.0 GHz PIII & 2.4 GHz P4 Xeon –Gigabit NIC on PCIX (Intel PRO/1000 MT Server) Foundry FastIron 1500 Gigabit Switch –480 Gbps total switching capacity –15 Slots, 12 in use (includes 128 Gigabit ports) Key Features of EvB JSEB: custom-designed PCI card –Interface between EvB and incoming data stream –Dual 1 MB memory banks (allows simultaneous r/w) –Programmable FPGA (Altera 10K) –Latest firmware enables continuous DMA – up to 100 MB/s I/O

10 June 2005David Winter "PHENIX EvB" RT20059 PHENIX EvB’s is an evolutionary tale Circa 2000: Started as 24x4 system at 20 Hz –TCP data transfer DCMs => SEBs –EvB using Win32 + ATM EvB development started in 1997: Linux new and ATM poorly supported in it, GbitE but a gleam in the netops’ eye. Run3 (2002-3): GbitE switch deployed for data transfer (control remained in ATM domain) Run4 (2003-4): –Introduced LZO compression in data stream (at ATP) –Completed conversion to GbitE (for data AND control) Multicast used for event requests and event flushing UDP used for data communication and remaining control tasks Run5 (2004-5): continued growth to 40x40 system at >5 kHz –Completed port from Win32 to Linux (FNAL SL3.0) –All communication switched to TCP Overarching driver: need to keep EvB in operation while evolving its design to exploit advancing technology –Now have 5 years of development history under our belts

10 June 2005David Winter "PHENIX EvB" RT Run5 Goal: 5 kHz “or bust” Strategic identification of bottlenecks continued from Run4 –Requires targeted testing, ability to profile in “real time” Major decision to port 90k+ lines of code to Linux –Con: Darling of the physics community does have its limitations –Pro: Superior performance far beyond what we expected Jumbo frames: used in Run4, but dropped for Run5 Level-2 triggers to be used as filter on archived data –Last used in ATP in Run2 –Ability to log data at >300 MB/s “obsoletes” Level-2 in ATP

10 June 2005David Winter "PHENIX EvB" RT Performance bottlenecks What is our primary limitation? Our ability to: –Move data across the network –Move data across the bus (PCI or memory) Win32 Asynch Socket Win32 Synch Socket Linux Synchronous Socket Win32 Completion Socket Write Bandwidth (Bytes/s) DMA Burst Mode off DMA Burst (4-word) DMA Burst (continuous) Read speed (Mbytes/sec) Read Length (MByte) ~ 100 MB/s ~ 60 MB/s What about the JSEB?

10 June 2005David Winter "PHENIX EvB" RT Monitoring Performance Each component keeps track of various statistics Data served via CORBA calls Java client displays stats in “real time” Strip charts display data as function of time Histograms display data as function of component Critical Evaluation Tool in a Rate-Dependent Application

10 June 2005David Winter "PHENIX EvB" RT No lack of technical issues… [Vanilla] Linux: easier said than done –No asynchronous [socket] I/O (later this became a feature) –No native system level timers (available to users) –No OS Event architecture (as available in Win32) –Not all of TCP/IP stack necessarily implemented e.g. TCP high watermark doesn’t work – but also silently fails! –Proprietary Jungo ( driver (for JSEB) exposed insidious memory mapping bug in 2.4 kernel Learned the true meaning of kernel “panic” as beam time approached… Jumbo Frames: Incompatible with NFS drivers (BUT, unnecessary with Linux performance) Very little sensitivity to parameters of Intel PRO/1000 driver –Interrupt coalescence, Tx/Rx descriptors, etc. Spin Locks vs. Mutexes: choose carefully!

10 June 2005David Winter "PHENIX EvB" RT Figure of Merit: Livetime vs. Rate Hardware limit

10 June 2005David Winter "PHENIX EvB" RT Summary RHIC and PHENIX face unparalleled requirements of rate and data volume driving the design and implementation of the DAQ and the Event Builder The PHENIX Event Builder is a fully object oriented system that is critical to the success of our DAQ –All control and real-time monitoring achieved via CORBA –Introduces nearly zero overhead –Steady-state operations at >~ 5 kHz and >~ 400 MB/s in Run5 –Can do Level-2 triggering Ability to archive 100’s MB/s makes this unnecessary (for now) Future performance limitations may require it –Significant evolution over the past 5 years, successfully integrating new technologies as they have emerged The world’s fastest and highest data volume DAQ, most likely will continue to be until the LHC era Thanks to all past and present PHENIX EvB developers: B. Cole, P. Steinberg, S. Kelly, S. Batsouli, J. Jia, F. Matathias Additional thanks to W. A. Zajc for excellent background material