The PHENIX Event Builder David Winter Columbia University for the PHENIX Collaboration CHEP 2004 Interlaken, Switzerland.

Slides:



Advertisements
Similar presentations
Page 1 Dorado 400 Series Server Club Page 2 First member of the Dorado family based on the Next Generation architecture Employs Intel 64 Xeon Dual.
Advertisements

Gelu M. Nita NJIT. Noise Diode Control Day/Night Attn. Ctrl. Solar Burst Attn. Ctrl. V/H RF Power Out Attn. Ctrl. Temperature Sensors.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
A Gigabit Ethernet Link Source Card Robert E. Blair, John W. Dawson, Gary Drake, David J. Francis*, William N. Haberichter, James L. Schlereth Argonne.
Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.
ACAT 2002, Moscow June 24-28thJ. Hernández. DESY-Zeuthen1 Offline Mass Data Processing using Online Computing Resources at HERA-B José Hernández DESY-Zeuthen.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.
CS 550 Amoeba-A Distributed Operation System by Saie M Mulay.
Figure 1.1 Interaction between applications and the operating system.
1 I/O Management in Representative Operating Systems.
Operating Systems.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Basic Computer Structure and Knowledge Project Work.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 2: System Structures.
Ch Review1 Review Chapter Microcomputer Systems Hardware, Software, and the Operating System.
Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.
Day 4 Understanding Hardware Partitions Linux Boot Sequence.
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
CISC105 General Computer Science Class 1 – 6/5/2006.
Jan 3, 2001Brian A Cole Page #1 EvB 2002 Major Categories of issues/work Performance (event rate) Hardware  Next generation of PCs  Network upgrade Control.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
IMPLEMENTATION OF SOFTWARE INPUT OUTPUT CONTROLLERS FOR THE STAR EXPERIMENT J. M. Burns, M. Cherney*, J. Fujita* Creighton University, Department of Physics,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
GBT Interface Card for a Linux Computer Carson Teale 1.
LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R.
Copyright © 2000 OPNET Technologies, Inc. Title – 1 Distributed Trigger System for the LHC experiments Krzysztof Korcyl ATLAS experiment laboratory H.
ATCA based LLRF system design review DESY Control servers for ATCA based LLRF system Piotr Pucyk - DESY, Warsaw University of Technology Jaroslaw.
Data Acquisition for the 12 GeV Upgrade CODA 3. The good news…  There is a group dedicated to development and support of data acquisition at Jefferson.
Control in ATLAS TDAQ Dietrich Liko on behalf of the ATLAS TDAQ Group.
The PHENIX Event Builder David Winter Columbia University for the PHENIX Collaboration DNP 2004 Chicago, IL.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
L/O/G/O Input Output Chapter 4 CS.216 Computer Architecture and Organization.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
The PHENIX Event Builder David Winter Columbia University for the PHENIX Collaboration CHEP 2004 Interlaken, Switzerland.
2003 Conference for Computing in High Energy and Nuclear Physics La Jolla, California Giovanna Lehmann - CERN EP/ATD The DataFlow of the ATLAS Trigger.
LHCb DAQ system LHCb SFC review Nov. 26 th 2004 Niko Neufeld, CERN.
A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Data Communications and Networks Chapter 9 – Distributed Systems ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.
Online Monitoring System at KLOE Alessandra Doria INFN - Napoli for the KLOE collaboration CHEP 2000 Padova, 7-11 February 2000 NAPOLI.
Week1: Introduction to Computer Networks. Copyright © 2012 Cengage Learning. All rights reserved.2 Objectives 2 Describe basic computer components and.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
David L. Winter for the PHENIX Collaboration Event Building at Multi-kHz Rates: Lessons Learned from the PHENIX Event Builder Real Time 2005 Stockholm,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
PHENIX DAQ RATES. RHIC Data Rates at Design Luminosity PHENIX Max = 25 kHz Every FEM must send in 40 us. Since we multiplex 2, that limit is 12 kHz and.
P H E N I X / R H I CQM04, Janurary 11-17, Event Tagging & Filtering PHENIX Level-2 Trigger Xiaochun He Georgia State University.
Remigius K Mommsen Fermilab CMS Run 2 Event Building.
CITA 171 Section 1 DOS/Windows Introduction. DOS Disk operating system (DOS) –Term most often associated with MS-DOS –Single-tasking operating system.
Barthélémy von Haller CERN PH/AID For the ALICE Collaboration The ALICE data quality monitoring system.
The ALICE Data-Acquisition Read-out Receiver Card C. Soós et al. (for the ALICE collaboration) LECC September 2004, Boston.
Eric Hazen1 Ethernet Readout With: E. Kearns, J. Raaf, S.X. Wu, others... Eric Hazen Boston University.
Modeling event building architecture for the triggerless data acquisition system for PANDA experiment at the HESR facility at FAIR/GSI Krzysztof Korcyl.
Modeling event building architecture for the triggerless data acquisition system for PANDA experiment at the HESR facility at FAIR/GSI Krzysztof Korcyl.
Controlling a large CPU farm using industrial tools
ALICE – First paper.
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
J.M. Landgraf, M.J. LeVine, A. Ljubicic, Jr., M.W. Schulz
Ch > 28.4.
Programming Languages
Chapter 2: Operating-System Structures
Design Principles of the CMS Level-1 Trigger Control and Hardware Monitoring System Ildefons Magrans de Abril Institute for High Energy Physics, Vienna.
EPICS: Experimental Physics and Industrial Control System
Chapter 2: Operating-System Structures
Cluster Computers.
Presentation transcript:

The PHENIX Event Builder David Winter Columbia University for the PHENIX Collaboration CHEP 2004 Interlaken, Switzerland

28 Sept 2004David Winter - The PHENIX Event Builder2Overview Introduction –Heavy Ion Collisions –The Relativistic Heavy-Ion Collider –The PHENIX experiment and its DAQ The Event Builder –Hardware –Software –Monitoring & Performance Present and Future Development Summary

28 Sept 2004David Winter - The PHENIX Event Builder3 Begin at the Beginning Big Bang: Formation of the universe approximately years ago Condensation of “normal” matter (protons and neutrons) s later Water: steam  Water  Ice Hadronic matter: free quarks and gluons  protons, neutrons, etc. $64,000 Question: Can we recreate the hot and dense matter that existed at the beginning of the universe?

28 Sept 2004David Winter - The PHENIX Event Builder4 Torturing the Nucleus: Heavy Ion Collisions “Cartoon” of what we imagine to be phase diagram of hadronic matter (Temp vs. baryon density) Lattice QCD calculations have long indicated existence of phase transition

28 Sept 2004David Winter - The PHENIX Event Builder5 Relativistic Heavy Ion Collider 3.83 km circumference Two independent rings Capable of colliding ~any nuclear species on ~any other species Energy: è 500 GeV for p-p è 200 GeV for Au-Au (per N-N collision) Luminosity –Au-Au: 2 x cm -2 s -1 –p-p : 2 x cm -2 s -1 (polarized) As if that weren’t enough, it’s visible from space!

28 Sept 2004David Winter - The PHENIX Event Builder6PHENIX The Pioneering High-Energy Nuclear Interaction Experiment Two central arms for measuring hadrons, photons and electrons Two forward arms for measuring muons Event characterization detectors in center

28 Sept 2004David Winter - The PHENIX Event Builder7 Data Collection: The Challenge High rates –(Au-Au) Collisions occur at about 10kHz, while the beam crossing rate occurs at 9.6MHz. Interest in rare physics processes –J/  (  and e channels), high p t Spectra, others Large event sizes (Run-4: >200 kb/event) How do we address these problems? –Triggering Most crossings don’t interact: need to select those that do Selection based on “interesting” physics: Event has muon tracks, high energy or momentum charged or neutral particles. –Buffering & pipelining: Need a “deadtime-less” DAQ –High Bandwidth (Run-4: up to 400 MB/s archiving) –Fast processing (Level-2 triggering, for example) –Granules and Partitions A Granule is a subsystem unit A Partition consists of one or more granules. All granules in the partition receive the same triggers Multiple partitions can run in parallel, independent of each other

28 Sept 2004David Winter - The PHENIX Event Builder8 The PHENIX DAQ FEM GTMGTM LL1 GL1 DCM SEB GBit S W I T C H ATP EVENT TEMP STORAGE FEM SEB ATP Event Builder EBC Some systems are used to make triggers Some aren’t

28 Sept 2004David Winter - The PHENIX Event Builder9

28 Sept 2004David Winter - The PHENIX Event Builder10 The Event Builder at-a-glance Run Control Front-End Electronics Data Collection Modules Event Builder Controller Sub-Event Buffer Assembly Trigger Processor Event Storage

28 Sept 2004David Winter - The PHENIX Event Builder11 Event Builder Overview Multithreaded & Distributed Three functionally distinct programs derived from the same basic object SubEvent Buffer (SEB) –One per granule –Physically connected to DCMs –Collects data for a single subsystem – a “subevent” Event Builder Controller (EBC) –Receives event notification from GL1 SEB –Assigns events to ATPs –Flushes system when event is archived Assembly Trigger Processor (ATP) –Assigned task of assembling events –Requests subevent data from each SEBs –Writes assembled events to short-term storage –Notifies EBC of event completion

28 Sept 2004David Winter - The PHENIX Event Builder12CPUs 105 1U rack-mounted dual CPU servers 28 Dell PowerEdge 1000 –1.0 GHz Dual PIII –256 MB ECC RAM –8 GB Seagate U160 SCSI HDD 77 Microway machines –2.4 GHz Dual P4 Xeon –1 GB ECC RAM –37 GB Seagate ATA100 HDD –45 Tyan MB with 400 MHz FSB –32 Supermicro MB with 533 MHz FSB

28 Sept 2004David Winter - The PHENIX Event Builder13 Network Interfaces Dell PowerEdge servers –On-board dual Intel Fast Ethernet (10/100 Mbps) controllers –17 retrofitted with Gigabit adapter add-on cards 16 Intel PRO/1000 MT Server Adapters 1 Syskonnect SK-98xx Gigabit Adapter Tyan MB based Microways –On-board combination Intel Fast Ethernet and Gigabit network controllers Supermicro MB based Microways –On-board dual Intel PRO/1000 MT Server Adapter controllers

28 Sept 2004David Winter - The PHENIX Event Builder14 Gigabit Switch Foundry FastIron 1500 Gigabit Switch Total switching capacity = 480 Gbps Jumbo Frame capable (used 9014 MTU in Run-4, Run-5 configuration under study) 15 Slots (10 in use) –2 control modules - each includes Mbps ports (module spans two slots) –6 16-port Gigabit modules (95 ports in use) 39 SEBs 52 ATPs 1 EBC, phnxbox2, phnxbox3, corba

28 Sept 2004David Winter - The PHENIX Event Builder15 JSEB Interface Card PLX 9080 PCI controller Input for JSEB cable from Partitioner Module PCI 2.1 interface: 32-bit 33 MHz with 3.3v or 5v signaling 2 banks of 1 MB static RAM Altera FLEX10k FPGA JTAG port for programming FPGA

28 Sept 2004David Winter - The PHENIX Event Builder16 JSEB Cont’d Interfaces the Partitioner Module (PM) to the SEB Data from DCMs transmitted via 50-pair 32-bit parallel cable Firmware contained in FPGA –Versions 0x11 and 0x13 used in Run 4 –Version 0x27 written for Run 5 (Pseudo) Driver provided by Jungo DMA burst mode not available in Run 4 –Version 0x27 fixes this problem DMA Burst Mode off DMA Burst (4-word) DMA Burst (continuous)

28 Sept 2004David Winter - The PHENIX Event Builder17Software Run 4 Windows NT/2000 Visual C Iona Orbix ASP 6.0 Java 1.[34].x –Run Control –Monitoring Boost 1.30.x –Threads, Mutexes, etc. Code developed on dedicated machine, distributed to cluster via FTP Run 5 Linux (SL 3.0.2) GCC/G Iona Orbix ASP 6.1 Java 1.[34].x Boost 1.30.x Code developed on any machine with proper versions of CORBA and GCC –Common file systems NFS mounted, use make install

28 Sept 2004David Winter - The PHENIX Event Builder18CORBA Common Object Request Broker Architecture The networking protocol by which the run control software components talk to each other. Based on a client/server architecture through a heterogeneous computing environment. –VxWorks, Linux, Windows NT/2000 Servers: implement “CORBA Objects” that execute the functionality described in the member functions of the object Clients: invoke local CORBA Object’s methods. This causes the server to execute the code of its corresponding member function

28 Sept 2004David Winter - The PHENIX Event Builder19 Basic Component Design Booted Configured Initialized Configuration Data from Run Control State Machine whose transitions are controlled by CORBA methods Running Monitor Data to client(s) Specific EvB components are derived from this model Event-driven Multithreaded

28 Sept 2004David Winter - The PHENIX Event Builder20Level-2 Logical place to run Level-2: ATP (contains whole event) Primitives: perform calculations, record self, available to all algorithms Algorithms: high level decisions based on primitive calculations Primitives/Algorithms saved in ATP data stream Database primitives can read from db or ASCII files Can be run during event building or as later pass over data stream ~3-400 MB/sGB/s Storage Physics Events Level-1 Throttle: Single Detector Info, Scaledowns Level-2 Throttle: Make decisions based on multiple- detector info

28 Sept 2004David Winter - The PHENIX Event Builder21 Performance Monitoring Each component keeps track of various statistics Data served via CORBA calls Java client displays stats in “real time” Strip charts display data as function of time Histograms display data as function of component

28 Sept 2004David Winter - The PHENIX Event Builder22 Run Control

28 Sept 2004David Winter - The PHENIX Event Builder23 Where does the future lie? The most important improvement we can make: Port to Linux system Linux is a “sane” platform (saner than Win32) –Win32 WAS the right platform when using ATM (ATM-free in ’03!) –Each piece performed well, but when put together suffered degraded performance –At the limit of what Win32 can provide us (Peak stable performance: ~2.2kHz, DAQ is spec’d to at least 8kHz) –Exploit the vast body of Linux knowledge that already exists in PHENIX –Common development platform eases development and maintenance –Re-evaluate architecture and cut away the “fat” Integrate the improvements made in Run-4 but never used –ATPs requesting multiple events at a time –SEB Load balancing algorithm

28 Sept 2004David Winter - The PHENIX Event Builder24 Transition to Linux Architecture Issues –Fundamentally the architecture is sound, the primary issue being excessive overhead in the OS. Therefore, we are taking a direct-porting approach, wherever possible –Use boost template library to enhance portability and performance –Use of ACE is under study, though not planned for the start of Run-5 Chosen platform is Fermilab’s Scientific Linux (currently using version 3.0.1, moving to 3.0.2) –Based on Redhat’s Enterprise Linux –GCC –Linux’s new Native POSIX Thread Library (NPTL) Linux and asynchronous I/O –Win32 implementation relied heavily on overlapped I/O –Asynch (socket) I/O basically not usable in Linux (Pseudo-)Real-time event timers not available

28 Sept 2004David Winter - The PHENIX Event Builder25 The Impact of a Linux Port Win32 Asynch Socket Win32 Synch Socket Linux Synchronous Socket Linux beats Win32 hands-down in simple socket tests JSEB read test Without network writing (top curves) With network writing (bottom curves) Win32 Completion Socket A picture is worth a thousand words

28 Sept 2004David Winter - The PHENIX Event Builder26 Implementing Under Linux All I/O is now synchronous –At first this was a major concern, BUT It greatly simplifies the architecture Linux synchronous performance much better than anticipated Timeouts (Event Timers) –In Win32 version, all sockets were UDP –Unreliability of UDP requires some form of timeout –Level-1 messages from GL1 to EBC now use TCP –Communication between ATPs and EBC now uses TCP –TCP now greatly simplifies code –Data messages are still UDP, therefore a preliminary timeout mechanism has been developed

28 Sept 2004David Winter - The PHENIX Event Builder27 The Payoff To date, all components have been ported Approximately 35% of the servers converted –Distcc is a god-send: 90k lines of code in <3 min The timeout mechanism is in the works Preliminary tests with 1 & 2 granule partitions have been performed

28 Sept 2004David Winter - The PHENIX Event Builder28Summary