Implementation of the STAR Data Acquisition System using a Myrinet Network J.M. Landgraf, M.J. LeVine, A. Ljubicic, Jr., M.W. Schulz (Brookhaven National Laboratory) J.M. Nelson (University of Birmingham) C. Adler, J.S. Lange (University of Frankfurt)
First Collisions at RHIC! Star Control Room June 12th, 2000 9:00pm
Outline The STAR DAQ System Introduction to Myrinet Components Event Building Network Introduction to Myrinet Myrinet Implementation Myrinet Software (GM) STAR DAQ Software myriLib Year 2 Event Builder Performance & Reliability
STAR DAQ DAQ Readout Units L3 Event Building Network VME Crate-Based Custom RBs with ASICs & i960 CPUs Motorola MVME Detector Broker L3 Linux Farm (Compaq Alpha workstations) Physics based build decision Event Building Network Token Management Event Building Event Storage and Buffering
DAQ / L3 Event Building Network Squares: MVME / VxWorks Circles: Alpha Workstations / Linux Diamonds: Ultrasparc Workstations / Solaris
What is Myrinet? Commercial Network From Myricom (www.myri.com) Low cost (~$1K / Card, $4-6K / Switch) PCI / PMC Network Interface Cards High bandwidth (1.28 + 1.28 Gb/sec) Low Latency (13 usec) Scalable switched topology Network control performed in software Open-source MCP / Driver software
Myrinet Architecture Network Card Interface (PCI64B) Lanai processor controls network Local memory buffer Both network & PCI DMA engines Switches Cut-through wormhole routing CRC is recalculated at each stage Including header Stop/Go flow control mediated with Small slack buffer
Myrinet Throughput We Tested: 32 / 64 bit Myrinet cards VxWorks MVME 2604, MVME 2306 Linux Compaq Alpha Linux Intell Solaris Ultrasparc
Myrinet Software Myrinet driver (GM) Network mapping Each myrinet node maintains list of port offsets to each other node Dynamic and Static mapping supported Alternate routes can be forced by user Myrinet driver (GM) Variable length Messages Sender / Receiver provide buffers in advance for each size Sender / Receiver notified and must return buffer to gm Directed Sends DMA directly to host memory Receiver not notified GM imposes structure on user program Poll / Block on gm_receive() GM is not thread-safe
DAQ Software Software is Message Based ICCP Message Protocol for(;;) { msgQReceive(&msg); switch(msg.cmd) } ICCP Message Protocol 120 byte messages Standard header Sending is routed to the proper network daqMsgSend(node, &msg) node/task/domain Local Queue Myrinet Ethernet VME Each network has an associated daemon myriLib ethComLib vmComLib que[task]
myriLib DAQ library which wraps gm What does it do? Several Flavors myriMsgSend() myriMemCpy() What does it do? Manages the DMA message buffers Handles callback functions Thread synchronization Misc… (Byte order, 32 vs. 64 bit etc.) Bypasses DMA limitations on Solaris Several Flavors Threaded vs Process Buffered vs Unbuffered DMA copies
myriLib Operations Threaded (VxWorks tasks) myriLib Process myriLib with Buffering These lead to extra latency/reduced throughput for directed sends
myriMemCopy() Throughput
Multi-Sender myriLib Throughput 32-bit card MVME 2306 senders 64-bit Ultrasparc receiver
Year 2 Event Building Solaris Myrinet Cards allow us to implement the EVB on the BB Node Removes a node from the network Simplifies Software Replaces point-to-point transfer with many-to-point transfer More Memory (1.5GB vs. 256 MB) Throughput increase via multiple pftp streams (30-35 MB/Sec vs. 25 MB/Sec) Multi-CPU Ultrasparc Machine Compression on Built Events? Preliminary Results Show Improved Small Event Performance (25 evts/Sec 140 evts/Sec) Improved Throughput to BB (28 MB/Sec 100 MB/Sec)
Year 1 Performance & Reliability RHIC Data Run 3 Months Data Taking ~15 Days Integrated Stable Beam Little down time due to DAQ STAR Performance ~10 TB data ~2.03 Million Events Myrinet Performance 4 known message failures (>108) Cause not known Reported by software Resulted in aborted run No known data corruption
Au-Au Central Collision Made up title - make up your own 130 GeV Au-Au Collision viewed through the L3 Event Display Several thousand tracks Tracking in real time (~100 msec)