Jan 3, 2001Brian A Cole Page #1 EvB 2002 Major Categories of issues/work Performance (event rate) Hardware  Next generation of PCs  Network upgrade Control.

Slides:



Advertisements
Similar presentations
CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Resource Containers: A new Facility for Resource Management in Server Systems G. Banga, P. Druschel,
Advertisements

Introduction CSCI 444/544 Operating Systems Fall 2008.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
CMPT 300: Operating Systems I Dr. Mohamed Hefeeda
Chapter 7 Protocol Software On A Conventional Processor.
1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Dr. Mohamed Hefeeda.
Reliability Week 11 - Lecture 2. What do we mean by reliability? Correctness – system/application does what it has to do correctly. Availability – Be.
W4118 Operating Systems OS Overview Junfeng Yang.
Figure 1.1 Interaction between applications and the operating system.
G Robert Grimm New York University Receiver Livelock.
COM S 614 Advanced Systems Novel Communications U-Net and Active Messages.
Can Google Route? Building a High-Speed Switch from Commodity Hardware Guido Appenzeller, Matthew Holliman Q2/2002.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
Chapter 8 Input/Output. Busses l Group of electrical conductors suitable for carrying computer signals from one location to another l Each conductor in.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
5 Feb 2002Alternative Ideas for the CALICE Backend System 1 Alternative Ideas for the CALICE Back-End System Matthew Warren and Gordon Crone University.
Is Lambda Switching Likely for Applications? Tom Lehman USC/Information Sciences Institute December 2001.
Hosting Virtual Networks on Commodity Hardware VINI Summer Camp.
Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Planning and Designing Server Virtualisation.
DLS Digital Controller Tony Dobbing Head of Power Supplies Group.
© 2010 IBM Corporation Plugging the Hypervisor Abstraction Leaks Caused by Virtual Networking Alex Landau, David Hadas, Muli Ben-Yehuda IBM Research –
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
14 Sep 2005DAQ - Paul Dauncey1 Tech Board: DAQ/Online Status Paul Dauncey Imperial College London.
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
Example: Sorting on Distributed Computing Environment Apr 20,
I/O management is a major component of operating system design and operation Important aspect of computer operation I/O devices vary greatly Various methods.
CDF Offline Production Farms Stephen Wolbers for the CDF Production Farms Group May 30, 2001.
The BaBar Event Building and Level-3 Trigger Farm Upgrade S.Luitz, R. Bartoldus, S. Dasu, G. Dubois-Felsmann, B. Franek, J. Hamilton, R. Jacobsen, D. Kotturi,
Srihari Makineni & Ravi Iyer Communications Technology Lab
The PHENIX Event Builder David Winter Columbia University for the PHENIX Collaboration DNP 2004 Chicago, IL.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
1 Network Performance Optimisation and Load Balancing Wulf Thannhaeuser.
3rd April 2001A.Polini and C.Youngman1 GTT status Items reviewed: –Results of GTT tests with 3 MVD-ADC crates. Aims Hardware and software setup used Credit.
1 Presented By: Eyal Enav and Tal Rath Eyal Enav and Tal Rath Supervisor: Mike Sumszyk Mike Sumszyk.
Virtual Infrastructure By: Andy Chau Farzana Mohsini Anya Mojiri Virginia Nguyen Bobby Phimmasane.
The PHENIX Event Builder David Winter Columbia University for the PHENIX Collaboration CHEP 2004 Interlaken, Switzerland.
LHCb front-end electronics and its interface to the DAQ.
2003 Conference for Computing in High Energy and Nuclear Physics La Jolla, California Giovanna Lehmann - CERN EP/ATD The DataFlow of the ATLAS Trigger.
Using Heterogeneous Paths for Inter-process Communication in a Distributed System Vimi Puthen Veetil Instructor: Pekka Heikkinen M.Sc.(Tech.) Nokia Siemens.
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
Supporting Multimedia Communication over a Gigabit Ethernet Network VARUN PIUS RODRIGUES.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
11/11/2003Update on Pomone1 Dario Menasce, Stefano Magni, Lorenzo Uplegger.
Scaling up from local DB to distributed DB Cristiano Bozza European Emulsion Group Nagoya, Jan 2004 Presented by Giuseppe Grella.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
David L. Winter for the PHENIX Collaboration Event Building at Multi-kHz Rates: Lessons Learned from the PHENIX Event Builder Real Time 2005 Stockholm,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
PHENIX DAQ RATES. RHIC Data Rates at Design Luminosity PHENIX Max = 25 kHz Every FEM must send in 40 us. Since we multiplex 2, that limit is 12 kHz and.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Back-end Electronics Upgrade TileCal Meeting 23/10/2009.
Dr D. Greer, Queens University Belfast ) Software Engineering Chapter 7 Software Architectural Design Learning Outcomes Understand.
Virtual Machine Monitors
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
Electronics Trigger and DAQ CERN meeting summary.
CS 286 Computer Organization and Architecture
GEOMATIKA UNIVERSITY COLLEGE CHAPTER 2 OPERATING SYSTEM PRINCIPLES
Computer networking In this section of notes you will learn the rudiments of networking, the components of a network and how to secure a network.
Event Building With Smart NICs
Co-designed Virtual Machines for Reliable Computer Systems
LHCb Online Meeting November 15th, 2000
Chapter 13: I/O Systems.
Client-Server Systems
Cluster Computers.
Presentation transcript:

Jan 3, 2001Brian A Cole Page #1 EvB 2002 Major Categories of issues/work Performance (event rate) Hardware  Next generation of PCs  Network upgrade Control (Sean)  Solving CORBA speed/reliability problems  Detailed monitoring of components Infrastructure (Sean)  Compiler, STL implementation  Error logging

Jan 3, 2001Brian A Cole Page #2 EvB Performance Current Status Data rate  Have achieved rates of > 150 Mbyte/s.  No throughput limitations observed so far.  Have x3 headroom w/ existing hardware. Event rate  Currently limited to ~ 1.3 kHz for full system.  Individual SEBs can run as fast as 2.3 kHz.  e.g. GL1 SEB On 933 MHz machines  More typically: 1.8 kHz for “larger” granules  Evidence that event rate depends on data volume (issue w/ PCI performance ?)

Jan 3, 2001Brian A Cole Page #3 EvB Performance – Event Rates How to increase event rates? Use full functionality of JSEB  Multiple event buffering  Interrupts instead of polling  Effects of these changes unknown.  But will be studied in existing system post-run. Code improvements  Have a number of code improvements “in hand” that we have not been able to fully test.  Many more places for optimization.  Reasonably achievable goal:  > 3 kHz with existing hardware

Jan 3, 2001Brian A Cole Page #4 EvB Performance – Event Rates (2) Hardware upgrade  Dual processor machines  factor of 2 (will verify after run using ATP PCs)  Faster CPUs: e.g. > 2 GHz P4 w/ DDR memory  SEB performance scales  CPU clock speed.  factor of 2 Plan: 3 kHz  2 (dual)  2 (cpu) = 12 kHz Contingency  JSEB optimization  Better compiler (?) & improved STL  Improved PCI performance (66 MHz, 64 bit)  Will study after run using ATP PCs

Jan 3, 2001Brian A Cole Page #5 Upgrade Goals as of Nov 01 Goal of original PHENIX CDR: 2 Gbyte/s  Based on x10 “design” luminosity  Since then “design” has increased by a factor of 2 (4 if 116 bunches works)  Reasonable to expect that RHIC may reach x10 original design in next few years Nominally: increase bandwidth by factor of 4  Increase data throughput of SEBs  While keeping #  constant  Increase number & throughput of ATPs Tony’s 10 kHz * 150 Mbyte/s = 1.5 Gbyte/s

Jan 3, 2001Brian A Cole Page #6 EvB Upgrade goals – How to Achieve Nominally Upgrade to 40 Gbit/s switch. SEBs w/ OC-12 connections (> 60 Mbyte/s) 128 ATPs w/ OC-12 connections. Problem:  prohibitively expensive if we buy from FORE  ~ $500k for ATM hardware Alternative Switch to Gbit ethernet.  > ATM solution for ¼ cost

Jan 3, 2001Brian A Cole Page #7 EvB Upgrade – Gbit Ethernet Possible implementation HP Procurve (256 Gbit/s)  Provides 64 Gbit ethernet ports  Inefficient to equip w/ fast ethernet – use Gbit for ATPs or split w/smaller switches. In principle, get 4 Gbyte/s throughput Cost estimate

Jan 3, 2001Brian A Cole Page #8 Practical Considerations Technical Issues There were (are) some good reasons for preferring ATM over ethernet: 1)Multicasting 2)Flow control (to protect against output port overloading) 3)Hardware CRC and frame re-assembly Professional performance studies:  Layer-3 switches now available can handle #1 & #2  #3 is still an issue. But some NICs moving this functionality to hard(firm)ware. Would still choose ATM but for cost difference

Jan 3, 2001Brian A Cole Page #9 Practical Considerations How hard would the switch to Gbit be ? In principle – not that hard.  Implementation of ATM communication uses socket interface (Winsock2)  Change to TCP/UDP “straight-forward” as we already have C++ classes that wrap Winsock2 TCP sockets.  Winsock provides mechanism for establishing multi-cast groups/flow control. If we stick with NT for now, migration to Gbit ethernet should be easy.

Jan 3, 2001Brian A Cole Page #10 Gbit Ethernet – Study Plan How to make sure Gbit ethernet will work 1.Study single PC send/receive rates a)Use ATP machines (66 MHz, 64 bit PCI) b)Evaluate for both TCP and UDP i.achievable rates ii.CPU utilization c)Check performance of different NICs. d)Learn how to use flow control in Winsock. 2.Experiment w/ fast ethernet using Martin’s switch (?) a)In same machines compare w/ ATM. 3.Test JSEB reading/sending to evaluate PCI

Jan 3, 2001Brian A Cole Page #11 EvB – Robustness Issues SEBs  Hanging/crashing (unhandled exception) on bad FEM data.  Simply needs further debugging  GL1 data errors, FEM bad event numbers  Need logging system to alert operator  Unhandled exception in transmit thread (rare but more frequent at high rate)  Needs debugging. EBC  Problem with end run in optimized code  Disappears in v3_0c

Jan 3, 2001Brian A Cole Page #12 EvB Robustness (cont) EBC (cont)  Occasional crashes w/ high-rate Au-Au running  Source of problem understood. Needs work to clean up end run procedure in EBC. ATP  Problem with ET failures largely solved.  Problem w/ ATP thread killing/restarting (Sean)  Problem with ATPs starting to generate unhandled exceptions every event high rate)  Needs debugging time.  ATP access to Obj database  Need to delay database initialization

Jan 3, 2001Brian A Cole Page #13 EvB Open Issues / Other work CORBA  Slow setup of Corba connections (Sean)  Upgrading Orbix on EvB machines Logging  Critically important for reporting failures (GL1, FEM, EvB, …) to operator. Flow control  Need to automate flow control settings. Use (redundant) application/file server  Simplify system setup.  Avoid need to manually distribute code.