Presentation is loading. Please wait.

Presentation is loading. Please wait.

The BaBar Event Building and Level-3 Trigger Farm Upgrade S.Luitz, R. Bartoldus, S. Dasu, G. Dubois-Felsmann, B. Franek, J. Hamilton, R. Jacobsen, D. Kotturi,

Similar presentations


Presentation on theme: "The BaBar Event Building and Level-3 Trigger Farm Upgrade S.Luitz, R. Bartoldus, S. Dasu, G. Dubois-Felsmann, B. Franek, J. Hamilton, R. Jacobsen, D. Kotturi,"— Presentation transcript:

1 The BaBar Event Building and Level-3 Trigger Farm Upgrade S.Luitz, R. Bartoldus, S. Dasu, G. Dubois-Felsmann, B. Franek, J. Hamilton, R. Jacobsen, D. Kotturi, I. Narsky, C. O’Grady, A. Perazzo, R. Rodriguez, E. Rosenberg, A. Salnikov, M. Weaver, M. Wittgen for the BaBar Computing Group CHEP 2003 San Diego

2 3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Outline BaBar Data Acquisition Overview The Old System Why upgrade? – Upgrade Options Adapting the Software Choosing Hardware Testing in the Real Environment Installation and Tests Other Performance Improvements Results – Summary - Plans

3 3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003

4 3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 The Old System Ca. 150 Read-Out Modules (ROMs) in 23 crates, 300MHz PPC 100 MBit/s Ethernet ROM  Switch 100 MBit/s Ethernet Switch  Farm Nodes 32 333Mhz Sun Ultra5 machines in level-3 trigger farm Ca. 12ms CPU /event/node (75%CPU) Various other limitations in system 2 kHz maximum L1 trigger rate

5 3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Why Upgrade the Farm? Increasing luminosities from PEP-II Detailed projections for trigger rates and event sizes At decision time: not sure about L1 trigger upgrades Factor 2 headroom desirable Absorb background spikes and non-ideal machine conditions Have more CPU-intensive level-3 trigger algorithms Better statistics for fast monitoring Sun hardware (bought 98/99) end of life? Increased hardware failure rate Reclaim rack space

6 3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Farm Upgrade Requirements Target: 10x as much CPU power as the original 32-node Sun Ultra-5 farm (for our specific application) Gigabit Ethernet on the event building network Farm side first ROM side to be upgraded later Fit in existing 32-node rack space

7 3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Upgrade Option 1 (at decision time in 2001) Sun UltraSPARC-II 440Mhz single-CPU nodes replace existing nodes Add more nodes, maybe replace farm later X 1.1 per CPU  Re-use BaBar offline machines?  No software modifications  Very large number of machines  Factor 10 in total CPU difficult to achieve (300 machines!)  Expensive if new machines

8 3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Upgrade Option 2 (at decision time in 2001) Dual-CPU Pentium-III 1.3 Ghz Linux X 2.6 per CPU  Relatively low hardware costs  Small number of nodes  1u form factor  Little endian (byte swapping modifications)  Mixed system

9 3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Upgrade Option 3 (at decision time in 2001) Dual-CPU UltraSPARC-III 750MHz X 1.8 per CPU  No software modifications necessary  High cost (factors, only server hardware available)  4u form factor  4-CPU (or more) machines not considered because of UDP network stack and SMP scaling issues

10 3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 The Choice After extensive consideration of all options Decision to go ahead with Pentium-III and Linux Plan for 50 Dual-CPU Pentium-III machines

11 3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Adapting the Software Data Flow Retrofit endian conversion PPC and SPARC big endian, original design did not foresee byte swapping for performance reasons All byte reordering done on Linux side Bulk 32-bit swapping of whole datagrams Takes care of control and navigational information Accessing the data from Linux Payload contains byte and 2-byte aligned data Data 32-bit pre-swapped Fix up byte and 2-byte aligned structures on demand Keep on-disk formats as big endian

12 3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Choosing the Hardware Limited resources and time for evaluation Start out with systems known to be reliable for the Windows group at SLAC: Dell PowerEdge 1550 Optical Gigabit (then: no experience with copper at SLAC) Acquire a few machines for testing

13 3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Testing in Lab and Real System Test stand testing of all software Parasitic of few nodes in real system for a few months Port monitoring (SPAN) feature of switch Feed copies of production datagrams to Linux nodes – no reply required Run event building software on mirrored events No stability problems observed

14 3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Purchasing, Installation and Tests By the time the testing was completed, hardware of choice no longer available Re-test next generation machines Dell PowerEdge 1650 @ 1.4GHz  OK Purchase 50 machines late spring 2002 and install in summer shutdown Keep enough Ultra-5 in place for shutdown DAQ needs New farm: 2 ½ water cooled racks Regular shelves, stack 2 machines No significant hardware problems (1 disk, 1 main board dead on arrival)

15 3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 50 1u Farm Nodes

16 3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Other Improvements In parallel: ROM Gigabit Ethernet Originally planned for later but we realized that this could be done by the end of the shutdown too Develop optimized zero-copy UDP stack Install optical Gigabit Ethernet PMC on readout modules Split crates to balance amounts of data Improve feature extraction ROM software

17 3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Result and Summary Very smooth transition System now capable of 5.5kHz L1 accept rate at current backgrounds Original design + performance: 2kHz System working very well in routine data taking No crashes No system stability problems No hardware problems

18 3/24/03 BaBar Farm Upgrade S.Luitz CHEP 2003 Further Improvements and Longer Term Plans Improvements Multi-CPU support Single L3 worker thread Run more than 1 L3 process per node Currently being implemented Migrating more software to Linux Longer Term Plans Keep Sun server infrastructure, however look into Linux as file servers Replace more systems with Linux machines


Download ppt "The BaBar Event Building and Level-3 Trigger Farm Upgrade S.Luitz, R. Bartoldus, S. Dasu, G. Dubois-Felsmann, B. Franek, J. Hamilton, R. Jacobsen, D. Kotturi,"

Similar presentations


Ads by Google