M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 1 AliRoot - Pub/Sub Framework Analysis Component Interface.

Slides:



Advertisements
Similar presentations
Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Advertisements

Jochen Thäder – Kirchhoff Institute of Physics - University of Heidelberg 1 HLT Data Challenge - PC² - - Setup / Results – - Clusterfinder Benchmarks –
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
HLT - data compression vs event rejection. Assumptions Need for an online rudimentary event reconstruction for monitoring Detector readout rate (i.e.
HLT & Calibration.
Quicktime Howell Istance School of Computing De Montfort University.
Timm Morten Steinbeck, Computer Science/Computer Engineering Group Kirchhoff Institute f. Physics, Ruprecht-Karls-University Heidelberg A Framework for.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/27 A Control Software for the ALICE High Level Trigger Timm.
> IRTG – Heidelberg 2007 < Jochen Thäder – University of Heidelberg 1/18 ALICE HLT in the TPC Commissioning IRTG Seminar Heidelberg – January 2008 Jochen.
HLT Online Monitoring Environment incl. ROOT - Timm M. Steinbeck - Kirchhoff Institute of Physics - University Heidelberg 1 HOMER - HLT Online Monitoring.
Timm M. Steinbeck - Kirchhoff Institute of Physics - University Heidelberg - DPG 2005 – HK New Test Results for the ALICE High Level Trigger.
The Publisher-Subscriber Interface Timm Morten Steinbeck, KIP, University Heidelberg Timm Morten Steinbeck Technical Computer Science Kirchhoff Institute.
Matthias Richter, University of Bergen & Timm M. Steinbeck, University of Heidelberg 1 AliRoot - Pub/Sub Framework Analysis Component Interface.
Sebastian Kalcher - Kirchhoff Institute of Physics - University of Heidelberg 1 HLT Networks IP-based InfiniBand Evaluation Sebastian Kalcher.
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration Timm Morten Steinbeck, Computer Science.
Timm M. Steinbeck - Kirchhoff Institute of Physics - University Heidelberg 1 Timm M. Steinbeck HLT Data Transport Framework.
Timm Morten Steinbeck, Computer Science/Computer Engineering Group Kirchhoff Institute f. Physics, Ruprecht-Karls-University Heidelberg Alice High Level.
Jochen Thäder – Kirchhoff Institute of Physics - University of Heidelberg 1 HLT for TPC commissioning - Setup - - Status - - Experience -
Performance benchmark of LHCb code on state-of-the-art x86 architectures Daniel Hugo Campora Perez, Niko Neufled, Rainer Schwemmer CHEP Okinawa.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Basic Computer Structure and Knowledge Project Work.
TPB Models Development Status Report Presentation to the Travel Forecasting Subcommittee Ron Milone National Capital Region Transportation Planning Board.
I/O bound Jobs Multiple processes accessing the same disc leads to competition for the position of the read head. A multi -threaded process can stream.
Obsydian OLE Automation Ranjit Sahota Chief Architect Obsydian Development Ranjit Sahota Chief Architect Obsydian Development.
OPERATING SYSTEM OVERVIEW. Contents Basic hardware elements.
History Server & API Christopher Larrieu Jefferson Laboratory.
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
New and active ways to bind to your design by Kaiming Ho Fraunhofer IIS.
Experience with analysis of TPC data Marian Ivanov.
MapReduce How to painlessly process terabytes of data.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
TPC online reconstruction Cluster Finder & Conformal Mapping Tracker Kalliopi Kanaki University of Bergen.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
February 17, 2015 Software Framework Development P. Hristov for CWG13.
Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
The EDGeS project receives Community research funding 1 Porting Applications to the EDGeS Infrastructure A comparison of the available methods, APIs, and.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.
Normal text - click to edit HLT tracking in TPC Off-line week Gaute Øvrebekk.
Modeling PANDA TDAQ system Jacek Otwinowski Krzysztof Korcyl Radoslaw Trebacz Jagiellonian University - Krakow.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
FPGA Co-processor for the ALICE High Level Trigger Gaute Grastveit University of Bergen Norway H.Helstrup 1, J.Lien 1, V.Lindenstruth 2, C.Loizides 5,
October Test Beam DAQ. Framework sketch Only DAQs subprograms works during spills Each subprogram produces an output each spill Each dependant subprogram.
What is a Process ? A program in execution.
Requirements for the O2 reconstruction framework R.Shahoyan, 14/08/
Quality assurance for TPC. Quality assurance ● Process: ● Detect the problems ● Define, what is the problem ● What do we expect? ● Defined in the TDR.
PSM, Database requirements for POOL (File catalog performance requirements) Maria Girone, IT-DB Strongly based on input from experiments: subject.
Computer Operation. Binary Codes CPU operates in binary codes Representation of values in binary codes Instructions to CPU in binary codes Addresses in.
AliRoot survey: Reconstruction P.Hristov 11/06/2013.
TTCN-3 Testing and Test Control Notation Version 3.
CSL DAT Adapter CSL 2.x DAT Reference Implementation on EDMA3 hardware using EDMA3 Low level driver.
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
Synergy.cs.vt.edu VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units Shucai Xiao 1, Pavan Balaji 2, Qian Zhu 3,
CALIBRATION: PREPARATION FOR RUN2 ALICE Offline Week, 25 June 2014 C. Zampolli.
Introduction to Operating Systems Concepts
NFV Compute Acceleration APIs and Evaluation
Matthias Richter, Sebastian Kalcher, Jochen Thäder & Timm M. Steinbeck
CMS High Level Trigger Configuration Management
x86 Processor Architecture
High Performance Computing on an IBM Cell Processor --- Bioinformatics
Commissioning of the ALICE HLT, TPC and PHOS systems
ALICE HLT tracking running on GPU
Programming Languages
Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz
Preliminary design of the behavior level model of the chip
Presentation transcript:

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 1 AliRoot - Pub/Sub Framework Analysis Component Interface Matthias Richter, Sebastian Kalcher, Jochen Thäder & Timm M. Steinbeck

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 2 Overview ● System consists of three main parts – A C++ shared library with a component handler class ● Compile and callable directly from AliRoot – A number of C++ shared libraries with the actual reconstruction components themselves ● Compiled as part of AliRoot and directly callable from it – A C wrapper API that provides access to the component handler and reconstruction components ● Contained in component handler shared library ● Called by Pub/Sub wrapper component ● Makes Pub/Sub and AliRoot compiler independant – Binary compatibility – No recompile of reconstruction code

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 3 Overview HLT TPC Shared Library Clusterfinder C++ Class Tracker C++ Class Component Handler Shared Library Compon ent Handler C++ Class Load Library Initialize Compon ent Handler C Wrapper Function s (Load Comp. Lib.;) Get Component; Initialize Component Pub/Sub Framewor k Wrapper Processi ng Compon ent (Load Component Library;) Get Component; Initialize Component; Process Event AliRoot (Load Component Library;) Initialize Components; Get Components Global Clusterfinder Object Global Tracker Object Register Component ProcessEvent

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 4 Components ● Components have to implement a set of abstract functions – Return ID (string) – Return set of required input data types – Return produced output data type(s) – Process one event – Close to what is needed for Pub/Sub components ● But simplified ● One global instance of each component has to be present in shared component library – Automagic registration with global component handler object

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 5 AliRoot ● AliRoot code accesses classes in component handler shared library ● Obtains component objects from handler class ● Accesses component objects directly HLT TPC Shared Library Clusterfinder C++ Class Tracker C++ Class Component Handler Shared Library Compon ent Handler C++ Class Load Library Initialize AliRoot (Load Component Library;) Initialize Components; Get Components Global Clusterfinder Object Global Tracker Object Register Component ProcessEvent

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 6 Publisher/Subscriber ● Pub/Sub Framework uses ONE wrapper component ● Accesses handler and components via C wrapper API ● Can call multiple components in different libraries – One component per wrapper instance HLT TPC Shared Library Clusterfinder C++ Class Tracker C++ Class Component Handler Shared Library Compon ent Handler C++ Class Load Library Initialize Compon ent Handler C Wrapper Function s (Load Comp. Lib.;) Get Component; Initialize Component Pub/Sub Framewor k Wrapper Processi ng Compon ent (Load Component Library;) Get Component; Initialize Component; Process Event Global Clusterfinder Object Global Tracker Object Register Component ProcessEvent

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 7 Publisher/Subscriber : AliRootWrapperSubscri ber : C Wrapper : AliHLTComponen t : AliHLTComponentHan dler Initialization Sequence

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 8 Publisher/Subscriber : AliRootWrapperSubscri ber : C Wrapper : AliHLTComponen t : AliHLTComponentHan dler Processing Sequence

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 9 Publisher/Subscriber : AliRootWrapperSubscri ber : C Wrapper : AliHLTComponen t : AliHLTComponentHan dler Termination Sequence

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 10 Current Status ● Basic Implementation Done ● Base library with ComponentHandler and Component base class implemented ● Pub/Sub wrapper component done and working ● HLT TPC reconstruction code ported and working ● Basic AliRoot HLT Configuration scheme implemented ● Ongoing work on integration of the ComponentHandler into the data processing scheme of AliRoot

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 11 ● pp – Events ● 14 TeV, 0.5 T ● Number of Events: 1200 ● Iterations: 100 ● TestBench: SimpleComponentWrapper ● TestNodes: – HD ClusterNodes e304, e307 (PIII, 733 MHz) – HD ClusterNodes e106, e107 (PIII, 800 MHz) – HD GatewayNode alfa (PIII, 1.0 GHz) – HD ClusterNode eh001 (Opteron, 1.6 GHz) – CERN ClusterNode eh000 (Opteron, 1.8 GHz) ClusterFinder Benchmarks

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 12 Cluster Distribution

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 13 Signal Distribution

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 14 File Size Distribution

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 15 Total Distributions

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 16 Padrows & Pads per Patch

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg Average Per patch, event Filesize [Byte]# Signals# Cluster Patch Basic Results

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 18 6,54 6,01 4,81 3,99 2,90 Patch 5 [ms] 6,906,616,676,148,826,57PIII 733MHz 6,336,066,125,648,106,04PIII 800 MHz 5,114,874,904,516,654,95PIII 1,0 GHz 4,133,943,983,665,323,96Opteron 1,8 GHz 3,06 2,93 2,962,733,922,93Opteron 1,6 GHz Average [ms] Patch 4 [ms] Patch 3 [ms] Patch 2 [ms] Patch 1 [ms] Patch 0 [ms] CPU Xeon IV 3.2 GHz Timing Results

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 19 Timing Results

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 20 Timing Results ● Memory streaming benchmarks: – 1.6 GHz Opteron system ● ca. 4.3 GB/s – 1.8 GHz Opteron system ● ca. 3 GB/s ● Reason for performance drop of 1.8 GHz system compared to 1.6 GHz system ● Cause of memory performance difference unknown, currently being investigated ● Maybe related to NUMA parameter (cf. slice 23)

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 21 Tracker Timing Results ● Slice tracker, average times per slice ● Opteron 1.8 GHz (Dual MP, Dual Core): ● 1 Process: ca. 3.6 ms/slice ● Independent of CPU ● 2 Processes, different chips: ca. factor 1 ● 2 Processes, same chip, different cores: ca. factor 1.75 ● 4 Processes, all cores: ca. factor 1.83 ● Xeon 3.2 GHz (Dual MP, HyperThreading: ● Mapping to CPUs unknown for more than 1 process ● 1 Process: ca ms/slice ● 2 Processes: ca. factor 2 slower ● 3 Processes: ca. factor 3.5 slower

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 22 Timing Results – Opteron Memory ● Floating Point/Memory Microbenchmarks ● CPU loop: No effect w. multiple processes ● Linear memory read: Almost no effect ● Random memory read: – Runtime Factors: 1.33, 1.01, 1.43 ● Linear memory read and write: – Runtime Factors: 1.57, 1.12, 2.31 ● Random memory read and write: – Runtime Factors: 1.91, 1.92, 2.78 ● Linear memory write: – Runtime Factors: 1.71, 1.72, 3.48 ● Random memory write: – Runtime Factors: 1.97, 1.90, 3.76 Runtime Factors are for two processes on same chip, two processes on different chips, four processes on all cores, relative to single process

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 23 Timing Results – Opteron Memory ● Floating Point/Memory Microbenchmarks – Memory results, in particular memory writes, likely explanation f. tracker behaviour – Tasks ● Examine system memory parameters (e.g. BIOS, Linux kernel) – One critical parameter: Kernel NUMA-Awareness found not activated ● Re-evaluate/optimize tracker code wrt. memory writes – Likely problem already found, Conformal Mapping uses large memory array with widely (quasi) randomly distributed write and read accesses – Lesson for system procurement ● If possible evaluate systems/architectures wrt. pure performance AND scalability

M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 24 Price Comparison ● Opteron 1.8 GHz: – Single Core: ca. 180,- € – Dual Core: ca. 350,- € ● Xeon 3.2 GHz: – Single Core: ca. 330,- € – Dual Core: ca. 350,- € ● Mainboard prices comparable – ca for dual MP, dual core capable ● For Opterons, per core prizes for full systems: – Assumption: 1 GB memory per core – 1.8 GHz Single/Dual Core, Dual MP: ca. 800/600,- € – 2.4 GHz Single/Dual Core, Dual MP: ca. 1000/880,- € – 2.4 GHz Single/Dual Core, Quad MP: ca. 1700/1250,- €