Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago) 10.27.2010.

Slides:



Advertisements
Similar presentations
Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster Xianyi Zhang 1),2) and Yunquan Zhang 1),3) 1) Laboratory of Parallel Software.
Advertisements

Slink TX Slink merger Teststand setup Slink merger firmware and testing Firmware features - DAQ RAMs for saving out input and output data - Possible to.
Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:
Intro Test 2 – Chapters 3,4 & Word Sample Questions SPRING 2005.
Summary Ted Liu, FNAL Feb. 9 th, 2005 L2 Pulsar 2rd IRR Review, ICB-2E, video: 82Pulsar
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
A presentation by Angela Little SULI Program 8/4/04 Pulsar Boards and the Level II Trigger Upgrade at CDF.
Introduction. 2 What Is SmartFlow? SmartFlow is the first application to test QoS and analyze the performance and behavior of the new breed of policy-based.
1 Pulsar firmware status March 12th, 2004 Overall firmware status Pulsar Slink formatter Slink merger Muon Reces SVT L2toTS Transmitters How to keep firmware.
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
Router Architectures An overview of router architectures.
Router Architectures An overview of router architectures.
How Computers Work. A computer is a machine f or the storage and processing of information. Computers consist of hardware (what you can touch) and software.
Status of the digital readout electronics Mauro Raggi and F. Gonnella LNF Photon Veto WG CERN 13/12/2011.
Parts of a Computer.
Emulator System for OTMB Firmware Development for Post-LS1 and Beyond Aysen Tatarinov Texas A&M University US CMS Endcap Muon Collaboration Meeting October.
Practical PC, 7th Edition Chapter 17: Looking Under the Hood
Unit - 1 Basic Computer Architecture P. Sugin Benzigar.
D75P 34R HNC Computer Architecture 1 Week 9 The Processor, Busses and Peripherals © C Nyssen/Aberdeen College 2003 All images © C Nyssen /Aberdeen College.
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R.
RPC PAC Trigger system installation and commissioning How we make it working… On-line software Resistive Plate Chambers Link Boxes Optical Links Synchronization.
YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.
Cluster Finder Report Laura Sartori (INFN Pisa) For the L2Cal Team Chicago, Fermilab, Madrid, Padova, Penn, Pisa, Purdue.
Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Online Calibration of the D0 Vertex Detector Initialization Procedure and Database Usage Harald Fox D0 Experiment Northwestern University.
Technical Part Laura Sartori. - System Overview - Hardware Configuration : description of the main tasks - L2 Decision CPU: algorithm timing analysis.
Chapter 19 Upgrading and Expanding Your PC. 2Practical PC 5 th Edition Chapter 19 Getting Started In this Chapter, you will learn: − If you can upgrade.
Status of Global Trigger Global Muon Trigger Sept 2001 Vienna CMS-group presented by A.Taurok.
Commissioning Experience and Status Burkard Reisert (FNAL) L2 installation readiness review:
Svtsim status Bill Ashmanskas, CDF simulation meeting, Main authors: Ashmanskas, Belforte, Cerri, Nakaya, Punzi Design goals/features: –main.
For more information on Pulsar board: Burkard Reisert (FNAL) Nov. 7 th, 2003 PULSAR Production Readiness.
5/7/2004Tomi Mansikkala User guide for SVT/XTRP TX firmware v1.0 XTRP out Control FPGA Tomi: - Introduction - Control bit descriptions - Test Pattern format.
Simple ideas on how to integrate L2CAL and L2XFT ---> food for thoughts Ted May 25th, 2007.
Latest ideas in DAQ development for LHC B. Gorini - CERN 1.
LKR Working group Introduction R. Fantechi October 27 th, 2009.
1Malcolm Ellis - Tracker Meeting - 28th November 2006 Electronics - Station Acceptance  Hardware: u 1 MICE cryostat with 1 VLPC cassette. u VME crate,
June 17th, 2002Gustaaf Brooijmans - All Experimenter's Meeting 1 DØ DAQ Status June 17th, 2002 S. Snyder (BNL), D. Chapin, M. Clements, D. Cutts, S. Mattingly.
Project Overview Ted Liu Fermilab Sept. 27 th, 2004 L2 Pulsar upgrade IRR Review
Online monitor for L2 CAL upgrade Giorgio Cortiana Outline: Hardware Monitoring New Clusters Monitoring
New L2cal hardware and CPU timing Laura Sartori. - System overview - Hardware Configuration: a set of Pulsar boards receives, preprocess and merges the.
Pulsar Status For Peter. L2 decision crate L1L1 TRACKTRACK SVTSVT CLUSTERCLUSTER PHOTONPHOTON MUONMUON Magic Bus α CPU Technical requirement: need a FAST.
A Super-TFC for a Super-LHCb (II) 1. S-TFC on xTCA – Mapping TFC on Marseille hardware 2. ECS+TFC relay in FE Interface 3. Protocol and commands for FE/BE.
XLV INTERNATIONAL WINTER MEETING ON NUCLEAR PHYSICS Tiago Pérez II Physikalisches Institut For the PANDA collaboration FPGA Compute node for the PANDA.
Week1: Introduction to Computer Networks. Copyright © 2012 Cengage Learning. All rights reserved.2 Objectives 2 Describe basic computer components and.
1 FTK AUX Design Review Functionality & Specifications M. Shochet November 11, 2014AUX design review.
L2 CAL Status Vadim Rusu For the magnificent L2CAL team.
Pulsar Hardware Status Burkard Reisert (FNAL) March, 14 th 2003.
XTRP Software Nathan Eddy University of Illinois 2/24/00.
Studies of LHCb Trigger Readout Network Design Karol Hennessy University College Dublin Karol Hennessy University College Dublin.
Evelyn Thomson Ohio State University Page 1 XFT Status CDF Trigger Workshop, 17 August 2000 l XFT Hardware status l XFT Integration tests at B0, including:
ROM. ROM functionalities. ROM boards has to provide data format conversion. – Event fragments, from the FE electronics, enter the ROM as serial data stream;
FTK high level simulation & the physics case The FTK simulation problem G. Volpi Laboratori Nazionali Frascati, CERN Associate FP07 MC Fellow.
GUIDO VOLPI – UNIVERSITY DI PISA FTK-IAPP Mid-Term Review 07/10/ Brussels.
Jun Doi IBM Research – Tokyo Early Performance Evaluation of Lattice QCD on POWER+GPU Cluster 17 July 2015.
CPU Central Processing Unit
Unit 2 Technology Systems
M. Bellato INFN Padova and U. Marconi INFN Bologna
NFV Compute Acceleration APIs and Evaluation
Intelligent trigger for Hyper-K
Initial check-out of Pulsar prototypes
CLAS12 DAQ & Trigger Status
Pulsar 2b AMchip05-based Pattern Recognition Mezzanine
A New Clock Distribution/Topology Processor Module for KOTO (CDT)
CPU Central Processing Unit
CPU Central Processing Unit
Chapter 2: Computer Hardware
Network Processors for a 1 MHz Trigger-DAQ System
Presentation transcript:

Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago)

Outline Overview of previous work done for calculations done by a CPU –Description of test stand and components in our setup –Latency measurements for a track fitting algorithm measured by PULSARS and internal timing in CPU Preliminary studies on latency measurements for calculations done by GPU –Comparisons with CPU –Future work

Goals of Previous Work done with CPU Goals: –Restore CDF L2 test stand to working state –Configure pulsar boards to transmit and receive test patterns –Run simplified linear track fitting algorithm on CPU Input read in from test patterns sent via S-LINK –Measure latency using internal CPU timing functions and PULSAR boards Work served as required experimental project for Ho Ling Li (now 2 nd year UChicago grad student) –Help from Jian Tang (UChicago), Pierluigi Catastini and Ted Liu (FNAL)

Flow Chart of Test Stand Setup AUX Card FILAR SOLAR GPU Memory CPU S-LINK Tx S-LINK Rx

Physical Test Stand Setup Pulsars housed in VME crate –Tools exist to communicate/load code into crate –That code controls run configurations PC is a retired L2 Linux Machine –Equipped with FILAR and SOLAR cards to receive/send S-LINK packets “Runs” occur using CDF RunControl DAQ software –Level 1 Accept prompts sending of loaded test patterns

PULSARS –PULSer And Recorder –Highly configurable Special purpose firmware loaded into FPGAs, defining board function –Used for variety of purposes in L2 trigger at CDF S-LINK Tx –Test patterns loaded into board, send on L1A AUX card –Attached to back of Tx –Sends out multiple copies of S-LINK packets S-LINK Rx –Fitted with 4 mezzanine cards that read in S-LINK packets –Measure time (to 100 ns) after L1A a packet was received S-LINK Tx AUX Card S-LINK Rx The PULSARS S-LINK Card

FILAR –Four Input Links for Atlas Readout –Accepts S-LINK packets, stored into PC memory on arrival SOLAR –Single Output Link for Atlas Readout –Sends out specified memory in S-LINK format FILAR and SOLAR cards connect to PC via PCI-X slots PC FILAR SOLAR FILAR and SOLAR Cards FILAR

The PC –2.4 GHz processor speed –Pre-developed tools from L2 testing for… Reading in from FILAR Sending out along SOLAR Internal timing Track Fitting Procedure 1. Copy in “track” data from S-LINK package 2. Retrieve constant set used for evaluating fit parameters 3. Run (linear) track fitting algorithm to calculate fit parameters 4. Store calculated parameters (and internal timing info) to be sent on SOLAR PC FILAR SOLAR PC and Track Fitting Algorithm

Latency Measurement Strategy From PULSARS –Record arrival time of packet coming straight from AUX Card –Record arrival time of packet coming from PC Checking fit parameter evaluation has been done –Difference is time for PC evaluation (neglecting extra cable time, which is small) From PC –Place time stamps around running of algorithm –Output difference along S-LINK Determine latency for various iterations of fitting algorithm (only step 3 from previous slide) –Model as T PC = n T alg + T O

Sample PULSAR Latency Measurements Track fitting algorithm run once. Track fitting algorithm not run (read-in then read-out).

Algorithm Times as Measured in PULSAR and PC Linear ScaleLog Scale

Internal Timing Measurements Having validated CPU internal timing, place time stamps around various steps of track fitting procedure Fitting algorithm run only once.Fitting algorithm run 100 times.

New Work with GPU Recently got new machine capable of housing a GPU –NVIDIA GTX 285 (for computations) –eVGA e-GeForce 9500 GT (for display) –Intel Core i7 Processor, 2.80 GHz –6 GB RAM –2 PCIe slots (GPUs) and 2 PCI-X slots (FILAR and SOLAR) Use CUDA tools/framework to run same linear track fitting algorithm for multiple tracks in a GPU –Focus so far with getting things running with same simple code –Plenty of optimization to go with just simple code, even more when we complicate the fitting procedure

Recent Results with Internal Timing Measurements

Conclusion and Outlook Developed setup at test stand to measure latency of track fitting algorithm in CPU –Can include full readout times via timing information in PULSARS Have new machine capable of housing GPU, FILAR, and SOLAR cards –Makes possible doing latency measurements for calculations done in GPU –Can compare with similar calculations in CPU Near Future –Setup new machine at test stand in place of old L2 PC and provide performance benchmark

BACKUP SLIDES

Cluster Electron Trigger Test stand at CDF GPU SLINK Merger SVT TX SVT Rx Slink to PCI mem CPU PCI to Slink SLINK

Flow Chart of Test Stand Setup S-LINK Tx AUX Card S-LINK Rx FILAR SOLAR GPU Memory CPU