M. Bellato INFN Padova and U. Marconi INFN Bologna

Slides:



Advertisements
Similar presentations
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Advertisements

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
A Gigabit Ethernet Link Source Card Robert E. Blair, John W. Dawson, Gary Drake, David J. Francis*, William N. Haberichter, James L. Schlereth Argonne.
OS Case Study: The Xbox 360  Instructor: Rob Nash  Readings: See citations in the slides.
Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.
Processor history / DX/SX SX/DX Pentium 1997 Pentium MMX
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
PCIe based readout U. Marconi, INFN Bologna CERN, May 2013.
Silicon Building Blocks for Blade Server Designs accelerate your Innovation.
DLS Digital Controller Tony Dobbing Head of Power Supplies Group.
LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R.
U N C L A S S I F I E D FVTX Detector Readout Concept S. Butsyk For LANL P-25 group.
1 Abstract & Main Goal המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory The focus of this project was the creation of an analyzing device.
CPT Week, April 2001Darin Acosta1 Status of the Next Generation CSC Track-Finder D.Acosta University of Florida.
Network Architecture for the LHCb DAQ Upgrade Guoming Liu CERN, Switzerland Upgrade DAQ Miniworkshop May 27, 2013.
Slide ‹Nr.› l © 2015 CommAgility & N.A.T. GmbH l All trademarks and logos are property of their respective holders CommAgility and N.A.T. CERN/HPC workshop.
Latest ideas in DAQ development for LHC B. Gorini - CERN 1.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Revision - 01 Intel Confidential Page 1 Intel HPC Update Norfolk, VA April 2008.
Gravitational N-body Simulation Major Design Goals -Efficiency -Versatility (ability to use different numerical methods) -Scalability Lesser Design Goals.
XLV INTERNATIONAL WINTER MEETING ON NUCLEAR PHYSICS Tiago Pérez II Physikalisches Institut For the PANDA collaboration FPGA Compute node for the PANDA.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
BPM stripline acquisition in CLEX Sébastien Vilalte.
Co-Processor Architectures Fermi vs. Knights Ferry Roger Goff Dell Senior Global CERN/LHC Technologist |
SRB data transmission Vito Palladino CERN 2 June 2014.
Performed By: Tal Goihman & Irit Kaufman Instructor: Mony Orbach Bi-semesterial Spring /04/2011.
PCIe based readout for the LHCb upgrade U. Marconi, INFN Bologna On behalf of the LHCb Online Group (Bologna-CERN-Padova) CHEP2013, Amsterdam, 15 th October.
ROM. ROM functionalities. ROM boards has to provide data format conversion. – Event fragments, from the FE electronics, enter the ROM as serial data stream;
R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet Domenico Galli Università di Bologna and INFN, Sezione di Bologna XII SuperB Project.
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
ROD Activities at Dresden Andreas Glatte, Andreas Meyer, Andy Kielburg-Jeka, Arno Straessner LAr Electronics Upgrade Meeting – LAr Week September 2009.
Hardware Architecture
16 February 2011 Herbert Cornelius Intel. Copyright © 2011 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective.
CHEP 2010, October 2010, Taipei, Taiwan 1 18 th International Conference on Computing in High Energy and Nuclear Physics This research project has.
The ALICE Data-Acquisition Read-out Receiver Card C. Soós et al. (for the ALICE collaboration) LECC September 2004, Boston.
E. Hazen1 MicroTCA for HCAL and CMS Review / Status E. Hazen - Boston University for the CMS Collaboration.
EXtreme Data Workshop Readout Technologies Rob Halsall The Cosener’s House 18 April 2012.
Chapter Overview General Concepts IA-32 Processor Architecture
Lecture 2. A Computer System for Labs
Giovanna Lehmann Miotto CERN EP/DT-DI On behalf of the DAQ team
Manycore processors Sima Dezső October Version 6.2.
A. Aloisio, R. Giordano Univ. of Naples ‘Federico II’
PXD DAQ (PC option) Status Report
Use of FPGA for dataflow Filippo Costa ALICE O2 CERN
NFV Compute Acceleration APIs and Evaluation
LHCb and InfiniBand on FPGA
Bus Interfacing Processor-Memory Bus Backplane Bus I/O Bus
Chapter 6 Input/Output Organization
Chapter 1: Introduction
Electronics Trigger and DAQ CERN meeting summary.
ETD/Online Report D. Breton, U. Marconi, S. Luitz
TELL1 A common data acquisition board for LHCb
Electronics, Trigger and DAQ for SuperB
Linux Operating System Architecture
CMS DAQ Event Builder Based on Gigabit Ethernet
Assembly Language for Intel-Based Computers, 5th Edition
Group Manager – PXI™/VXI Software
PCI BASED READ-OUT RECEIVER CARD IN THE ALICE DAQ SYSTEM
Chapter 1: The 8051 Microcontrollers
Mattan Erez The University of Texas at Austin
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Five Key Computer Components
SVT detector electronics
Command and Data Handling
NVMe.
TELL1 A common data acquisition board for LHCb
CSE 502: Computer Architecture
Chapter 13: I/O Systems.
Fixed Latency Serial Links with FPGA-embedded SerDes for SuperB
Presentation transcript:

M. Bellato INFN Padova and U. Marconi INFN Bologna Status of the ROM M. Bellato INFN Padova and U. Marconi INFN Bologna SuperB meeting - Frascati 4/7 April 2011

SuperB meeting - Frascati 4/7 April 2011 Parameters. The estimated event size at the FE is 500 kB. The maximum trigger rate is 150 kHz. We set the aggregate ROM input rate to 10 Gb/s. Number of FEE 2Gb/s data links : 325 With these assumptions the number of ROMs needed to manage the SuperB data flux can be estimated to be of the order of: (8. × 500. ×103 ) × (150. × 103)/(10. × 109) = 60. boards Each ROM will handle ∼10 kB of fragment size at 150 kHz. SuperB meeting - Frascati 4/7 April 2011

18 June 2009

SuperB meeting - Frascati 4/7 April 2011 ROM implementation. We foresee two possible approaches to the implementation of the ROM. Based on custom electronics and field busses: using FPGA to get data from the FEE, perform “simple” synchronous processing, and format data according to a suitable industrial standard. Based on custom electronics and host PC/CPUs: FPGA being used to get data from the FEE (possibly perform synch. Processing) and inject them into the PC via PCIe. CPUs will perform complex data processing and data transfer, using standard protocol and on board network interface cards. SuperB meeting - Frascati 4/7 April 2011

SuperB meeting - Frascati 4/7 April 2011 Implementation (I) Based on a field bus (VME, xTCA, …) 5 x FEE optical links at 2Gb/s as input 1 x 10Gb/s as output FCTS interface is not an issue FPGA centric design Inputs deserialization Synchronous processing (feature extraction) Fragment formatting according to event builder protocol (TCP/UDP, Converged Ethernet, custom) Output serialization Once design is fixed, changes are difficult Improvements are limited, if not foreseen in advance SuperB meeting - Frascati 4/7 April 2011

Implementation (I) – Block Diagram SuperB meeting - Frascati 4/7 April 2011

SuperB meeting - Frascati 4/7 April 2011 Implementation (II) Solution based on FPGA and PC/CPUs appears to be more versatile than the one based on stand alone custom board. The required functionality implemented by means of high level languages. Transmission protocol to the HLT farm is no more a constraint. The trigger rate of 150 kHz constraints the available processing time per CPU to NCPU / 150. ms NCPU being the number of CPU cores per PC box. A processing time of the order of 1 ms per CPU would require therefore order of 100 CPU per box. Availability of many-cores CPUs …………? see next slides. SuperB meeting - Frascati 4/7 April 2011

Implementation(II) – Block Diagram Power Management SFP RTX FCTS I/F SNAP12 RX Front-Ends FPGA – Virtex 6/7 PCIexpress 4x Dual port RAM Clock Management SuperB meeting - Frascati 4/7 April 2011

+ How to get High Performance and Energy Efficiency for highly Parallel Workloads? high F.P. performance (VPU/SIMD) + many integrated small energy efficient and high-performance cores small extreme energy efficient core The Newest Addition to the Intel Server Family. Industry’s First General Purpose Many Core Architecture

Knights Ferry - Aubrey Isle Processor Memory Controller System & I/O PCIe Interface Function Fixed Multi-Threaded Wide SIMD I$ D$ . . . Shared coherent L2 Cache GDDR GDDR . . GDDR GDDR http://download.intel.com/technology/architecture-silicon/Siggraph_Larrabee_paper.pdf Multiple IA cores 16-wide vector units (512b) 1024-bit ring bus - In-order, short pipeline - Extended instruction set GDDR memory - Multi-thread support Fully coherent caches - Supports virtual memory Standard IA Shared Memory Programming Future options subject to change without notice.

>50 Intel Architecture cores The “Knights” Family Future Knights Products Knights Corner 1st Intel® MIC product 22nm process >50 Intel Architecture cores Knights Ferry Development Platform Future options subject to change without notice.

“Knights Ferry” Development Platform Software Development Platform Growing availability through 2011 Up to 32 cores, up to 1.2 GHz Up to 128 threads at 4 threads / core Up to 8MB shared coherent cache 1-2 GB GDDR5 shared memory PCIe Card Bundled with Intel HPC SW tools Software development platform for Intel® MIC architecture

SuperB meeting - Frascati 4/7 April 2011 ROM PCIe Architecture The interface board (FEE to PCIe) plugged into the PC mother board. PCIe Gen2 x4 lanes has enough bandwidth: (4.×5.×109 bit/s) (150 kHz)-1 = 16.7 KB ( > fragment size ~ 10 KB) Adequate cooling of the interface board may be an issue (but estimated power consumption < 40 W). Mechanical fit is poor FCTS interface should be integrated in the ROM ? SuperB meeting - Frascati 4/7 April 2011

ROM PCIe Architecture (II) Event builder farm may fit the ROMs as plug-in cards Exploit existing hardware and reduce costs Event builder farm may fit also many-cores CPU cards as add-on HLT may be performed on the same farm + filter farm SuperB meeting - Frascati 4/7 April 2011

SuperB meeting - Frascati 4/7 April 2011 Ongoing Tests Proof of concept performed on Xilinx ML605 with Virtex-6 LX240T and PCIe Gen2 4x SuperB meeting - Frascati 4/7 April 2011

SuperB meeting - Frascati 4/7 April 2011 Ongoing tests SuperB meeting - Frascati 4/7 April 2011

SuperB meeting - Frascati 4/7 April 2011 Plans Legacy implementation based on field busses requires investigation on the 10Gb/s output links in terms of protocol implementations on FPGA PCIe based implentation requires Development of a Linux driver to test the performance. Evaluate process latency of getting and transmitting data: useful time for processing. Evaluate Knights Ferry or GP-GPU for feature extraction ? SuperB meeting - Frascati 4/7 April 2011