ICE-DIP Project: Research on data transport for manycore processors for next generation DAQs Aram Santogidis › 5/12/2014.

Slides:

Advertisements

Similar presentations

HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.

Advertisements

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.

ALFA: The new ALICE-FAIR software framework

Introduction to Systems Architecture Kieran Mathieson.

Eyal de Lara Department of Computer Science University of Toronto.

Distributed Systems Lecture #2 URL:

Status and roadmap of the AlFa Framework Mohammad Al-Turany GSI-IT/CERN-PH-AIP.

Hardware/Software Concepts Tran, Van Hoai Department of Systems & Networking Faculty of Computer Science & Engineering HCMC University of Technology.

ALFA - a common concurrency framework for ALICE and FAIR experiments

Efficient Hardware dependant Software (HdS) Generation using SW Development Platforms Frédéric ROUSSEAU CASTNESS‘07 Computer Architectures and Software.

New Direction Proposal: An OpenFabrics Framework for high-performance I/O apps OFA TAC, Key drivers: Sean Hefty, Paul Grun.

Flexible data transport for the online reconstruction of FAIR experiments Mohammad Al-Turany Dennis Klein Alexey Rybalchenko (GSI-IT) 5/17/13M. Al-Turany,

IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.

ICE-DIP project Parallel processing on Many-Core processors ICE-DIP introduction at Intel › 22/7/2014.

1/29/2002 CS Distributed Systems 1 Infiniband Architecture Aniruddha Bohra.

Many-SC Programming Model Jaejin Lee Center for Manycore Programming Seoul National University

High performance I/O with the ZeroMQ (ØMQ) messaging library thematic CERN School of Computing Aram Santogidis › May 2015.

Emalayan Vairavanathan

A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.

Socket Swapping for efficient distributed communication between migrating processes MS Final Defense Praveen Ramanan 12 th Dec 2002.

Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.

© 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect.

The NE010 iWARP Adapter Gary Montry Senior Scientist

GBT Interface Card for a Linux Computer Carson Teale 1.

1 Next Few Classes Networking basics Protection & Security.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.

ATCA based LLRF system design review DESY Control servers for ATCA based LLRF system Piotr Pucyk - DESY, Warsaw University of Technology Jaroslaw.

Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.

Heterogeneous Multikernel OS Yauhen Klimiankou BSUIR

ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.

Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.

ICE-DIP Mid-term review Data Transfer WP4a - ESR4: Aram Santogidis › 16/1/2015.

Guangdeng Liao, Xia Zhu, Steen Larsen, Laxmi Bhuyan, Ram Huggahalli University of California, Riverside Intel Labs.

Frank Lemke DPG Frühjahrstagung 2010 Time synchronization and measurements of a hierarchical DAQ network DPG Conference Bonn 2010 Session: HK 70.3 University.

ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.

Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

Performance measurement with ZeroMQ and FairMQ

Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.

Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.

CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.

KIP Ivan Kisel JINR-GSI meeting Nov 2003 High-Rate Level-1 Trigger Design Proposal for the CBM Experiment Ivan Kisel for Kirchhoff Institute of.

CMS week, June 2002, CERN 1 First P2P Measurements on Infiniband Luciano Berti INFN Laboratori Nazionali di Legnaro.

Efficiency of small size tasks calculation in grid clusters using parallel processing.. Olgerts Belmanis Jānis Kūliņš RTU ETF Riga Technical University.

Modeling PANDA TDAQ system Jacek Otwinowski Krzysztof Korcyl Radoslaw Trebacz Jagiellonian University - Krakow.

Interconnection network network interface and a case study.

Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols.

Mr. P. K. GuptaSandeep Gupta Roopak Agarwal

Intro to Distributed Systems and Networks Hank Levy.

ALFA - a common concurrency framework for ALICE and FAIR experiments Mohammad Al-Turany GSI-IT/CERN-PH.

ICHEC Presentation ESR2: Reconfigurable Computing and FPGAs ICE-DIP Srikanth Sridharan 9/2/2015.

IPbus A method to communicate with cards over Ethernet 1.

CWG13: Ideas and discussion about the online part of the prototype P. Hristov, 11/04/2014.

Data Plane Computing System CERN Openlab Technical Workshop 5-6th November 2015 Lazaros Lazaridis › 05/11/2015.

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University

Monitoring for the ALICE O 2 Project 11 February 2016.

Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.

Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.

Advanced Rendering Technology The AR250 A New Architecture for Ray Traced Rendering.

Mitglied der Helmholtz-Gemeinschaft FairMQ with FPGAs and GPUs Simone Esch –

Flexible data transport for online reconstruction M. Al-Turany Dennis Klein A. Rybalchenko 12/05/12 M. Al-Turany, Panda Collaboration Meeting, Goa 1.

Flexible data transport for the online reconstruction in FairRoot Mohammad Al-Turany Dennis Klein Anar Manafov Alexey Rybalchenko 6/25/13M. Al-Turany,

ALFA - a common concurrency framework for ALICE and FAIR experiments Mohammad Al-Turany GSI-IT/CERN-PH.

APE group Many-core platforms and HEP experiments computing XVII SuperB Workshop and Kick-off Meeting Elba, May 29-June 1,

Enrico Gamberini, Giovanna Lehmann Miotto, Roland Sipos

OCP: High Performance Computing Project

Data transfer on manycore processors for high throughput applications

Presentation transcript:

ICE-DIP Project: Research on data transport for manycore processors for next generation DAQs Aram Santogidis › 5/12/2014

Aram Santogidis – ICE-DIP Project 2

The points of my research Data Transfer › Move data fast to manycore processors › Develop easy to use intra/inter-machine communication interface › Study the trade-offs of related communication patterns and mechanisms › Conduct this research in the context of next generation DAQs 3

Distributed application for online processing 4 Storage Detector Processing Nodes CPU RAM OS APP Online Processing Cluster Network

Heterogeneous computing with manycore co-processors 5 CPU RAM OS APP Network CPU RAM OS APP OSOS OSOS

We need high efficient transport 6 CPU RAM OS APP CPU RAM OS APP OSOS OSOS RDMA IMC (intra-machine) Network

Data transport of ALFA/FairROOT and Xeon Phi 7 FairMQ ALFA / FairROOT O2O2 Xeon Phi

Boosted NanoMSG’s performance of Intra-machine communication 8 Host Co-Processor P w w w w w w w w w C PCIe NN_PIPELINE (Push/Pull) TCP (400MB/s) TCP INPROC IPC SCIF(5.7 GB/s) SCIF

NanoMSG extended with SCIF support 9 PAIR REQ/ REP PIPELINE BUS PUB/ SUB SURVEY API TCP IPC INPROC SCIF ?? SCALABILITY PROTOCOLS TRANSPORT PROTOCOLS C Runtime + OS APPLICATIONS

Future plans › Focus more on ZeroMQ › Develop a message-passing to RDMA mapping protocol › Study ALFA and O 2 › Experiment on infiniband hardware › Work on getting early access on KNL and OmniPath. 10