Latency Measurement Testing

Slides:



Advertisements
Similar presentations
Communication-Avoiding Algorithms Jim Demmel EECS & Math Departments UC Berkeley.
Advertisements

Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
A many-core GPU architecture.. Price, performance, and evolution.
Students:Gilad Goldman Lior Kamran Supervisor:Mony Orbach Network Sniffer.
Name: Kaiyong Zhao Supervisor: Dr. X. -W Chu. Background & Related Work Multiple-Precision Integer GPU Computing & CUDA Multiple-Precision Arithmetic.
History Copyright2 Research History 10 years 1989 FASTCHART Single Processor FASTHARD Single Processor 1996.
Timm M. Steinbeck - Kirchhoff Institute of Physics - University Heidelberg 1 Timm M. Steinbeck HLT Data Transport Framework.
A Flexible Architecture for Simulation and Testing (FAST) Multiprocessor Systems John D. Davis, Lance Hammond, Kunle Olukotun Computer Systems Lab Stanford.
/ 6.338: Parallel Computing Project FinalReport Parallelization of Matrix Multiply: A Look At How Differing Algorithmic Approaches and CPU Hardware.
Time measurement of network data transfer R. Fantechi, G. Lamanna 25/5/2011.
Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.
Adam Meyer, Michael Beck, Christopher Koch, and Patrick Gerber.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Information and Communication Technology Fundamentals Credits Hours: 2+1 Instructor: Ayesha Bint Saleem.
 Happy December!  Sponge: Go to your wiki!  Today’s Lesson Target  Basic Engineering Notation  How to describe a CPU.  How to install a CPU.
KONOE, a toolkit for an object- oriented online environment, with Gate Package M.Abe,Y.Nagasaka,F.Fujiwara, T.Tamura,I.Nakano,H.Sakamoto, Y.Sakamoto,S.Enomoto,
Sobolev Showcase Computational Mathematics and Imaging Lab.
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
“ PC  PC Latency measurements” G.Lamanna, R.Fantechi & J.Kroon (CERN) TDAQ WG –
LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R.
JPCM - JDC121 JPCM. Agenda JPCM - JDC122 3 Software performance is Better Performance tuning requires accurate Measurements. JPCM - JDC124 Software.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
HW/SW Co-design Lecture 2: Lab Environment Setup Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE Dept, NTHU.
Yang Yu, Tianyang Lei, Haibo Chen, Binyu Zang Fudan University, China Shanghai Jiao Tong University, China Institute of Parallel and Distributed Systems.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
A Look Inside The Processor
GFlow: Towards GPU-based High- Performance Table Matching in OpenFlow Switches Author : Kun Qiu, Zhe Chen, Yang Chen, Jin Zhao, Xin Wang Publisher : Information.
FroNtier Stress Tests at Tier-0 Status report Luis Ramos LCG3D Workshop – September 13, 2006.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,
L1/HLT trigger farm Bologna setup 0 By Gianluca Peco INFN Bologna Genève,
Cross-Architecture Performance Prediction (XAPP): Using CPU to predict GPU Performance Newsha Ardalani Clint Lestourgeon Karthikeyan Sankaralingam Xiaojin.
Introduction to Microprocessors
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
Hardware Architecture
Chapter 3 Getting Started. Copyright © 2005 Pearson Addison-Wesley. All rights reserved. Objectives To give an overview of the structure of a contemporary.
Sobolev(+Node 6, 7) Showcase +K20m GPU Accelerator.
Information Technology (IT). Information Technology – technology used to create, store, exchange, and use information in its various forms (business data,
Status from Martin Josefsson Hardware level performance of Status from Martin Josefsson Hardware level performance of by Jesper Dangaard Brouer.
Open-source routing at 10Gb/s Olof Hagsand (KTH) Robert Olsson (Uppsala U) Bengt Görden (KTH) SNCNW May 2009 Project grants: Internetstiftelsen (IIS) Equipment:
CPU Central Processing Unit
M. Bellato INFN Padova and U. Marconi INFN Bologna
NFV Compute Acceleration APIs and Evaluation
LHCb and InfiniBand on FPGA
CALICE TDAQ Application Network Protocols 10 Gigabit Lab
Community Grids Laboratory
Lynn Choi School of Electrical Engineering
FPGAs for next gen DAQ and Computing systems at CERN
Lab 1: Using NIOS II processor for code execution on FPGA
Processor support devices Part 2: Caches and the MESI protocol
High-performance tracing of many-core systems with LTTng
Hot Processors Of Today
Status of the Merlin Readout System
HISTORY OF MICROPROCESSORS
Open Source 10g Talk at KTH/Kista
HISTORY OF MICROPROCESSORS
CPU Central Processing Unit
CSCI 315 Operating Systems Design
Presented by: Isaac Martin
Multicultural Social Community Development Institute ( MSCDI)
CPU Central Processing Unit
I/O Systems I/O Hardware Application I/O Interface
Cloud Web Filtering Platform
Hardware Accelerated Video Decoding in
Welcome to the FPGA Tools Course Agenda
Graphics Processing Unit
CSE 502: Computer Architecture
Run time performance for all benchmarked software.
Presentation transcript:

Latency Measurement Testing John Kroon, G. Lamanna, R. Fantechi 12/07/2011

Outline Lab Setup Hardware Tools Measurements Looking Forward

Hardware Adapter for Parallel port PCATE is a Pentium 4 2.4GHz Cache: L1 is 8 kB L2 is 512 kB (no L3) GPU1 is a 2*4 core Xeon E5630 2.53GHz (16 processors) Cache: L1 is 256 kB L2 is 1024 kB L3 is 12288 kB Direct Ethernet connection on hidden network Each PC has a Parallel port I/F used for generating timing pulses Lecroy Scope for: Time measurements Histograms Save screenshots PCATE GPU1 Adapter for Parallel port

Hardware LKRPN0 is a Intel Xeon 2.0GHz (2 processors) Cache L1: 4096 kB L2: 64 kB LKRPN0

Test Structure GPU1 PCATE LKRPN0

Latency GPU1PCATE

Opposite Way (PCATEGPU1)

Busy Script Running on Same CPU PCATEGPU1

Selected Points 50 microsec pulse Package Size (Bytes) CPU Not Busy CPU Busy ∆t=Busy – Not Busy 300 60.6 58.8 -1.8 700 66.7 64.9 1100 71.3 69.2 -2.1 30 microsec pulse Package Size (Bytes) CPU Not Busy CPU Busy ∆t=Busy – Not Busy 300 50.8 49.9 -0.9 700 61.5 59.3 -2.2 1100 70.1 69.0 -1.1

Latency GPULKRPN0

Opposite Way (LKRPN0GPU) Interesting and Unexpected!

Latency PCATELKRPN0

Opposite Way (LKRPN0PCATE)

Future Testing We will investigate the latency after changing various Kernel settings using “sysctl” and/or “insmod” Swappiness? Further tests on latencyCache is important? Test latency over different protocols (TCP,etc.) Test with TELL1 (FPGA)