PowerPC / Intel Benchmark

Slides:



Advertisements
Similar presentations
Chapter 10 Input/Output Organization. Connections between a CPU and an I/O device Types of bus (Figure 10.1) –Address bus –Data bus –Control bus.
Advertisements

12 GeV Trigger Workshop Session II - DAQ System July 8th, 2009 – Christopher Newport Univ. David Abbott.
Nooks: an architecture for safe device drivers Mike Swift, The Wild and Crazy Guy, Hank Levy and Susan Eggers.
Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented by Reinette Grobler.
Lecture 3: Computer Performance
Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.
1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Computer Architecture Lecture10: Input/output devices Piotr Bilski.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
History of Microprocessor MPIntroductionData BusAddress Bus
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
Fast Fault Finder A Machine Protection Component.
Morgan Kaufmann Publishers
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
AS Computing Hardware. Buffers and Interrupts A buffer is an area of memory used for holding data during input/output transfers to and from disk.
Migration to PPC at JLab Richard Dickson. VME data interface differences: CPU VMEchip2 VME mvme177 System Local Bus PPC 750 CPU Raven VME mvme2700.
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
4. Operations and Performance M. Lonza, D. Bulfone, V. Forchi’, G. Gaio, L. Pivetta, Sincrotrone Trieste, Trieste, Italy A Fast Orbit Feedback for the.
Javier Argomedo (ESO/DoE/CSE) - Instrument Control Systems 2014 E-ELT M1 Local Control System Network and LCU Prototyping Motivation Requirements Design.
Input / Output Chapter 9.
CPU Central Processing Unit
Chapter 6 System Integration and Performance
Chapter 13: I/O Systems.
DIRECT MEMORY ACCESS and Computer Buses
Lynn Choi School of Electrical Engineering
Module 12: I/O Systems I/O hardware Application I/O Interface
Computer Architecture & Operations I
Microcontrollers & GPIO
Operating System.
Diskpool and cloud storage benchmarks used in IT-DSS
HyperTransport™ Technology I/O Link
Computer Architecture & Operations I
The deadline establish a priority among interrupt requests.
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
Embedded Systems Design
The Software Framework available at the ATLAS ROD Crate
Morgan Kaufmann Publishers
Architecture & Organization 1
CS 286 Computer Organization and Architecture
Chapter III Desktop Imaging Systems & Issues
CPU Central Processing Unit
CSCI 315 Operating Systems Design
Architecture & Organization 1
CPU Central Processing Unit
Computer Architecture
I/O Systems I/O Hardware Application I/O Interface
Energy Efficient Scheduling in IoT Networks
Operating System Concepts
13: I/O Systems I/O hardwared Application I/O Interface
Bus-Based Computer Systems
Overview of Computer Architecture and Organization
Types of Computers Mainframe/Server
CSE 451: Operating Systems Autumn 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 596 Allen Center 1.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Wireless Embedded Systems
CSE 451: Operating Systems Winter 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 412 Sieg Hall 1.
EPICS: Experimental Physics and Industrial Control System
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Contact Information Office: 225 Neville Hall Office Hours: Monday and Wednesday 12:00-1:00 and by appointment. Phone:
Chapter 13: I/O Systems.
Module 12: I/O Systems I/O hardwared Application I/O Interface
Cluster Computers.
Efficient Migration of Large-memory VMs Using Private Virtual Memory
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

PowerPC / Intel Benchmark Reflective Memory Recorder Upgrade: an opportunity to benchmark PowerPC and Intel architectures for real time. Roberto Abutera, Helmut Tischera, Robert Frahma aEuropean Southern Observatory, 07/11/2018 PowerPC / Intel Benchmark

PowerPC / Intel Benchmark History & Status mvme167 SBC based on 68k MPC68040 25 MHz (1979) mvme2600 SBC based on MPC604 200 MHz (1994) mvme2700 SBC based on the MPC750 366 MHz (1997) mvme6100 SBC based on the MPC7457 1.3 GHz ( 2001) Used on ~300 computers Small fraction hard real time requirements, rest pseudo-real time 07/11/2018 PowerPC / Intel Benchmark

Why do we need new/more powerful CPUs In fast control loops : Reduce latency & pure computational delays => dramatically improve control system bandwidth Simplify software architecture ( less asynchronous, less distribution, less communication requirements, etc ) And in general: Confront imminent obsolescence. Improve heat dissipation per unit of processing (6100 .aka. VME toaster) Improve systems response time Support , newer, more demanding applications ( wavefront control, huge detectors arrays, interferometry, etc) 07/11/2018 PowerPC / Intel Benchmark

Why do we need new more powerful CPUs Dramatic effect of computational delay in control loops. 07/11/2018 PowerPC / Intel Benchmark

PowerPC / Intel Benchmark Benchmark Hardware PowerPC MVME6100, MPC7457 PowerPC, 1.267 GHz, VME, 0.5 GB RAM Gigabit Ethernet. GE Reflective Memory PMC-5565 Cost : ~6000 Euros Intel MultiCore: Intel Core i7 E5-1410 0, 2.8 GHz, 4 Cores, PCIExpress,12 GB RAM Cost : ~1400 Euros 07/11/2018 PowerPC / Intel Benchmark

Benchmark Application VLTI Reflective Memory Network Recorder Based on a VxWorks real time operating system. Based on the standard VLT instrument software framework. Sequence of DMA read access to sparse pieces of reflective memory. Streaming collected data up to 0.5 Gigabit/second. Tasks : trfmCtrl : DMA read Reflective Memory into the local memory. trfmXfer : Transfer data from memory to remote system via TCP/IP. tNet0 : system network stack task. trfmMon: background task running a 1 Hz. 07/11/2018 PowerPC / Intel Benchmark

PowerPC / Intel Benchmark 07/11/2018 PowerPC / Intel Benchmark

RMN recorder configurations Fragmented Configuration 10 KHz leading to a transfer rate of 68.3 Mbit/sec. 07/11/2018 PowerPC / Intel Benchmark

RMN recorder configurations Big Data 10 KHz leading to a transfer rate of 290 Mbit/sec. 07/11/2018 PowerPC / Intel Benchmark

PowerPC / Intel Benchmark Cycles to benchmark Read-in cycle ( Control task) Awake by interrupt of system clock Sequential DMA accesses. Few µsec available ( cycle frequency ) Can’t be delayed beyond µsec budget without losing samples. Higher priority Transfer cycle ( Transfer + Network tasks) : Awake when read-in cycles buffers are filled ( .i.e. 3 seconds) TCP/IP socket data transfer Can be delayed within 3 seconds as long as the full buffer is transmitted in the available 3 seconds. Lower priority 07/11/2018 PowerPC / Intel Benchmark

Read-in cycle - PowerPC - 10KHz - Fragmented 07/11/2018 PowerPC / Intel Benchmark

Read-in cycle - Intel (1 core) 10KHz - Fragmented 07/11/2018 PowerPC / Intel Benchmark

Transfer cycle – PowerPC - 10KHz - Fragmented 07/11/2018 PowerPC / Intel Benchmark

Transfer cycle – Intel (1 core) - 10KHz - Fragmented 07/11/2018 PowerPC / Intel Benchmark

Read-in cycle – PowerPC - 16KHz - Fragmented Read-in cycle ( Control task) unable to execute in 62 µsec leads to : Loose of data or Crash of the computer. Possibilities to overcome this : Redesign the map of the reflective memory to reduce fragmentation => high development and integration costs .i.e. affects many systems. Reprogram the reflective memory driver to have chained DMA access=>high development and complex maintenance 07/11/2018 PowerPC / Intel Benchmark

Read in cycle – Intel (1 core) - 16KHz – Big Data 07/11/2018 PowerPC / Intel Benchmark

Read in + Transfer cycle – Intel MultiCore - 16KHz – Big Data 07/11/2018 PowerPC / Intel Benchmark

PowerPC / Intel Benchmark % Idle Time 16 KHz on PowerPC not possible. MultiCore , on both configurations and frequencies remains mainly IDLE. 07/11/2018 PowerPC / Intel Benchmark

PowerPC / Intel Benchmark Conclusions RMN recorder range of operation can be extended to at least 16 KHz by moving to Intel MultiCore. As expected, VxWorks isolates application software from hardware architecture. Low migration cost. Performance scales with available CPU power and I/O bandwidth. Applications access to VME bus very slow. Cost per CPU power unit is dramatically reduced moving from PowerPC to Intel MultiCore. 07/11/2018 PowerPC / Intel Benchmark