I/O bound Jobs Multiple processes accessing the same disc leads to competition for the position of the read head. A multi -threaded process can stream.

Slides:



Advertisements
Similar presentations
Concurrent programming: From theory to practice Concurrent Algorithms 2014 Vasileios Trigonakis Georgios Chatzopoulos.
Advertisements

ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Pthreads & Concurrency. Acknowledgements  The material in this tutorial is based in part on: POSIX Threads Programming, by Blaise Barney.
CHAPTER 5 THREADS & MULTITHREADING 1. Single and Multithreaded Processes 2.
Chapter 4: Threads. Overview Multithreading Models Threading Issues Pthreads Windows XP Threads.
Problem 17: “Greedy Shapers” T3 T4 CPU2 T1 T2 CPU1 S Stream S1: P = 100 J = 1000 D = 1 Stream S2: P = 10 J = 10 D = 1 Stream S3: P = 10 J = 0 D = 0 Stream.
/ 6.338: Parallel Computing Project FinalReport Parallelization of Matrix Multiply: A Look At How Differing Algorithmic Approaches and CPU Hardware.
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, /3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab.
Parallelism Processing more than one instruction at a time. Pipelining
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
1 Multi-threaded Performance Pitfalls Ciaran McHale.
Roger Jones, Lancaster University1 Experiment Requirements from Evolving Architectures RWL Jones, Lancaster University Ambleside 26 August 2010.
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
The University of Adelaide, School of Computer Science
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 2 Processes and Threads Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Performance of mathematical software Agner Fog Technical University of Denmark
Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.
Copyright © 1997 – 2014 Curt Hill Concurrent Execution of Programs An Overview.
Hyper Threading Technology. Introduction Hyper-threading is a technology developed by Intel Corporation for it’s Xeon processors with a 533 MHz system.
Harnessing Multicore Processors for High Speed Secure Transfer Raj Kettimuthu Argonne National Laboratory.
Chapter 2 Processes and Threads Introduction 2.2 Processes A Process is the execution of a Program More specifically… – A process is a program.
Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.
Hyper-Threading Technology Architecture and Microarchitecture
Processor Architecture
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
Copyright © Curt Hill Parallelism in Processors Several Approaches.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
The JANA Reconstruction Framework David Lawrence - JLab May 25, /25/101JANA - Lawrence - CLAS12 Software Workshop.
Chap 4: Processors Mainly manufactured by Intel and AMD Important features of Processors: Processor Speed (900MHz, 3.2 GHz) Multiprocessing Capabilities.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
Background Computer System Architectures Computer System Software.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
October 19, 2010 David Lawrence JLab Oct. 19, 20101RootSpy -- CHEP10, Taipei -- David Lawrence, JLab Parallel Session 18: Software Engineering, Data Stores,
1 EMC CONFIDENTIAL—INTERNAL USE ONLY SMI-S Performance Overview and Status 8/28/2014.
Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.
Parallelizing Atlas Reconstruction and Simulation: Issues and Optimization Solutions for Scaling on Multi- and Many-CPU Platforms Charles Leggett 1 Sebastien.
Processor Level Parallelism 1
MAHARANA PRATAP COLLEGE OF TECHNOLOGY SEMINAR ON- COMPUTER PROCESSOR SUBJECT CODE: CS-307 Branch-CSE Sem- 3 rd SUBMITTED TO SUBMITTED BY.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Paul Cockshott Glasgow
Chapter 4 – Thread Concepts
Threads Some of these slides were originally made by Dr. Roger deBry. They include text, figures, and information from this class’s textbook, Operating.
Using the VTune Analyzer on Multithreaded Applications
Getting the Most out of Scientific Computing Resources
NFV Compute Acceleration APIs and Evaluation
Community Grids Laboratory
Getting the Most out of Scientific Computing Resources
Processes and Threads Processes and their scheduling
Kilohertz Decision Making on Petabytes
Chapter 4 – Thread Concepts
Async or Parallel? No they aren’t the same thing!
Assembly Language for Intel-Based Computers, 5th Edition
Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism Topic 11 Amazon Web Services Prof. Zhang Gang
CLARA Based Application Vertical Elasticity
Hyperthreading Technology
MODERN OPERATING SYSTEMS Third Edition ANDREW S
Chapter 4: Threads.
Threads Chapter 4.
CS510 - Portland State University
Objectives Describe how common characteristics of CPUs affect their performance: clock speed, cache size, number of cores Explain the purpose and give.
CS 286 Computer Organization and Architecture
Operating System Overview
Run time performance for all benchmarked software.
Presentation transcript:

I/O bound Jobs Multiple processes accessing the same disc leads to competition for the position of the read head. A multi -threaded process can stream events in sequence, dispatching them to individual threads resulting in faster event processing rates for I/O bound jobs as shown in these test results. I/O bound Jobs Multiple processes accessing the same disc leads to competition for the position of the read head. A multi -threaded process can stream events in sequence, dispatching them to individual threads resulting in faster event processing rates for I/O bound jobs as shown in these test results. CPU bound Jobs CPU bound jobs benefit from the parallelism achieved through multi-threading. This benefit can be lost if the framework requires frequent mutex (un)locking. These plots show event processing rates for CPU bound jobs. The linear shape indicates proper scaling with the number of threads which implies virtually no overhead is imposed by the framework which is designed to minimize mutex (un)locking. Hyperthreading These plots also show how “hyperthreading” in Intel CPUs can improve performance for CPU bound jobs CPU bound Jobs CPU bound jobs benefit from the parallelism achieved through multi-threading. This benefit can be lost if the framework requires frequent mutex (un)locking. These plots show event processing rates for CPU bound jobs. The linear shape indicates proper scaling with the number of threads which implies virtually no overhead is imposed by the framework which is designed to minimize mutex (un)locking. Hyperthreading These plots also show how “hyperthreading” in Intel CPUs can improve performance for CPU bound jobs Alternate Factory Model Reconstruction code is built into factory classes as callback methods. Inputs come from other factories. In JANA’s model, ownership of the objects stays with the factory. Only const pointers are passed, ensuring data integrity. Threading Model A JANA application consists of a single JApplication object and multiple JEventLoop objects (one for each thread). Each thread has its own complete set of factory objects that together, can completely process an event. JANA uses POSIX pthreads. Alternate Factory Model Reconstruction code is built into factory classes as callback methods. Inputs come from other factories. In JANA’s model, ownership of the objects stays with the factory. Only const pointers are passed, ensuring data integrity. Threading Model A JANA application consists of a single JApplication object and multiple JEventLoop objects (one for each thread). Each thread has its own complete set of factory objects that together, can completely process an event. JANA uses POSIX pthreads. Notice: Authored by Jefferson Science Associates, LLC under U.S. DOE Contract No. DE-AC05-06OR The U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce this manuscript for U.S. Government purposes. JANA JANA is a multithreaded event reconstruction framework written in C++. It is designed to utilize all of the available cores of a CPU while processing high volume HENP data. JANA JANA is a multithreaded event reconstruction framework written in C++. It is designed to utilize all of the available cores of a CPU while processing high volume HENP data. Data On Demand JANA’s modified factory model produces data only on demand. This avoids wasting CPU cycles on reconstruction that is not needed for that particular event. Data On Demand JANA’s modified factory model produces data only on demand. This avoids wasting CPU cycles on reconstruction that is not needed for that particular event. Intel Xeon (5560) 2.8GHz Dual Processors 8 cores/processor + hyperthreading For this test, each hyperthread gave the equivalent of 15% of a full core AMD Opteron (2352) 2.1GHz Dual Processors 4 cores/processor with no hyperthreading Without hyperthreading, a modest improvement (2% of a core) is observed when processing threads are over-subscribed Intel Xeon (circa 2004) 2.8GHz Dual Processors with 1 core/processor + hyperthreading An older machine shows hyperthreads gaining only about 8% of a core. CHEP 2009 Prague Website :