A performance analysis of multicore computer architectures Michel Schelske.

Slides:



Advertisements
Similar presentations
Scheduling Algorithms for Unpredictably Heterogeneous CMP Architectures J. Winter and D. Albonesi, Cornell University International Conference on Dependable.
Advertisements

MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
1 Undergraduate Curriculum Revision Department of Computer Science February 10, 2010.
Algebra Recap Solve the following equations (i) 3x + 7 = x (ii) 3x + 1 = 5x – 13 (iii) 3(5x – 2) = 4(3x + 6) (iv) 3(2x + 1) = 2x + 11 (v) 2(x + 2)
Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?
An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem Juyeon Kim, Deokjin Joo, Taehan Kim DAC’13.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.
I.1 ii.2 iii.3 iv.4 1+1=. i.1 ii.2 iii.3 iv.4 1+1=
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Central Processing Unit IS – 221 PC Admin 2/26/2007 by Len Krygsman IV.
Embedded Computing From Theory to Practice November 2008 USTC Suzhou.
I.1 ii.2 iii.3 iv.4 1+1=. i.1 ii.2 iii.3 iv.4 1+1=
Parallel Computing in SAS. Genetic Algorithms Application Alejandro Correa, Banco Colpatria Andrés González, Banco Colpatria Darwin Amézquita, Banco Colpatria.
Tile Size Selection for Low-Power Tile-based Architectures Michael Brown.
INTEL CONFIDENTIAL Why Parallel? Why Now? Introduction to Parallel Programming – Part 1.
Presenter: Alexander Kaganov Supervisor: Paul Chow
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Computer Performance Computer Engineering Department.
Multi Core Processor Submitted by: Lizolen Pradhan
Orchestration by Approximation Mapping Stream Programs onto Multicore Architectures S. M. Farhad (University of Sydney) Joint work with Yousun Ko Bernd.
Advisor: Dr. Aamir Shafi Co-Advisor: Mr. Ali Sajjad Member: Dr. Hafiz Farooq Member: Mr. Tahir Azim Optimizing N-body Simulations for Multi-core Compute.
Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal
1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
HPEC 2004 Sparse Linear Solver for Power System Analysis using FPGA Jeremy Johnson, Prawat Nagvajara, Chika Nwankpa Drexel University.
Operation Frequency No. of Clock cycles ALU ops % 1 Loads 25% 2
INTRODUCTION Shimmi Asokan. Roman Abacus Pascaline.
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
The Truth About Parallel Computing: Fantasy versus Reality William M. Jones, PhD Computer Science Department Coastal Carolina University.
"Distributed Computing and Grid-technologies in Science and Education " PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS Klimov Georgy Dubna, 2012.
Central processing unit
Microprocessors BY Sandy G.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous.
Orchestration by Approximation Mapping Stream Programs onto Multicore Architectures S. M. Farhad (University of Sydney) Joint work with Yousun Ko Bernd.
Ning WengANCS 2005 Design Considerations for Network Processors Operating Systems Tilman Wolf 1, Ning Weng 2 and Chia-Hui Tai 1 1 University of Massachusetts.
Speedup for Multi-Level Parallel Computing School of Computer Engineering Nanyang Technological University 21 st May 2012 Shanjiang Tang, Bu-Sung Lee,
Minimizing Delay in Shared Pipelines Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) Yoram Revah, Aviran Kadosh.
HPEC 2004 Sparse Linear Solver for Power System Analysis using FPGA Jeremy Johnson, Prawat Nagvajara, Chika Nwankpa Drexel University.
Effective Parallel Multicore-optimized K-mers Counting Algorithm
Lesson Two 1) Set of even numbers: Fifth Primary The Subsets of N Any even number: 0, 2, 4, 6, 8, ……………….. It is usually represented by letter E Book.
GPU Accelerated Vessel Segmentation Using Laplacian Eigenmaps Lin Cheng, Hyunsu Cho and Peter A. Yoon Trinity College.
Analyzing Memory Access Intensity in Parallel Programs on Multicore Lixia Liu, Zhiyuan Li, Ahmed Sameh Department of Computer Science, Purdue University,
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Which of these is 52 written as a product of its prime factors? a) 2 x 26b) 2 x 2 x 13 c) 4 x 13d) 1 x 52.
Distributed and Parallel Processing George Wells.
ΟΡΓΑΝΩΣΗ ΚΑΙ ΔΙΟΙΚΗΣΗ ΕΠΙΧΕΙΡΗΣΕΩΝ 3 Ο ΜΑΘΗΜΑ. ΟΙ ΜΕΓΑΛΕΣ ΕΠΙΧΕΙΡΗΣΕΙΣ Η δημιουργία μεγάλων επιχειρήσεων είναι ένα από τα χαρακτηριστικά του 20 ου αιώνα.
Measuring Performance II and Logic Design
Multiple Processor Systems
Community Grids Laboratory
Time and Depth Imaging Algorithms in a Hardware Accelerator Paradigm
R. Rastogi, A. Srivastava , K. Sirasala , H. Chavhan , K. Khonde
Ioannis E. Venetis Department of Computer Engineering and Informatics
COMPUTATIONAL MODELS.
IEEE NPSS Real Time Conference 2009
First order non linear pde’s
“Temperature-Aware Task Scheduling for Multicore Processors”
Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism Topic 11 Amazon Web Services Prof. Zhang Gang
MKT 711 Enthusiastic Study/snaptutorial.com
Cube root – Prime Factorization
دانشگاه شهیدرجایی تهران
تعهدات مشتری در کنوانسیون بیع بین المللی
Rational Exponents and Nth Roots Objectives:
ОПЕРАТИВНА ПРОГРАМА “ИНОВАЦИИ И КОНКУРЕНТОСПОСОБНОСТ“ „Подобряване на производствения капацитет в МСП“
Many-Core Graph Workload Analysis
Solving Equations 3x+7 –7 13 –7 =.
Run time performance for all benchmarked software.
Presentation transcript:

A performance analysis of multicore computer architectures Michel Schelske

2 Parallel Algorithms for Multicore Benchmarking10. Apr Structure 1.Observations & Theory 2.Problems 3.Solution

3 Parallel Algorithms for Multicore Benchmarking10. Apr Observation I clock rate cores performance 3-4 GHz cores frequency performance Multi performance

4 Parallel Algorithms for Multicore Benchmarking10. Apr Observation II programthread partitioning granularity

5 Parallel Algorithms for Multicore Benchmarking10. Apr Theory Optimum  depends on hardware and the problem to be solved granularity performance coarse-grainfine-grain

6 Parallel Algorithms for Multicore Benchmarking10. Apr Example

7 Parallel Algorithms for Multicore Benchmarking10. Apr Observation III Optimum  depends on hardware and the problem to be solved coarse-grain granularity performance fine-grain

8 Parallel Algorithms for Multicore Benchmarking10. Apr Observation IV Optimum  depends on hardware and the problem to be solved granularity performance coarse-grainfine-grain

9 Parallel Algorithms for Multicore Benchmarking10. Apr The problems Granularity is only one performance parameter. Find the optimal parallelization parameters with respect to – the algorithm – the computer architecture

10 Parallel Algorithms for Multicore Benchmarking10. Apr Our Solution hardware core operating system Application Profiler Benchmark

11 Parallel Algorithms for Multicore Benchmarking10. Apr Thank you for your attention

12 Parallel Algorithms for Multicore Benchmarking10. Apr Result calculation of prime numbers on a computer with two Intel Xeon Singlecore CPUs