XEON PHI. TOPICS What are multicore processors? Intel MIC architecture Xeon Phi Programming for Xeon Phi Performance Applications.

Slides:



Advertisements
Similar presentations
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Advertisements

Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Types of Parallel Computers
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
Heterogeneous Computing: New Directions for Efficient and Scalable High-Performance Computing Dr. Jason D. Bakos.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Panda: MapReduce Framework on GPU’s and CPU’s
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
Computer System Architectures Computer System Software
Comp-TIA Standards.  AMD- (Advanced Micro Devices) An American multinational semiconductor company that develops computer processors and related technologies.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
GPU Programming David Monismith Based on notes taken from the Udacity Parallel Programming Course.
Lecture 2 : Introduction to Multicore Computing
Computing Labs CL5 / CL6 Multi-/Many-Core Programming with Intel Xeon Phi Coprocessors Rogério Iope São Paulo State University (UNESP)
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 1 Programming Massively Parallel Processors Lecture Slides for Chapter 1: Introduction.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc
GPU Architecture and Programming
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Hyper Threading Technology. Introduction Hyper-threading is a technology developed by Intel Corporation for it’s Xeon processors with a 533 MHz system.
Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Processor Architecture
Yang Yu, Tianyang Lei, Haibo Chen, Binyu Zang Fudan University, China Shanghai Jiao Tong University, China Institute of Parallel and Distributed Systems.
Innovation for Our Energy Future Opportunities for WRF Model Acceleration John Michalakes Computational Sciences Center NREL Andrew Porter Computational.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
HyperThreading ● Improves processor performance under certain workloads by providing useful work for execution units that would otherwise be idle ● Duplicates.
Lecture 8 : Manycore GPU Programming with CUDA Courtesy : SUNY-Stony Brook Prof. Chowdhury’s course note slides are used in this lecture note.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Co-Processor Architectures Fermi vs. Knights Ferry Roger Goff Dell Senior Global CERN/LHC Technologist |
Chapter 1 — Computer Abstractions and Technology — 1 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 © Barry Wilkinson GPUIntro.ppt Oct 30, 2014.
Native Computing & Optimization on Xeon Phi John D. McCalpin, Ph.D. Texas Advanced Computing Center.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
11 Brian Van Straalen Portable Performance Discussion August 7, FASTMath SciDAC Institute.
Sobolev(+Node 6, 7) Showcase +K20m GPU Accelerator.
Martin Kruliš by Martin Kruliš (v1.1)1.
Parallel OpenFOAM CFD Performance Studies Student: Adi Farshteindiker Advisors: Dr. Guy Tel-Zur,Prof. Shlomi Dolev The Department of Computer Science Faculty.
Manycore processors Sima Dezső October Version 6.2.
M. Bellato INFN Padova and U. Marconi INFN Bologna
NFV Compute Acceleration APIs and Evaluation
Parallel Computing Lecture
Unconventional applications of Intel® Xeon Phi™ Processor (KNL)
Mattan Erez The University of Texas at Austin
Multicore and GPU Programming
6- General Purpose GPU Programming
CSE 502: Computer Architecture
Multicore and GPU Programming
Presentation transcript:

XEON PHI

TOPICS What are multicore processors? Intel MIC architecture Xeon Phi Programming for Xeon Phi Performance Applications

WHAT ARE MULTICORE PROCESSORS? ApplicationsPerformanceProgramming for Xeon Phi Xeon PhiIntel MIC architectureWhat are multicore processors?

WHAT ARE MULTICORE PROCESSORS? ApplicationsPerformanceProgramming for Xeon Phi Xeon PhiIntel MIC architectureWhat are multicore processors?

WHAT ARE MULTICORE PROCESSORS? ApplicationsPerformanceProgramming for Xeon Phi Xeon PhiIntel MIC architectureWhat are multicore processors?

WHAT ARE MULTICORE PROCESSORS? Advantages: I/O latency reduction – many operations are performed inside the die Power efficiency – a dual core processor will require less power than 2 single core processors. Area reduction – more common circuitry yields less redundancy. Higher performance – utilizing parallel coding techniques allows for an increase in overall performance. ApplicationsPerformanceProgramming for Xeon Phi Xeon PhiIntel MIC architectureWhat are multicore processors?

WHAT ARE MULTICORE PROCESSORS? Disadvantages: Parallelization overhead – the take advantage of multiple cores an adequate OS and optimized application code is needed. SW development difficulties – multiple cores and threads increase the difficulty of code development. HW development difficulties – integrating multicore chips reduce production yield in comparison to the less dense single chip designs. ApplicationsPerformanceProgramming for Xeon Phi Xeon PhiIntel MIC architectureWhat are multicore processors?

INTEL’S MIC ARCHITECTURE The Larrabee project (2006) Originally designed for GPU purposes. Introduced very wide 512-bit SIMD units to the x86 processor design. Cache coherence multiprocessor system Up to 4 threads per core Ultra-wide ring memory bus Project was terminated on may 2010 ApplicationsPerformanceProgramming for Xeon Phi Xeon PhiIntel MIC architectureWhat are multicore processors?

XEON PHI! The Larrabee project gave birth to the Xeon Phi family of processors: Knights Ferry (May 2010) 32 cores, up to 750 GFLOPS Knights Corner (Nov. 2011) 60 cores, up to 1.2 TFLOPS Knights Landing (June 2013) 72 cores, up to 3 TFLOPS!!! ApplicationsPerformanceProgramming for Xeon Phi Xeon PhiIntel MIC architectureWhat are multicore processors?

ApplicationsPerformanceProgramming for Xeon Phi Xeon PhiIntel MIC architectureWhat are multicore processors?

PROGRAMING FOR XEON PHI (MIC) Programing for a MIC processor is almost transparent in comparison to normal CPU’s Standard programing languages: C/C++ and Fortran Standard parallel programming tools: OpenMP & MPI MPI can be executed on both host and on the coprocessor Any code can run on MIC, not just kernels Optimizing for MIC is similar to optimizing for normal CPUs ApplicationsPerformanceProgramming for Xeon Phi Xeon PhiIntel MIC architectureWhat are multicore processors?

XEON PHI VS. NVIDIA TESLA TeslaXeon PhiCriteria CUDA/OpenCLC++/C/Fortran/OpenCLHPC programming Hardware threadsOpenMP, MultithreadingThreading Host onlyHost and coprocessorMPI support KernelSerial, scripts, etc…Code types ApplicationsPerformanceProgramming for Xeon Phi Xeon PhiIntel MIC architectureWhat are multicore processors?

PERFORMANCE ApplicationsPerformanceProgramming for Xeon Phi Xeon PhiIntel MIC architectureWhat are multicore processors?

PERFORMANCE ApplicationsPerformanceProgramming for Xeon Phi Xeon PhiIntel MIC architectureWhat are multicore processors?

PERFORMANCE ApplicationsPerformanceProgramming for Xeon Phi Xeon PhiIntel MIC architectureWhat are multicore processors?

PERFORMANCE ApplicationsPerformanceProgramming for Xeon Phi Xeon PhiIntel MIC architectureWhat are multicore processors?

APPLICATIONS ApplicationsPerformanceProgramming for Xeon Phi Xeon PhiIntel MIC architectureWhat are multicore processors?

SUPERCOMPUTING Xeon Phi provide 8 out of 10 PFLOPS of “Stampede” super computer. Tianhe-2, 2013’s world’s fastest SC, is based on Knights Corner technology ApplicationsPerformanceProgramming for Xeon Phi Xeon PhiIntel MIC architectureWhat are multicore processors?

BIBLIOGRAPHY www1.cse.wustl.edu