“SMT Capable CPU-GPU Systems for Big Data”

Slides:

Advertisements

Similar presentations

Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel.

Advertisements

Complete Unified Device Architecture A Highly Scalable Parallel Programming Framework Submitted in partial fulfillment of the requirements for the Maryland.

L1 Event Reconstruction in the STS I. Kisel GSI / KIP CBM Collaboration Meeting Dubna, October 16, 2008.

Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.

O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 A Brief Summer Recap Flocking, CUDA, GPU, Ants, and More Jesse St.Charles.

Simulation of Microwave Induced Thermoacoustic Imaging Model using GPU Nilangshu Bidyanta Ramaprasad Kulkarni ECE 562 Term Project.

A many-core GPU architecture.. Price, performance, and evolution.

The PTX GPU Assembly Simulator and Interpreter N.M. Stiffler Zheming Jin Ibrahim Savran.

Multi-core processors. History In the early 1970’s the first Microprocessor was developed by Intel. It was a 4 bit machine that was named the 4004 The.

Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.

GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.

Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.

Jared Barnes Chris Jackson.  Originally created to calculate pixel values  Each core executes the same set of instructions Mario projected onto several.

Invited Talk 5: “Discovering Energy-Efficient High-Performance Computing Systems? WSU CAPPLab may help!” ICIEV 2014 Dhaka, Bangladesh Dr. Abu Asaduzzaman,

“Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015.

Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.

CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

Extracted directly from:

“SMT/GPU Provides High Performance; at WSU CAPPLab, we can help you!” Bogazici University Istanbul, Turkey Presented by: Dr. Abu Asaduzzaman Assistant.

MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi CoE EECS Department April 21, 2014.

General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.

CoE EECS Department Graduate Students Welcome Party – 2014 (Updated on Aug. 22, 2014) Welcome questions: Do you know Einstein? Do you know me? Knowing.

Computer Architecture and Parallel Programming Laboratory (CAPPLab) Group Meetings Greetings! Abu Asaduzzaman Assistant Professor, Elec. Eng. & Comp. Sci.

Applying GPU and POSIX Thread Technologies in Massive Remote Sensing Image Data Processing By: Group 17 King Mongkut's Institute of Technology Ladkrabang.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

GPU Architecture and Programming

Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.

Carlo del Mundo Department of Electrical and Computer Engineering Ubiquitous Parallelism Are You Equipped To Code For Multi- and Many- Core Platforms?

By Dirk Hekhuis Advisors Dr. Greg Wolffe Dr. Christian Trefftz.

1 Workshop 9: General purpose computing using GPUs: Developing a hands-on undergraduate course on CUDA programming SIGCSE The 42 nd ACM Technical.

CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.

Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.

CUDA Compute Unified Device Architecture. Agent Based Modeling in CUDA Implementation of basic agent based modeling on the GPU using the CUDA framework.

GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.

Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.

Date of download: 6/1/2016 Copyright © 2016 SPIE. All rights reserved. Triangulated shapes of human head layer boundaries employed in simulations: (a)

“A Learner-Centered Computational Experience in Nanotechnology for Undergraduate STEM Students” IEEE ISEC 2016 Friend Center at Princeton University March.

MAHARANA PRATAP COLLEGE OF TECHNOLOGY SEMINAR ON- COMPUTER PROCESSOR SUBJECT CODE: CS-307 Branch-CSE Sem- 3 rd SUBMITTED TO SUBMITTED BY.

Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

2014 Heterogeneous many cores for medical control: Performance, Scalability, and Accuracy Madhurima Pore, Arizona State University October 10,2014 #GHC14.

General Purpose computing on Graphics Processing Units

Parallel Programming Models

Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt.

IEEE SoutheastCon 2016 Norfolk, Virginia, USA

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

NFV Compute Acceleration APIs and Evaluation

Modeling Big Data Execution speed limited by: Model complexity

A Level Computing – a2 Component 2 1A, 1B, 1C, 1D, 1E.

CS427 Multicore Architecture and Parallel Computing

GPU Computing Jan Just Keijser Nikhef Jamboree, Utrecht

Microarchitecture.

Multi-core processors

Spatial Analysis With Big Data

Multi-core processors

Map-Scan Node Accelerator for Big-Data

Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang

IEEE-HKN Chapter at Wichita State

NVIDIA Fermi Architecture

About Hardware Optimization in Midas SW

Chapter 1 Introduction.

1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.

Graphics Processing Unit

6- General Purpose GPU Programming

CSE 502: Computer Architecture

Multicore and GPU Programming

CIS 6930: Chip Multiprocessor: GPU Architecture and Programming

Presentation transcript:

“SMT Capable CPU-GPU Systems for Big Data” WSU Analytics and Big Data Meet Up ~ April 3, 2017 “SMT Capable CPU-GPU Systems for Big Data” http://webs.wichita.edu/?u=eecs&p=/faculty/abu/ Less than seven minutes, a lot to say/prove… huge pressure! I’ll start with explaining my title… “SMT Capable CPU-GPU Systems for Big Data” Next presenter, please come start when it is your time… Dr. Zaman; WSU-5261

“SMT Capable CPU-GPU Systems for Big Data” WSU Analytics and Big Data Meet Up “SMT Capable CPU-GPU Systems for Big Data” WSU – Wichita State University SMT – Simultaneous Multi-Threading CPU – Central Processing Unit GPU – Graphics Processing Unit CPU CPU Pure Harvard Architecture Von Neumann Arch A Computer System Name of the Game: performance, energy consumption, cost, …

“SMT Capable CPU-GPU Systems for Big Data” WSU Analytics and Big Data Meet Up “SMT Capable CPU-GPU Systems for Big Data” WSU – Wichita State University SMT – Simultaneous Multi-Threading CPU – Central Processing Unit GPU – Graphics Processing Unit Compute Unified Device Architecture (CUDA) – a parallel computing platform and application programming interface (API) model created by Nvidia Time-Efficient Computing

“SMT Capable CPU-GPU Systems for Big Data” WSU Analytics and Big Data Meet Up “SMT Capable CPU-GPU Systems for Big Data” WSU – Wichita State University SMT – Simultaneous Multi-Threading CPU – Central Processing Unit GPU – Graphics Processing Unit SMT Capable CPU-GPU System Many-Core GPU Card A process is a running program. A process can generate many processes (called threads). … Multicore CPU Instruction Execution Intel Xeon Phi has up to 72 cores [http://www.intel.com/content/www/us/en/products/processors/xeon-phi/xeon-phi-processors.html] Nvidia Tesla K80 has up to 4992 cores [http://www.nvidia.com/object/tesla-k80.html]

“SMT Capable CPU-GPU Systems for Big Data” WSU Analytics and Big Data Meet Up “SMT Capable CPU-GPU Systems for Big Data” WSU – Wichita State University SMT – Simultaneous Multi-Threading CPU – Central Processing Unit GPU – Graphics Processing Unit Energy-Efficient Computing CL1 – Level-1 Cache CL2 – Level-2 Cache Cache and memory are very power-hungry. More energy consumption, more heat dissipation! CL1 CL2 Name of the Game: performance, energy consumption, cost, …

CPU-GPU Systems for Big Data WSU Analytics and Big Data Meet Up “SMT Capable CPU-GPU Systems for Big Data” A1,1 A1,2 CPU-GPU Systems for Big Data Matrix Multiplication (MM) 4 x 4 Matrix: (4^3) 64 multiplications 2 x 2 Matrix: (2^3) 8 multiplications Are we reducing #*s? (no) What is the message? For many 2 x 2 matrix solvers (with 8 MULT), it takes “only” 2 * 8 MULT time unit (6416?) Do we have many solvers/cores? (Yes, GPU) A B C

CUDA/MM for Graph Problems WSU Analytics and Big Data Meet Up “SMT Capable CPU-GPU Systems for Big Data” CUDA/MM for Graph Problems Number of paths of length 4 between C and J? 7 Row (# of paths of length 1) x Column (# of paths of length 3)  Column (# of paths of length 4) CUDA (parallel programming) improves performance. Length 3  Length 4  Length 1 

CAPPLab Research Activities WSU Analytics and Big Data Meet Up “SMT Capable CPU-GPU Systems for Big Data” CAPPLab Research Activities Research supported by Kansas NSF EPSCoR, Nvidia, CybertronPC, … Asaduzzaman, A., Chidella, K.K., Mitra, P., Islam, M.F., Cluff, K., Islam, A., and Saeed, K.A., “A Novel Image Conversion and Processing Technique for Analyzing Breast Cancer,” under review, IEEE Transactions on Medical Imaging, Vol. X, No. Y, 2017. CAPPLab earned top research designation (GPU Research Center) by Nvidia in 2015. Asaduzzaman, A., Gummadi, D., and Yip, C.M., “A Talented CPU-to-GPU Memory Mapping Technique,” in IEEE SoutheastCon 2014, Lexington, KY, March 13-16, 2014. Asaduzzaman, A., Yip, C.M., Kumar, S., and Asmatulu, R., “Fast, Effective, and Adaptable Computer Modeling and Simulation of Lightning Strike Protection on Composite Materials,” in IEEE SoutheastCon Conference 2013, Jacksonville, FL, April 4-7, 2013.

“SMT Capable CPU-GPU Systems for Big Data” WSU Analytics and Big Data Meet Up “SMT Capable CPU-GPU Systems for Big Data” WSU – Wichita State University SMT – Simultaneous Multi-Threading CPU – Central Processing Unit GPU – Graphics Processing Unit Dr. Zaman; WSU-5261