CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT.

Slides:



Advertisements
Similar presentations
CS179: GPU Programming Lecture 5: Memory. Today GPU Memory Overview CUDA Memory Syntax Tips and tricks for memory handling.
Advertisements

Complete Unified Device Architecture A Highly Scalable Parallel Programming Framework Submitted in partial fulfillment of the requirements for the Maryland.
1 Computational models of the physical world Cortical bone Trabecular bone.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 28, 2011 GPUMemories.ppt GPU Memories These notes will introduce: The basic memory hierarchy.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.
Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.
An Effective GPU Implementation of Breadth-First Search Lijuan Luo, Martin Wong and Wen-mei Hwu Department of Electrical and Computer Engineering, UIUC.
1 100M CUDA GPUs Oil & GasFinanceMedicalBiophysicsNumericsAudioVideoImaging Heterogeneous Computing CPUCPU GPUGPU Joy Lee Senior SW Engineer, Development.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
OpenCL Usman Roshan Department of Computer Science NJIT.
GPGPU platforms GP - General Purpose computation using GPU
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
Jawwad A Shamsi Nouman Durrani Nadeem Kafi Systems Research Laboratories, FAST National University of Computer and Emerging Sciences, Karachi Novelties.
SAGE: Self-Tuning Approximation for Graphics Engines
2012/06/22 Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use.
Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung.
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor Mark Gebhart 1,2 Stephen W. Keckler 1,2 Brucek Khailany 2 Ronny Krashinsky.
Extracted directly from:
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Revisiting Kirchhoff Migration on GPUs Rice Oil & Gas HPC Workshop
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.
ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 3, 2011outline.1 ITCS 6010/8010 Topics in Computer Science: GPU Programming for High Performance.
Debugging and Profiling GMAO Models with Allinea’s DDT/MAP Georgios Britzolakis April 30, 2015.
YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Fast Support Vector Machine Training and Classification on Graphics Processors Bryan Catanzaro Narayanan Sundaram Kurt Keutzer Parallel Computing Laboratory,
GPU Architecture and Programming
ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson Dec 24, 2012outline.1 ITCS 4010/5010 Topics in Computer Science: GPU Programming for High Performance.
"Distributed Computing and Grid-technologies in Science and Education " PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS Klimov Georgy Dubna, 2012.
Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.
Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
Contemporary Languages in Parallel Computing Raymond Hummel.
Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.
University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.
CS 732: Advance Machine Learning
GAIN: GPU Accelerated Intensities Ahmed F. Al-Refaie, S. N. Yurchenko, J. Tennyson Department of Physics Astronomy - University College London - Gower.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
GFlow: Towards GPU-based High- Performance Table Matching in OpenFlow Switches Author : Kun Qiu, Zhe Chen, Yang Chen, Jin Zhao, Xin Wang Publisher : Information.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1.
CS 179: GPU Computing LECTURE 2: MORE BASICS. Recap Can use GPU to solve highly parallelizable problems Straightforward extension to C++ ◦Separate CUDA.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
GPGPU Programming with CUDA Leandro Avila - University of Northern Iowa Mentor: Dr. Paul Gray Computer Science Department University of Northern Iowa.
1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,
Computer Engg, IIT(BHU)
Accelerated B.S./M.S An approved Accelerated BS/MS program allows an undergraduate student to take up to 6 graduate level credits as an undergraduate.
Our Graphics Environment
CS427 Multicore Architecture and Parallel Computing
Constructing a system with multiple computers or processors
Artificial Intelligence
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Multicore and GPU Programming
6- General Purpose GPU Programming
Multicore and GPU Programming
Presentation transcript:

CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT

Parallel computing Why in an advance machine learning course? Some machine learning programs take a long time to finish. For example large neural networks and kernel methods. Dataset sizes are getting larger. While linear classification and regression programs are generally very fast they can be slow on large datasets.

Examples Dot product evaluation Gradient descent algorithms Cross-validation – Evaluating many folds in parallel – Parameter estimation analytics-database.html analytics-database.html

Parallel computing Multi-core programming – OpenMP: ideal for running same program on different inputs – MPI: master slave setup that allows message passing Graphics Processing Units: – Equipped with hundred to thousand cores – Designed for running in parallel hundreds of short functions called threads

GPU programming Memory has four types with different sizes and access times – Global: largest, ranges from 3 to 6GB, slow access time – Local: same as global but specific to a thread – Shared: on-chip, fastest, and limited to threads in a block – Constant: cached global memory and accessible by all threads Coalescent memory access is key to fast GPU programs. Main idea is that consecutive threads access consecutive memory locations.

GPU programming Designed for running in parallel hundreds of short functions called threads Threads are organized into blocks which are in turn organized into grids Ideal for running the same function on millions of different inputs

Languages CUDA: – C-like language introduced by NVIDIA – CUDA programs run only on NVIDIA GPUs OpenCL: – OpenCL programs run on all GPUs – Same as C – Requires no special compiler except for opencl header and object files (both easily available)

CUDA We will compile and run a program for determining interacting SNPs in a genome- wide association study Location: