SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters November.

Slides:



Advertisements
Similar presentations
Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
Advertisements

1 Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping Chi-Keung (CK) Luk Technology Pathfinding and Innovation Software.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Characterization and Transformation of Unstructured Control Flow in GPU.
The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Sponsors: National Science Foundation, LogicBlox Inc., and NVIDIA Kernel.
GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
PVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments Palden Lama, Xiaobo Zhou, University of Colorado at Colorado Springs.
University of Michigan Electrical Engineering and Computer Science Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems.
Cyberinfrastructure for Scalable and High Performance Geospatial Computation Xuan Shi Graduate assistants supported by the CyberGIS grant Fei Ye (2011)
GPUs on Clouds Andrew J. Younge Indiana University (USC / Information Sciences Institute) UNCLASSIFIED: 08/03/2012.
OpenFOAM on a GPU-based Heterogeneous Cluster
Evaluating GPU Passthrough in Xen for High Performance Cloud Computing Andrew J. Younge 1, John Paul Walters 2, Stephen P. Crago 2, and Geoffrey C. Fox.
MSSG: A Framework for Massive-Scale Semantic Graphs Timothy D. R. Hartley, Umit Catalyurek, Fusun Ozguner, Andy Yoo, Scott Kohn, Keith Henderson Dept.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Synergistic Execution of Stream Programs on Multicores with Accelerators Abhishek Udupa et. al. Indian Institute of Science.
Big Kernel: High Performance CPU-GPU Communication Pipelining for Big Data style Applications Sajitha Naduvil-Vadukootu CSC 8530 (Parallel Algorithms)
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Sponsors: National Science Foundation, LogicBlox Inc., and NVIDIA Red Fox:
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Supporting GPU Sharing in Cloud Environments with a Transparent
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Sponsors: National Science Foundation, LogicBlox Inc., and NVIDIA Kernel.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Accelerating Simulation of Agent-Based Models on Heterogeneous Architectures.
Architectural Optimizations David Ojika March 27, 2014.
Secure & flexible monitoring of virtual machine University of Mazandran Science & Tecnology By : Esmaill Khanlarpour January.
Revisiting Kirchhoff Migration on GPUs Rice Oil & Gas HPC Workshop
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission.
Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli.
GPU Architecture and Programming
Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal June 1,
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
GPU-Accelerated Computing and Case-Based Reasoning Yanzhi Ren, Jiadi Yu, Yingying Chen Department of Electrical and Computer Engineering, Stevens Institute.
Harmony: A Run-Time for Managing Accelerators Sponsor: LogicBlox Inc. Gregory Diamos and Sudhakar Yalamanchili.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
OpenCL Programming James Perry EPCC The University of Edinburgh.
Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3,
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
An Execution Model for Heterogeneous Multicore Architectures Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Full and Para Virtualization
Euro-Par, 2006 ICS 2009 A Translation System for Enabling Data Mining Applications on GPUs Wenjing Ma Gagan Agrawal The Ohio State University ICS 2009.
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
Parallel IO for Cluster Computing Tran, Van Hoai.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
Enabling Dynamic Memory Management Support for MTC on NVIDIA GPUs Proposed Work This work aims to enable efficient dynamic memory management on NVIDIA.
An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems Isaac Gelado, Javier Cabezas. John Stone, Sanjay Patel, Nacho Navarro.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
Scaling up R computation with high performance computing resources.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Hadoop Javad Azimi May What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data. It includes:
Connected Maintenance Solution
CS427 Multicore Architecture and Parallel Computing
Enabling machine learning in embedded systems
Connected Maintenance Solution
Texas Instruments TDA2x and Vision SDK
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
The Yin and Yang of Processing Data Warehousing Queries on GPUs
Key Manager Domains February, 2019.
Panel on Research Challenges in Big Data
6- General Purpose GPU Programming
Presentation transcript:

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY HPCDB Satisfying Data-Intensive Queries Using GPU Clusters November 11 th, Satisfying Data-Intensive Queries Using GPU Clusters Haicheng Wu, Jeff Young Sudhakar Yalamanchili Computer Architecture and Systems Laboratory Center for Experimental Research in Computer Systems School of Electrical and Computer Engineering Georgia Institute of Technology Sponsors: AIC, AMD, LogicBlox Inc., National Science Foundation, NEC, NVIDIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY HPCDB Satisfying Data-Intensive Queries Using GPU Clusters November 11 th, Application: Data Warehousing On-line and off-line analysis Retail analysis Forecasting Pricing Etc… Combination of relational data queries and computational kernels Current applications process 1 to 50 TBs of data [1] Techniques can be applied to other “Big Data” problems like irregular graphs, sorting [1] Independent Oracle Users Group. A New Dimension to Data Warehousing: 2011 IOUG Data Warehousing Survey. …… LargeQty(p) <- Qty(q), q > ……

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY HPCDB Satisfying Data-Intensive Queries Using GPU Clusters November 11 th, Proposed System Model Red Fox: Compilation and optimization of queries for GPUs Remove need for application developer to optimize applications to run on GPUs Oncilla: Global Address Space (GAS) layer Create an API to simplify data movement and scheduling

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY HPCDB Satisfying Data-Intensive Queries Using GPU Clusters November 11 th, Red Fox Compilation Flow RA-to-PTX ( nvcc + RA-Lib) Primitives Library Runtime Manager LogicBlox Front-End Language Front-End Translation Layer Back-End Datalog Queries Query PlanPTX/Binary Kernel Kernel Weaver

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY HPCDB Satisfying Data-Intensive Queries Using GPU Clusters November 11 th, Relational Algebra Primitives on GPUs Multi-stage algorithm Simple primitives are close to maximum performance More complex primitives could show better performance with newer implementations (in progress with NVIDIA Research) Raw Performance (NVIDIA C2050) Fastest known for GPUs! Practical MAX

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY HPCDB Satisfying Data-Intensive Queries Using GPU Clusters November 11 th, Red Fox: TPC-H Q1 Results GPU computation scales well with problem size Improved primitives could lead to further 10x speedup

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY HPCDB Satisfying Data-Intensive Queries Using GPU Clusters November 11 th, Oncilla: Fabrics for Accelerator Clouds Goal: Transparent, efficient host memory aggregation across node for accelerators Solution: Use Global Address Spaces (GAS) and commodity fabrics (HT, QPI, PCIe, 10GE, IB) Support in-core databases using software from Red Fox project Companies: LogicBlox, NVIDIA, AIC

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY HPCDB Satisfying Data-Intensive Queries Using GPU Clusters November 11 th, Oncilla aims to combine support for multiple types of data transfer and CUDA- based optimizations under a simplified runtime. Ex: “oncilla_malloc(2 GB, node2, gpumem)” Enable application developers and schedulers to take advantage of high- performance GAS without needing to be experts in specialized hardware Oncilla: Efficient Data Movement

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY HPCDB Satisfying Data-Intensive Queries Using GPU Clusters November 11 th, Questions? For more information: Red Fox: H. Wu, G. Diamos, H. Cadambi, and S. Yalamanchili, “KernelWeaver: Automatically Fusing Database Primitives for Efficient GPU Computation,” MICRO, December Oncilla: J. Young, S. Yalamanchili, Commodity Converged Fabrics for Global Address Spaces in Accelerator Clouds,” HPCC, June, 2012