Sparse Computations: Better, Faster, Cheaper! Padma Raghavan Department of Computer Science and Engineering, The Pennsylvania State University

Slides:



Advertisements
Similar presentations
Copyright 2011, Data Mining Research Laboratory Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining Xintian Yang, Srinivasan.
Advertisements

Aggregating local image descriptors into compact codes
CSE431 Chapter 7A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 7A: Intro to Multiprocessor Systems Mary Jane Irwin (
Yasuhiro Fujiwara (NTT Cyber Space Labs)
Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.
CS 795 – Spring  “Software Systems are increasingly Situated in dynamic, mission critical settings ◦ Operational profile is dynamic, and depends.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
1 BSMOR: Block Structure-preserving Model Order Reduction http//:eda.ee.ucla.edu Hao Yu, Lei He Electrical Engineering Dept., UCLA Sheldon S.D. Tan Electrical.
Optimization of Sparse Matrix Kernels for Data Mining Eun-Jin Im and Katherine Yelick U.C.Berkeley.
Kathryn Linehan Advisor: Dr. Dianne O’Leary
CAFO: Cost Aware Flip Optimization for Asymmetric Memories RAKAN MADDAH *, SEYED MOHAMMAD SEYEDZADEH AND RAMI MELHEM COMPUTER SCIENCE DEPARTMENT UNIVERSITY.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Prefetching for Visual Data Exploration Punit R. Doshi, Elke A. Rundensteiner, Matthew O. Ward Computer Science Department Worcester Polytechnic Institute.
FLANN Fast Library for Approximate Nearest Neighbors
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Linear Algebra and Image Processing
Cache-Conscious Runtime Optimization for Ranking Ensembles Xun Tang, Xin Jin, Tao Yang Department of Computer Science University of California at Santa.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Utilising software to enhance your research Eamonn Hynes 5 th November, 2012.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung.
Low-Power Wireless Sensor Networks
1 Information Retrieval through Various Approximate Matrix Decompositions Kathryn Linehan Advisor: Dr. Dianne O’Leary.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
Achieving High Software Reliability Using a Faster, Easier and Cheaper Method NASA OSMA SAS '01 September 5-7, 2001 Taghi M. Khoshgoftaar The Software.
HPDC 2013 Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices Yu Su*, Gagan Agrawal*, Jonathan Woodring # Kary Myers #, Joanne Wendelberger.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
JAVA AND MATRIX COMPUTATION
Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Using to Save Lives Or, Using Digg to find interesting events. Presented by: Luis Zaman, Amir Khakpour, and John Felix.
 The need for parallelization  Challenges towards effective parallelization  A multilevel parallelization framework for BEM: A compute intensive application.
GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin, Sanghack Lee, Ngot Bui.
CS378 Final Project The Netflix Data Set Class Project Ideas and Guidelines.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Linear Solution to Scale and Rotation Invariant Object Matching Hao Jiang and Stella X. Yu Computer Science Department Boston College.
Secure Execution of Computations in Untrusted Hosts S. H. K. Narayanan 1, M.T. Kandemir 1, R.R. Brooks 2 and I. Kolcu 3 1 Embedded Mobile Computing Center.
Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems Tsinghua University Tsinghua National Laboratory for Information.
Optimizing the Performance of Sparse Matrix-Vector Multiplication
Analysis of Sparse Convolutional Neural Networks
Yuanrui Zhang, Mahmut Kandemir
Resource Elasticity for Large-Scale Machine Learning
Richard Dorrance Literature Review: 1/11/13
Morgan Kaufmann Publishers
Accelerating MapReduce on a Coupled CPU-GPU Architecture
C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs Shuo Wang1, Zhe Li2, Caiwen Ding2, Bo Yuan3, Qinru Qiu2, Yanzhi Wang2,
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Presented by: Isaac Martin
Linchuan Chen, Peng Jiang and Gagan Agrawal
CSCI N207 Data Analysis Using Spreadsheet
Peng Jiang, Linchuan Chen, and Gagan Agrawal
Data Mining for Finding Connections of Disease and Medical and Genomic Characteristics Vipin Kumar William Norris Professor and Head, Department of Computer.
Ernest Valveny Computer Vision Center
Presentation transcript:

Sparse Computations: Better, Faster, Cheaper! Padma Raghavan Department of Computer Science and Engineering, The Pennsylvania State University Issue: Can we make sparse matrix vector multiplication (SMV) FASTER on multicores? Focus: Performance Method: Identify and use smaller dense sub- matrices ‘hidden’ in the sparse matrix -- Sparsity- exploiting effectively dense (SpEED-SMV ) FST-K-Means improves the classification accuracy of 5 out of 8 datasets. Classification Accuracy (P) Number of correctly classified documents Total number of documents FST-K-Means improves the cluster cohesiveness for K-Means and is competitive with GraClus on the benchmark datasets. Cluster Cohesiveness (J) Centroid of cluster M k jth document of cluster M i User Challenges in Processing High Dimensional Sparse Data Will my application speed-up, compute high quality solutions? Developer Can I make sparse codes better, faster, cheaper? WHAT IS SPARSITY? Consider N interacting variables NxN pair-wise interactions Dense: elements, NxN matrix/array Sparse: c N elements, c is a small number SPARSITY PROPERTIES Compact representation Memory and computational cost scaling O(N) per sweep through data Data mining Approximations Discretization Issue: Can we transform data for BETTER classification? Focus: Quality of classification Method: Feature subspace transformation (FST) by force-directed graph embedding. Apply K- Means on transformed data (FST K-Means) Decreasing memory loads by reducing index overheads Obtaining a supernodal ordered matrix Ordering supernodal matrix using RCM ordering scheme Performance comparison: SpEED-SMV vs. RxC-SMV Issue: Can we schedule for energy efficiency (CHEAPER)? Focus: Reduce energy delay product (EDP) Method: Dynamic scheduling with helper threads that learn application profile to select number of cores, threads and voltage/frequency levels Run away leakage on idle cores Thermal emergencies Transient errors Best EDP adaptivity? Monitor- Model- Adapt Helper thread’s recipe  Collect performance/energy statistics as the execution progresses  Calculate EDP values  Use data fitting with available data and interpolate to predict configuration WHERE DOES SPARSITY COME FROM? Matrix Mesh Graph EnergyDelayProduct [EDP] (Energy X Time) is lower for faster system A B Software for sparse computations can deliver high quality solutions that are scalable and energy efficient by exploiting latent structure in the data and adapting to match hardware attributes This research was funded in part by the NSF Grants CSR-SMA , CCF and OCI One sparse dataset: 3 forms Matrix  Graph  Mesh Algebraic Combinatorial Geometry SpEED-SMV improves the performance of SMV by as much as 59.35% and 50.32%, and on average, 37.35% and 6.07%, compared to traditional compressed sparse row and blocked compressed sparse row schemes Achieve up to 66:3% and 83:3% savings in EDP when adjusting all the parameters properly in applications FFT and MG, respectively Anirban Chatterjee, Sanjukta Bhowmick, Padma Raghavan: Feature subspace transformations for enhancing k-means clustering. CIKM 2010: Yang Ding, Mahmut T. Kandemir, Padma Raghavan, Mary Jane Irwin: A helper thread based EDP reduction scheme for adapting application execution in CMPs. IPDPS 2008: 1-14 Manu Shantharam, Anirban Chatterjee, Padma Raghavan: Exploiting dense substructures for fast sparse matrix vector multiplication. IJHPCA 25(3): (2011)