Interactive Machine Learning with a GPU-Accelerated Toolkit

Slides:



Advertisements
Similar presentations
Copyright 2011, Data Mining Research Laboratory Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining Xintian Yang, Srinivasan.
Advertisements

Slides from: Doug Gray, David Poole
Aggregating local image descriptors into compact codes
University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
Technology of Data Analytics. INTRODUCTION OBJECTIVE  Data Analytics mindset – shallow and wide, deep when you need it  Quick overview, useful tidbits,
Visualization and Cluster
Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.
Clementine Server Clementine Server A data mining software for business solution.
Neural Networks Chapter Feed-Forward Neural Networks.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Ling Liu Professor School of Computer Science Georgia Institute of Technology Cloud Computing Research in my group.
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
CSE 185 Introduction to Computer Vision Pattern Recognition.
Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.
Overcast: Reliable Multicasting with an Overlay Network CS294 Paul Burstein 9/15/2003.
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Stratified K-means Clustering Over A Deep Web Data Source Tantan Liu, Gagan Agrawal Dept. of Computer Science & Engineering Ohio State University Aug.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Xutao Li1, Gao Cong1, Xiao-Li Li2
Consensus Group Stable Feature Selection
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
Multimedia Analytics Jianping Fan Department of Computer Science University of North Carolina at Charlotte.
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.
EACH IMAGE MATTERS, EVEN AMONG MILLIONS: SCALING UP QOE-DRIVEN DELIVERY OF IMAGE-RICH WEB APPLICATIONS BY PARVEZ AHAMMAD.
Big Data Analytics and HPC Platforms
Panel: Beyond Exascale Computing
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
TensorFlow– A system for large-scale machine learning
World’s fastest Machine Learning Engine
Semi-Supervised Clustering
Online Multiscale Dynamic Topic Models
Working With Azure Batch AI
基于多核加速计算平台的深度神经网络 分割与重训练技术
Eick: Introduction Machine Learning
Dynamic Graph Partitioning Algorithm
Restricted Boltzmann Machines for Classification
Large Scale Data Processing Techniques for Astronomical Applications
Supporting Fault-Tolerance in Streaming Grid Applications
"Playing Atari with deep reinforcement learning."
Deep Learning For Application Performance Tuning
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks
Community Distribution Outliers in Heterogeneous Information Networks
CMPT 733, SPRING 2016 Jiannan Wang
NormFace:
Collaborative Filtering Matrix Factorization Approach
Logistic Regression & Parallel SGD
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
Declarative Transfer Learning from Deep CNNs at Scale
Fast and Exact K-Means Clustering
Emna Krichene 1, Youssef Masmoudi 1, Adel M
orange.biolab.si A general-purpose open source component-based
Decision Trees for Mining Data Streams
实习生汇报 ——北邮 张安迪.
Christoph F. Eick: A Gentle Introduction to Machine Learning
TensorFlow: A System for Large-Scale Machine Learning
Panel on Research Challenges in Big Data
CMPT 733, SPRING 2017 Jiannan Wang
H2O is used by more than 14,000 companies
Carlos Ordonez, Javier Garcia-Garcia,
The Student’s Guide to Apache Spark
The Updated experiment based on LSTM
Using Clustering to Make Prediction Intervals For Neural Networks
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
Machine Learning.
Logistic Regression Geoff Hulten.
Overall Introduction for the Lecture
Presentation transcript:

Interactive Machine Learning with a GPU-Accelerated Toolkit Biye Jiang, Huasha Zhao, John Canny Computer Science Division University of California, Berkeley {bjiang, hzhao, jfc}@cs.berkeley.edu Berkeley Institute of Design Dashboard Model overview (Topic matrix, image cluster centers) Main loss and other evaluation metrics Sliders to change model hyper-parameters Dashboard Other Visualized Metric Cluster centers Pairwise distance (MINST dataset) between clusters Silhouette graph References [1] Huasha Zhao, Biye Jiang and John Canny SAME but Different: Fast and High-Quality Gibbs Parameter Estimation Arxiv 1409.5402 (2014) [2] John Canny and Huasha Zhao. Big data analytics with small footprint: Squaring the cloud. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2013 [3] John Canny and Huasha Zhao BIDMach: Large-scale Learning with Zero Memory Allocation, BIGLearn Workshop, Neural Information Processing Conference (NIPS) 2013 Motivation ML algorithms optimize mathematical criteria People have informal notions of a “good” model Real-world ML applications often involve trade-offs between multiple criteria (business logic): Revenue Advertiser satisfaction User metrics These goals should be addressed during training, not after. Interactive ML allows users to understand the effects of these trade-offs on model quality and structure. Tuning parameter: sizeWeight Metric: Histogram of cluster size Increase sizeWeight Interactive Interface Minibatch learning supports continuous data streaming, model updates happen many times per second. Parameters and visualizations are updated in real time. Algorithms converge after a few updates: Less than 10s to get stable result on MINST8M (20GB) Encoding constraints as mixins functions Users first figure out secondary optimization goal Sparseness, Consistency, Independence Then use its gradient to update the model The mixin weight  is a hyper-parameter, the mixin value is a metric. Users can interactively adjust the controls and see the effects on corresponding metrics Visualization in Browser Web Server BIDMach Grab data from GPU 10 times/s Changed Hyper-Parameter Model overview, Evaluation metrics Using D3.js Each hyper-parameter corresponds to a visualization of the metric Data structure User defined logging. Only used data will be logged. Efficient internal matrix format in BIDMach Using JSON to communicate between server and browser Can easily support deep neural network Temperature/Learning rate control Control the window size for moving average update Control the variance of the Gibbs sampler User defined annealing learning schedule