Interactive Machine Learning with a GPU-Accelerated Toolkit

Slides:

Advertisements

Similar presentations

Copyright 2011, Data Mining Research Laboratory Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining Xintian Yang, Srinivasan.

Advertisements

Slides from: Doug Gray, David Poole

Aggregating local image descriptors into compact codes

University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.

Technology of Data Analytics. INTRODUCTION OBJECTIVE  Data Analytics mindset – shallow and wide, deep when you need it  Quick overview, useful tidbits,

Visualization and Cluster

Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.

Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.

Clementine Server Clementine Server A data mining software for business solution.

Neural Networks Chapter Feed-Forward Neural Networks.

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Ling Liu Professor School of Computer Science Georgia Institute of Technology Cloud Computing Research in my group.

Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,

CSE 185 Introduction to Computer Vision Pattern Recognition.

Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Overcast: Reliable Multicasting with an Overlay Network CS294 Paul Burstein 9/15/2003.

Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.

Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.

Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.

Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,

Stratified K-means Clustering Over A Deep Web Data Source Tantan Liu, Gagan Agrawal Dept. of Computer Science & Engineering Ohio State University Aug.

Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.

Xutao Li1, Gao Cong1, Xiao-Li Li2

Consensus Group Stable Feature Selection

Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.

Multimedia Analytics Jianping Fan Department of Computer Science University of North Carolina at Charlotte.

Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.

EACH IMAGE MATTERS, EVEN AMONG MILLIONS: SCALING UP QOE-DRIVEN DELIVERY OF IMAGE-RICH WEB APPLICATIONS BY PARVEZ AHAMMAD.

Big Data Analytics and HPC Platforms

Panel: Beyond Exascale Computing

Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton

TensorFlow– A system for large-scale machine learning

World’s fastest Machine Learning Engine

Semi-Supervised Clustering

Online Multiscale Dynamic Topic Models

Working With Azure Batch AI

基于多核加速计算平台的深度神经网络分割与重训练技术

Eick: Introduction Machine Learning

Dynamic Graph Partitioning Algorithm

Restricted Boltzmann Machines for Classification

Large Scale Data Processing Techniques for Astronomical Applications

Supporting Fault-Tolerance in Streaming Grid Applications

"Playing Atari with deep reinforcement learning."

Deep Learning For Application Performance Tuning

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

Community Distribution Outliers in Heterogeneous Information Networks

CMPT 733, SPRING 2016 Jiannan Wang

Collaborative Filtering Matrix Factorization Approach

Logistic Regression & Parallel SGD

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

Declarative Transfer Learning from Deep CNNs at Scale

Fast and Exact K-Means Clustering

Emna Krichene 1, Youssef Masmoudi 1, Adel M

orange.biolab.si A general-purpose open source component-based

Decision Trees for Mining Data Streams

实习生汇报 ——北邮张安迪.

Christoph F. Eick: A Gentle Introduction to Machine Learning

TensorFlow: A System for Large-Scale Machine Learning

Panel on Research Challenges in Big Data

CMPT 733, SPRING 2017 Jiannan Wang

H2O is used by more than 14,000 companies

Carlos Ordonez, Javier Garcia-Garcia,

The Student’s Guide to Apache Spark

The Updated experiment based on LSTM

Using Clustering to Make Prediction Intervals For Neural Networks

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

Machine Learning.

Logistic Regression Geoff Hulten.

Overall Introduction for the Lecture

Presentation transcript:

Interactive Machine Learning with a GPU-Accelerated Toolkit Biye Jiang, Huasha Zhao, John Canny Computer Science Division University of California, Berkeley {bjiang, hzhao, jfc}@cs.berkeley.edu Berkeley Institute of Design Dashboard Model overview (Topic matrix, image cluster centers) Main loss and other evaluation metrics Sliders to change model hyper-parameters Dashboard Other Visualized Metric Cluster centers Pairwise distance (MINST dataset) between clusters Silhouette graph References [1] Huasha Zhao, Biye Jiang and John Canny SAME but Different: Fast and High-Quality Gibbs Parameter Estimation Arxiv 1409.5402 (2014) [2] John Canny and Huasha Zhao. Big data analytics with small footprint: Squaring the cloud. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2013 [3] John Canny and Huasha Zhao BIDMach: Large-scale Learning with Zero Memory Allocation, BIGLearn Workshop, Neural Information Processing Conference (NIPS) 2013 Motivation ML algorithms optimize mathematical criteria People have informal notions of a “good” model Real-world ML applications often involve trade-offs between multiple criteria (business logic): Revenue Advertiser satisfaction User metrics These goals should be addressed during training, not after. Interactive ML allows users to understand the effects of these trade-offs on model quality and structure. Tuning parameter: sizeWeight Metric: Histogram of cluster size Increase sizeWeight Interactive Interface Minibatch learning supports continuous data streaming, model updates happen many times per second. Parameters and visualizations are updated in real time. Algorithms converge after a few updates: Less than 10s to get stable result on MINST8M (20GB) Encoding constraints as mixins functions Users first figure out secondary optimization goal Sparseness, Consistency, Independence Then use its gradient to update the model The mixin weight  is a hyper-parameter, the mixin value is a metric. Users can interactively adjust the controls and see the effects on corresponding metrics Visualization in Browser Web Server BIDMach Grab data from GPU 10 times/s Changed Hyper-Parameter Model overview, Evaluation metrics Using D3.js Each hyper-parameter corresponds to a visualization of the metric Data structure User defined logging. Only used data will be logged. Efficient internal matrix format in BIDMach Using JSON to communicate between server and browser Can easily support deep neural network Temperature/Learning rate control Control the window size for moving average update Control the variance of the Gibbs sampler User defined annealing learning schedule