Declarative Transfer Learning from Deep CNNs at Scale

Slides:

Advertisements

Similar presentations

Shark:SQL and Rich Analytics at Scale

Advertisements

SDN Controller Challenges

When Data Management Systems Meet Approximate Hardware: Challenges and Opportunities Author: Bingsheng He (Nanyang Technological University, Singapore)

A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses George Candea (EPFL & Aster Data) Neoklis Polyzotis (UC Santa Cruz) Radek Vingralek.

Spark: Cluster Computing with Working Sets

Shimin Chen Big Data Reading Group.  Energy efficiency of: ◦ Single-machine instance of DBMS ◦ Standard server-grade hardware components ◦ A wide spectrum.

Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.

CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.

Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.

MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.

Next Generation of Apache Hadoop MapReduce Arun C. Murthy - Hortonworks Founder and Architect Formerly Architect, MapReduce.

Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.

Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.

A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.

Task Scheduling for Highly Concurrent Analytical and Transactional Main-Memory Workloads Iraklis Psaroudakis (EPFL), Tobias Scheuer (SAP AG), Norman May.

Emalayan Vairavanathan

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Goodbye rows and tables, hello documents and collections.

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.

Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.

Operating System for the Cloud Runs applications in the cloud Provides Storage Application Management Windows Azure ideal for applications needing:

ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.

Data Tagging Architecture for System Monitoring in Dynamic Environments Bharat Krishnamurthy, Anindya Neogi, Bikram Sengupta, Raghavendra Singh (IBM Research.

Fine-grained Partitioning for Aggressive Data Skipping Liwen Sun, Michael J. Franklin, Sanjay Krishnan, Reynold S. Xin† UC Berkeley and †Databricks Inc.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.

Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.

FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.

Using Deduplicating Storage for Efficient Disk Image Deployment Xing Lin, Mike Hibler, Eric Eide, Robert Ricci University of Utah.

Wenchi MA CV Group EECS,KU 03/20/2017

Presented by: Omar Alqahtani Fall 2016

Analysis of Sparse Convolutional Neural Networks

About Hadoop Hadoop was one of the first popular open source big data technologies. It is a scalable fault-tolerant system for processing large datasets.

By Chris immanuel, Heym Kumar, Sai janani, Susmitha

SLAQ: Quality-Driven Scheduling for Distributed Machine Learning

Distributed Network Traffic Feature Extraction for a Real-time IDS

Chilimbi, et al. (2014) Microsoft Research

Spark Presentation.

Introduction to NewSQL

PA an Coordinated Memory Caching for Parallel Jobs

Hadoop Clusters Tess Fulkerson.

Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.

Year 2 Updates.

Applying Twister to Scientific Applications

MapReduce Simplied Data Processing on Large Clusters

Author: Ahmed Eldawy, Mohamed F. Mokbel, Christopher Jonathan

Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

Object Recognition & Detection

CMPT 733, SPRING 2016 Jiannan Wang

Early Results of Deep Learning on the Stampede2 Supercomputer

A Comparative Study of Convolutional Neural Network Models with Rosenblatt’s Brain Model Abu Kamruzzaman, Atik Khatri , Milind Ikke, Damiano Mastrandrea,

CS110: Discussion about Spark

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

Dog/Cat Classifier Christina Stiff.

Data-Intensive Computing: From Clouds to GPU Clusters

Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz

(A Research Proposal for Optimizing DBMS on CMP)

Yi Wang, Wei Jiang, Gagan Agrawal

Tuning CNN: Tips & Tricks

TensorFlow: A System for Large-Scale Machine Learning

Sketch Object Prediction

Mohammad Samragh Mojan Javaheripi Farinaz Koushanfar

Supporting Online Analytics with User-Defined Estimation and Early Termination in a MapReduce-Like Framework Yi Wang, Linchuan Chen, Gagan Agrawal The.

Accelerating Regular Path Queries using FPGA

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

Declarative Transfer Learning from Deep CNNs at Scale Supun Nakandala and Arun Kumar {snakanda, arunkk}@eng.ucsd.edu

Growth of unstructured data Source: ETC@USC Data growth mainly driven by unstructured data Source: 2017 Kaggle Survey Dark Data unstructured structured e-Commerce Healthcare Social Media 2

Opportunity: CNN Deep Convolution Neural Networks (CNN) provide opportunities to holistically integrate image data with analytics pipelines.

CNN: Hierarchical Feature Extractors Low level features Mid level features High level features

CNN: Training Limitations “Transfer Learning” mitigates these limitations Lot of labelled training data Lot of compute power Time consuming “Dark art” of hyperparameter tuning

Outline Example and Motivations Our System Vista Experimental Evaluation

Transfer Learning: CNNs for the other 90% Pre-trained CNN Image Data Train ML Model Evaluate Brand Tags Price Structured Data

Transfer Learning: Bottleneck Which layer will result in the best accuracy? Image Data Train ML Model Evaluate Brand Tags Price Structured Data

Transfer Learning: Current Practice + Efficient CNN inference - Doesn’t support scalable processing + Scalable processing + Fault tolerant - No support for CNNs Image Data Train ML Model Evaluate Brand Tags Price Structured Data

Problems with Current Practice Usability: Manual management of CNN features. Efficiency: From image inference for all feature layers has computational redundancies. Reliability: CNN layers are big, requires careful memory configuration. Disk spills System crashes! l1 l2 l3

Outline Example and Motivations Our System Vista Overview System Architecture System Optimizations Experimental Evaluation

Pre-trained CNN and layers of Vista: Overview Vista is a declarative system for scalable feature transfer from deep CNNs for multimodal analytics. Vista takes in: l1 l2 l3 Brand Tags Price Structured Data Image Data Pre-trained CNN and layers of interest ML model Vista optimizes the CNN feature transfer workload and reliably runs it.

Outline Example and Motivations Our System Vista Overview System Architecture System Optimizations Experimental Evaluation

Vista: Architecture Benefit: Usability Declarative API Pre-trained CNNs Optimizer Logical Plan Optimizations Physical Plan Optimizations System Configuration Optimizations Benefit: Efficiency and Reliability Benefit: Efficiency, Scalability, and Fault Tolerance

Outline Our System Vista System Optimizations Logical Plan Optimizations Physical Plan Optimizations System Configuration Optimizations

Current Practice: Repeated Inference Lazy Materialization ML Model Problem: Repeated inferences ⋈ Multimodal Features {S, l1} Structured Data {S} Image Features {l1} l1 l2 l3 Images

Extract all layers in one go ML Model {S, l1} ML Model {S, l2} ML Model {S, l3} Eager Materialization Problem: High Memory Footprint - disk spills/cache misses - system crashes! ⋈ Multimodal Features {S, l1, l2, l3} Structured Data {S} Image Features {l1, l2, l3} l1 l2 l3 Images

Our Novel Plan: Staged CNN Inference ML Model ML Model l1 l2 l3 Partial CNN Operator Multimodal Features {S, l2} ⋈ Multimodal Features {S, l1} Staged Materialization Structured Data {S} Image Features {l1} Benefits: No redundant computations, minimum memory footprint l1 l2 l3 Images

Our Novel Plan: Staged CNN Inference ML Model ML Model l1 l2 l3 Partial CNN Operator Multimodal Features {S, l2} ⋈ Multimodal Features {S, l1} Staged Materialization Problem: High join overhead 14 KB 784 KB Structured Data {S} Image Features {l1} Benefits: No redundant computations, minimum memory footprint l1 l2 l3 Images

Our Novel Plan: Staged CNN Inference - Reordered ML Model ML Model Partial CNN Operator Multimodal Features {S, l2} l1 l2 l3 Multimodal Features {S, l1} Structured Data {S} ⋈ Staged Materialization l1 l2 l3 Image Features {l1} Benefits: No redundant computations, minimum memory footprint Problem: High join overhead Images

Outline Our System Vista System Optimizations Logical Plan Optimizations Physical Plan Optimizations System Configuration Optimizations

Vista: Physical Plan Optimizations Join Operator Options: Broadcast vs Shuffle join Trade-Offs: Memory footprint vs Network cost Storage Format Options: Compressed vs Uncompressed Trade-Offs: Memory footprint vs Compute cost Benefit: Vista automatically picks the physical plan choices.

Outline Our System Vista System Optimizations Logical Plan Optimizations Physical Plan Optimizations System Configuration Optimizations

Vista: System Configuration Optimizations Memory allocation Query parallelism Data partition size

Memory Allocation Challenge: Default configurations won’t work CNN features are big Non trivial CNN model inference memory In-Memory Storage Core Memory User OS Reserved CNN Inference UDF Execution Query Processing e.g. join processing Store intermediate data CNN model footprints Benefit: Vista frees the data scientist from manual memory and system configuration tuning.

Query Parallelism and Data Partition Size Increase Query Parallelism  Increase CNN Inference Memory  Less Storage Memory Benefit: Vista sets Query Parallelism to improve utilization and reduce disk spills. Data Partition Size Too big  System Crash, Too small  High overhead Benefit: Vista sets optimal data partition size to reduce overheads and avoid crashes.

Outline Example and Motivations Our System Vista Experimental Evaluation

Runs in standalone mode Experimental Setup Intel Xeon @ 2.00GHz CPU with 8 cores 8 worker nodes and 1 master node 300 GB HDD 32 GB RAM Version 2.2.0 Runs in standalone mode Version 1.3.0

Dataset & Workloads Amazon product reviews dataset Pre-trained CNNs: Number of Records 200,000 Number of Structured Features 200 – price, category embedding, review embedding Image Image of each product item Target Predict each product is popular or not ML model: Logistic Regression for 10 iterations Pre-trained CNNs: AlexNet – Last 4 layers VGG16 – Last 3 layers ResNet50 – Last 5 layers

End-to-end reliability and efficiency System crashes due to mismanaged memory Runtime (min) 10X X System crashes Experimental results for other data systems, datasets, and drill-down experiments can be found in our paper.

Project Webpage: https://adalabucsd.github.io/vista.html Summary of Vista Declarative system for scalable feature transfer from CNNs. Performs DBMS inspired logical plan, physical plan, and system configuration optimizations. Improves efficiency by up to 90% and avoids unexpected system crashes. Thank You! Project Webpage: https://adalabucsd.github.io/vista.html