TensorFlow and Clipper (Lecture 24, cs262a)

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Advertisements

Neural Network I Week 7 1. Team Homework Assignment #9 Read pp. 327 – 334 and the Week 7 slide. Design a neural network for XOR (Exclusive OR) Explore.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Lecture 14 – Neural Networks
September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Tyson Condie.
Classification Part 3: Artificial Neural Networks
Distributed shared memory. What we’ve learnt so far  MapReduce/Dryad as a distributed programming model  Data-flow (computation as vertex, data flow.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
M. Wang, T. Xiao, J. Li, J. Zhang, C. Hong, & Z. Zhang (2014)
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:
BIG DATA/ Hadoop Interview Questions.
Hadoop Javad Azimi May What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data. It includes:
Big data classification using neural network
TensorFlow CS 5665 F16 practicum Karun Joseph, A Reference:
TensorFlow– A system for large-scale machine learning
Deep Learning Software: TensorFlow
Big Data is a Big Deal!.
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Big Data A Quick Review on Analytical Tools
Tensorflow Tutorial Homin Yoon.
Large-scale Machine Learning
Chilimbi, et al. (2014) Microsoft Research
Enabling machine learning in embedded systems
Matt Gormley Lecture 16 October 24, 2016
Deep Learning Platform as a Service
Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for.
Deep Learning Libraries
Intro to NLP and Deep Learning
CS 224S: TensorFlow Tutorial
What is an ANN ? The inventor of the first neuro computer, Dr. Robert defines a neural network as,A human brain like system consisting of a large number.
Classification with Perceptrons Reading:
Intelligent Information System Lab
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Overview of TensorFlow
A VERY Brief Introduction to Convolutional Neural Network using TensorFlow 李 弘
Comparison Between Deep Learning Packages
Introduction to CuDNN (CUDA Deep Neural Nets)
Neural Networks and Backpropagation
INF 5860 Machine learning for image classification
Introduction to Neural Networks
Deep Learning Packages
Introduction to Tensorflow
An open-source software library for Machine Intelligence
CS110: Discussion about Spark
MXNet Internals Cyrus M. Vahid, Principal Solutions Architect,
Some Takeaways… (Lecture 26, cs262a)
Overview of big data tools
CSC 578 Neural Networks and Deep Learning
Long Short Term Memory within Recurrent Neural Networks
Neural Networks Geoff Hulten.
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
Vinit Shah, Joseph Picone and Iyad Obeid
History of Deep Learning 1/16/19
TensorFlow: A System for Large-Scale Machine Learning
EE 193: Parallel Computing
David Kauchak CS51A Spring 2019
Automatic Handwriting Generation
Deep Learning Libraries
MapReduce: Simplified Data Processing on Large Clusters
PYTHON Deep Learning Prof. Muhammad Saeed.
CSC 578 Neural Networks and Deep Learning
Parallel Systems to Compute
Overall Introduction for the Lecture
Presentation transcript:

TensorFlow and Clipper (Lecture 24, cs262a) Ali Ghodsi and Ion Stoica, UC Berkeley April 18, 2018 3-D graph Checklist What we want to enable What we have today How we’ll get there

Today’s lecture Abadi et al., “TensorFlow: A System for Large-Scale Machine Learning”, OSDI 2016 (https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf) Crankshaw et al., “Clipper: A Low-Latency Online Prediction Serving System”, NSDI 2017 (https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/crankshaw)

A short history of Neural Networks 1957: Perceptron (Frank Rosenblatt): one layer network neural network 1959: first neural network to solve a real world problem, i.e., eliminates echoes on phone lines (Widrow & Hoff) 1988: Backpropagation (Rumelhart, Hinton, Williams): learning a multi-layered network

A short history of NNs 1989: ALVINN: autonomous driving car using NN (CMU) 1989: (LeCun) Successful application to recognize handwritten ZIP codes on mail using a “deep” network 2010s: near-human capabilities for image recognition, speech recognition, and language translation

Perceptron Invented by Frank Rosenblatt (1957):  simplified mathematical model of how the neurons in our brains operate From: http://www.andreykurenkov.com/writing/ai/a-brief-history-of-neural-nets-and-deep-learning/

Perceptron Could implement AND, OR, but not XOR From: http://www.andreykurenkov.com/writing/ai/a-brief-history-of-neural-nets-and-deep-learning/

Hidden layers Hidden layers can find features within the data and allow following layers to operate on those features Can implement XOR From: http://www.andreykurenkov.com/writing/ai/a-brief-history-of-neural-nets-and-deep-learning/

Learning: Backpropagation From: http://www.andreykurenkov.com/writing/ai/a-brief-history-of-neural-nets-and-deep-learning/

Context (circa 2015) Deep learning already claiming big successes Imagenet challenge classification task From: http://www.wsdm-conference.org/2016/slides/WSDM2016-Jeff-Dean.pdfz

Context (circa 2015) Deep learning already claiming big successes Number of developers/researchers exploding A “zoo” of tools and libraries, some of questionable quality…

What is TensorFlow? Open source library for numerical computation using data flow graphs Developed by Google Brain Team to conduct machine learning research Based on DisBelief used internally at Google since 2011 “TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms”

What is TensorFlow Key idea: express a numeric computation as a graph Graph nodes are operations with any number of inputs and outputs Graph edges are tensors which flow between nodes

Programming model Neural network with one hidden layer The function we are computing is on the left. Relu is an activation function that takes the max of 0 and your input, applies a nonlinearity to your network. The graph on the right is how this function would look like in a tensorflow graph. Let us break down the nodes in the graph into three types

Programming model Variables are stateful nodes which output their current value. State is retained across multiple executions of a graph (mostly parameters) We have two variables in this graph, the bias and W matrix. What we mean by saying that variables are stateful is that they retain their current value over multiple executions, and it’s easy to restore saved values to variables. Variables have a number of other useful features. They can be saved to disk during and after training, and gradient updates will by default apply over all the variables in your graph Variables are still operations (nodes = operations!). When you run a variable you get the value of its state. The variables in your graph are going to be your parameters

Programming model Placeholders are nodes whose value is fed in at execution time (inputs, labels, …) The second important node type is placeholders. If you have inputs into your network that depend on some external data, you want to be able to build your graph without depending on any real values yet, as those are passed in at run time. Placeholders, unlike variables that have initial assignment, expect as arguments a data type and a shape of a tensor. Inputs, labels

Mathematical operations: Programming model Mathematical operations: MatMul: Multiply two matrices Add: Add elementwise ReLU: Activate with elementwise rectified linear function 0, x <= 0 ReLu(x) = The other nodes are mathematical operations that you are familiar with. Matmul, addition, and relu are all operations in your graph that combine your inputs and parameters to produce new nodes. x, x > 0

Code import tensorflow as tf b = tf.Variable(tf.zeros((100,))) W = tf.Variable(tf.random_uniform((784, 100), -1, 1)) x = tf.placeholder(tf.float32, (1, 784)) h = tf.nn.relu(tf.matmul(x, W) + b)

CPU GPU Running the graph Deploy graph with a session: a binding to a particular execution context (e.g. CPU, GPU) CPU GPU

End-to-end So far: Next: train model Built a graph using variables and placeholders Deploy the graph onto a session, i.e., execution environment Next: train model Define loss function Compute gradients

Defining loss Use placeholder for labels Build loss node using labels and prediction prediction = tf.nn.softmax(...) #Output of neural network label = tf.placeholder(tf.float32, [100, 10]) cross_entropy = -tf.reduce_sum(label * tf.log(prediction), axis=1)

Gradient computation: Backpropagation train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy) tf.train.GradientDescentOptimizer is an Optimizer object tf.train.GradientDescentOptimizer(lr).minimize(cross_entropy) adds optimization operation to computation graph TensorFlow graph nodes have attached gradient operations Gradient with respect to parameters computed with backpropagation … automatically

Design Principles Dataflow graphs of primitive operators Deferred execution (two phases) Define program i.e., symbolic dataflow graph w/ placeholders Executes optimized version of program on set of available devices Common abstraction for heterogeneous accelerators Issue a kernel for execution Allocate memory for inputs and outputs Transfer buffers to and from host memory

Dynamic Flow Control Problem: support ML algos that contain conditional and iterative control flow, e.g. Recurrent Neural Networks (RNNs) Long-Short Term Memory (LSTM) Solution: Add conditional (if statement) and iterative (while loop) programming constructs

TensorFlow high-level architecture Core in C++ Very low overhead Different front ends for specifying/driving the computation Python and C++ today, easy to add more From: http://www.wsdm-conference.org/2016/slides/WSDM2016-Jeff-Dean.pdf

TensorFlow architecture Core in C++ Very low overhead Different front ends for specifying/driving the computation Python and C++ today, easy to add more From: http://www.wsdm-conference.org/2016/slides/WSDM2016-Jeff-Dean.pdf

Detailed architecture From: https://www.tensorflow.org/extend/architecture

Key components Similar to MapReduce, Apache Hadoop, Apache Spark, … From: https://www.tensorflow.org/extend/architecture

Client From: https://www.tensorflow.org/extend/architecture

Master From: https://www.tensorflow.org/extend/architecture

Computation graph partition From: https://www.tensorflow.org/extend/architecture

Computation graph partition From: https://www.tensorflow.org/extend/architecture

Execution From: https://www.tensorflow.org/extend/architecture

Fault Tolerance Assumptions: Fine grain operations: “It is unlikely that tasks will fail so often that individual operations need fault tolerance” ;-) “Many learning algorithms do not require strong consistency” Solution: user-level checkpointing (provides 2 ops) save(): writes one or more tensors to a checkpoint file restore(): reads one or more tensors from a checkpoint file

Discussion Eager vs. deferred (lazy) execution Transparent vs. user-level fault tolerance Easy of use

Discussion OpenMP/Cilk MPI MapReduce / Spark Environment, Assumptions Single node, multiple core, shared memory Supercomputers Sophisticate programmers High performance Hard to scale hardware Commodity clusters Java programmers Programmer productivity Easier, faster to scale up cluster Computation Model Fine-grained task parallelism Message passing Data flow / BSP Strengths Simplifies parallel programming on multi-cores Can write very fast asynchronous code Fault tolerance Weaknesses Still pretty complex, need to be careful about race conditions Easy to end up with non-deterministic code (if not using barriers) Not as high performance as MPI