TensorFlow The Deep Learning Library You Should Be Using
Caveats Ecosystem of Tools High-Level Overview In Development “I should learn how to use this tool” “I know where to start” In Development v1.0 announced last Wednesday Evolving API, limited dev support The magic is in the C Ecosystem of Tools TensorFlow; TF Learn; TensorBoard; XLA Compiler; TensorServing; TF Debug Plus third party development Community Development Google; Reddit; StackOverflow; GitHub; Udacity; Slack
Mathematical Caveats Machine Learning is a pipeline: Neural Network math contains: Forward propagation Backward propagation Parameter optimization Single/Multi-threadedness of packages varies Machine Learning is a pipeline: CPU CPU (system) memory I/O (HDD, SSD, NVME) GPU GPU Memory
Step Back: Neural Networks Why do we use them? They can learn highly complex patterns and apply transformations to features we would never think of Demo How do they work? Forward prop makes predictions Backward prop adjusts parameters Demo on http://playground.tensorflow.org cluster data with 2x2 hidden layers cluster data with no hidden layers concentric circle data with no hidden layers concentric circle data with squared inputs, no hidden layers concentric circle data with 2x2 hidden layers concentric circle data with 2x4 hidden layers
𝑋 → 𝐻 → 𝑌 Forward Prop 𝐻 =𝑟𝑒𝑙𝑢 𝑊 ∗ 𝑋 1 𝑋 2 𝑋 3 + 𝑏 𝑋 → 𝐻 → 𝑌 𝐻 =𝑟𝑒𝑙𝑢 𝑊 ∗ 𝑋 1 𝑋 2 𝑋 3 + 𝑏 𝑌 =𝑟𝑒𝑙𝑢 𝑈 ∗ 𝐻 1 𝐻 2 𝐻 3 𝐻 4 + 𝑐
Forward Prop 𝐻 =𝑟𝑒𝑙𝑢 𝑊 ∗ 𝑋 1 𝑋 2 𝑋 3 + 𝑏 Matrix Multiplication 𝐻 =𝑟𝑒𝑙𝑢 𝑊 ∗ 𝑋 1 𝑋 2 𝑋 3 + 𝑏 Matrix Multiplication Extremely Parallel Changes Dimensionality 𝑚∗𝑛 ∗ 𝑛∗𝑝 =(𝑚∗𝑝) 𝑚∗𝑛 ∗ 3∗1 =(4∗1) 4∗3 ∗ 3∗1 =(4∗1) 𝐻 =𝑟𝑒𝑙𝑢 𝑊 11 𝑊 12 𝑊 13 𝑊 21 𝑊 22 𝑊 23 𝑊 31 𝑊 32 𝑊 33 𝑊 41 𝑊 42 𝑊 43 ∗ 𝑋 1 𝑋 2 𝑋 3 + 𝑏 1 𝑏 2 𝑏 3 𝑏 4
High-Level API
Step Back: SKlearn SciKit Learn allows for rapid machine learning prototyping In Python With an easy API
Parallelization Neural networks are fancy matrix operations Matrix operations run best on GPU Let’s GPU accelerate…
Parallelization We want a library that can run on: CPU and GPU (and TPU!) different types of CPUs and GPUs different operating systems and environments distributed systems all this without modifying code
Performance Gains Hostname Hippie Fireball Frank Cobalt CPU i3-2330M 4 cores @ 2.2GHz E5-2603 12 cores @ 1.6GHz i5-2400 4 cores @ 3.1GHz i7-6800k 12 cores @ 4.0GHz RAM 4 GB 32 GB 8 GB GPU N/A GTX 750 Ti 640 cores @ 1.2GHz 2x GTX 1070 1920 cores @ 1.7GHz CPU Load 400% (98% ea) 1130% (90% ea) 270% (60% ea) 730% (40% ea) System Memory Load 17% 3% 7% GPU Load 49% 34%, 32% GPU Memory Load 96% 96%, 96% Time per Batch 0.95s 0.29s 0.10s 0.04s Time to Complete 325m, 36s 95m, 26s 33m, 37s 13m, 55s
TensorFlow Large-Scale Machine Learning Across Heterogeneous Distributed Systems Normally, we run programs (largely) sequentially TensorFlow: builds a computation graph assigns operations to optimal hardware then runs the computations
TF Learn TensorFlow with an SkLearn API model.fit() model.predict() model.evaluate() model.save() and model.load() GPU accelerated deep learning in 8 lines of Python
Workflow Demo Example notebook Workflow: Are performance advantages significant (6hr vs. 6min)? If you’re training one model, no If you’re training 6000, yes How many should you train? As many as possible Workflow: Preprocess the dataset Identify candidate models Implement candidate models Model generation functions “Dumb” candidate evaluation “Smart” candidate evaluation Example notebook
Now What Get your hands on a TensorFlow compatible GPU (CUDA compute 3.0+) Make friends with UNIX Install tensorflow and tflearn (harder than you think…) Learn more about supported deep learning methods Do the quickstart tutorial: http://tflearn.org/tutorials/quickstart.html Explore some examples: http://tflearn.org/examples/ Explore the documentation too (*gasp*): http://tflearn.org/optimizers/ Follow along with some additional tutorials: Sentdex, Siraj Ravel, Martin Görner