Deep Learning Software: TensorFlow Emre Kavak Technische Universität München Fakultät für Informatik München, 04.06.2017
Content What is a deep learning framework? How do we express machine learning models? Why do TF and co exist? TensorFlow Short facts First-steps Architecture TensorBoard and MNIST Emre Kavak
1. What is a deep learning framework? a) What is deep learning? Recap: What is the aim of machine learning? Given a data set, we want to make predictions for data points that are not observed. E.g.: size, location, year of building price Set up a model and test it Linear algebra, prob. theory and optimization Emre Kavak
1. What is a deep learning framework? a) What is deep learning? Problem: How can my computer decide whether there is a dinosaur or a cat on the picture? Unstructured data + what are my independent variables? What should I look for? Use some kind of unsupervised learning, e.g. neural networks Non-linear transformation between each of them If many (>2) layers deep network Emre Kavak
1. What is a deep learning framework? b) What is a framework? A platform to develop applications Collection of predefined tools, functionalities, ... Handling tasks in the ‚background‘ do not reinvent the wheel Unequal library; framework defines control flow Emre Kavak
1. What is a deep learning framework? How do we express machine learning models? General purpose compuation Machine learning Deep learning Emre Kavak
1. What is a deep learning framework? There are many libraries for scientific computing in Matlab or Python? Why use TensorFlow (Torch, Caffe and co.)? Specialization: Optimizations and computation accelerators for special ML and DL applications Easy to build complex and long computations Distributed execution Visualization (TensorBoard ) Emre Kavak
2. TensorFlow – general facts Develpoed by Google and now it is open-source Mostly used for deep learning applications but is also applicable for various computational tasks Supports different devices: GPU, CPU, ... Build and deploy large-scale applications on distributed machines with different computational units -Google DeepMind developed -Main applications in DL -can utilize CPUs, GPUs and more -large-scale application and distribution Emre Kavak
2. TensorFlow – first steps Define two constants and print them: Result: (<tf.Tensor 'Const_1:0' shape=() dtype=float32>, <tf.Tensor 'Const_2:0' shape=() dtype=float32>) TensorFlow gave us tensors But we want to have constants: Result: [3.0, 4.0] - Python Script - defining two constants and printing them out - create Session and run desired output Emre Kavak
2. TensorFlow – Architecture Questions/observations: User has to tell TensorFlow explicitly to run What is a tensor? Is there any reason behind calling our constants node1 and node2? We used Python -most obvious point: we use python -what is a tensor? -there is a reason for naming our variables nodes Emre Kavak
2. TensorFlow – Architecture Actors Client - first understand main actors of TensorFlow framework just high-level and shortly We were just the user defining our programm node1 = tf.constant(3.0, tf.float32) node2 = tf.constant(4.0) # also tf.float32 implicitly Emre Kavak
2. TensorFlow – User Computational graph: User defines compuation symbolically. Declarative paradigm. Using placeholders and symbolic representatives that are not directly executed Graph: nodes: operations, variables, constants edges: tensors (informal: multidimensional vectors, matrices) - Here it gets obvious why nodes and edges - symbolic/declarative pograming approach Program is data flow graph Adding two nodes Emre Kavak
2. TensorFlow – User Computational graph: User defines compuation symbolically. Declarative paradigm. Using placeholders and symbolic representatives that are not directly executed Graph: nodes: operations, variables, constants edges: tensors (informal: multidimensional vectors, matrices) - Here it gets obvious why nodes and edges - symbolic/declarative pograming approach Program is data flow graph Adding two nodes Emre Kavak
2. TensorFlow – Architecture Master (Coordinator) Actors Client Session.run() Create session (bridge) and call run Master does optimizations, and decides based on certain metrics how to distrubte execution Orchestrates execution node1 = tf.constant(3.0, tf.float32) node2 = tf.constant(4.0) # also tf.float32 implicitly Emre Kavak
TensorFlow – (distributed) Master Prune the computational graph Perform optimizations Distribute graph partitions on avavilable machines and devices Emre Kavak
2. TensorFlow – Architecture Master (Coordinator) Actors Optimizations + (Distribution) Client Session.run() - Code is going to be executed on existing computational units Execute model node1 = tf.constant(3.0, tf.float32) node2 = tf.constant(4.0) # also tf.float32 implicitly CPU GPU Worker (MacBook Pro) Emre Kavak
2. TensorFlow – Architecture Master (Coordinator) Actors Client Session.run() - More general, distributed setting CPU1 GPU1 CPU2 GPU2 GPU CPU Worker Process 1 Worker Process 2 Emre Kavak
2. TensorFlow – Architecture Master (Coordinator) Actors Client Session.run() - first understand main actors of TensorFlow framework - just high-level and shortly CPU1 GPU1 CPU2 GPU2 GPU CPU Worker Process 1 Worker Process 2 Emre Kavak
2. TensorFlow – User A few operations: Stateful Operations - Here it gets obvious why nodes and edges - symbolic/declarative pograming approach Program is data flow graph Adding two nodes Emre Kavak
2. TensorFlow – How can Python be efficient? Actually, Python is not efficient But we only describe our computation with Python Implementations are handled by so called kernels Pre-built code for devices (CPU: C++, GPU:CUDA) New operations need new kernels (if they cannot be composed) Python Client ... Kernel implementations RelU Softmax MatMul ... Emre Kavak
3. TensorFlow – TensorBoard Deep networks can become extremely complex Key feature of TensorFlow: visualize models Emre Kavak
Case Study: MNIST with TensorFlow MNIST is a data set of handwritten numbers Equivalent to ‚Hello World!‘ (but for DL) Given 55,000 training examples of 28x28 pixel images Create vector with 724 (28*28) dimensions to store one picture (flattened). Values are [0,1] describing the pixel density One data point is [1,724] tensor All examples [55000,124] tensor We also have identification tensors that help to compare our results to the actual values [0,0,1,0,0,0,0,0,0,0] [0,0,0,0,0,0,0,0,0,1] Emre Kavak
Sources (Images) https://commons.wikimedia.org/wiki/File:Tensorflow_logo.svg (TensorFlow Logo) https://commons.wikimedia.org/wiki/Category:Images_with_Matlab_source_code#/media/File:Matlab_Logo.png (Matlab Logo) https://commons.wikimedia.org/wiki/File:Python.svg (Python Logo) https://de.wikipedia.org/wiki/R_(Programmiersprache)#/media/File:R_logo.svg (R Logo) https://commons.wikimedia.org/wiki/File:Gnu-octave-logo-cutout.svg (Octave Logo) https://github.com/Theano/Theano/tree/master/doc/images (Theano) https://raw.githubusercontent.com/dmlc/dmlc.github.io/master/img/logo-m/mxnet2.png (MXNet) https://github.com/scikit-learn/scikit-learn/blob/master/doc/images/scikit-learn-logo-notext.png (scikit-learn) Emre Kavak