Presented By :- Ankur Mali IST 597

Slides:

Advertisements

Similar presentations

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Advertisements

Lecture 13 – Perceptrons Machine Learning March 16, 2010.

The loss function, the normal equation,

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Linear Regression  Using a linear function to interpolate the training set  The most popular criterion: Least squares approach  Given the training set:

Mathematical formulation XIAO LIYING. Mathematical formulation.

Classification / Regression Neural Networks 2

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Linear Discrimination Reading: Chapter 2 of textbook.

Non-Bayes classifiers. Linear discriminants, neural networks.

Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.

LogisticRegression Regularization AI Lab 방 성 혁. Job Flow [ex2data.txt] Format [118 by 3] mapFeature function [ mapped_x ][ y ] Format [118 by 15] … [118.

Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.

Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.

Multinomial Regression and the Softmax Activation Function Gary Cottrell.

Back Propagation and Representation in PDP Networks

RNNs: An example applied to the prediction task

Zheng ZHANG 1-st year PhD candidate Group ILES, LIMSI

Getting started with TensorBoard

Tensorflow Tutorial Homin Yoon.

Computer Science and Engineering, Seoul National University

Class 6 Mini Presentations - part

Deep Learning Libraries

Intro to NLP and Deep Learning

CS 224S: TensorFlow Tutorial

Coding in Python and Basics of Tensorflow

A Simple Artificial Neuron

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Classification with Perceptrons Reading:

Overview of TensorFlow

PH2150 Scientific Computing Skills

A brief introduction to neural network

Classification / Regression Neural Networks 2

Tensorflow in Deep Learning

Introduction to TensorFlow

RNNs: Going Beyond the SRN in Language Prediction

Tensorflow in Deep Learning

Tensorflow in Deep Learning

Introduction to Tensorflow

Presented by Xinxin Zuo 10/20/2017

Lecture 11. MLP (III): Back-Propagation

An open-source software library for Machine Intelligence

CSCI B609: “Foundations of Data Science”

Interpolation and curve fitting

Deep Neural Networks (DNN)

Neural Networks Geoff Hulten.

Vinit Shah, Joseph Picone and Iyad Obeid

Overview of Neural Network Architecture Assignment Code

The loss function, the normal equation,

IST 597 Tensorflow Tutorial Recurrent Neural Networks

Mathematical Foundations of BME Reza Shadmehr

Tensorflow Tutorial Presented By :- Ankur Mali

Softmax Classifier.

Backpropagation Disclaimer: This PPT is modified based on

Machine Learning Project

Artificial Intelligence 10. Neural Networks

Backpropagation David Kauchak CS159 – Fall 2019.

Tensorflow Lecture 박 영 택 컴퓨터학부.

Introduction to TensorFlow

Review By Artineer.

The Updated experiment based on LSTM

Deep Learning Libraries

Getting started with TensorBoard

Deep Learning with TensorFlow

Logistic Regression Geoff Hulten.

An introduction to neural network and machine learning

Machine Learning for Cyber

Presentation transcript:

Presented By :- Ankur Mali IST 597 Tensorflow Tutorial Presented By :- Ankur Mali IST 597

Numerical Computation Overflow and Underflow >>> import numpy as np >>> a = np.array([2**63 - 1, 2**63 - 1], dtype=int) >>> a array([9223372036854775807, 9223372036854775807]) >>> a.dtype dtype('int64') >>>a + 1 array([-9223372036854775808, -9223372036854775808]) >>>a.sum() -2 import numpy as np b= np.array([2**61-1,2**61-1],dtype=int) array([2305843009213693951, 2305843009213693951]) b+1 array([2305843009213693952, 2305843009213693952]) b.sum() 4611686018427387902

Floating point error x = 1e9 eps = 1e-6 for _ in xrange(int(1e6)): ... x += eps >>> print x 1000000000.95 If subtraction 999999999.046 x = 1e9 eps = 1e-6 for _ in xrange(int(1e6)): x += eps x -= eps print x

Stochastic Gradient Descent SGD optimization on loss surface contours[ruder.io]

Stochastic Gradient Descent in Tensorflow def sgd(cost, params, lr=tf.float32(0.002)): g_params = tf.gradients(cost, params) updates = [] for param, g_param in zip(params, g_params): updates.append(param.assign(param - lr*g_param)) return updates

SGD with clip def sgd_clip(cost, params, lr=tf.float32(0.002), thld=tf.float32(0.99989)): g_params = tf.gradients(cost, params) updates = [] for param, g_param in zip(params, g_params): g_param = tf.where(tf.greater(tf.abs(g_param), thld), thld/tf.norm(g_param,ord=1)*g_param, g_param) updates.append(param.assign(param - lr*g_param)) return updates

Momentum def momentum(cost, params, lr=tf.float32(0.003), gamma=tf.float32(0.900015)): g_params = tf.gradients(cost, params) updates = [] for param, g_param in zip(params, g_params): v = tf.Variable(np.zeros(param.get_shape(), dtype='float32'), name='v') updates.append(v.assign(gamma*v - lr*g_param)) with tf.control_dependencies(updates): updates.append(param.assign(param + v)) return updates

RMSProp def rmsprop(cost, params, lr=tf.float32(0.002), gamma=tf.float32(0.900078019), eps=tf.float32(1e-8)): g_params = tf.gradients(cost, params) updates = [] for param, g_param in zip(params, g_params): ms_g = tf.Variable(np.zeros(param.get_shape(), dtype='float32'), name='ms_g') updates.append(ms_g.assign(gamma*ms_g + (1. - gamma)*g_param**2)) with tf.control_dependencies(updates): updates.append(param.assign(param - lr/tf.sqrt(ms_g + eps)*g_param)) return updates

Adam def adam(cost, params, alpha=tf.float32(0.002), beta_1=tf.float32(0.900005000178), beta_2=tf.float32(0.998999912), eps=tf.float32(1e-8)): g_params = tf.gradients(cost, params) t = tf.Variable(0.0, dtype=tf.float32, name='t') updates = [] updates.append(t.assign(t + 1)) with tf.control_dependencies(updates): for param, g_param in zip(params, g_params): m = tf.Variable(np.zeros(param.get_shape(), dtype='float32'), name='m') v = tf.Variable(np.zeros(param.get_shape(), dtype='float32'), name='v') alpha_t = alpha*tf.sqrt(1. - beta_2**t)/(1. - beta_1**t) updates.append(m.assign(beta_1*m + (1. - beta_1)*g_param)) updates.append(v.assign(beta_2*v + (1. - beta_2)*g_param**2)) updates.append(param.assign(param - alpha_t*m/(tf.sqrt(v) + eps))) return updates Improving Generalization Performance by Switching from Adam to SGD

Linear Regression in tensorflow Part 1 Import library and define parameters for training from __future__ import print_function import tensorflow as tf import numpy import matplotlib.pyplot as plt rng = numpy.random # Parameters learning_rate = 0.01 training_epochs = 1000 display_step = 50

Part-2 # Training Data train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167, 7.042,10.791,5.313,7.997,5.654,9.27,3.1]) train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221, 2.827,3.465,1.65,2.904,2.42,2.94,1.3]) n_samples = train_X.shape[0] # tf Graph Input X = tf.placeholder("float") Y = tf.placeholder("float") # Set model weights W = tf.Variable(rng.randn(), name="weight") b = tf.Variable(rng.randn(), name="bias")

Part 3 # Construct a linear model pred = tf.add(tf.multiply(X, W), b) # Mean squared error cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples) # Gradient descent # Note, minimize() knows to modify W and b because Variable objects are trainable=True by default optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) # Initialize the variables (i.e. assign their default value) init = tf.global_variables_initializer()

Part 4 # Start training with tf.Session() as sess: # Run the initializer sess.run(init) # Fit all training data for epoch in range(training_epochs): for (x, y) in zip(train_X, train_Y): sess.run(optimizer, feed_dict={X: x, Y: y}) # Display logs per epoch step if (epoch+1) % display_step == 0: c = sess.run(cost, feed_dict={X: train_X, Y:train_Y}) print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \ "W=", sess.run(W), "b=", sess.run(b)) print("Optimization Finished!") training_cost = sess.run(cost, feed_dict={X: train_X, Y: train_Y}) print("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')

Part 5 Display and test model plt.plot(train_X, train_Y, 'ro', label='Original data') plt.plot(train_X, sess.run(W) * train_X + sess.run(b), label='Fitted line') plt.legend() plt.show() # Testing example, as requested (Issue #2) test_X = numpy.asarray([6.83, 4.668, 8.9, 7.91, 5.7, 8.7, 3.1, 2.1]) test_Y = numpy.asarray([1.84, 2.273, 3.2, 2.831, 2.92, 3.24, 1.35, 1.03]) print("Testing... (Mean square loss Comparison)") testing_cost = sess.run( tf.reduce_sum(tf.pow(pred - Y, 2)) / (2 * test_X.shape[0]), feed_dict={X: test_X, Y: test_Y}) # same function as cost above print("Testing cost=", testing_cost) print("Absolute mean square loss difference:", abs( training_cost - testing_cost)) plt.plot(test_X, test_Y, 'bo', label='Testing data')

Output

Second Order Optimization data_x = [0., 1., 2.] data_y = [-1., 1., 3.] batch_size = len(data_x) x = tf.placeholder(shape=[batch_size], dtype=tf.float32, name="x") y = tf.placeholder(shape=[batch_size], dtype=tf.float32, name="y") W = tf.Variable(tf.ones(shape=[1]), dtype=tf.float32, name="W") b = tf.Variable(tf.zeros(shape=[1]), dtype=tf.float32, name="b")

Part 2 pred = x * W + b loss = tf.reduce_mean(0.5 * (y - pred)**2) # Preprocessings to the weight update wrt_variables = [W, b] grads = tf.gradients(loss, wrt_variables) hess = tf.hessians(loss, wrt_variables) inv_hess = [tf.matrix_inverse(h) for h in hess]

Part 3 # 2nd order weights update rule. update_directions = [ - tf.reduce_sum(h) * g for h, g in zip(inv_hess, grads) ] op_apply_updates = [ v.assign_add(up) for v, up in zip(wrt_variables, update_directions)

Part 4 sess = tf.Session() sess.run(tf.global_variables_initializer()) # First loss initial_loss = sess.run( loss, feed_dict={ x: data_x, y: data_y } ) print("Initial loss:", initial_loss)

Part 5 ('Prediction:', array([-0.99999994, 1. , 3. ], dtype=float32)) ('Expected:', [-1.0, 1.0, 3.0]) for iteration in range(100): new_loss, _ = sess.run( [loss, op_apply_updates], feed_dict={ x: data_x, y: data_y } ) print("Loss after iteration {}: {}".format(iteration, new_loss)) # Results: print("Prediction:", sess.run(pred, feed_dict={x: data_x})) print("Expected:", data_y)

Questions?