Matteo Fischetti, University of Padova

Slides:

Advertisements

Similar presentations

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.

Advertisements

Support Vector Machines

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y Harshit Maheshwari Pulkit Jain Sayantan Marik

Machine Learning Neural Networks

Lecture 14 – Neural Networks

Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.

Neural Networks Chapter Feed-Forward Neural Networks.

A Decomposition Heuristic for Stochastic Programming Natashia Boland +, Matteo Fischetti*, Michele Monaci*, Martin Savelsbergh + + Georgia Institute of.

Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.

Appendix B: An Example of Back-propagation algorithm

Thinning out Steiner Trees

An informal description of artificial neural networks John MacCormick.

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

Yang, Luyu.  Postal service for sorting mails by the postal code written on the envelop  Bank system for processing checks by reading the amount of.

Dünaamiliste süsteemide modelleerimine Identification for control in a nonlinear system world Eduard Petlenkov, Automaatikainstituut, 2013.

Handwritten digit recognition

Integer LP In-class Prob

Analysis of Classification Algorithms In Handwritten Digit Recognition Logan Helms Jon Daniele.

Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.

Intersection Cuts for Bilevel Optimization

CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.

Essential components of the implementation are:  Formation of the network and weight initialization routine  Pixel analysis of images for symbol detection.

Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.

1 Convolutional neural networks Abin - Roozgard. 2  Introduction  Drawbacks of previous neural networks  Convolutional neural networks  LeNet 5 

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Neural networks and support vector machines

Handwritten Digit Recognition Using Stacked Autoencoders

Neural Network Architecture Session 2

Learning Deep Generative Models by Ruslan Salakhutdinov

From Mixed-Integer Linear to Mixed-Integer Bilevel Linear Programming

Artificial Neural Networks

Exact Algorithms for Mixed-Integer Bilevel Linear Programming

Role and Potential of TAs for Industrial Scheduling Problems

Applications of Deep Learning and how to get started with implementation of deep learning Presentation By : Manaswi Advisor : Dr.Chinmay.

Neural Networks CS 446 Machine Learning.

Classification with Perceptrons Reading:

Matteo Fischetti, University of Padova

Intelligent Information System Lab

Understanding the Difficulty of Training Deep Feedforward Neural Networks Qiyue Wang Oct 27, 2017.

Random walk initialization for training very deep feedforward networks

Deep Learning and Mixed Integer Optimization

#post_modern Branch-and-Cut Implementation

Non-linear hypotheses

Neural Networks Advantages Criticism

Research Interests.

Recurrent Neural Networks

Introduction to Deep Learning with Keras

A Kaggle Project By Ryan Bambrough

network of simple neuron-like computing elements

8-3 RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang,

Convolutional neural networks Abin - Roozgard.

Emre O. Neftci iScience Volume 5, Pages (July 2018) DOI: /j.isci

Matteo Fischenders, University of Padova

CSC 578 Neural Networks and Deep Learning

Optimization for Fully Connected Neural Network for FPGA application

Deep Learning and Mixed Integer Optimization

On Convolutional Neural Network

Matteo Fischetti, University of Padova

Autoencoders hi shea autoencoders Sys-AI.

CSC321: Neural Networks Lecture 11: Learning in recurrent networks

Intersection Cuts from Bilinear Disjunctions

Introduction to Neural Networks

Deep Neural Networks as Scientific Models

Modeling IDS using hybrid intelligent systems

EE 193/Comp 150 Computing with Biological Parts

CSC 578 Neural Networks and Deep Learning

Intersection Cuts for Quadratic Mixed-Integer Optimization

Presentation transcript:

Deep Neural Networks as 0-1 Mixed Integer Linear Programs: A feasibility study Matteo Fischetti, University of Padova Jason Jo, Montreal Institute for Learning Algorithms (MILA) CPAIOR 2018, Delft, June 2018

Machine Learning Example (MIPpers only!): Continuous 0-1 Knapack Problem with a fixed n. of items CPAIOR 2018, Delft, June 2018

Implementing the ? in the box CPAIOR 2018, Delft, June 2018

Implementing the ? in the box #differentiable_programming (Yann LeCun) CPAIOR 2018, Delft, June 2018

Deep Neural Networks (DNNs) Parameters w’s are organized in a layered feed-forward network (DAG = Directed Acyclic Graph) Each node (or “neuron”) makes a weighted sum of the outputs of the previous layer  no “flow splitting/conservation” here! CPAIOR 2018, Delft, June 2018

Role of nonlinearities We want to be able to play with a huge n. of parameters, but if everything stays linear we actually have n+1 parameters only  we need nonlinearities somewhere! Zooming into neurons we see the nonlinear “activation functions” Each neuron acts as a linear SVM, however … … its output is not interpreted immediately … … but it becomes a new feature … #automatic_feature_detection … to be forwarded to the next layer for further analysis #SVMcascade CPAIOR 2018, Delft, June 2018

Modeling a DNN with fixed param.s Assume all the parameters (weights/biases) of the DNN are fixed We want to model the computation that produces the output value(s) as a function of the inputs, using a MINLP #MIPpersToTheBone Each hidden node corresponds to a summation followed by a nonlinear activation function CPAIOR 2018, Delft, June 2018

Modeling ReLU activations Recent work on DNNs almost invariably only use ReLU activations Easily modeled as plus the bilinear condition or, alternatively, the indicator constraints CPAIOR 2018, Delft, June 2018

A complete 0-1 MILP CPAIOR 2018, Delft, June 2018

Adversarial problem: trick the DNN … CPAIOR 2018, Delft, June 2018

… by changing few well-chosen pixels CPAIOR 2018, Delft, June 2018

Experiments on small DNNs The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems We considered the following (small) DNNs and trained each of them to get a fair accuracy (93-96%) on the test-set CPAIOR 2018, Delft, June 2018

Computational experiments Instances: 100 MNIST training figures (each with its “true” label 0..9) Goal: Change some of the 28x28 input pixels (real values in 0-1) to convert the true label d into (d + 5) mod 10 (e.g., “0”  “5”, “6”  “1”) Metric: L1 norm (sum of the abs. differences original-modified pixels) MILP solver: IBM ILOG CPLEX 12.7 (as black box) Basic model: only obvious bounds on the continuous var.s Improved model: apply a MILP-based preprocessing to compute tight lower/upper bounds on all the continuous variables, as in P. Belotti, P. Bonami, M. Fischetti, A. Lodi, M. Monaci, A. Nogales-Gomez, and D. Salvagnin. On handling indicator constraints in mixed integer programming. Computational Optimization and Applications, (65):545–566, 2016. CPAIOR 2018, Delft, June 2018

Differences between the two models CPAIOR 2018, Delft, June 2018

Effect of bound-tightening preproc. CPAIOR 2018, Delft, June 2018

Reaching 1% optimality CPAIOR 2018, Delft, June 2018

Thanks for your attention! Slides available at http://www.dei.unipd.it/~fisch/papers/slides/ Paper: M. Fischetti, J. Jo, "Deep Neural Networks as 0-1 Mixed Integer Linear Programs: A Feasibility Study", Constraints 23(3), 296-309, 2018. . CPAIOR 2018, Delft, June 2018