Matteo Fischetti, University of Padova

Slides:



Advertisements
Similar presentations
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Advertisements


Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y Harshit Maheshwari Pulkit Jain Sayantan Marik
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Neural Networks Chapter Feed-Forward Neural Networks.
A Decomposition Heuristic for Stochastic Programming Natashia Boland +, Matteo Fischetti*, Michele Monaci*, Martin Savelsbergh + + Georgia Institute of.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Appendix B: An Example of Back-propagation algorithm
Thinning out Steiner Trees
An informal description of artificial neural networks John MacCormick.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Yang, Luyu.  Postal service for sorting mails by the postal code written on the envelop  Bank system for processing checks by reading the amount of.
Dünaamiliste süsteemide modelleerimine Identification for control in a non- linear system world Eduard Petlenkov, Automaatikainstituut, 2013.
Handwritten digit recognition
Integer LP In-class Prob
Analysis of Classification Algorithms In Handwritten Digit Recognition Logan Helms Jon Daniele.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Intersection Cuts for Bilevel Optimization
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Essential components of the implementation are:  Formation of the network and weight initialization routine  Pixel analysis of images for symbol detection.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
1 Convolutional neural networks Abin - Roozgard. 2  Introduction  Drawbacks of previous neural networks  Convolutional neural networks  LeNet 5 
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Neural networks and support vector machines
Handwritten Digit Recognition Using Stacked Autoencoders
Neural Network Architecture Session 2
Learning Deep Generative Models by Ruslan Salakhutdinov
From Mixed-Integer Linear to Mixed-Integer Bilevel Linear Programming
Artificial Neural Networks
Exact Algorithms for Mixed-Integer Bilevel Linear Programming
Role and Potential of TAs for Industrial Scheduling Problems
Applications of Deep Learning and how to get started with implementation of deep learning Presentation By : Manaswi Advisor : Dr.Chinmay.
Neural Networks CS 446 Machine Learning.
Classification with Perceptrons Reading:
Matteo Fischetti, University of Padova
Intelligent Information System Lab
Understanding the Difficulty of Training Deep Feedforward Neural Networks Qiyue Wang Oct 27, 2017.
Random walk initialization for training very deep feedforward networks
Deep Learning and Mixed Integer Optimization
#post_modern Branch-and-Cut Implementation
Non-linear hypotheses
Neural Networks Advantages Criticism
Research Interests.
Recurrent Neural Networks
Introduction to Deep Learning with Keras
A Kaggle Project By Ryan Bambrough
network of simple neuron-like computing elements
8-3 RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang,
Convolutional neural networks Abin - Roozgard.
Emre O. Neftci  iScience  Volume 5, Pages (July 2018) DOI: /j.isci
Matteo Fischenders, University of Padova
CSC 578 Neural Networks and Deep Learning
Optimization for Fully Connected Neural Network for FPGA application
Deep Learning and Mixed Integer Optimization
On Convolutional Neural Network
Matteo Fischetti, University of Padova
Autoencoders hi shea autoencoders Sys-AI.
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
Intersection Cuts from Bilinear Disjunctions
Introduction to Neural Networks
Deep Neural Networks as Scientific Models
Modeling IDS using hybrid intelligent systems
EE 193/Comp 150 Computing with Biological Parts
CSC 578 Neural Networks and Deep Learning
Intersection Cuts for Quadratic Mixed-Integer Optimization
Presentation transcript:

Deep Neural Networks as 0-1 Mixed Integer Linear Programs: A feasibility study Matteo Fischetti, University of Padova Jason Jo, Montreal Institute for Learning Algorithms (MILA) CPAIOR 2018, Delft, June 2018

Machine Learning Example (MIPpers only!): Continuous 0-1 Knapack Problem with a fixed n. of items CPAIOR 2018, Delft, June 2018

Implementing the ? in the box CPAIOR 2018, Delft, June 2018

Implementing the ? in the box #differentiable_programming (Yann LeCun) CPAIOR 2018, Delft, June 2018

Deep Neural Networks (DNNs) Parameters w’s are organized in a layered feed-forward network (DAG = Directed Acyclic Graph) Each node (or “neuron”) makes a weighted sum of the outputs of the previous layer  no “flow splitting/conservation” here! CPAIOR 2018, Delft, June 2018

Role of nonlinearities We want to be able to play with a huge n. of parameters, but if everything stays linear we actually have n+1 parameters only  we need nonlinearities somewhere! Zooming into neurons we see the nonlinear “activation functions” Each neuron acts as a linear SVM, however … … its output is not interpreted immediately … … but it becomes a new feature … #automatic_feature_detection … to be forwarded to the next layer for further analysis #SVMcascade CPAIOR 2018, Delft, June 2018

Modeling a DNN with fixed param.s Assume all the parameters (weights/biases) of the DNN are fixed We want to model the computation that produces the output value(s) as a function of the inputs, using a MINLP #MIPpersToTheBone Each hidden node corresponds to a summation followed by a nonlinear activation function CPAIOR 2018, Delft, June 2018

Modeling ReLU activations Recent work on DNNs almost invariably only use ReLU activations Easily modeled as plus the bilinear condition or, alternatively, the indicator constraints CPAIOR 2018, Delft, June 2018

A complete 0-1 MILP CPAIOR 2018, Delft, June 2018

Adversarial problem: trick the DNN … CPAIOR 2018, Delft, June 2018

… by changing few well-chosen pixels CPAIOR 2018, Delft, June 2018

Experiments on small DNNs The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems We considered the following (small) DNNs and trained each of them to get a fair accuracy (93-96%) on the test-set CPAIOR 2018, Delft, June 2018

Computational experiments Instances: 100 MNIST training figures (each with its “true” label 0..9) Goal: Change some of the 28x28 input pixels (real values in 0-1) to convert the true label d into (d + 5) mod 10 (e.g., “0”  “5”, “6”  “1”) Metric: L1 norm (sum of the abs. differences original-modified pixels) MILP solver: IBM ILOG CPLEX 12.7 (as black box) Basic model: only obvious bounds on the continuous var.s Improved model: apply a MILP-based preprocessing to compute tight lower/upper bounds on all the continuous variables, as in P. Belotti, P. Bonami, M. Fischetti, A. Lodi, M. Monaci, A. Nogales-Gomez, and D. Salvagnin. On handling indicator constraints in mixed integer programming. Computational Optimization and Applications, (65):545–566, 2016. CPAIOR 2018, Delft, June 2018

Differences between the two models CPAIOR 2018, Delft, June 2018

Effect of bound-tightening preproc. CPAIOR 2018, Delft, June 2018

Reaching 1% optimality CPAIOR 2018, Delft, June 2018

Thanks for your attention! Slides available at http://www.dei.unipd.it/~fisch/papers/slides/ Paper: M. Fischetti, J. Jo, "Deep Neural Networks as 0-1 Mixed Integer Linear Programs: A Feasibility Study", Constraints 23(3), 296-309, 2018. . CPAIOR 2018, Delft, June 2018