Data Mining, Neural Network and Genetic Programming

Slides:



Advertisements
Similar presentations
Neural networks Introduction Fitting neural networks
Advertisements

Deep Learning and Neural Nets Spring 2015
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Deep Convolutional Nets
Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Convolutional Neural Network
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Convolutional Neural Networks
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Deep Learning and Its Application to Signal and Image Processing and Analysis Class III - Fall 2016 Tammy Riklin Raviv, Electrical and Computer Engineering.
Neural networks and support vector machines
Big data classification using neural network
Convolutional Neural Network
The Relationship between Deep Learning and Brain Function
Summary of “Efficient Deep Learning for Stereo Matching”
Deep Learning Amin Sobhani.
Data Mining, Neural Network and Genetic Programming
Data Mining, Neural Network and Genetic Programming
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
The Problem: Classification
Classification: Logistic Regression
COMP24111: Machine Learning and Optimisation
Matt Gormley Lecture 16 October 24, 2016
Generative Adversarial Networks
Lecture 24: Convolutional neural networks
Neural Networks CS 446 Machine Learning.
Intelligent Information System Lab
Neural networks (3) Regularization Autoencoder
ECE 6504 Deep Learning for Perception
Training Techniques for Deep Neural Networks
ECE 599/692 – Deep Learning Lecture 6 – CNN: The Variants
Deep Learning Convoluted Neural Networks Part 2 11/13/
CNNs and compressive sensing Theoretical analysis
Introduction to Neural Networks
Image Classification.
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
CS 4501: Introduction to Computer Vision Training Neural Networks II
INF 5860 Machine learning for image classification
ECE 599/692 – Deep Learning Lecture 4 – CNN: Practical Issues
ECE 599/692 – Deep Learning Lecture 5 – CNN: The Representative Power
Artificial Intelligence Chapter 3 Neural Networks
Very Deep Convolutional Networks for Large-Scale Image Recognition
Smart Robots, Drones, IoT
Object Classes Most recent work is at the object level We perceive the world in terms of objects, belonging to different classes. What are the differences.
[Figure taken from googleblog
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
Lecture: Deep Convolutional Neural Networks
Deep Learning for Non-Linear Control
Artificial Intelligence Chapter 3 Neural Networks
Tuning CNN: Tips & Tricks
Machine Learning – Neural Networks David Fenyő
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Artificial Intelligence Chapter 3 Neural Networks
Machine learning overview
Image Classification & Training of Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Neural networks (3) Regularization Autoencoder
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Introduction to Neural Networks
Natalie Lang Tomer Malach
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Data Mining, Neural Network and Genetic Programming Deep Learning and Transfer Learning for Object Recognition Yi Mei Yi.mei@ecs.vuw.ac.nz

Outline Object Recognition Deep Learning Overview Recipes of deep learning Automated ANN Architecture Design Transfer Learning DLID DeCAF DAN

Object Recognition Object recognition usually refers to object classification. Sometimes refers to the whole procedure that finds objects in large pictures. Object detection also has other meanings or interpretations. Object Recognition Object Classification Object Localization One-class object detection Multi-class object detection

Methods for Object Detection Object Recognition Neural Methods Genetic Methods Genetic Algorithms Genetic Programming Classification only Classification and localisation Classification only Classification and localisation Feed Forward Networks Share Weight Networks SOMs High Order Networks Deep Learning ……

Deep Learning Machine learning algorithms based on learning multiple levels of representation / abstraction. Fig: I. Goodfellow

Deep Learning Has been successful in many areas: object recognition, object detection, speech recognition, natural language processing, …

Deep Learning LeNet: 7 layers [LeCun et al. 1998] Subsampling (Max-pooling)

Deep Learning Subsampling will not change the object Over-subsampling

Deep Learning AlexNet: 8 layers [Krizhevsky et al. 2012] Similar to LeNet but Bigger model (7 hidden layers, 650k units, 60M params) Error: 15.315% for ImageNet 2012 challenge (No. 1) image-net.org

Deep Learning VGGNet: 19 layers [Simonyan and Zisserman 2014] Error: 7.32% for ImageNet 2014 challenge

Deep Learning GoogLeNet: 22 layers [Szegedy et al. 2014] Error: 6.67% for ImageNet 2014 challenge (No. 1)

Deep Learning ResNet: 152 layers [He et al. 2015] Error: 3.57% for ImageNet 2015 challenge (No. 1)

Deep Learning

Automated CNN Architecture Design Manually designing CNN architecture requires a lot of domain knowledge and trial-and-errors Use genetic programming to automatically evolve an architecture

Automated CNN Architecture Design Cartisian GP

Automated CNN Architecture Design Functions (Operators) CovBlock (shift 1 with padding, keep input size) ResBlock (shift 1 with padding, keep input size) Max (Average) pooling (2x2 filter, shift 2) Summation (element-wise addition) Constrain the architecture Sum two feature maps with the same size

Automated CNN Architecture Design Results ConvSet much better than VGG ResSet much better than ResNet

Automated CNN Architecture Design Using ResSet can evolve much simpler architecture

Why Deep Learning is Hard? Vanishing Gradient

Recipes of Deep Learning Mini-batch (online learning) Proper loss function: Cross entropy New activation function: ReLU Adaptive learning rate Regularisation (Weight decay) Dropout Data augmentation

Mini-batch Offline learning updates weights once after using all the training examples in one epoch Online learning updates weight after using each training examples Mini-batch splits the training examples into a number of batches, and updates weights after using each batch E.g. 100 examples in a mini-batch Online learning Mini-batch Offline learning

Mini-batch Objective function (error) changes from one batch to another We are not minimising the real error Can get better performance (cross-validation?)

Proper Loss Function 2-layer network (1 hidden layer) Cross entropy Square error

Rectified Linear Unit (ReLU) Fast to compute Fix gradient (1) when positive No learning when negative

Rectified Linear Unit (ReLU)

Rectified Linear Unit (ReLU)

Adaptive Learning Rate NN performance heavily relies on the learning rate (step size) Even with the correct direction (gradient), it is unknown how far we should go along that direction

Adaptive Learning Rate Decrease learning rate over time At the beginning, we are far from the destination, so we use larger learning rate After several epochs, we are close to the destination, so we reduce the learning rate Adagrad Smaller derivatives, larger learning rate, and vice versa

Regularisation L2 (Weight decay) Prevent weights from going too large Make more weights near zero, and thus can be ignored 0.99~

Dropout Each time before updating the weights, each neuron has p% to dropout The network structure is changed For each mini-batch, we resample the dropout neurons

Dropout Train a set (ensemble) of different networks Test by averaging the output of all trained networks Cannot get y1, y2, y3, y4, …

Dropout Approximation: Test with full network If a neuron has p% to dropout, then at test time its weights are multiplied by (1-p)% Approximation

Data Augmentation Create synthetic training images by transformations Rotation, scaling, flipping, cropping, noise, …

Transfer Learning/Domain Adaptation Use knowledge learned from past (source domain) to help solve the problem at hand (target domain)

Transfer Learning/Domain Adaptation Standard supervised learning assumes that the training and test examples (x, y) are drawn i.i.d from a distribution In domain adaptation, the source and target domains have different but related distributions and One can extract/use knowledge from more than one source domains at the same time Unsupervised: labelled and unlabelled source examples + unlabelled target examples Semi-supervised: also consider a small set of labelled target examples Supervised: all examples are labelled

Transfer Learning/Domain Adaptation DLID (Deep Learning by Interpolation between Domain) DeCAF DAN (Deep Adaptation Networks)

Dataset Amazon Dslr Webcam

DLID [Chopra et al. 2013] Discrete interpolation from source domain to target domain Different proportions of training examples from source and target domains for each interpolation point Feature extraction using unsupervised trainer New feature repn by combining all extracted features

DeCAF [Donahue et al. ICML 2014] 1. Take the AlexNet trained on ImageNet 2. Do feed-forward operation using AlexNet on new images 3. Get the 6th or 7th layer activations as “features” (DeCAF6 and DeCAF7) 4. Apply any classifier (e.g. SVM and logistic regression)

DAN [Long et al. ICML 2015 ] Fine tune the AlexNet trained on ImageNet Freeze layers 1-3 Fine-tune layers 4-5 Similar distribution in the source and target domain under the hidden representations in layers 6-8 Regulariser, computed by MK-MMD

Summary Deep learning is hard (Standard BP cannot work well) Vanishing gradient problem Many good ideas in deep learning (e.g. mini-batch, cross-entropy, ReLU, adaptive learning rate, regularisation, dropout, data augmentation) Transfer learning/Domain adaptation is a trending direction (similar to human learning) Strategies in transfer learning Learn shared hidden representations (e.g. DLID) Share features (e.g. DeCAF, DAN)