Data Mining, Neural Network and Genetic Programming

Slides:

Advertisements

Similar presentations

Neural networks Introduction Fitting neural networks

Advertisements

Deep Learning and Neural Nets Spring 2015

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.

Deep Convolutional Nets

Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.

Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.

Convolutional Neural Network

Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Convolutional Neural Networks

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Machine Learning Supervised Learning Classification and Regression

Deep Learning and Its Application to Signal and Image Processing and Analysis Class III - Fall 2016 Tammy Riklin Raviv, Electrical and Computer Engineering.

Neural networks and support vector machines

Big data classification using neural network

Convolutional Neural Network

The Relationship between Deep Learning and Brain Function

Summary of “Efficient Deep Learning for Stereo Matching”

Deep Learning Amin Sobhani.

Data Mining, Neural Network and Genetic Programming

Data Mining, Neural Network and Genetic Programming

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

The Problem: Classification

Classification: Logistic Regression

COMP24111: Machine Learning and Optimisation

Matt Gormley Lecture 16 October 24, 2016

Generative Adversarial Networks

Lecture 24: Convolutional neural networks

Neural Networks CS 446 Machine Learning.

Intelligent Information System Lab

Neural networks (3) Regularization Autoencoder

ECE 6504 Deep Learning for Perception

Training Techniques for Deep Neural Networks

ECE 599/692 – Deep Learning Lecture 6 – CNN: The Variants

Deep Learning Convoluted Neural Networks Part 2 11/13/

CNNs and compressive sensing Theoretical analysis

Introduction to Neural Networks

Image Classification.

Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.

CS 4501: Introduction to Computer Vision Training Neural Networks II

INF 5860 Machine learning for image classification

ECE 599/692 – Deep Learning Lecture 4 – CNN: Practical Issues

ECE 599/692 – Deep Learning Lecture 5 – CNN: The Representative Power

Artificial Intelligence Chapter 3 Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Smart Robots, Drones, IoT

Object Classes Most recent work is at the object level We perceive the world in terms of objects, belonging to different classes. What are the differences.

[Figure taken from googleblog

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

Lecture: Deep Convolutional Neural Networks

Deep Learning for Non-Linear Control

Artificial Intelligence Chapter 3 Neural Networks

Tuning CNN: Tips & Tricks

Machine Learning – Neural Networks David Fenyő

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

Artificial Intelligence Chapter 3 Neural Networks

Machine learning overview

Image Classification & Training of Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

Neural networks (3) Regularization Autoencoder

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Introduction to Neural Networks

Natalie Lang Tomer Malach

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Artificial Intelligence Chapter 3 Neural Networks

Presentation transcript:

Data Mining, Neural Network and Genetic Programming Deep Learning and Transfer Learning for Object Recognition Yi Mei Yi.mei@ecs.vuw.ac.nz

Outline Object Recognition Deep Learning Overview Recipes of deep learning Automated ANN Architecture Design Transfer Learning DLID DeCAF DAN

Object Recognition Object recognition usually refers to object classification. Sometimes refers to the whole procedure that finds objects in large pictures. Object detection also has other meanings or interpretations. Object Recognition Object Classification Object Localization One-class object detection Multi-class object detection

Methods for Object Detection Object Recognition Neural Methods Genetic Methods Genetic Algorithms Genetic Programming Classification only Classification and localisation Classification only Classification and localisation Feed Forward Networks Share Weight Networks SOMs High Order Networks Deep Learning ……

Deep Learning Machine learning algorithms based on learning multiple levels of representation / abstraction. Fig: I. Goodfellow

Deep Learning Has been successful in many areas: object recognition, object detection, speech recognition, natural language processing, …

Deep Learning LeNet: 7 layers [LeCun et al. 1998] Subsampling (Max-pooling)

Deep Learning Subsampling will not change the object Over-subsampling

Deep Learning AlexNet: 8 layers [Krizhevsky et al. 2012] Similar to LeNet but Bigger model (7 hidden layers, 650k units, 60M params) Error: 15.315% for ImageNet 2012 challenge (No. 1) image-net.org

Deep Learning VGGNet: 19 layers [Simonyan and Zisserman 2014] Error: 7.32% for ImageNet 2014 challenge

Deep Learning GoogLeNet: 22 layers [Szegedy et al. 2014] Error: 6.67% for ImageNet 2014 challenge (No. 1)

Deep Learning ResNet: 152 layers [He et al. 2015] Error: 3.57% for ImageNet 2015 challenge (No. 1)

Deep Learning

Automated CNN Architecture Design Manually designing CNN architecture requires a lot of domain knowledge and trial-and-errors Use genetic programming to automatically evolve an architecture

Automated CNN Architecture Design Cartisian GP

Automated CNN Architecture Design Functions (Operators) CovBlock (shift 1 with padding, keep input size) ResBlock (shift 1 with padding, keep input size) Max (Average) pooling (2x2 filter, shift 2) Summation (element-wise addition) Constrain the architecture Sum two feature maps with the same size

Automated CNN Architecture Design Results ConvSet much better than VGG ResSet much better than ResNet

Automated CNN Architecture Design Using ResSet can evolve much simpler architecture

Why Deep Learning is Hard? Vanishing Gradient

Recipes of Deep Learning Mini-batch (online learning) Proper loss function: Cross entropy New activation function: ReLU Adaptive learning rate Regularisation (Weight decay) Dropout Data augmentation

Mini-batch Offline learning updates weights once after using all the training examples in one epoch Online learning updates weight after using each training examples Mini-batch splits the training examples into a number of batches, and updates weights after using each batch E.g. 100 examples in a mini-batch Online learning Mini-batch Offline learning

Mini-batch Objective function (error) changes from one batch to another We are not minimising the real error Can get better performance (cross-validation?)

Proper Loss Function 2-layer network (1 hidden layer) Cross entropy Square error

Rectified Linear Unit (ReLU) Fast to compute Fix gradient (1) when positive No learning when negative

Rectified Linear Unit (ReLU)

Rectified Linear Unit (ReLU)

Adaptive Learning Rate NN performance heavily relies on the learning rate (step size) Even with the correct direction (gradient), it is unknown how far we should go along that direction

Adaptive Learning Rate Decrease learning rate over time At the beginning, we are far from the destination, so we use larger learning rate After several epochs, we are close to the destination, so we reduce the learning rate Adagrad Smaller derivatives, larger learning rate, and vice versa

Regularisation L2 (Weight decay) Prevent weights from going too large Make more weights near zero, and thus can be ignored 0.99~

Dropout Each time before updating the weights, each neuron has p% to dropout The network structure is changed For each mini-batch, we resample the dropout neurons

Dropout Train a set (ensemble) of different networks Test by averaging the output of all trained networks Cannot get y1, y2, y3, y4, …

Dropout Approximation: Test with full network If a neuron has p% to dropout, then at test time its weights are multiplied by (1-p)% Approximation

Data Augmentation Create synthetic training images by transformations Rotation, scaling, flipping, cropping, noise, …

Transfer Learning/Domain Adaptation Use knowledge learned from past (source domain) to help solve the problem at hand (target domain)

Transfer Learning/Domain Adaptation Standard supervised learning assumes that the training and test examples (x, y) are drawn i.i.d from a distribution In domain adaptation, the source and target domains have different but related distributions and One can extract/use knowledge from more than one source domains at the same time Unsupervised: labelled and unlabelled source examples + unlabelled target examples Semi-supervised: also consider a small set of labelled target examples Supervised: all examples are labelled

Transfer Learning/Domain Adaptation DLID (Deep Learning by Interpolation between Domain) DeCAF DAN (Deep Adaptation Networks)

Dataset Amazon Dslr Webcam

DLID [Chopra et al. 2013] Discrete interpolation from source domain to target domain Different proportions of training examples from source and target domains for each interpolation point Feature extraction using unsupervised trainer New feature repn by combining all extracted features

DeCAF [Donahue et al. ICML 2014] 1. Take the AlexNet trained on ImageNet 2. Do feed-forward operation using AlexNet on new images 3. Get the 6th or 7th layer activations as “features” (DeCAF6 and DeCAF7) 4. Apply any classifier (e.g. SVM and logistic regression)

DAN [Long et al. ICML 2015 ] Fine tune the AlexNet trained on ImageNet Freeze layers 1-3 Fine-tune layers 4-5 Similar distribution in the source and target domain under the hidden representations in layers 6-8 Regulariser, computed by MK-MMD

Summary Deep learning is hard (Standard BP cannot work well) Vanishing gradient problem Many good ideas in deep learning (e.g. mini-batch, cross-entropy, ReLU, adaptive learning rate, regularisation, dropout, data augmentation) Transfer learning/Domain adaptation is a trending direction (similar to human learning) Strategies in transfer learning Learn shared hidden representations (e.g. DLID) Share features (e.g. DeCAF, DAN)