University of Missouri, Department of Computer Science University of Missouri, Informatics Institute Sean Lander, Master’s Candidate An Evolutionary Method.

Slides:



Advertisements
Similar presentations
A brief review of non-neural-network approaches to deep learning
Advertisements

NEURAL NETWORKS Backpropagation Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Artificial Neural Networks
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
ImageNet Classification with Deep Convolutional Neural Networks
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
Machine Learning Neural Networks
Neural Nets Using Backpropagation Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Artificial Neural Networks ML Paul Scheible.
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
Artificial Neural Networks
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Artificial Neural Networks
LOGO Classification III Lecturer: Dr. Bo Yuan
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
Artificial Neural Networks
Classification Part 3: Artificial Neural Networks
Integrating Neural Network and Genetic Algorithm to Solve Function Approximation Combined with Optimization Problem Term presentation for CSC7333 Machine.
C. Benatti, 3/15/2012, Slide 1 GA/ICA Workshop Carla Benatti 3/15/2012.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
Machine Learning Chapter 4. Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Classification / Regression Neural Networks 2
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
A Simulated-annealing-based Approach for Simultaneous Parameter Optimization and Feature Selection of Back-Propagation Networks (BPN) Shih-Wei Lin, Tsung-Yuan.
 Based on observed functioning of human brain.  (Artificial Neural Networks (ANN)  Our view of neural networks is very simplistic.  We view a neural.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Intro. ANN & Fuzzy Systems Lecture 13. MLP (V): Speed Up Learning.
1 Technological Educational Institute Of Crete Department Of Applied Informatics and Multimedia Intelligent Systems Laboratory.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
An Evolutionary Algorithm for Neural Network Learning using Direct Encoding Paul Batchis Department of Computer Science Rutgers University.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Evolutionary Computation Evolving Neural Network Topologies.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Machine Learning Supervised Learning Classification and Regression
Stanford University.
Fall 2004 Backpropagation CS478 - Machine Learning.
Artificial Neural Networks
Deep Learning Amin Sobhani.
Learning with Perceptrons and Neural Networks
Matt Gormley Lecture 16 October 24, 2016
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Structure learning with deep autoencoders
Classification / Regression Neural Networks 2
Prof. Carolina Ruiz Department of Computer Science
Synaptic DynamicsII : Supervised Learning
Artificial Neural Networks
Neural Networks Geoff Hulten.
Lecture Notes for Chapter 4 Artificial Neural Networks
Representation Learning with Deep Auto-Encoder
Artificial Neural Networks
Prof. Carolina Ruiz Department of Computer Science
Presentation transcript:

University of Missouri, Department of Computer Science University of Missouri, Informatics Institute Sean Lander, Master’s Candidate An Evolutionary Method for Training Autoencoders for Deep Learning Networks MASTER’S THESIS DEFENSE SEAN LANDER ADVISOR: YI SHANG

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Agenda o Overview o Background and Related Work o Methods o Performance and Testing o Results o Conclusion and Future Work

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Agenda o Overview o Background and Related Work o Methods o Performance and Testing o Results o Conclusion and Future Work

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Overview Deep Learning classification/reconstruction o Since 2006, Deep Learning Networks (DLNs) have changed the landscape of classification problems o Strong ability to create and utilize abstract features o Easily lends itself to GPU and distributed systems o Does not require labeled data – VERY IMPORTANT o Can be used for feature reduction and classification

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Overview Problem and proposed solution o Problems with DLNs: o Costly to train with large data sets or high feature spaces o Local minima systemic with Artificial Neural Networks o Hyper-parameters must be hand selected o Proposed Solutions: o Evolutionary based approach with local search phase o Increased chance of global minimum o Optimizes structure based on abstracted features o Data partitions based on population size (large data only) o Reduced training time o Reduced chances of overfitting

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Agenda o Overview o Background and Related Work o Methods o Performance and Testing o Results o Conclusion and Future Work

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Background Perceptrons o Started with Perceptron in 1950 o Only capable of linear separability o Failed on XOR

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Background Artificial Neural Networks (ANNs) o ANNs went out of favor until the Multilayer Perceptron (MLP) introduced o Pro: Non-linear classification o Con: Time consuming o Advance in training: Backpropagation o Increased training speeds o Limited to shallow networks o Error propagation diminishes a number of layers increase

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Background Backpropagation using Gradient Descent

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Background Deep Learning Networks (DLNs) o Allows for deep networks with multiple layers o Layers pre-trained using unlabeled data o Layers are “stacked” and fine tuned o Minimizes error degradation for deep neural networks (many layers) o Still costly to train o Manual selection of hyper-parameters o Local, not global, minimum

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Background Autoencoders for reconstruction o Autoencoders can be used for feature reduction and clustering o “Classification error” is the ability to reconstruct the sample input o Abstracted features – output from the hidden layer – can be used to replace raw input for other techniques

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Related Work Evolutionary and genetic ANNs o First use of Genetic Algorithms (GAs) in 1989 o Two layer ANN on a small data set o Tested multiple types of chromosomal encodings and mutation types o Late 1990s and early 2000s introduced other techniques o Multi-level mutations and mutation priority o Addition of local search in each generation o Inclusion of hyper-parameters as part of the mutation o Issue of competing conventions starts to appear o Two ANNs produce the same results by sharing the same nodes but in a permuted order

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Related Work Hyper-parameter selection for DLNs o Majority of the work explored using newer technologies and methods such as GPU and distributed (MapReduce) training o Improved versions of Backpropagation, such as Conjugated Gradient or Limited Memory BFGS were tested under different conditions o Most conclusions pointed toward manual parameter selection via trial-and-error

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Agenda o Overview o Background and Related Work o Methods o Performance and Testing o Results o Conclusion and Future Work

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Method 1 Evolutionary Autoencoder (EvoAE) o IDEA: Autoencoders’ power are in their feature abstraction, the hidden node output o Training many AEs will make more potential abstracted features o Best AEs will contain the best features o Joining these features should create a better AE

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Method 1 Evolutionary Autoencoder (EvoAE) o A population of autoencoders (AEs) is initialized with a semi-random number of hidden nodes o Each AE is trained for a small number of epochs using Backpropagation o The AEs are ranked based on reconstruction error o The top AEs are selected for crossover o New AEs are mutated based on mutation rate, with an even chance at gaining or losing a node o New nodes are selected randomly from the population, not randomly initialized o Continue until convergence criteria is met

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Method 1 Evolutionary Autoencoder (EvoAE) xx’ A3 A4 A1 A2 hx B3 C2 B1 B2 h Initialization Local Search x’ Crossover Mutation

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Method 1A Distributed learning and Mini-batches o Training of generic EvoAE increases in time linearly to the size of the population o ANN training time increases drastically with data size o To combat this, mini-batches can be used where each AE is trained against a batch and updated o Batch size << total data

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Method 1A Distributed learning and Mini-batches o EvoAE lends itself to distributed system o Data duplication and storage now an issue due to data duplication Train Forward propagation Backpropagation Rank Calculate error Sort GA Crossover Mutate Batch 1 Batch 2 … Batch N

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Method 2 EvoAE Evo-batches o IDEA: When data is large, small batches can be representative o Prevents overfitting as nodes being trained are almost always introduced to new data o Scales well with large amounts of data even when parallel training is not possible o Works well on limited memory systems by increasing size of the population, thus reducing data per batch o Quick training of large populations, equivalent to training a single autoencoder using traditional methods

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Method 2 EvoAE Evo-batches Data AData BData CData D Data C Data B Data A Original Data Local Search CrossoverMutate

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Agenda o Overview o Background and Related Work o Methods o Performance and Testing o Results o Conclusion and Future Work

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Performance and Testing Hardware and testing parameters o Lenovo Y500 laptop o Intel i7 3 rd generation 2.4GHz o 12 GB RAM o All weights randomly initialized to N(0,0.5) ParameterWineIrisHeart DiseaseMNIST Hidden Size Hidden Std DevNULL 80 Hidden +/-16 6NULL Mutation Rate0.1 ParameterDefaults Learning Rate0.1 Momentum2 Weight Decay0.003 Population Size30 Generations50 Epochs/Gen20 Train/Validate80/20

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Performance and Testing Baseline Learning rate Learning rate * 0.1 o Baseline is a single AE with 30 random initializations o Two learning rates to create two baseline measurements o Base learning rate o Learning rate * 0.1

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Performance and Testing Data partitioning o Three data partitioning methods were used o Full data o Mini-batch o Evo-batch Full data Mini-batch Evo-batch

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Performance and Testing Post-training configurations o Post-training run in the following ways o Full data (All) o Batch data (Batch) o None Full data Batch data None All sets below are using the Evo-batch configuration

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Agenda o Overview o Background and Related Work o Methods o Performance and Testing o Results o Conclusion and Future Work

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Parameters Review ParameterWineMNIST Hidden Size32200 Hidden Std DevNULL80 Hidden +/-16NULL Mutation Rate0.1 ParameterDefaults Learning Rate0.1 Momentum2 Weight Decay0.003 Population Size30 Generations50 Epochs/Gen20 Train/Validate80/20

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Datasets o UCI wine dataset o 178 samples o 13 features o 3 classes o Reduced MNIST dataset o 6000/1000 and 24k/6k training/testing samples o 784 features o 10 classes (0-9)

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Small datasets - UCI Wine ParameterWine Hidden Size32 Hidden Std DevNULL Hidden +/-16 Mutation Rate0.1

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Small datasets - UCI Wine o Best error-to-speed: o Baseline 1 o Best overall error: o Full data All o Full data is fast on small scale data o Evo- and mini-batch not good on small scale data ParameterWine Hidden Size32 Hidden Std DevNULL Hidden +/-16 Mutation Rate0.1

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Small datasets – MNIST 6k/1k ParameterMNIST Hidden Size200 Hidden Std Dev80 Hidden +/-NULL Mutation Rate0.1

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Small datasets – MNIST 6k/1k o Best error-to-time: o Mini-batch None o Best overall error: o Mini-batch Batch o Full data slows exponentially on large scale data o Evo- and mini-batch close to baseline speed ParameterMNIST Hidden Size200 Hidden Std Dev80 Hidden +/-NULL Mutation Rate0.1

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Medium datasets – MNIST 24k/6k ParameterMNIST Hidden Size200 Hidden Std Dev80 Hidden +/-NULL Mutation Rate0.1

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Medium datasets – MNIST 24k/6k o Best error-to-time: o Evo-batch None o Best overall error: o Evo-batch Batch OR o Mini-batch Batch o Full data too slow to run on dataset o EvoAE w/ population 30 trains as quickly as a single baseline AE when using Evo-batch ParameterMNIST Hidden Size200 Hidden Std Dev80 Hidden +/-NULL Mutation Rate0.1

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Agenda o Overview o Background and Related Work o Methods o Performance and Testing o Results o Conclusion and Future Work

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Conclusions Good for large problems o Traditional methods are still preferred choice for small problems and toy problems o EvoAE with Evo-batch produces effective and efficient feature reduction given a large volume of data o EvoAE is robust against poorly-chosen hyper-parameters, specifically learning rate

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Future Work o Immediate goals: o Transition to distributed system, MapReduce based or otherwise o Harness GPU technology for increased speeds (~50% in some cases) o Long term goals: o Open the system for use by novices and non-programmers o Make the system easy to use and transparent to the user for both modification and training purposes

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Thank you

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Background Backpropagation with weight decay

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Background Conjugated Gradient Descent

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Background Architecture and hyper-parameters

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Small datasets – UCI Iris o The UCI Iris dataset has 150 samples with 4 features and 3 classes o Best error-to-speed: o Baseline 1 o Best overall error: o Full data None ParameterIris Hidden Size32 Hidden Std DevNULL Hidden +/-16 Mutation Rate0.1

Sean Lander, Master’s CandidateUniversity of Missouri, Department of Computer Science Results Small datasets – UCI Heart Disease o The UCI Heart Disease dataset has 297 samples with 13 features and 5 classes o Best error-to-time: o Baseline 1 o Best overall error: o Full data None ParameterHeart Disease Hidden Size12 Hidden Std DevNULL Hidden +/-6 Mutation Rate0.1