The Updated experiment based on LSTM

Slides:



Advertisements
Similar presentations
NEURAL NETWORKS Backpropagation Algorithm
Advertisements

Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
Neural networks Introduction Fitting neural networks
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Distributed Representations of Sentences and Documents
Collaborative Filtering Matrix Factorization Approach
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Back-Propagation MLP Neural Network Optimizer ECE 539 Andrew Beckwith.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
M Machine Learning F# and Accord.net.
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Distributed Pattern Recognition System, Web-based by Nadeem Ahmed.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Machine Learning Supervised Learning Classification and Regression
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Stanford University.
RNNs: An example applied to the prediction task
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Environment Generation with GANs
National Taiwan University
Machine Learning & Deep Learning
Computer Science and Engineering, Seoul National University
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Classification with Perceptrons Reading:
Intro to NLP and Deep Learning
CS 188: Artificial Intelligence
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks
RNNs: Going Beyond the SRN in Language Prediction
Lecture 11. MLP (III): Back-Propagation
Image Captions With Deep Learning Yulia Kogan & Ron Shiff
Collaborative Filtering Matrix Factorization Approach
Logistic Regression & Parallel SGD
Final Presentation: Neural Network Doc Summarization
Large Scale Support Vector Machines
Word embeddings based mapping
Deep Neural Networks (DNN)
Word embeddings based mapping
Neural Networks Geoff Hulten.
Other Classification Models: Recurrent Neural Network (RNN)
Deep Learning for Non-Linear Control
Vinit Shah, Joseph Picone and Iyad Obeid
The experiments based on CNN
Semantic Similarity Detection
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Softmax Classifier.
RNNs: Going Beyond the SRN in Language Prediction
Neural networks (1) Traditional multi-layer perceptrons
Attention.
Machine learning overview
CS639: Data Management for Data Science
Image Classification & Training of Neural Networks
The experiments based on word-embedding and SVM
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Attention for translation
Automatic Handwriting Generation
Introduction to Neural Networks
Batch Normalization.
The experiments based on Recurrent Neural Networks
LHC beam mode classification
Artificial Neural Network learning
The experiment based on hier-attention
Patterson: Chap 1 A Review of Machine Learning
ONNX Training Discussion
Presentation transcript:

The Updated experiment based on LSTM 2018-11-06 Raymond ZHAO Wenlong

Content Introduction The Updated experiments based on LSTM (long short-Term memory) TODO

A large Screen Size laptop Introduction Develop a new product configuration approach in e-commerce industry to elicit customer needs Collect online user reviews (laptop) as inputs query-to-attributes mapping: map user inputs (the functional requirements in unstructured query) into product parameters or features (structured attributes) Text classification => Similar to Sentiment Classification (SentiC) on Stanford Sentiment Treebank of movie reviews A large Screen Size laptop

The Updated experiments epoch = 4 (in this experiment) generally defined as "one pass over the entire training dataset" (reference from keras) But Why we use more than one Epoch? The data (in ML) is too big to feed to the computer at once we divide it in number of batches (each step-> update the weights based on loss function)

Gradient Descent Alg (Reference from quora) (Reference from quora) When the data is too big (in ML) and we cannot pass all the data to the computer at one epoch (once). => divide it in number of batches, give it to our computer batch by batch and update the weights of the neural networks at the end of every step to fit it to the data given. A limited dataset (batches) and an iterative optimization algs (like SGD, AdaGrad) used in ML to find the best results (minima of a curve). Loss function is decreasing while learning rate parameter in Gradient Descent Algs becomes more smaller by the shorter size of steps (Reference from quora) (Reference from quora)

Gradient Descent Alg The experiment on our server Why we use more than one Epoch?

Epoch (Reference from quora) Update the weights with one epoch (single pass) is not enough use a limited dataset (batches) and to optimise the learning and the graph we are using Gradient Descent algs which is an iterative process one epoch leads to underfitting of the curve in the graph Need to pass the full dataset multiple times to the same neural network As the number of epochs increases, more number of times the weight are changed in the neural network and the curve goes from underfitting to optimal to overfitting curve. => What is the right numbers of epochs? <- From the experiments based on your data (Reference from quora)

TODO ALL Experiments RNN-LSTM LSTM with attention

Thanks