The Updated experiment based on LSTM

Slides:

Advertisements

Similar presentations

NEURAL NETWORKS Backpropagation Algorithm

Advertisements

Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.

Neural networks Introduction Fitting neural networks

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Distributed Representations of Sentences and Documents

Collaborative Filtering Matrix Factorization Approach

11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering

8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.

Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.

Back-Propagation MLP Neural Network Optimizer ECE 539 Andrew Beckwith.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.

M Machine Learning F# and Accord.net.

Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.

Distributed Pattern Recognition System, Web-based by Nadeem Ahmed.

BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.

Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.

Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.

Machine Learning Supervised Learning Classification and Regression

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Stanford University.

RNNs: An example applied to the prediction task

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

Environment Generation with GANs

National Taiwan University

Machine Learning & Deep Learning

Computer Science and Engineering, Seoul National University

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Classification with Perceptrons Reading:

Intro to NLP and Deep Learning

CS 188: Artificial Intelligence

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

RNNs: Going Beyond the SRN in Language Prediction

Lecture 11. MLP (III): Back-Propagation

Image Captions With Deep Learning Yulia Kogan & Ron Shiff

Collaborative Filtering Matrix Factorization Approach

Logistic Regression & Parallel SGD

Final Presentation: Neural Network Doc Summarization

Large Scale Support Vector Machines

Word embeddings based mapping

Deep Neural Networks (DNN)

Word embeddings based mapping

Neural Networks Geoff Hulten.

Other Classification Models: Recurrent Neural Network (RNN)

Deep Learning for Non-Linear Control

Vinit Shah, Joseph Picone and Iyad Obeid

The experiments based on CNN

Semantic Similarity Detection

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

Softmax Classifier.

RNNs: Going Beyond the SRN in Language Prediction

Neural networks (1) Traditional multi-layer perceptrons

Machine learning overview

CS639: Data Management for Data Science

Image Classification & Training of Neural Networks

The experiments based on word-embedding and SVM

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

Attention for translation

Automatic Handwriting Generation

Introduction to Neural Networks

Batch Normalization.

The experiments based on Recurrent Neural Networks

LHC beam mode classification

Artificial Neural Network learning

The experiment based on hier-attention

Patterson: Chap 1 A Review of Machine Learning

ONNX Training Discussion

Presentation transcript:

The Updated experiment based on LSTM 2018-11-06 Raymond ZHAO Wenlong

Content Introduction The Updated experiments based on LSTM (long short-Term memory) TODO

A large Screen Size laptop Introduction Develop a new product configuration approach in e-commerce industry to elicit customer needs Collect online user reviews (laptop) as inputs query-to-attributes mapping: map user inputs (the functional requirements in unstructured query) into product parameters or features (structured attributes) Text classification => Similar to Sentiment Classification (SentiC) on Stanford Sentiment Treebank of movie reviews A large Screen Size laptop

The Updated experiments epoch = 4 (in this experiment) generally defined as "one pass over the entire training dataset" (reference from keras) But Why we use more than one Epoch? The data (in ML) is too big to feed to the computer at once we divide it in number of batches (each step-> update the weights based on loss function)

Gradient Descent Alg (Reference from quora) (Reference from quora) When the data is too big (in ML) and we cannot pass all the data to the computer at one epoch (once). => divide it in number of batches, give it to our computer batch by batch and update the weights of the neural networks at the end of every step to fit it to the data given. A limited dataset (batches) and an iterative optimization algs (like SGD, AdaGrad) used in ML to find the best results (minima of a curve). Loss function is decreasing while learning rate parameter in Gradient Descent Algs becomes more smaller by the shorter size of steps (Reference from quora) (Reference from quora)

Gradient Descent Alg The experiment on our server Why we use more than one Epoch?

Epoch (Reference from quora) Update the weights with one epoch (single pass) is not enough use a limited dataset (batches) and to optimise the learning and the graph we are using Gradient Descent algs which is an iterative process one epoch leads to underfitting of the curve in the graph Need to pass the full dataset multiple times to the same neural network As the number of epochs increases, more number of times the weight are changed in the neural network and the curve goes from underfitting to optimal to overfitting curve. => What is the right numbers of epochs? <- From the experiments based on your data (Reference from quora)

TODO ALL Experiments RNN-LSTM LSTM with attention

Thanks