Download presentation
Presentation is loading. Please wait.
1
Semantic Similarity Detection
Pankaj Kabra
2
Executive Summary This project intends to solve the problem of “semantic similarity” detection in a pair of texts. “semantic similarity” = similar in meaning(may or may not be similar in actual text) A Siamese Neural Network is built for the task of similarity detection using tensorflow and keras. The data is analyzed pre and post modelling and presented I have been able to achieve an accuracy of ~82% and according to Quora’s official blog post, the maximum achieved accuracy is 87%
3
Dataset - Quora Question Pairs
Figure: Wordmap of Quora Question pairs Dataset Data Obtained from Quora’s Official Competition to detect duplicate questions. A lot of questions are duplicated many times on Quora, if Quora can find and suggest already present question to a user they can save a lot of storage space As seen above in the example from the dataset, the model needs to capture a lot of variations. This problem is similar Face Recognition problem in the Image domain, even though almost all the pixel value changes, the person is still the same.
4
Siamese Neural Network with Negative Exponential L1 Distance
Siamese Networks are generally two side networks, such that both sides share the same weights. We will obtain a higher-level feature on both the sides and compare the values using the Negative Exponential L1 distance as described in the figure. I used tensorflow & keras to build a simple version of Siamese network as depicted in the figure and a hybrid stacked version which gave higher accuracy. I programmed the network on my own with help from various resources in the web. And trained it using GPU on euler
5
Results – Accuracy vs Epoch
The Graph shows the Training and Validation Accuracy as a function of Epoch’s I chose the model from Iteration 11 as after that the training accuracy kept on increasing and the validation accuracy either slightly decreased or remained the same. I supposed that the model was overfitting after this point.
6
Model Analysis – Testing on Unseen Data
are dogs better than cats? how can i overcome fear? .89 .35 which is better dog or cat? how can i overcome fear of calculus? do you like eating pizza? why are there so many duplicated questions on quora? .07 .98 how much pizza do you eat? why do people ask similar questions on quora multiple times? how can i dance? can you earn in youtube .99 .04 what can i do to learn daning? how much do you earn in youtube?
7
Miscellaneous Learnings
Interesting learning - If the network does not share parameters, even exactly same strings do not get high scores Model is not good with Out Of Vocabulary words Application of Siamese Networks can be on many more problems such as finding similar images, similar music and movies for recommender systems
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.