INITIAL GOAL: Detecting personality based on interaction with Alexa Krithika Ganesh CS2310 Multimedia Software Engineering
Detecting personality based on interaction with Alexa Detect personality of the user by passing the user voice commands to a machine learning model which predicts the emotional and interest quotient for each voice command. At the end of the week gives an analysis of the interest and emotional quotient of the user.. Objectives Predict emotion and interest quotient for each voice command End of week give an overall analysis of the mood and interest of the person Extension of Exercise 4
<Sentence , emotion> 13 Machine Learning Training Component Building Model Sentence vectors <Sentence , emotion> Embedding layer Data scrubbing Clean data Vectorization Word vectors LSTM Predicted vectors Dense layer Probability distribution vector Softmax layer Max of Probability distribution vector Categorical Cross entropy loss function Changes vector to O/P dimension Predicts emotion
Challenges - Solution Challenge 2 Solution Challenge 1 Solution Voice commands Voice commands Voice commands Voice commands Text Use State of the Art AI emotion Predictor My LSTM model Amazon echo dot Google speech Recognizer No way to capture the voice commands for analysis !! Predictions slow Not accurate enough Need better training data Could not capture frequency, loudness Could save the voice commands for analysis !! Excellent predictions Component based SE GOAL : Focus on the components working together rather than the algorithm itself!!
Krithika Ganesh CS2310 Multimedia Software Engineering PLAN B: Detecting personality based on interactions with Google Speech Recognizer Krithika Ganesh CS2310 Multimedia Software Engineering
Project Remote component Machine learning Trained component Architecture Super component Component Project Remote component Machine learning Trained component Component Flask server Input processor Component SIS Server Uploader component Component UI component
Google Speech Recognizer System Design Voice Commands Uploader Google Speech Recognizer Share Personality Input processor Trained Model Emotion Predicts Super component Test Voice Samples PRJ Remote
Input processor Voice commands Laptop microphone Voice file convertor Google Cloud Speech API enables developers to convert audio to text by applying powerful neural network models in an easy to use API. WAV PCM 8 KHz, 16 bit Mono. Google Speech Recognizer Text Save converted file to be analyzed locally
Super Component: Trained ML Component Raw voice sample The Emotions Analytics engine measures the speaker’s current mood. It requires at least 13 seconds of continuous voice to render an emotional analysis. Attitudes Temper value : measures aggressiveness Valence value : positive negative neutral Arousal value : measures degree of energy Attitudes Emotion But the attitude values are numbers which a lay man cannot understand !! So for this project I decided to extract only emotion data !!
Emotions : Mood Groups Mood groups are an indicator of a speaker’s emotional state during the analyzed voice section. Aggressive / Confrontational Mood Groups Self-Control Mood Group Supremacy and Arrogance Hostility and Anger Self-control and practicality Criticism and Cynicism. Embracive Mood Groups Depressive / Gloomy Mood Groups Leadership and Charisma Creativeness and Passion Loneliness and Unfulfillment Friendliness and Warm Love and Happiness Sadness and Sorrow Defensiveness and Anxiety
PRJ Remote: Testing phase Check results User can upload any test voice sample
Uploader: Share the emotion results Mail results to user Hostname: smtp.gmail.com Port: 587
UI Component Records user voice, Google speech Recognizer translates to text, voice sample saved User uploads voice file, can listen to it, analyze the voice sample, check emotion results Share the emotion results via Gmail or Facebook
Interaction among components and SIS Server Start running the SIS server Start running UI component Then interaction begins UI Component SIS Server Record voice Display voice to text INPUT PROCESSOR runs Record Voice component runs File convertor component runs Google Speech Recognizer runs Upload converted voice file Render voice file PRJ REMOTE runs Click start analysis Render Analysis ML Trained Component runs Click on share UPLOADER Component runs
Screenshots of a Scenario
SIS Server Running
Google Speech Recognizer Running Record voice here Google Speech Recognizer Running
Choose the recorded file Play the recorded file PRJ Component Tester Running
After clicking on start Results are displayed here More visual :D
Share on Facebook Mail Results
Demo link: https://www.youtube.com/watch?v=CzMNhcfvvgE
Future work Improve on my LSTM model. Share results directly on Facebook – feeling surprised, feeling blessed….. Maintain history of previous results by saving it to a database Aggregate the emotion results and analyze it
References My Demo Video https://www.youtube.com/watch?v=CzMNhcfvvgE Input Processor https://cloud.google.com/speech/ https://ffmpeg.org/ Trained model : Beyond verbal http://www.beyondverbal.com/api-quick-integration-guide/ https://github.com/BeyondVerbal-V3/JavaScript-Samples Uploader http://www.geeksforgeeks.org/send-mail-gmail-account-using-python/ Training data <sentence, emotion> : https://www.crowdflower.com/ Training model LSTM https://keras.io/
DEMO