INITIAL GOAL: Detecting personality based on interaction with Alexa

Slides:

Advertisements

Similar presentations

B. Ramamurthy 4/17/ Overview of EC2 Components (fig. 2.1) 10..* /17/20152.

Advertisements

Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.

Copyright 2007, Information Builders. Slide 1 Workload Distribution for the Enterprise Mark Nesson, Vashti Ragoonath June, 2008.

LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.

CS 290C: Formal Models for Web Software Lecture 1: Introduction Instructor: Tevfik Bultan.

Sunee Holland University of South Australia School of Computer and Information Science Supervisor: Dr G Stewart Von Itzstein.

PDF Wikispaces Blogging PBWorks You are now ready to cut the red ribbon and unveil your project to your intended audience.

A VERY USEFUL E-LEARNING TOOL FOR TEACHERS, RESEARCHERS, AND STUDENTS.

Input Devices.  Identify audio and video input devices  List the function of the respective devices.

MULTIMEDIA INPUT / OUTPUT TECHNOLOGIES INTRODUCTION 6/1/ A.Aruna, Assistant Professor, Faculty of Information Technology.

Collaborator Revolutionizing the way you communicate and understand

Performance Comparison of Speaker and Emotion Recognition

SUPPLY CHAIN OF BIG DATA. WHAT IS BIG DATA?  A lot of data  Too much data for traditional methods  The 3Vs  Volume  Velocity  Variety.

1 What is Multimedia? Multimedia can have a many definitions Multimedia means that computer information can be represented through media types: – Text.

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.

Publish your presentations online we present SLIDEPLAYER.COM.

Using Speech Recognition to Predict VoIP Quality

Software and Communication Driver, for Multimedia analyzing tools on the CEVA-X Platform. June 2007 Arik Caspi Eyal Gabay.

Connected Infrastructure

RSView32 Messenger Extends the functionality of RSView32 with powerful alarm annunciation, paging, and messaging capabilities Integrates directly into.

Section 9.1 Section 9.2 YOU WILL LEARN TO…

Request-to-Resolve Scenario Overview

Leveraging the Business Intelligence Features in SharePoint 2010

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan

Artificial Intelligence for Speech Recognition

Leveraging BI in SharePoint with PowerPivot and Power View

Technology Literacy Hardware.

Building & Applying Emotion Recognition

Connected Infrastructure

CSC 480 Software Engineering

All about Technology: Using Voki Avatar in the classroom

The next generation of collaboration

Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.

Request-to-Resolve Scenario Overview

Azure Machine Learning & ML Studio

Week 1 Gates Introduction to Information Technology cosc 010 Week 1 Gates

Detecting personality based on interactions with Alexa

PowerPoint Xpress Start

Artificial Intelligence with Heart: Improving Customer Experience through Sentiment Analysis.

OpenWorld 2018 Audio Recognition Using Oracle Data Science Platform

Workflows with ENVI and Esri Agriculture workflows for ICARDA

Ch 15 –part 3 -design evaluation

Retrieval of audio testimonials via voice search

Chapter 11-Business and Technology

Playback control using mind

Identifying Confusion from Eye-Tracking Data

Request-to-Resolve Scenario Overview

Lecture 1: Multi-tier Architecture Overview

Speech Capture, Transcription and Analysis App

EMOTION DIARY Milestone 2 BENEFITS WHAT IS IT? FEATURE HIGHLIGHTS

What's New in eCognition 9

David Cyphert CS 2310 – Software Engineering

denblogs.com/jendorman

Text Analysis and Search Analytics

e-PLUS Lab5 Language Lab System

Final Project Presentation | CIS3203

ARCHITECTURE OVERVIEW

Voice Activation for Wealth Management

Request-to-Resolve Scenario Overview

Text Analysis and Search Analytics

Food Inventory Tracker

Danfoss Link™ Voice Control

Attention for translation

Publish your presentations online we present SLIDEPLAYER.ONLINE.

What's New in eCognition 9

What's New in eCognition 9

Text-to-speech (TTS) Traditional approaches (before 2016) Neural TTS

Exploring Cognitive Services

Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems NDSS 2019 Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick.

Presentation transcript:

INITIAL GOAL: Detecting personality based on interaction with Alexa Krithika Ganesh CS2310 Multimedia Software Engineering

Detecting personality based on interaction with Alexa Detect personality of the user by passing the user voice commands to a machine learning model which predicts the emotional and interest quotient for each voice command. At the end of the week gives an analysis of the interest and emotional quotient of the user.. Objectives Predict emotion and interest quotient for each voice command End of week give an overall analysis of the mood and interest of the person Extension of Exercise 4

<Sentence , emotion> 13 Machine Learning Training Component Building Model Sentence vectors <Sentence , emotion> Embedding layer Data scrubbing Clean data Vectorization Word vectors LSTM Predicted vectors Dense layer Probability distribution vector Softmax layer Max of Probability distribution vector Categorical Cross entropy loss function Changes vector to O/P dimension Predicts emotion

Challenges - Solution Challenge 2 Solution Challenge 1 Solution Voice commands Voice commands Voice commands Voice commands Text Use State of the Art AI emotion Predictor My LSTM model Amazon echo dot Google speech Recognizer No way to capture the voice commands for analysis !! Predictions slow Not accurate enough Need better training data Could not capture frequency, loudness Could save the voice commands for analysis !! Excellent predictions Component based SE GOAL : Focus on the components working together rather than the algorithm itself!!

Krithika Ganesh CS2310 Multimedia Software Engineering PLAN B: Detecting personality based on interactions with Google Speech Recognizer Krithika Ganesh CS2310 Multimedia Software Engineering

Project Remote component Machine learning Trained component Architecture Super component Component Project Remote component Machine learning Trained component Component Flask server Input processor Component SIS Server Uploader component Component UI component

Google Speech Recognizer System Design Voice Commands Uploader Google Speech Recognizer Share Personality Input processor Trained Model Emotion Predicts Super component Test Voice Samples PRJ Remote

Input processor Voice commands Laptop microphone Voice file convertor Google Cloud Speech API enables developers to convert audio to text by applying powerful neural network models in an easy to use API. WAV PCM 8 KHz, 16 bit Mono. Google Speech Recognizer Text Save converted file to be analyzed locally

Super Component: Trained ML Component Raw voice sample The Emotions Analytics engine measures the speaker’s current mood. It requires at least 13 seconds of continuous voice to render an emotional analysis. Attitudes Temper value : measures aggressiveness Valence value : positive negative neutral Arousal value : measures degree of energy Attitudes Emotion But the attitude values are numbers which a lay man cannot understand !! So for this project I decided to extract only emotion data !!

Emotions : Mood Groups Mood groups are an indicator of a speaker’s emotional state during the analyzed voice section. Aggressive / Confrontational Mood Groups Self-Control Mood Group Supremacy and Arrogance Hostility and Anger Self-control and practicality Criticism and Cynicism. Embracive Mood Groups Depressive / Gloomy Mood Groups Leadership and Charisma Creativeness and Passion Loneliness and Unfulfillment Friendliness and Warm Love and Happiness Sadness and Sorrow Defensiveness and Anxiety

PRJ Remote: Testing phase Check results User can upload any test voice sample

Uploader: Share the emotion results Mail results to user Hostname: smtp.gmail.com Port: 587

UI Component Records user voice, Google speech Recognizer translates to text, voice sample saved User uploads voice file, can listen to it, analyze the voice sample, check emotion results Share the emotion results via Gmail or Facebook

Interaction among components and SIS Server Start running the SIS server Start running UI component Then interaction begins UI Component SIS Server Record voice Display voice to text INPUT PROCESSOR runs Record Voice component runs File convertor component runs Google Speech Recognizer runs Upload converted voice file Render voice file PRJ REMOTE runs Click start analysis Render Analysis ML Trained Component runs Click on share UPLOADER Component runs

Screenshots of a Scenario

SIS Server Running

Google Speech Recognizer Running Record voice here Google Speech Recognizer Running

Choose the recorded file Play the recorded file PRJ Component Tester Running

After clicking on start Results are displayed here More visual :D

Share on Facebook Mail Results

Demo link: https://www.youtube.com/watch?v=CzMNhcfvvgE

Future work Improve on my LSTM model. Share results directly on Facebook – feeling surprised, feeling blessed….. Maintain history of previous results by saving it to a database Aggregate the emotion results and analyze it

References My Demo Video https://www.youtube.com/watch?v=CzMNhcfvvgE Input Processor https://cloud.google.com/speech/ https://ffmpeg.org/ Trained model : Beyond verbal http://www.beyondverbal.com/api-quick-integration-guide/ https://github.com/BeyondVerbal-V3/JavaScript-Samples Uploader http://www.geeksforgeeks.org/send-mail-gmail-account-using-python/ Training data <sentence, emotion> : https://www.crowdflower.com/ Training model LSTM https://keras.io/

DEMO