ASAP and Deep ASAP: End-to-End Audio Sentiment Analysis Pipelines

Slides:

Advertisements

Similar presentations

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Advertisements

Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.

© 2013 IBM Corporation Efficient Multi-stage Image Classification for Mobile Sensing in Urban Environments Presented by Shashank Mujumdar IBM Research,

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.

Jacob Zurasky ECE5526 – Spring 2011

USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.

Performance Comparison of Speaker and Emotion Recognition

WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.

WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.

RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.

DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.

Audio-Based Multimedia Event Detection Using Deep Recurrent Neural Networks Yun Wang, Leonardo Neves, Florian Metze 3/23/2016.

Detecting Web Attacks Using Multi-Stage Log Analysis

Olivier Siohan David Rybach

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

CNN-RNN: A Uniﬁed Framework for Multi-label Image Classiﬁcation

Name: Sushmita Laila Khan Affiliation: Georgia Southern University

Recognition of bumblebee species by their buzzing sound

Faster R-CNN – Concepts

ECE 417 Lecture 1: Multimedia Signal Processing

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

Mr. Darko Pekar, Speech Morphing Inc.

University of Rochester

Deep Learning Amin Sobhani.

2 Research Department, iFLYTEK Co. LTD.

Speaker Classification through Deep Learning

Recurrent Neural Networks for Natural Language Processing

Supervised Time Series Pattern Discovery through Local Importance

Conditional Random Fields for ASR

Source: Procedia Computer Science（2015）70:

Different Units Ramakrishna Vedantam.

Basic machine learning background with Python scikit-learn

Natural Language Processing of Knee MRI Reports

RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION

CS 698 | Current Topics in Data Science

Urban Sound Classification with a Convolution Neural Network

Using Tensorflow to Detect Objects in an Image

Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang

CSSE463: Image Recognition Day 20

A Comparative Study of Convolutional Neural Network Models with Rosenblatt’s Brain Model Abu Kamruzzaman, Atik Khatri , Milind Ikke, Damiano Mastrandrea,

Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.

Recurrent Neural Networks

Final Presentation: Neural Network Doc Summarization

Object Classification through Deconvolutional Neural Networks

Human Speech Perception and Feature Extraction

iSRD Spam Review Detection with Imbalanced Data Distributions

Long Short Term Memory within Recurrent Neural Networks

Mashup Service Recommendation based on User Interest and Service Network Buqing Cao ICWS2013, IJWSR.

John H.L. Hansen & Taufiq Al Babba Hasan

Classification Breakdown

Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Natural Language Processing (NLP) Systems Joseph E. Gonzalez

Automatic Handwriting Generation

Modeling IDS using hybrid intelligent systems

Object Detection Implementations

What's New in eCognition 9

Towards Formula SAE Driver/Vehicle Optimization

Bidirectional LSTM-CRF Models for Sequence Tagging

THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU

LHC beam mode classification

Listen Attend and Spell – a brief introduction

Huawei CBG AI Challenges

Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Austin Karingada, Jacob Handy, Adviser : Dr

Machine Learning for Cyber

Presentation transcript:

ASAP and Deep ASAP: End-to-End Audio Sentiment Analysis Pipelines Master’s Project Defense By: Avery Wells Advisor: Dr. Yi Shang

Current Problem Sentiment analysis – identifying emotions or thoughts towards something Recently audio sentiment analysis has become more popular Researchers in the field must re-implement some experimental steps common across problem domains Partners is Business School have data from company they can’t analyze

Solution and Contributions ASAP (Audio Sentiment Analysis Pipeline) End-to-end pipeline that automates the sentiment analysis process Can use a variety of classical machine learning methods for sentiment classification Easily extendible to various use cases Deep ASAP Same features as ASAP but uses a deep recurrent neural net for classification

Motivations Will save researchers time – can focus on their models Able to apply pipelines to several application areas Can these methods use acoustic features to accurately predict if a salesperson will successfully schedule a sales meeting from a cold call?

Roadmap Background and Previous Works ASAP and Deep ASAP Experiments Results Conclusions and Future Works

Roadmap Background and Previous Works ASAP and Deep ASAP Experiments Results Conclusions and Future Works

Background Sentiment analysis – identifying a person’s attitude towards something through computational methods Examples: “That red car is awesome!” ✅ “The blue car is ugly.” ❌ “I really like you!” ❓❓

Previous Works Audio sentiment analysis used to be done by transcribing audio into text Lakshmish Kaushik, “Sentiment extraction from natural audio streams,” in ICASSP. IEEE, 2013. Requires accurate transcription and large database of tagged words IBM Watson currently uses this system

Previous Works In 2011 Morency used acoustic features extracted from audio to classify sentiment Louis-Philippe Morency. Towards multimodal sentiment analysis: harvesting opinions from the web. (ICMI '11) Used a Hidden Markov Model to classify sentiment of 47 YouTube videos 3 class problem Achieved 55% accuracy

Previous Works Rosas used a similar set of features on a similar YouTube dataset (Spanish videos) Veronica Rosas. Multimodal Sentiment Analysis of Spanish Online Videos. IEEE Intelligent Systems (May 2013). Used an Support Vector Machine with linear kernel to classify sentiment 3 class problem Achieved only 46.75% accuracy

Previous Works More recent approaches use deep neural networks Chernykh used a deep recurrent neural network (RNN) on the IEMOCAP database with 54% accuracy 6 classes Chernykh (2017). Emotion Recognition From Speech With Recurrent Neural Networks. Wang used deep convolution neural nets (CNN) on the previous YouTube datasets with 56%-61% accuracy Wang (2016). Select-Additive Learning: Improving Cross- individual Generalization. 2 classes

Roadmap Background and Previous Works ASAP and Deep ASAP Experiments Problem Redefinition Current Pipelines ASAP Design and Implementation Deep ASAP Design and Implementation Experiments Results Conclusions and Future Works

Problem Redefinition Input: collection of labeled phone conversations Output: sentiment label Two speakers in call, other datasets had one speaker Call is labeled as positive if meeting was attained, negative otherwise

Current Methodology Current audio sentiment analysis pipeline is: Divide audio signal into utterances Extract features from each utterance Select and train classification model Test and validate trained model

ASAP - Design

ASAP - Input Can handle any number of WAV files Provide utility script to convert MP3 to WAV in batch Each audio file should have a sentiment label Takes CSV as input where: First column is path to a WAV file Last column is label for the WAV file Each row is for a different WAV file

ASAP - Preprocessing Trained an HMM with pyAudioAnalysis tool to detect ringtones in audio signal Keep only audio after last ringtone Goal is to eliminate any receptionist Split remaining audio signal into utterances based on speaker turn Speaker diarization from pyAudioAnalysis package Utterances from same original audio file are given same label All utterances are saved to disk for potential use later

ASAP - Feature Extraction Extract features from each utterance using OpenSMILE OpenSMILE allows for a great amount of versatility Can chose from number of predefined features Can extract feature summaries over entire signal or over custom windows throughout signal INTERSPEECH 2009 Emotion Challenge feature set 385 acoustic features compiled based on past research Feature file is saved as a CSV for later use

ASAP - Model Training Currently two machine learning methods can be chosen: HMM implemented based on methods described in Morency Random Forest classifier from Python’s scikit-learn All hyper parameters for methods can be specified by user Any classification method found in scikit-learn can be easily added SVM, MLP, KNN, etc.

ASAP – Classification/Validation Several accuracy statistics are calculated for each run Classification accuracy of individual utterances Classification accuracy of whole audio files Max and min classification accuracies Standard deviation of classification accuracy Determine label of full file by majority of utterances Overall model accuracy is average validation accuracy

Deep ASAP - Design

Deep ASAP – Feature Extraction Use OpenSMILE to extract features from a sliding window over the audio signal Same INTERSPEECH feature set as ASAP Can change sliding window settings Default 5 seconds, no overlap Produces sequence of acoustic features

Deep ASAP – Model Training Trains a basic Bidirectional Long Short Term Memory network (BLSTM) on feature sequences LSTMs have shown to perform well in the sequence/time-series domain H Harutyunyan. Top 10 finisher in TopCoder audio signal classification competition Chernykh (2017). Emotion Recognition From Speech With Recurrent Neural Networks. LSTM was implemented in Lasagne, a Python deep learning library

Roadmap Background and Previous Works ASAP and Deep ASAP Experiments Datasets Experiment Design Results Conclusions and Future Works

Experiments - Datasets YouTube dataset from Morency Kept only videos with positive or negative sentiment 27 videos with a total of 169 utterances Penske phone conversations Labeled as positive if a meeting was obtained, negative otherwise 145 calls with a total of 2943 utterances

Experiments - Datasets Positive Negative Penske Calls 57 88 Utterances 1444 1499 YouTube Videos 12 15 94 75

Experiments Penske and YouTube datasets were put through both pipelines Both models from ASAP were tested LSTM from Deep ASAP was tested Training and validation was done in leave-one-group-out fashion Input data was shuffled Hyper parameters for each model were varied All random seeds set for reproducibility

Roadmap Background and Previous Works ASAP and Deep ASAP Experiments Results YouTube Dataset Penske Dataset Conclusions and Future Works

Results - YouTube HMM: 70.37% accuracy Random Forsest: 70.37% accuracy LSTM: 51.85% accuracy

Results – YouTube – HMM – 70%

Results – YouTube – RF – 70%

Results – YouTube – LSTM – 51%

Results – Call Center HMM: 60.41% accuracy Random Forest: 61.11% accuracy LSTM: 62.93% accuracy

Results – Penske – HMM – 60%

Results – Penske – RF – 61%

Results Penske – LSTM – 63%

Roadmap Background and Previous Works ASAP and Deep ASAP Experiments Results Conclusions and Future Works

Conclusions Methods from both pipelines were able to produce results in line with recent publications However, classification accuracies may not tell the whole story Automation of pipeline will save researchers time in the future Can be directly applied to call center sentiment analysis problem

Future Works Test different deep learning architectures such as CNN Use combination of models to create ensemble classifier More accurate automated segmenting Try classification with more balanced dataset

Thank you!