Improving Upon Semantic Classification of Spoken Diary Entries Using Pragmatic Context Information Daniel J. Rayburn Reeves Curry I. Guinn University of.

Improving Upon Semantic Classification of Spoken Diary Entries Using Pragmatic Context Information Daniel J. Rayburn Reeves Curry I. Guinn University of North Carolina Wilmington

Overview Introduction Problem definition Hypotheses – Hypothesis 1: Using Context – Hypothesis 2: Using Thresholds Limitations and future Work

EPA Chemical Exposure Study Create models of exposure to various chemicals Activity/Location/Time/Energy expenditure database Requires data

Database Necessary data from study: – Date/Time – Location – Activity Activity and location representation: CHAD – Consolidated Human Activity Database – Designed by EPA – Single representation for location and activity

Background on Data collection Recall Data Real-time Paper Diaries Direct Observation

Digital voice diaries Sony Voice Recorder Subject recorded daily locations/activities 1220 utterances Transcribed and classified

Database Sample TimeRecorded UtteranceCHAD LocationCHAD Activity 8:57 AMin the bedroom starting housework30125 - Bedroom11200 - Indoor chores 8:59 AMcarrying clothes to the laundry room 30128 - Utility room / Laundry room 11410 - Wash clothes 9:00 AMthe bedroom getting more clothes30125 - Bedroom11410 - Wash clothes 9:05 AMloading the washing machine in the laundry room 30128 - Utility room / Laundry room 11410 - Wash clothes 9:06 AM sitting down going to watch twenty minutes of Regis 30122 - Living room / family room 17223 - Watch TV 9:23 AM I'm going to be brushing the dog in the family room 30122 - Living room / family room 11800 - Care for pets/animals

Problem Definition Difficulties in human encoding: – Error prone – Inefficient – Expensive Computer classification assistance Possible Solution: – statistical language processing to perform text abstraction

Solution Strategies – Word-only system Word n-grams at utterance level to identify the most likely semantic categories – Probabilistic relationship between words

N-grams Diary entry substrings Word relationships These relationships used in word-only n- gram model Example: “I am walking to the store” – Trigram: “I am walking” – Bigram: “am walking”

Leave one out testing Problems with single data set – Database small size – Single test/training set bias – More data sets with better diversity Leave-one-out testing – 1 test set = 1 day of recordings from 1 subject – 42 training/testing sets in all

Word-only system results Leave-one out test sets – Location: 65.5% correct – Activity: 55.3% correct

Hypothesis 1 Word + context system – Performing statistical NLP text abstraction using multi-diary entry contextual information will improve the disambiguation of human speech diary entries over the word-only n-gram model applied to single diary entries in the word-only study.

Reasoning for using context Information human used when encoding Relationship between activities and locations – Relationship between current location and current activity – Relationship between current location and previous location

Previous context information Past context information helps disambiguate Diary Entry: “in the office at the computer” – Correct Location: Study or Home Office – Previous Location: Living room / family room – Top 3 Location Word-only Choices (w/ probability) 0.904 - Office building/bank/post office 0.217 - Public building/library/museum /theater 0.053 - Public garage / parking

6 context relationships Current location given: – Current activity – Previous activity – Previous location Current activity given – Current location – Previous location – Previous activity

Context incorporation How much do we weight the words in the utterance versus the context information? We assumed a linear combination of weights We applied a brute force search of coefficients to achieve the optimal results

Average activity results Word-only – 55.3% Word+context – 66.1% % improvement – 19.5% Weights – Word-only: 0.354 – Previous Location: 0.177 – Current Activity: 0.201 – Previous Activity: 0.268

Average location results Word-only – 65.5% Word+context – 76.0% % improvement – 16.0% Weights – Word-only: 0.294 – Previous Location: 0.146 – Current Activity: 0.286 – Previous Activity: 0.274

Hypothesis 2: Thresholds Threshold System: – “Thresholds can be found experimentally in the data to balance trade-offs between precision and recall.” Threshold – A level at which the computer can classify diary entries with a certain level of precision – Level will be computed using precision and recall Guesses – Computer can either classify or not classify – If classifies, considered a guess – Ex: SAT tests

Threshold Example Difference of top 2 scores – “going to lay [sic] in bed for 20 to 30 minutes” Correct Location: 30125 – Bedroom Top Score: 30122 - Living room / family room: 0.6448 Second Score: 30125 – Bedroom: 0.6296 Relative Difference: (0.6448 - 0.6296) / 0.6448 = 0.0235

Precision & Recall Precision – The accuracy of the computer system when it encoded a diary entry Recall – The number of total diary entries the computer made a correct guess on relative to the entire data set Relationship between – Generally as precision goes up recall goes down

Example: Precision and Recall Student takes 10 question test – Guesses at 7 questions – Answers 6 questions right Precision – 86%, 6 out of 7 attempted answers correct Recall – 60%, 6 answers correct out of all questions

Appropriate threshold levels Done experimentally – Step size of 0.05 Attempt to determine tradeoff between precision and recall Relationship between scores – Different between top 2 scores

Threshold results

Limitations Optimal Classifier – Neural Network and Markov modeling Database – Increased size Context Information – Utilize more information from context

Questions?

Improving Upon Semantic Classification of Spoken Diary Entries Using Pragmatic Context Information Daniel J. Rayburn Reeves Curry I. Guinn University of.

Similar presentations

Presentation on theme: "Improving Upon Semantic Classification of Spoken Diary Entries Using Pragmatic Context Information Daniel J. Rayburn Reeves Curry I. Guinn University of."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Improving Upon Semantic Classification of Spoken Diary Entries Using Pragmatic Context Information Daniel J. Rayburn Reeves Curry I. Guinn University of.

Similar presentations

Presentation on theme: "Improving Upon Semantic Classification of Spoken Diary Entries Using Pragmatic Context Information Daniel J. Rayburn Reeves Curry I. Guinn University of."— Presentation transcript:

Similar presentations

About project

Feedback