Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Upon Semantic Classification of Spoken Diary Entries Using Pragmatic Context Information Daniel J. Rayburn Reeves Curry I. Guinn University of.

Similar presentations


Presentation on theme: "Improving Upon Semantic Classification of Spoken Diary Entries Using Pragmatic Context Information Daniel J. Rayburn Reeves Curry I. Guinn University of."— Presentation transcript:

1 Improving Upon Semantic Classification of Spoken Diary Entries Using Pragmatic Context Information Daniel J. Rayburn Reeves Curry I. Guinn University of North Carolina Wilmington

2 Overview Introduction Problem definition Hypotheses – Hypothesis 1: Using Context – Hypothesis 2: Using Thresholds Limitations and future Work

3 EPA Chemical Exposure Study Create models of exposure to various chemicals Activity/Location/Time/Energy expenditure database Requires data

4 Database Necessary data from study: – Date/Time – Location – Activity Activity and location representation: CHAD – Consolidated Human Activity Database – Designed by EPA – Single representation for location and activity

5 Background on Data collection Recall Data Real-time Paper Diaries Direct Observation

6 Digital voice diaries Sony Voice Recorder Subject recorded daily locations/activities 1220 utterances Transcribed and classified

7 Database Sample TimeRecorded UtteranceCHAD LocationCHAD Activity 8:57 AMin the bedroom starting housework30125 - Bedroom11200 - Indoor chores 8:59 AMcarrying clothes to the laundry room 30128 - Utility room / Laundry room 11410 - Wash clothes 9:00 AMthe bedroom getting more clothes30125 - Bedroom11410 - Wash clothes 9:05 AMloading the washing machine in the laundry room 30128 - Utility room / Laundry room 11410 - Wash clothes 9:06 AM sitting down going to watch twenty minutes of Regis 30122 - Living room / family room 17223 - Watch TV 9:23 AM I'm going to be brushing the dog in the family room 30122 - Living room / family room 11800 - Care for pets/animals

8 Problem Definition Difficulties in human encoding: – Error prone – Inefficient – Expensive Computer classification assistance Possible Solution: – statistical language processing to perform text abstraction

9 Solution Strategies – Word-only system Word n-grams at utterance level to identify the most likely semantic categories – Probabilistic relationship between words

10 N-grams Diary entry substrings Word relationships These relationships used in word-only n- gram model Example: “I am walking to the store” – Trigram: “I am walking” – Bigram: “am walking”

11 Leave one out testing Problems with single data set – Database small size – Single test/training set bias – More data sets with better diversity Leave-one-out testing – 1 test set = 1 day of recordings from 1 subject – 42 training/testing sets in all

12 Word-only system results Leave-one out test sets – Location: 65.5% correct – Activity: 55.3% correct

13 Hypothesis 1 Word + context system – Performing statistical NLP text abstraction using multi-diary entry contextual information will improve the disambiguation of human speech diary entries over the word-only n-gram model applied to single diary entries in the word-only study.

14 Reasoning for using context Information human used when encoding Relationship between activities and locations – Relationship between current location and current activity – Relationship between current location and previous location

15 Previous context information Past context information helps disambiguate Diary Entry: “in the office at the computer” – Correct Location: Study or Home Office – Previous Location: Living room / family room – Top 3 Location Word-only Choices (w/ probability) 0.904 - Office building/bank/post office 0.217 - Public building/library/museum /theater 0.053 - Public garage / parking

16 6 context relationships Current location given: – Current activity – Previous activity – Previous location Current activity given – Current location – Previous location – Previous activity

17 Context incorporation How much do we weight the words in the utterance versus the context information? We assumed a linear combination of weights We applied a brute force search of coefficients to achieve the optimal results

18 Average activity results Word-only – 55.3% Word+context – 66.1% % improvement – 19.5% Weights – Word-only: 0.354 – Previous Location: 0.177 – Current Activity: 0.201 – Previous Activity: 0.268

19 Average location results Word-only – 65.5% Word+context – 76.0% % improvement – 16.0% Weights – Word-only: 0.294 – Previous Location: 0.146 – Current Activity: 0.286 – Previous Activity: 0.274

20 Hypothesis 2: Thresholds Threshold System: – “Thresholds can be found experimentally in the data to balance trade-offs between precision and recall.” Threshold – A level at which the computer can classify diary entries with a certain level of precision – Level will be computed using precision and recall Guesses – Computer can either classify or not classify – If classifies, considered a guess – Ex: SAT tests

21 Threshold Example Difference of top 2 scores – “going to lay [sic] in bed for 20 to 30 minutes” Correct Location: 30125 – Bedroom Top Score: 30122 - Living room / family room: 0.6448 Second Score: 30125 – Bedroom: 0.6296 Relative Difference: (0.6448 - 0.6296) / 0.6448 = 0.0235

22 Precision & Recall Precision – The accuracy of the computer system when it encoded a diary entry Recall – The number of total diary entries the computer made a correct guess on relative to the entire data set Relationship between – Generally as precision goes up recall goes down

23 Example: Precision and Recall Student takes 10 question test – Guesses at 7 questions – Answers 6 questions right Precision – 86%, 6 out of 7 attempted answers correct Recall – 60%, 6 answers correct out of all questions

24 Appropriate threshold levels Done experimentally – Step size of 0.05 Attempt to determine tradeoff between precision and recall Relationship between scores – Different between top 2 scores

25 Threshold results

26 Limitations Optimal Classifier – Neural Network and Markov modeling Database – Increased size Context Information – Utilize more information from context

27 Questions?


Download ppt "Improving Upon Semantic Classification of Spoken Diary Entries Using Pragmatic Context Information Daniel J. Rayburn Reeves Curry I. Guinn University of."

Similar presentations


Ads by Google