Presentation is loading. Please wait.

Presentation is loading. Please wait.

The JDPA Sentiment Corpus for the Automotive Domain Miriam Eckert, Lyndsie Clark, Nicolas Nicolov J.D. Power and Associates Jason S. Kessler Indiana University.

Similar presentations


Presentation on theme: "The JDPA Sentiment Corpus for the Automotive Domain Miriam Eckert, Lyndsie Clark, Nicolas Nicolov J.D. Power and Associates Jason S. Kessler Indiana University."— Presentation transcript:

1 The JDPA Sentiment Corpus for the Automotive Domain Miriam Eckert, Lyndsie Clark, Nicolas Nicolov J.D. Power and Associates Jason S. Kessler Indiana University

2 Overview 335 blog posts containing opinions about cars – 223K tokens of blog data Goal of annotation project: – Examples of how words interact to evaluate entities – Annotations encode these interactions Entities are invoked physical objects and their properties – Not just cars, car parts – People, locations, organizations, times

3 Excerpt from the corpus “last night was nice. sean bought me caribou and we went to my house to watch the baseball game … “… yesturday i helped me mom with brians house and then we went and looked at a kia spectra. it looked nice, but when we got up to it, i wasn't impressed...”

4 Outline Motivating example Overview of annotation types – Some statistics Potential uses of corpus Comparison to other resources

5 John recently purchased a had agreatadisappointing stereo, and was mildly verygrippy. He also considered a which, while highlyhad a better PERSON Honda Civic. CAR engine, CAR-PART stereo. CAR-PART CAR PERSON BMW It CAR REFERS-TO priced CAR-FEATURE REFERS-TO

6 John recently purchased a had agreatadisappointing stereo, and was mildly verygrippy. He also considered a which, while highlyhad a better PERSON Honda Civic. CAR engine, CAR-PART stereo. CAR-PART CAR PERSON BMW It CAR priced CAR-FEATURE TARGET

7 John recently purchased a had agreatadisappointing stereo, and was mildly verygrippy. He also considered a which, while highlyhad a better PERSON Honda Civic. CAR engine, CAR-PART stereo. CAR-PART CAR PERSON BMW It CAR REFERS-TO priced CAR-FEATURE REFERS-TO PART-OF FEATURE-OF PART-OF

8 John recently purchased a had agreatadisappointing stereo, and was mildly verygrippy. He also considered a which, while highlyhad a better PERSON Honda Civic. CAR engine, CAR-PART stereo. CAR-PART CAR PERSON BMW It CAR priced CAR-FEATURE DIMENSION MORE LESS

9 John recently purchased a had agreatadisappointing stereo, and was mildly verygrippy. He also considered a which, while highlyhad a better PERSON Honda Civic. CAR engine, CAR-PART stereo. CAR-PART CAR PERSON BMW It CAR REFERS-TO PART-OF TARGET priced CAR-FEATURE FEATURE-OF DIMENSION MORE LESS Entity-level sentiment: positive Entity-level sentiment: mixed REFERS-TO TARGET

10 Outline Motivating example Overview of annotation types – Some statistics Potential uses of corpus Comparison to other resources

11 John recently purchased a Civic. It had a great engine and was priced well. John PERSON CivicIt Entity annotations REFERS-TO CAR engine CAR-PART >20 semantic types from ACE Entity Mention Detection Task Generic automotive types priced CAR- FEATURE

12 Entity-relation annotations Entity-level sentiment: Positive Relations between entities Entity-level sentiment annotations Sentiment flow between entities through relations My car has a great engine. Honda, known for its high standards, made my car. Civic CAR engine CAR- PART priced CAR- FEATURE PART-OF FEATURE- OF

13 Entity annotation type: statistics Inter-annotator agreement Among mentions 83% Refers-to: 68% 61K mentions in corpus and 43K entities 103 documents annotated by around 3 annotators A1: …Kia Rio… A2: …Kia Rio… MATCH A1: …Kia Rio… A2: …Kia Rio… NOT A MATCH

14 Sentiment expressions greatengine highlypriced Prior polarity: positive Prior polarity: negative Evaluations Target mentions Prior polarity: Semantic orientation given target positive, negative, neutral, mixed … a highly spec’ed Prior polarity: positive

15 Sentiment expressions Occurrences in corpus: 10K 13% are multi-word like no other, get up and go 49% are headed by adjectives 22% nouns (damage, good amount) 20% verbs (likes, upset) 5% adverbs (highly)

16 Sentiment expressions 75% of sentiment expression occurrences have non evaluative uses in corpus “light” – …the car seemed too light to be safe… – …vehicles in the light truck category… 77% sentiment expression occurrences are positive Inter-annotator agreement: – 75% spans, 66% targets, 95% prior polarity

17 Modifiers -> contextual polarity NEGATORS not a goodcar not a verygood car INTENSIFIERS very good cara kind of good cara UPWARD DOWNARD NEUTRALIZERS ifgoodthe car is I hope goodthe car is COMMITTERS sure good the car isI am UPWARD suspect good the car isI DOWNWARD

18 Other annotations Speech events (not sourced from author) – John thinks the car is good. Comparisons: – Car X has a better engine than car Y. – Handles a variety of cases

19 Outline Motivating example Overview of annotation types – Some statistics Potential uses of corpus Comparison to other resources

20 Possible tasks Detecting mentions, sentiment expressions, and modifiers Identifying targets of sentiment expressions, modifiers Coreference resolution Finding part-of, feature-of, etc. relations Identifying errors/inconsistencies in data

21 Possible tasks Exploring how elements interact: – Some idiot thinks this is a good car. Evaluating unsupervised sentiment systems or those trained on other domains How do relations between entities transfer sentiment? – The car’s paint job is flawless but the safety record is poor. Solution to one task may be useful in solving another.

22 But wait, there’s more! 180 digital camera blog posts were annotated Total of 223,001 + 108,593 = 331,594 tokens

23 Outline Motivating example – Elements combine to render entity-level sentiment Overview of annotation types – Some statistics Potential uses of corpus Comparison to other resources

24 Other resources MPQA Version 2.0 – Wiebe, Wilson and Cardie (2005) – Largely professionally written news articles – Subjective expression “beliefs, emotions, sentiments, speculations, etc.” – Attitude, contextual sentiment on subjective expressions – Target, source annotations – 226K tokens (JDPA: 332K)

25 Other resources Data sets provided by Bing Liu (2004, 2008) – Customer-written consumer electronics product reviews – Contextual sentiment toward mention of product – Comparison annotations – 130K tokens (JDPA: 332K)

26 Thank you! Obtaining the corpus: – Research and educational purposes – ICWSM.JDPA.corpus@gmail.com – June 2010 – Annotation guidelines: http://www.cs.indiana.edu/~jaskessl Thanks to: Prof. Michael Gasser, Prof. James Martin, Prof. Martha Palmer, Prof. Michael Mozer, William Headden

27 Top 20 annotations by type

28 Inter-annotator agreement


Download ppt "The JDPA Sentiment Corpus for the Automotive Domain Miriam Eckert, Lyndsie Clark, Nicolas Nicolov J.D. Power and Associates Jason S. Kessler Indiana University."

Similar presentations


Ads by Google