1 NLP Found Helpful * Carl Sable, Kathleen McKeown, and Kenneth W. Church * (at least for one Text Categorization Task)

2 Overview I.Task –Categorizing images based on captions –Predicate argument relations important, bag of words approaches not good enough II.Experiments with Human Subjects –Verification that NLP important for people III.Operational NLP-based System –Takes advantage of findings, outperforms standard systems IV.Conclusions: NLP important for certain tasks!

3 Sample Image with Caption Philippine rescuers carry a fire victim March 19 who perished in a blaze at a Manila disco.

4 Categories From EMNLP-01 PoliticsStruggle Disaster Crime Other

5 Categories From EMNLP-01 (cont) PoliticsStruggle Disaster Crime Other CategoryF1F1 Politics89% Struggle88% Disaster97% Crime90% Other59% Affected People Other WreckageWorkers Responding

6 Disaster Image Categories Affected People Other Wreckage Workers Responding

7 Manual Categorization Tool

8 Lots of Agreement For Categories Agreement for 248 out of 296 images CategoryNumber of Images with Agreement Workers Responding98 (40%) Affected People72 (29%) Wreckage55 (22%) Other23 (9%) Total248 (100%)

9 Performance of Standard Systems Not Very Satisfying

10 Words are Ambiguous: Workers Responding vs. Affected People Philippine rescuers carry a fire victim March 19 who perished in a blaze at a Manila disco. Hypothetical alternative caption: A fire victim who perished in a blaze at a Manila disco is carried by Philippine rescuers March 19. Workers RespondingAffected People

11 Summary of Observations About Task Need to distinguish foreground from background, determine focus of image Not all words are important; some are misleading –Problematic for bag of words approaches Hypothesis: subject and verb are useful clues –Need linguistic analysis to determine predicate argument relationships Philippine rescuers carry a fire victim March 19 who perished in a blaze at a Manila disco.

12 Hypothesis: Subject and Verb are Useful Clues SubjectVerbCategoryGuessable? TruckmakesWreckageNo couplemournAffected PeopleYes blockssufferedWreckageYes NAMEgatherAffected PeopleNo childsleepsAffected PeopleYes inspectorssearchWorkers RespondingYes NAMEobservesWorkers RespondingNo workersconferWorkers RespondingYes childcoversAffected PeopleYes chimneystandsWreckageYes

14 Experiments with Humans Subjects: 4 Conditions Test Hypothesis: Subject and Verb are Useful Clues SENT: First sentence of caption Philippine rescuers carry a fire victim March 19 who perished in a blaze at a Manila disco. RAND: All words from first sentence in random order At perished disco who Manila a a in 19 carry Philippine blaze victim a rescuers March fire IDF: Top two TF*IDF words disco rescuers S-V: Subject and verbsubject = “rescuers”, verb = “carry”

15 More words are better than fewer words –SENT, RAND > S-V, IDF Syntax is important –SENT > RAND; S-V > IDF Experiments with Humans Subjects: Results Hypothesis: Subject and Verb are Useful Clues

16 RAND is Very Slow! Perhaps human subjects unscrambled words, regaining syntactic information ConditionAverage Time (in seconds) RAND68 SENT34 IDF22 S-V20

17 Using Just Two Words (S-V) Almost as Good as All the Words (Bag of Words)

19 Operational NLP-based System For each test document: –Extract subject and verb –Compare to those from training set using some method of word-to-word similarity –Based on similarities, generate a score for every category Sentence POS tagger CASS shallow parser Perl scriptWordNet Output Subjects83.9% Verbs80.6% Extract subjects and verbs from all documents in training set

20 Choosing a Category For given test document d, calculate total score for every category c: Choose category with highest score –If subject is NAME, a bit more complicated

21 Word Similarity Examine large “extended corpus” to generate many subject/verb pairs Use to compute similarities:

22 NLP-based System Outperforms Others The Right Two Words Beat All the Words, NLP Found Helpful for at least one Text Categorization Task!

24 Conclusions NLP is important for our task! –Not all words are important; some are misleading –Need to distinguish foreground from background, determine focus of image –Subject and verb: clues for focus Verified in two ways: –Experiments with human subjects –Operational NLP-based system outperforms others Philippine rescuers carry a fire victim March 19 who perished in a blaze at a Manila disco.

25 Related Work NLP and IR (Strzalkowski 1998, 1999) NLP and retrieval of images (Smeaton and Quigley 1996, Elworthy 2000) NLP and text categorization (Riloff and Lorenzen 1999) Word similarity –Using WordNet (Sussna 1993, Resnik 1999, Richardson et al. 1994) –Jaccard Coefficient and Dice Coefficient (Radecki 1982, van Rijsbergen 1979, Smadja et al. 1996)

1 NLP Found Helpful * Carl Sable, Kathleen McKeown, and Kenneth W. Church * (at least for one Text Categorization Task)

Similar presentations

Presentation on theme: "1 NLP Found Helpful * Carl Sable, Kathleen McKeown, and Kenneth W. Church * (at least for one Text Categorization Task)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 NLP Found Helpful * Carl Sable, Kathleen McKeown, and Kenneth W. Church * (at least for one Text Categorization Task)

Similar presentations

Presentation on theme: "1 NLP Found Helpful * Carl Sable, Kathleen McKeown, and Kenneth W. Church * (at least for one Text Categorization Task)"— Presentation transcript:

Similar presentations

About project

Feedback