Term Informativeness for Named Entity Detection Jason D. M. Rennie MIT Tommi Jaakkola MIT.

Term Informativeness for Named Entity Detection Jason D. M. Rennie MIT Tommi Jaakkola MIT

Information Extraction President Bush signed the Central America Free Trade Agreement into law Tuesday… WhoWhatWhen

Named Entity Detection President Bush signed the Central America Free Trade Agreement into law Tuesday, hailing the seven-nation pact as an open- door policy that will benefit U.S. exporters and seed prosperity and democracy in Central America and the Dominican Republic.

Informal Communication Other Sources of Information –E-mail –Web Bulletin Boards –Mailing Lists More specialized, up-to-date information But, harder to extract

IE for Informal Comm. SUBJECT: Two New Ipswich Seafood Joints to Open Soon. ALL HOUNDS ON DECK! #1 Across from the new HS, at the old White Cap Seafood is a renovated new joint and the sign says "Salt Box". I suspect they are opening soon; they look ready. Lets hope its great as there is too much 'just average' around here. #2: In the…

NED for Informal Comm. Subject: finale harvard square has anyone been to the recently opened finale in harvard square?

Restaurant Bulletin Board Gathered from a Restaurant BBoard –6 sets of ~100 posts –132 threads –Applied Ratnaparki’s POS tagger –Hand-labeled each token In/Out of restaurant name

Detecting Named Entities Named Entity Informative Bursty Named Entity Informative

Document 1Document 2Document 3 Quantifying Informativeness the clandestine Brazil

A Little History… Z-measure [Brookes,1968] Inverse Doc. Freq. [Jones,1973] x I [Bookstein & Swanson, 1974] Residual IDF [Church & Gale, 1995] Gain [Papenini, 2001]

Main Idea Informative words are: –Rare (IDF) –Modal (Mixture Score) Rarity and Modality are independent qualities We quantify informativeness using a product of IDF and Mixture Score

Binomial Distribution

Term Frequency Distributions 7070 4040 8080 5555 6060 “the” “Brazil”

Mixture Models    0.1%    5% 10% 05  90%

Modality Modal words fit a mixture much better than a single binomial We separately fit the binomial and mixture models to each term frequency distribution We quantify modality by comparing the fitness of the two models

Learning Mixture Parameters Use Gradient Descent to learn,  1,  2

Comparing Fitness Use log-odds to compare fitness of the two models

Top Mixture Score Words TokenScoreRest. Occur. sichaun99.6231/52 fish50.597/73 was48.790/483 speed44.6916/19 tacos43.774/19

Independence Rareness (IDF) Modality (Mixture Score) ?

Correlation Coefficient Score PairCorr. Coefficient IDF/Mixture-.0139 IDF/RIDF.4113 Mixture/RIDF.7380

Top Words Overlap Plot Two sorted lists –Sorted by IDF –Sorted by Mixture Score Look at % overlap among top N in both lists Plot % overlap as we vary N Independent scores would produce line along diagonal

Overlap Plot # Top Words Percent Overlap IDF/Mixture IDF/RIDF

Top IDF*Mixture Words TokenScoreRest. Occur. sichaun379.9731/52 villa197.0810/11 tokyo191.727/11 ribs181.570/13 speed156.2316/19

Intro to NED Experiments Task: Identify Restaurant Names Use standard NED features (capitalization, punctuation, POS) as “Baseline” Add informativeness score as an additional feature Use F1 Breakeven as performance metric

NED Experiments Feature SetF1 Breakeven Baseline55.0% IDF56.0% Mixture56.0% IDF,Mixture56.9% Residual IDF57.4% IDF*RIDF58.5% IDF*Mixture59.3% Better

Summary Traditional syntax-based features are not enough for IE in e-mail & bulletin boards We used term occurrence statistics to construct an informativeness score (IDF*Mixture) We found IDF*Mixture to be useful for identifying topic-centric words and named entites

Discussion Phrases Foreign languages, Speech Co-reference resolution, context tracking Collaborative filtering

Term Informativeness for Named Entity Detection Jason D. M. Rennie MIT Tommi Jaakkola MIT.

Similar presentations

Presentation on theme: "Term Informativeness for Named Entity Detection Jason D. M. Rennie MIT Tommi Jaakkola MIT."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Term Informativeness for Named Entity Detection Jason D. M. Rennie MIT Tommi Jaakkola MIT.

Similar presentations

Presentation on theme: "Term Informativeness for Named Entity Detection Jason D. M. Rennie MIT Tommi Jaakkola MIT."— Presentation transcript:

Similar presentations

About project

Feedback