Download presentation
Presentation is loading. Please wait.
Published byMaud Waters Modified over 9 years ago
1
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The University of Texas at Dallas June 6, 2010 NAACL-HLT 2010: Computational Linguistics in a World of Social Media Los Angeles, California
2
06/06/20102Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs Goal Goal: Identify topic tags of political blog posts Goal: Identify topic tags of political blog posts Tags are single words or groups of words Tags are single words or groups of words Motivation: Build a system that Motivation: Build a system that Collates information across blog posts Collates information across blog posts Combines evidence to numerically rate attitudes of blogs on different topics Combines evidence to numerically rate attitudes of blogs on different topics Trace the evolution of attitudes over time Trace the evolution of attitudes over time Tags assigned to a post are collectively the post’s topical signature Tags assigned to a post are collectively the post’s topical signature
3
06/06/20103Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs Our Approach Train a Support Vector Machine for each possible tag Train a Support Vector Machine for each possible tag Select the five strongest votes Select the five strongest votes Investigated several features Investigated several features Single words (baseline) Single words (baseline) Syntactic groups (noun phrases and proper nouns, detected with shallow parsing) Syntactic groups (noun phrases and proper nouns, detected with shallow parsing) Named Entity Recognition Named Entity Recognition Co-reference Resolution Co-reference Resolution Synonyms (using WordNet) Synonyms (using WordNet) Word position (title versus body) Word position (title versus body)
4
06/06/20104Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs Data Collected data from two major political blogs Collected data from two major political blogs Daily Kos (100,000 blog posts) Daily Kos (100,000 blog posts) Red State (70,000 blog posts) Red State (70,000 blog posts) 787,780 tags across both blogs 787,780 tags across both blogs Covers the period 2003-2010 for Daily Kos and 2007-2010 for Red State Covers the period 2003-2010 for Daily Kos and 2007-2010 for Red State
5
06/06/20105Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs Results Baseline precision/recall (Single Words): 25.84%/54.97% Baseline precision/recall (Single Words): 25.84%/54.97% +Stemming precision/recall : -0.46%/-0.62% +Stemming precision/recall : -0.46%/-0.62% +Proper Nouns precision/recall:+12.84%/+1.95% +Proper Nouns precision/recall:+12.84%/+1.95% +Named Entities precision/recall:+12.23%/-8.53% +Named Entities precision/recall:+12.23%/-8.53% All features All features Automated Scoring precision/recall: 20.95%/65.123% Automated Scoring precision/recall: 20.95%/65.123% Manual Scoring precision/recall: 63.49%/72.71% Manual Scoring precision/recall: 63.49%/72.71% Syntactic noun phrases help a lot Syntactic noun phrases help a lot Named entity recognition and proper nouns are excellent features Named entity recognition and proper nouns are excellent features Effect of co-reference resolution is marginal Effect of co-reference resolution is marginal
6
06/06/20106Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs Earlier Work Wang and Davison followed a similar approach with SVM’s but for the purpose of query expansion and suggestion Wang and Davison followed a similar approach with SVM’s but for the purpose of query expansion and suggestion Tags are assigned to web pages whereas we assign tags to individual posts Tags are assigned to web pages whereas we assign tags to individual posts They report a precision of 45.25% and recall of 23.24% compared to our precision 20.95% and recall of 65.123% They report a precision of 45.25% and recall of 23.24% compared to our precision 20.95% and recall of 65.123% Sood et. al find similar blog posts and filter tags Sood et. al find similar blog posts and filter tags They report a precision of 13.11% and recall of 22.83% compared to our precision 20.95% and recall of 65.123% They report a precision of 13.11% and recall of 22.83% compared to our precision 20.95% and recall of 65.123%
7
06/06/20107Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs Conclusion Described and evaluated a tool for automatically tagging political blogs for topics Described and evaluated a tool for automatically tagging political blogs for topics Tagging benefits from named entity recognition and proper nouns Tagging benefits from named entity recognition and proper nouns Using a hybrid approach (statistical and grammatical) yields better results Using a hybrid approach (statistical and grammatical) yields better results Recall exceeds numbers reported for other domains Recall exceeds numbers reported for other domains Next step: Aggregate post opinion data, using the content tags as anchor points Next step: Aggregate post opinion data, using the content tags as anchor points
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.