Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The.

Slides:



Advertisements
Similar presentations
Product Review Summarization Ly Duy Khang. Outline 1.Motivation 2.Problem statement 3.Related works 4.Baseline 5.Discussion.
Advertisements

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Overview of Search Engines
Introduction.  Classification based on function role in classroom instruction  Placement assessment: administered at the beginning of instruction 
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
NERIL: Named Entity Recognition for Indian FIRE 2013.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
14/12/2009ICON Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata , India ICON.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Natural language processing tools Lê Đức Trọng 1.
Automatic Set Instance Extraction using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University Pittsburgh,
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
Digital libraries and web- based information systems Mohsen Kamyar.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Evaluating NLP Features for Automatic Prediction of Language Impairment Using Child Speech Transcripts Khairun-nisa Hassanali 1, Yang Liu 1 and Thamar.
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Language Identification and Part-of-Speech Tagging
PRESENTED BY: PEAR A BHUIYAN
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Information Retrieval on the World Wide Web
Social Knowledge Mining
Machine Learning in Natural Language Processing
Writing Analytics Clayton Clemens Vive Kumar.
CSE 635 Multimedia Information Retrieval
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Introduction to Search Engines
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The University of Texas at Dallas June 6, 2010 NAACL-HLT 2010: Computational Linguistics in a World of Social Media Los Angeles, California

06/06/20102Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs Goal Goal: Identify topic tags of political blog posts Goal: Identify topic tags of political blog posts Tags are single words or groups of words Tags are single words or groups of words Motivation: Build a system that Motivation: Build a system that Collates information across blog posts Collates information across blog posts Combines evidence to numerically rate attitudes of blogs on different topics Combines evidence to numerically rate attitudes of blogs on different topics Trace the evolution of attitudes over time Trace the evolution of attitudes over time Tags assigned to a post are collectively the post’s topical signature Tags assigned to a post are collectively the post’s topical signature

06/06/20103Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs Our Approach Train a Support Vector Machine for each possible tag Train a Support Vector Machine for each possible tag Select the five strongest votes Select the five strongest votes Investigated several features Investigated several features Single words (baseline) Single words (baseline) Syntactic groups (noun phrases and proper nouns, detected with shallow parsing) Syntactic groups (noun phrases and proper nouns, detected with shallow parsing) Named Entity Recognition Named Entity Recognition Co-reference Resolution Co-reference Resolution Synonyms (using WordNet) Synonyms (using WordNet) Word position (title versus body) Word position (title versus body)

06/06/20104Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs Data Collected data from two major political blogs Collected data from two major political blogs Daily Kos (100,000 blog posts) Daily Kos (100,000 blog posts) Red State (70,000 blog posts) Red State (70,000 blog posts) 787,780 tags across both blogs 787,780 tags across both blogs Covers the period for Daily Kos and for Red State Covers the period for Daily Kos and for Red State

06/06/20105Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs Results Baseline precision/recall (Single Words): 25.84%/54.97% Baseline precision/recall (Single Words): 25.84%/54.97% +Stemming precision/recall : -0.46%/-0.62% +Stemming precision/recall : -0.46%/-0.62% +Proper Nouns precision/recall:+12.84%/+1.95% +Proper Nouns precision/recall:+12.84%/+1.95% +Named Entities precision/recall:+12.23%/-8.53% +Named Entities precision/recall:+12.23%/-8.53% All features All features Automated Scoring precision/recall: 20.95%/65.123% Automated Scoring precision/recall: 20.95%/65.123% Manual Scoring precision/recall: 63.49%/72.71% Manual Scoring precision/recall: 63.49%/72.71% Syntactic noun phrases help a lot Syntactic noun phrases help a lot Named entity recognition and proper nouns are excellent features Named entity recognition and proper nouns are excellent features Effect of co-reference resolution is marginal Effect of co-reference resolution is marginal

06/06/20106Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs Earlier Work Wang and Davison followed a similar approach with SVM’s but for the purpose of query expansion and suggestion Wang and Davison followed a similar approach with SVM’s but for the purpose of query expansion and suggestion Tags are assigned to web pages whereas we assign tags to individual posts Tags are assigned to web pages whereas we assign tags to individual posts They report a precision of 45.25% and recall of 23.24% compared to our precision 20.95% and recall of % They report a precision of 45.25% and recall of 23.24% compared to our precision 20.95% and recall of % Sood et. al find similar blog posts and filter tags Sood et. al find similar blog posts and filter tags They report a precision of 13.11% and recall of 22.83% compared to our precision 20.95% and recall of % They report a precision of 13.11% and recall of 22.83% compared to our precision 20.95% and recall of %

06/06/20107Hassanali and Hatzivassiloglou: Automatic Detection of Tags for Political Blogs Conclusion Described and evaluated a tool for automatically tagging political blogs for topics Described and evaluated a tool for automatically tagging political blogs for topics Tagging benefits from named entity recognition and proper nouns Tagging benefits from named entity recognition and proper nouns Using a hybrid approach (statistical and grammatical) yields better results Using a hybrid approach (statistical and grammatical) yields better results Recall exceeds numbers reported for other domains Recall exceeds numbers reported for other domains Next step: Aggregate post opinion data, using the content tags as anchor points Next step: Aggregate post opinion data, using the content tags as anchor points