Fine-Grained Location Extraction from Tweets with Temporal Awareness Date:2015/03/19 Author:Chenliang Li, Aixin Sun Source:SIGIR '14 Advisor:Jia-ling Koh.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
Sentiment Analysis on Twitter Data
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based.
Large-Scale Entity-Based Online Social Network Profile Linkage.
Influence and Passivity in Social Media Daniel M. Romero, Wojciech Galuba, Sitaram Asur, and Bernardo A. Huberman Social Computing Lab, HP Labs.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
Unsupervised Dependency Parsing David Mareček Institute of Formal and Applied Linguistics Charles University in Prague Doctoral thesis defense September.
Modeling the Evolution of Product Entities Priya Radhakrishnan 1, Manish Gupta 1,2, Vasudeva Varma 1 1 Search and Information Extraction Lab, IIIT-Hyderabad,
MINING FEATURE-OPINION PAIRS AND THEIR RELIABILITY SCORES FROM WEB OPINION SOURCES Presented by Sole A. Kamal, M. Abulaish, and T. Anwar International.
1 Asking What No One Has Asked Before : Using Phrase Similarities To Generate Synthetic Web Search Queries CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG.
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
Extracting Personal Names from Applying Named Entity Recognition to Informal Text Einat Minkov & Richard C. Wang Language Technologies Institute.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,
To Trust of Not To Trust? Predicting Online Trusts using Trust Antecedent Framework Viet-An Nguyen 1, Ee-Peng Lim 1, Aixin Sun 2, Jing Jiang 1, Hwee-Hoon.
PAKDD'15 DATA MINING COMPETITION: GENDER PREDICTION BASED ON E-COMMERCE DATA Team members: Maria Brbic, Dragan Gamberger, Jan Kralj, Matej Mihelcic, Matija.
Mining and Summarizing Customer Reviews
SaariStory: A framework to represent the medieval history of Saarland Michael Barz, Jonas Hempel, Cornelius Leidinger, Mainack Mondal Course supervisor:
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:
Short Text Understanding Through Lexical-Semantic Analysis
De-identifying Pathology Reports for Pathology Informatics
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Syntax The study of how words are ordered and grouped together Key concept: constituent = a sequence of words that acts as a unit he the man the short.
Can We Predict Eat Out Behavior of a Person from Tweets and Check-ins? Md. Taksir Hasan Majumder ( ) Md. Mahabur Rahman ( ) Department of Computer.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
Wang-Chien Lee i Pervasive Data Access ( i PDA) Group Pennsylvania State University Mining Social Network Big Data Intelligent.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Question Answering over Implicitly Structured Web Content
COLING 2012 Extracting and Normalizing Entity-Actions from Users’ comments Swapna Gottipati, Jing Jiang School of Information Systems, Singapore Management.
Beyond Nouns Exploiting Preposition and Comparative adjectives for learning visual classifiers.
Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Xutao Li1, Gao Cong1, Xiao-Li Li2
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
Intelligent DataBase System Lab, NCKU, Taiwan Josh Jia-Ching Ying, Eric Hsueh-Chan Lu, Wen-Ning Kuo and Vincent S. Tseng Institute of Computer Science.
Supertagging CMSC Natural Language Processing January 31, 2006.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Exploiting Wikipedia Inlinks for Linking Entities in Queries Entity Recognition and Disambiguation Challenge ACM SIGIR 2014 July 6-11, 2014 The 37 th Annual.
1 Blog Cascade Affinity: Analysis and Prediction 2009 ACM Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
Web Intelligence and Intelligent Agent Technology 2008.
Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life.
The Wisdom of the Few Xavier Amatrian, Neal Lathis, Josep M. Pujol SIGIR’09 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Homework 3 Progress Presentation -Meet Shah. Goal Identify whether tweet is sarcastic or not.
Language Identification and Part-of-Speech Tagging
A Simple Approach for Author Profiling in MapReduce
Name: Sushmita Laila Khan Affiliation: Georgia Southern University
Neighborhood - based Tag Prediction
A Brief Introduction to Distant Supervision
A Framework for Benchmarking Entity-Annotation Systems
Pooria Taghizadeh : Dr. Hadi Tabatabaee : Dr. Mona Ghassemian :
Predicting Prevalence of Influenza-Like Illness From Geo-Tagged Tweets
Natural Language Processing
By Hossein Hematialam and Wlodek Zadrozny Presented by
Extracting Why Text Segment from Web Based on Grammar-gram
Introduction Dataset search
Presentation transcript:

Fine-Grained Location Extraction from Tweets with Temporal Awareness Date:2015/03/19 Author:Chenliang Li, Aixin Sun Source:SIGIR '14 Advisor:Jia-ling Koh Spearker:LIN,CI-JIE 1

Outline Introduction Method Experiment Conclusion 2

Outline Introduction Method Experiment Conclusion 3

Introduction Twitter is a popular platform for sharing activities, plans, and opinions. Users often reveal their location information and short term visiting plans 4

Introduction Goal: extract all POI(point of interest) and temporal awareness label pairs from tweet 5 find pairs

Challenges Grammar errors, misspellings, informal abbreviations POI names are ambiguous 6 mac Apple’s products McDonald’s chain restaurant refer to

Outline Introduction Method Experiment Conclusion 7

Overview of P ETAR 8

POI Inventory Construction extracting the POI names mentioned in tweets that are associated with Foursquare check-ins 9 Regular expression

POI Inventory Construction partial POI names are extracted by taking all the sub-sequences of the names (up to 5 words) stopwords are ignored and used as separators filtering is conducted to remove infrequent candidate POI names 10

POI Inventory Construction Not all candidate POI names are valid noisy data is included as well: “my room”, “my work place”, “my bed” 11

Data Analysis and Observations Data Sets 4.33M tweets from 19,256 unique Singapore-based users during June ,201 tweets mentions at least one candidate POI name by 13,758 unique users Observation 1: Many users reveal their fine-grained locations in their tweets. 222,201 tweets were published by 71.4% of all users in the dataset 91.3% of the users who had published at least 20 tweets 12

Data Analysis and Observations Observation 2: The candidate POI mentions are mostly very short with one or two words. Many of the mentions are partial location names. 46.7% of the candidate POI names are unigrams 41.6%+ of the candidate POI names are partial POI names POI names with 3 or more words are about 2.5% only. 13

Data Analysis and Observations Observation 3: About half of the candidate POI mentions indeed refer to locations and their associated temporal awareness can be determined tweets are sampled from these 222,201 tweets for manual annotation 14

Data Analysis and Observations 15 heading to gucci at paragon now! We infer that the user is going to visit “paragon” within 2 hours

Overview of P ETAR 16

Time-Aware POI Tagger Prediction of whether a candidate POI mention is truly a POI and its temporal awareness largely relies on the context expressed in the tweet 17

Time-Aware POI Tagger 18

Time-Aware POI Tagger 19

Time-Aware POI Tagger 20 The dog runs. A dog jumps. The dog jumps. A cat runs. The cat jumps.

Time-Aware POI Tagger Grammatical Feature Time-trend score of tweet 1.The dictionary D contains 36 commonly used words in English with manually assigned time-trend scores: 1, 0, and -1 2.Verbs tagged with VBN and VBD are assigned score -1; VBZ, VBP, VBG and VB assigned with score 0 3.compute a time-trend score for a tweet t and then take the average of the scores assigned 21 yesterday

Time-Aware POI Tagger Grammatical Feature The closest verb The closest verb to a candidate location name based on TwitterNLP POS tagging The tense label of the verb The distance of the verb to candidate location name, and whether the verb is to the left of candidate location name. 22 closet Verbtensedistanceleft wentpast True

Time-Aware POI Tagger 23

Time-Aware POI Tagger 24

Time-Aware POI Tagger BILOU Schema Feature because of the POI inventory, the candidate POI mentions in a tweet can be pre-labeled with BILOU schema 25

Overview of P ETAR 26

Outline Introduction Method Experiment Conclusion 27

Experiment Experiment Setup manually annotated 4000 tweets as ground truth 5-fold cross validation is applied Evaluation metrics: Precision, Recall and F1 Comparative Methods Random Annotation (RA) K-Nearest Neighbor (KNN) StanfordNER (CRF-Classifier) 28

Experiment POI extraction with temporal awareness Disambiguating POIs (ignoring temporal awareness) 29

Experiment Lexical features are better for POI mention disambiguation Grammatical features are better for resolving temporal awareness 30

Experiment Lexical + Grammatical features are better in most cases 31

Outline Introduction Method Experiment Conclusion 32

Conclusion facilitate the fine-grained location-based services/marketing and personalization P ETAR exploits the crowd wisdom of Foursquare community to enable fine- grained location extraction time-aware POI tagger conducts the location extraction and temporal awareness resolution in an effective and efficient way 33

Thanks for listening. 34