Download presentation
Presentation is loading. Please wait.
Published byMorgan Atkinson Modified over 8 years ago
1
TWinner : Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science GIR ’10 29 April, 2011 Sengyu Rim
2
Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion 2/26
3
Introduction Motivations – Users find news through search engines –The search results of common search engines are different from the user expected Non-critical information Unorganized content –Necessary for search engines to understand the intend of the user query 3/26
4
Introduction Motivation E.g what event in Korea attracted most attention in 2002? A naive user is searching the news with keyword “korea” on 2002.06-18 Map: korea Wiki: Korea News: Korea:Italy 2:1 Food: Kimchi 4/26
5
Introduction Analyze the content of a popular social networking site, Twitter to know the intention of the user query –Twitter provides popular news topics –Twitter provides keywords that may enhance the user query TWinner makes two novel contributions to the field of Geographic information retrieval –Identifying the intent of the user query –Adding additional keywords to the query 5/26
6
Introduction The architecture of the news intent system Twinner 6/26
7
Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion 7/26
8
Related Work To identify and disambiguate the locations of users –Natural Language Processing –Data Mining To establish the relationship between the location of the news and news content –A model using NLP techniques 8/26
9
Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion 9/26
10
Twitter as News-wire Twitter –Free social networking –Micro-blogging service –Medium for news updates 10/26
11
Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion 11/26
12
Determining News Intent Identification of Location –Geo-tags the query to a location with certain confidence Frequency-Population Ratio –FPR always remains constant in the absence of a news making event irrespective of the location –Used to assign a news intent confidence to the query –FPR = (α + β) * Nt α: the population density factor β: location type constant Nt:the number of tweets per minute at that instant 12/26
13
Determining News Intent Experiments on determining the effect of geo-type and population density 13/26
14
Determining News Intent The drawback of FPR –Fails to take into account the geographical relatedness of features Modified FPR –FPR = Σ δi (α i + β i ) * Nt δi: factor that each geo-location related to the primary search query 14/26
15
Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion 15/26
16
Assigning Weights to Tweets Detecting Spam Messages –Spam messages carry little or no relevant information –Nature of spam messages –The formula that tags to a certain level of confidence whether the message is spam or not Np: the number of followers Nq: the number of people the user is following μ: an arbitrary constant Nr: the ratio of number of tweets containing a reply to the total number of tweets 16/26
17
Assigning Weights to Tweets On basis of user location –The experiment conducted to understand the relation between Twitter messages and the location of the user 17/26
18
Assigning Weights to Tweets Using Hyperlinks Mentioned in Tweets –30-50% of the general Twitter messages contain a hyperlink to external website –The news Twitter messages of this percentage increases to 70-80% –We also make use of this pointer to assign the weights to tweets 18/26
19
Assigning Weights to Tweets Semantic Similarity –Summarize the Twitter messages into a couple of keywords –Naïve approach picks k keywords ignoring the sematic similarity –The definition of the semantic similarity M: the total number of articles searched in New York Times Corpus f(x): the number of articles for term x f(y): the number of articles for term y 19/26
20
Assigning Weights to Tweets Reassigns the weight of all keywords on the basis of the following formula – Wi*= Wi + ΣS ij * W j Wi*: the new weight of the keyword i Wi: the weight without semantic similarity S ij : the semantic similarity derived from semantic formula W j : the initial weight of the other words being considered Identifies k keywords that are semantically dissimilar but together contribute maximum weight. –S pq <S threshold, the similarity between any two word(p) and word(q) belonging to the set of k is less than a threshold –W 1 +W 2 +W 3 +….+W k is maximum for all groups satisfying the condition above mentioned 20/26
21
Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion 21/26
22
Experiment and Results Experiments-to see the validity of the hypothesis –First: a naïve user is looking for the latest on the happenings in the context to the Ford Hood incident on 12 th November 2009 –Second: a naïve user is looking for the latest on the happenings in the context to ‘Russia’ on 5 th December 2009 –Third: :a naïve user is looking for the latest on the happenings in the context to ‘Haiti’ on 18 th January 2010 22/26
23
Experiment and Results Results 23/26
24
Experiment and Results Result-shows the contrast in search results produced by using original query and after adding keywords obtained by TWinner 24/26
25
Outline Introduction Related Work Twitter as News-wire Determining News Intent Assigning Weights to Tweets Experiments and Results Conclusion 25/26
26
Conclusion We present a system to predict a user’s news intent –Takes location mentioned and time of query into consideration –Makes use of the social networking site Twitter to understand the relations hip between geo-information and the news intend of the query Future work –Understanding the content of the social media message –Sentiment conveyed by the messages –Enhancing the accuracy of the system 26/26
27
Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.