Presentation is loading. Please wait.

Presentation is loading. Please wait.

TWinner : Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science.

Similar presentations


Presentation on theme: "TWinner : Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science."— Presentation transcript:

1 TWinner : Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science GIR ’10 29 April, 2011 Sengyu Rim

2 Outline  Introduction  Related Work  Twitter as News-wire  Determining News Intent  Assigning Weights to Tweets  Experiments and Results  Conclusion 2/26

3 Introduction  Motivations – Users find news through search engines –The search results of common search engines are different from the user expected  Non-critical information  Unorganized content –Necessary for search engines to understand the intend of the user query 3/26

4 Introduction  Motivation E.g what event in Korea attracted most attention in 2002? A naive user is searching the news with keyword “korea” on 2002.06-18 Map: korea Wiki: Korea News: Korea:Italy 2:1 Food: Kimchi 4/26

5 Introduction  Analyze the content of a popular social networking site, Twitter to know the intention of the user query –Twitter provides popular news topics –Twitter provides keywords that may enhance the user query  TWinner makes two novel contributions to the field of Geographic information retrieval –Identifying the intent of the user query –Adding additional keywords to the query 5/26

6 Introduction  The architecture of the news intent system Twinner 6/26

7 Outline  Introduction  Related Work  Twitter as News-wire  Determining News Intent  Assigning Weights to Tweets  Experiments and Results  Conclusion 7/26

8 Related Work  To identify and disambiguate the locations of users –Natural Language Processing –Data Mining  To establish the relationship between the location of the news and news content –A model using NLP techniques 8/26

9 Outline  Introduction  Related Work  Twitter as News-wire  Determining News Intent  Assigning Weights to Tweets  Experiments and Results  Conclusion 9/26

10 Twitter as News-wire  Twitter –Free social networking –Micro-blogging service –Medium for news updates 10/26

11 Outline  Introduction  Related Work  Twitter as News-wire  Determining News Intent  Assigning Weights to Tweets  Experiments and Results  Conclusion 11/26

12 Determining News Intent  Identification of Location –Geo-tags the query to a location with certain confidence  Frequency-Population Ratio –FPR always remains constant in the absence of a news making event irrespective of the location –Used to assign a news intent confidence to the query –FPR = (α + β) * Nt  α: the population density factor  β: location type constant  Nt:the number of tweets per minute at that instant 12/26

13 Determining News Intent  Experiments on determining the effect of geo-type and population density 13/26

14 Determining News Intent  The drawback of FPR –Fails to take into account the geographical relatedness of features  Modified FPR –FPR = Σ δi (α i + β i ) * Nt  δi: factor that each geo-location related to the primary search query 14/26

15 Outline  Introduction  Related Work  Twitter as News-wire  Determining News Intent  Assigning Weights to Tweets  Experiments and Results  Conclusion 15/26

16 Assigning Weights to Tweets  Detecting Spam Messages –Spam messages carry little or no relevant information –Nature of spam messages –The formula that tags to a certain level of confidence whether the message is spam or not  Np: the number of followers  Nq: the number of people the user is following  μ: an arbitrary constant  Nr: the ratio of number of tweets containing a reply to the total number of tweets 16/26

17 Assigning Weights to Tweets  On basis of user location –The experiment conducted to understand the relation between Twitter messages and the location of the user 17/26

18 Assigning Weights to Tweets  Using Hyperlinks Mentioned in Tweets –30-50% of the general Twitter messages contain a hyperlink to external website –The news Twitter messages of this percentage increases to 70-80% –We also make use of this pointer to assign the weights to tweets 18/26

19 Assigning Weights to Tweets  Semantic Similarity –Summarize the Twitter messages into a couple of keywords –Naïve approach picks k keywords ignoring the sematic similarity –The definition of the semantic similarity  M: the total number of articles searched in New York Times Corpus  f(x): the number of articles for term x  f(y): the number of articles for term y 19/26

20 Assigning Weights to Tweets  Reassigns the weight of all keywords on the basis of the following formula – Wi*= Wi + ΣS ij * W j  Wi*: the new weight of the keyword i  Wi: the weight without semantic similarity  S ij : the semantic similarity derived from semantic formula  W j : the initial weight of the other words being considered  Identifies k keywords that are semantically dissimilar but together contribute maximum weight. –S pq <S threshold, the similarity between any two word(p) and word(q) belonging to the set of k is less than a threshold –W 1 +W 2 +W 3 +….+W k is maximum for all groups satisfying the condition above mentioned 20/26

21 Outline  Introduction  Related Work  Twitter as News-wire  Determining News Intent  Assigning Weights to Tweets  Experiments and Results  Conclusion 21/26

22 Experiment and Results  Experiments-to see the validity of the hypothesis –First: a naïve user is looking for the latest on the happenings in the context to the Ford Hood incident on 12 th November 2009 –Second: a naïve user is looking for the latest on the happenings in the context to ‘Russia’ on 5 th December 2009 –Third: :a naïve user is looking for the latest on the happenings in the context to ‘Haiti’ on 18 th January 2010 22/26

23 Experiment and Results  Results 23/26

24 Experiment and Results  Result-shows the contrast in search results produced by using original query and after adding keywords obtained by TWinner 24/26

25 Outline  Introduction  Related Work  Twitter as News-wire  Determining News Intent  Assigning Weights to Tweets  Experiments and Results  Conclusion 25/26

26 Conclusion  We present a system to predict a user’s news intent –Takes location mentioned and time of query into consideration –Makes use of the social networking site Twitter to understand the relations hip between geo-information and the news intend of the query  Future work –Understanding the content of the social media message –Sentiment conveyed by the messages –Enhancing the accuracy of the system 26/26

27 Thank you!


Download ppt "TWinner : Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science."

Similar presentations


Ads by Google