Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.

Similar presentations


Presentation on theme: "Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology."— Presentation transcript:

1 Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology

2 Goal

3 Solution Outline Identify events that occur today More than 0.5 billion daily searches on the web (2008) Many queries are related to current events Analyze what events tend to follow today’s events in the past History repeats itself Query log archives

4 Google Hot Trends Technorati Online news (Newzingo) Knowledge Sources July 08 Aug 08 Sep 08

5 Jul y 08 Au g 08 Sep 08 Identifying Events Hurricane Ivan Hurricane Wilma Hurricane Dean Hurricane Gustav Hurricane Katrina Peak Detection Algorithm Each maximum point m y has at most two neighboring minimum points. We consider a maximum point as a peak if: 1. Local maximum m y > Δ 1 (high-pass filter). 2. The difference between the point m y and the lowest of its neighboring minimum points is above Δ 2.

6 Prediction Indication Weight 1. : How many of the peaks of w 2 (future candidate) appeared k days after w 1 (today’s term) 2. Saliency of w 1 : Significance of the peak in the search volume. hurricane Storm Flood Weather Evacuation Gas Economics Taliban War South Asia china pope texans 0.85 0.40 0.10 0.36 0.12 0.30 0.05 0.01 0.08 Goal: For each candidate term evaluate the likelihood of it to appear in the future, given today’s terms. Likelihood to appear in k days Future candidate terms Today’s salient terms Indication weight on the candidate 0.9 0.7

7 Hurricane Gas Hurricane

8 Empirical Methodology Testing on aggregation of 4500 online news sources What is “to appear in the news” Appear significantly more times than its average in the past year Precision at 100

9 Empirical Evaluation Baseline method - What happens today happens tomorrow Each point is how many of the 100 appeared A total of 30 days of experiments

10 Empirical Evaluation Baseline method - What happens today happens tomorrow Each point is an average of results from 30 days of tests

11 Empirical Evaluation Baseline-related – 100 terms which are related to today’s terms are selected randomly Each point is how many of the 100 appeared A total of 30 days of experiments Baseline - Related

12 Empirical Evaluation Cross-Correlation - Not using indication weights Each point is how many of the 100 appeared A total of 30 days of experiments

13 Conclusions A new method for prediction of global future events using their patterns in the past. A novel application of aggregated collection of search queries, represented as a time series of a search term. Testing methodology for evaluating such news prediction algorithms.

14 Problems Data collection How do we collect large masses of data representing events over time? Identifying Events The search volume contains navigational queries (popular websites), transactional, etc. Prediction issues: Data mining of large amounts of candidates for prediction, noise in data, finding patterns and coping with periodic patterns.

15 Future Work Causality model Extraction from hyperlinks between news articles Abstraction and generalizations: Holonyms Hypernyms Synonyms Going beyond first order (direct) prediction Bayesian networks, HMM

16 Parameter Tuning


Download ppt "Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology."

Similar presentations


Ads by Google