Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Google News Personalization: Scalable Online Collaborative Filtering
News and Blog Analysis with Lydia Steven Skiena Dept. of Computer Science SUNY Stony Brook
A probabilistic model for retrospective news event detection
A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.
Hashtags as Milestones in Time Identifying the hashtags for meaningful events using Twitter search logs and Wikipedia data Stewart Whiting University of.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Measuring Scholarly Communication on the Web Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Bibliometric Analysis.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering On-line Alert Systems for Production Plants A Conflict Based Approach.
Data Mining.
Application of Apriori Algorithm to Derive Association Rules Over Finance Data Set Presented By Kallepalli Vijay Instructor: Dr. Ruppa Thulasiram.
Chaotic Mining: Knowledge Discovery Using the Fractal Dimension Daniel Barbara George Mason University Information and Software Engineering Department.
Analysing the link structures of the Web sites of national university systems Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton,
1 Collaborative Filtering: Latent Variable Model LIU Tengfei Computer Science and Engineering Department April 13, 2011.
Introduction to Data Science Kamal Al Nasr, Matthew Hayes and Jean-Claude Pedjeu Computer Science and Mathematical Sciences College of Engineering Tennessee.
Twitter Volume Spikes: Analysis and Application in Stock Trading Yuexin Mao, Wei Wei and Bing Wang COMP4332/RMBI4310 CHAN Chun Ting ( )
Social Theory Driven Operational Forecasting of Civil Unrest Event Outbreaks Final Project Presentation Peter Wu Apr 30, 2015.
Data Mining Chun-Hung Chou
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Emerging Topic Detection on Twitter (Cataldi et al., MDMKDD 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
An Effective Fuzzy Clustering Algorithm for Web Document Classification: A Case Study in Cultural Content Mining Nils Murrugarra.
1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query.
November 8, Global Competitive Internet Usage Forecasting Across Countries and Languages June Wei Department of Management/MIS College of Business.
Addressing Incompleteness and Noise in Evolving Web Snapshots KJDB2007 Masashi Toyoda IIS, University of Tokyo.
Predicting Content Change on the Web Kira Radinsky Technion, Israel Paul Bennettt Microsoft Research.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
1 Linmei HU 1, Juanzi LI 1, Zhihui LI 2, Chao SHAO 1, and Zhixing LI 1 1 Knowledge Engineering Group, Dept. of Computer Science and Technology, Tsinghua.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky (Technion) Eugene Agichtein (Emory) Evgeniy Gabrilovich (Yahoo!
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Query trends CS 349 Presentation December 2 nd, 2008 Catherine Grevet.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Spatial Variation in Search Engine Queries Lars Backstrom, Jon Kleinberg, Ravi Kumar and Jasmine Novak.
Is Weather Becoming More Extreme? By Matt and Mazin.
NextPlace: A Spatio-Temporal Prediction Framework for Pervasive Systems Salvatore Scellato1, Micro Musolesi, Cecilia Mascolo1, Vito Latora, and Andrew.
Information Retrieval with Time Series Query Hyun Duk Kim (now at Twitter), Danila Nikitin (now at Google), ChengXiang Zhai University of Illinois at Urbana-Champaign.
Unconstrained Endpoint Profiling Googling the Internet Ionut Trestian, Supranamaya Ranjan, Alekandar Kuzmanovic, Antonio Nucci Reviewed by Lee Young Soo.
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Expert (and Novice) Finding.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
Model Based Event Detection in Sensor Networks Jayant Gupchup, Andreas Terzis, Randal Burns, Alex Szalay.
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Distinguishing humans from robots in web search logs preliminary results using query rates and intervals Omer Duskin Dror G. Feitelson School of Computer.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Target Classification in Wireless Distributed Sensor Networks (WSDN) Using AI Techniques Can Komar
Joseph Fitzwater, Senior Analyzing Hurricane Intensity with a New Classification for the 21 st Century.
Presentation by: ABHISHEK KAMAT ABHISHEK MADHUSUDHAN SUYAMEENDRA WADKI
Online Conditional Outlier Detection in Nonstationary Time Series
A Methodology for Finding Bad Data
Global Enterprise Search
I don’t need a title slide for a lecture
An Initial Study of Survival Analysis using Deep Learning
Pei Lee, ICDE 2014, Chicago, IL, USA
Inductive Clustering: A technique for clustering search results Hieu Khac Le Department of Computer Science - University of Illinois at Urbana-Champaign.
Presentation transcript:

Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology

Goal

Solution Outline Identify events that occur today More than 0.5 billion daily searches on the web (2008) Many queries are related to current events Analyze what events tend to follow today’s events in the past History repeats itself Query log archives

Google Hot Trends Technorati Online news (Newzingo) Knowledge Sources July 08 Aug 08 Sep 08

Jul y 08 Au g 08 Sep 08 Identifying Events Hurricane Ivan Hurricane Wilma Hurricane Dean Hurricane Gustav Hurricane Katrina Peak Detection Algorithm Each maximum point m y has at most two neighboring minimum points. We consider a maximum point as a peak if: 1. Local maximum m y > Δ 1 (high-pass filter). 2. The difference between the point m y and the lowest of its neighboring minimum points is above Δ 2.

Prediction Indication Weight 1. : How many of the peaks of w 2 (future candidate) appeared k days after w 1 (today’s term) 2. Saliency of w 1 : Significance of the peak in the search volume. hurricane Storm Flood Weather Evacuation Gas Economics Taliban War South Asia china pope texans Goal: For each candidate term evaluate the likelihood of it to appear in the future, given today’s terms. Likelihood to appear in k days Future candidate terms Today’s salient terms Indication weight on the candidate

Hurricane Gas Hurricane

Empirical Methodology Testing on aggregation of 4500 online news sources What is “to appear in the news” Appear significantly more times than its average in the past year Precision at 100

Empirical Evaluation Baseline method - What happens today happens tomorrow Each point is how many of the 100 appeared A total of 30 days of experiments

Empirical Evaluation Baseline method - What happens today happens tomorrow Each point is an average of results from 30 days of tests

Empirical Evaluation Baseline-related – 100 terms which are related to today’s terms are selected randomly Each point is how many of the 100 appeared A total of 30 days of experiments Baseline - Related

Empirical Evaluation Cross-Correlation - Not using indication weights Each point is how many of the 100 appeared A total of 30 days of experiments

Conclusions A new method for prediction of global future events using their patterns in the past. A novel application of aggregated collection of search queries, represented as a time series of a search term. Testing methodology for evaluating such news prediction algorithms.

Problems Data collection How do we collect large masses of data representing events over time? Identifying Events The search volume contains navigational queries (popular websites), transactional, etc. Prediction issues: Data mining of large amounts of candidates for prediction, noise in data, finding patterns and coping with periodic patterns.

Future Work Causality model Extraction from hyperlinks between news articles Abstraction and generalizations: Holonyms Hypernyms Synonyms Going beyond first order (direct) prediction Bayesian networks, HMM

Parameter Tuning