1 Blog Cascade Affinity: Analysis and Prediction 2009 ACM Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date : 2009.11.30.

Slides:



Advertisements
Similar presentations
Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,
Advertisements

Presented by Xinyu Chang
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
Time-sensitive Personalized Query Auto-Completion
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.
Existing tools to analyze Blogosphere. IceRocket Ice Spy – Spy on what others are searching. Blog Trends – Identifies the trend of particular terms in.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
1 Opinion Spam and Analysis (WSDM,08)Nitin Jindal and Bing Liu Date: 04/06/09 Speaker: Hsu, Yu-Wen Advisor: Dr. Koh, Jia-Ling.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:
Probabilistic Question Recommendation for Question Answering Communities Mingcheng Qu, Guang Qiu, Xiaofei He, Cheng Zhang, Hao Wu, Jiajun Bu, Chun Chen.
1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Date: 2013/8/27 Author: Shinya Tanaka, Adam Jatowt, Makoto P. Kato, Katsumi Tanaka Source: WSDM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Estimating.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
LOGO Finding High-Quality Content in Social Media Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis and Gilad Mishne (WSDM 2008) Advisor.
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
Retrieval of Highly Related Biomedical References by Key Passages of Citations Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Algorithmic Detection of Semantic Similarity WWW 2005.
1 Blog site search using resource selection 2008 ACM CIKM Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
LOGO Identifying the Influential Bloggers in a Community Nitin Agarwal, Huan Liu, Lei Tang and Philip S. Yu WSDM 2008 Advisor : Dr. Koh Jia-Ling Speaker.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Extracting Query Facets From Search Results Date : 2013/08/20 Source : SIGIR’13 Authors : Weize Kong and James Allan Advisor : Dr.Jia-ling, Koh Speaker.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
LOGO Comments-Oriented Blog Summarization by Sentence Extraction Meishan Hu, Aixin Sun, Ee-Peng Lim (ACM CIKM’07) Advisor : Dr. Koh Jia-Ling Speaker :
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
PREDICTION ON TWEET FROM DYNAMIC INTERACTION Group 19 Chan Pui Yee Wong Tsz Wing Yeung Chun Kit.
QUERY-PERFORMANCE PREDICTION: SETTING THE EXPECTATIONS STRAIGHT Date : 2014/08/18 Author : Fiana Raiber, Oren Kurland Source : SIGIR’14 Advisor : Jia-ling.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Using Friendship Ties and Family Circles for Link Prediction
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Measuring the Latency of Depression Detection in Social Media
Discovery of Blog Communities based on Mutual Awareness
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Presentation transcript:

1 Blog Cascade Affinity: Analysis and Prediction 2009 ACM Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :

2 Outline Introduction Data Preparation Analysis of Cascade Features  Elapsed Time  Number of Friends  Popularity of Participants  Number of Participants  Citing Factor Cascade Affinity Prediction Experiments Conclusions & Future Work

3 Introduction A blog cascade is a group of posts linked together discussing about the same topic, and cascade a ffi nity refers to the phenomenon of a blog’s inclination to join a specific cascade. blog b 2 visited b 1 and wrote a post p 2 in response to this topic of discussion and explicitly created a hyperlink to p1.

4 Introduction

5 Data Preparation We extracted our blog data set in September, 2008 using Technorati API. We first selected the group of top 100 blogs indexed by Technorati as seeds. From these seeds, we retrieved the blogs that had linked to these seeds in their posts, and then we iteratively retrieve the posts that linked to the previous level. We additionally extracted blog-to-blog relationships with weighted edges where the weight of an edge from b i to b j indicates the number of times b i has cited b j ’s posts.

6 Data Preparation

7 The next step is to post-process the extracted cascades to eliminate the ones which have been there not more than a month till the time T ∗. C = {c i | T i ≦ T* - 30 } Many participants may join these cascades after time T ∗ and consequently adversely a ff ect the modeling of the ground truth based on the features set.

8 Features -Elapsed Time It refers to the di ff erence between the time a blogger joins a cascade and the cascade creation time.

9 Features -Number of Friends Introduce the notion of quasi-friend to model friendship within blogosphere based on post citings. A quasi-friend indicates that b 2 probably often reads b 1 ’s blog. (Directed) Given a value of K,we denote the set of quasi-friends of a blog b j as Fj = {f 1,f 2,...,f r }, where each element fr is a blog.

10 Features -Number of Friends (contd.) Γ i (α) : the set of blogs having α quasi-friends in a cascade c i, consideration the time t i (j). F j t i (j) denotes the set of blogs that became a quasi-friend of j’s before time t i (j). Φ i t i (j) is the set of blogs that appeared in c i before time t i (j).

11 Features -Popularity of Participants The e ff ect of popularity of cascade participants on the cascade a ffi nity of a blogger. The popularity of a cascade c i is the highest rank of the blogs in the cascade. The rank of each blog is based on its in-degree.

12 Features -Number of Participants A blogger may have stronger a ffi nity to a cascade which has absorbed a lot of participants. To compute the probability of joining a cascade as a function of the number of participants existing in the cascade. The probability of joining a cascade with β participants, referred to as cascade a ffi nity probability (denoted as ( β)), can be computed as follows.

13 Features -Citing Factor This feature is based on the hypothesis that a blogger b j is more inclined to join a cascade if b j likes to cite others’ posts.

14 Features - ANOVA Test For each feature, we compare the values between blogs which finally joined a cascade and those did not using the one-way analysis of variance (ANOVA) to test whether the di ff erence is really caused by the feature values or just by noise in the data. The result shows that the p-value for citation factor is while other features are all less than It indicates that the di ff erent values of citation factor in both groups should only be considered as noise.

15 Candidate Blog Extraction The candidate blogs are those that have at least one quasi-friend in the given cascade. For a given cascade,the candidate blogs cand(c i ) that may join c i is given by the following equation.

16 Cascade Joining Prediction We would like to compute a score for each candidate blog extracted with respect to a cascade indicating its likelihood of joining the cascade. Based on the features identified in the preceding section,the prediction task can be naturally formulated as a binary classification task. We adopted Support Vector Machines (svm) classifier.

17 Experimental Setting We conducted experiments on our data set using 5-fold cross validation. Precision, denoted by Pr, is the percentage of blogs that eventually joined the target cascade among all blogs predicted to be joining. Recall, denoted by Re, is the percentage of the correct predictions among all blogs that eventually joined the target cascade. F 1 measure = (2 x Pr x Re) / (Pr + Re)

18 Experimental Results

19 Experimental Results Significance of time for modeling number of friends  If we discard temporal issue in modeling quasi-friends then the definition of Γ i (α) is modified as follows:

20 Conclusions Our proposed features are derived from structural information of the cascades without any content analysis of posts/blogs. We performed ANOVA test on these features and showed that all of them, except citation factor, have significant impact on cascade a ffi nity. The cascade a ffi nity prediction is then formulated as a classification task and SVM classifier is employed in our experiments. Using the prediction scores from svm, the candidate blogs can be ranked according to their probability of joining a cascade. We have evaluated di ff erent combinations of the features and our results on cascade a ffi nity prediction is consistent with the ANOVA test.

21 Future Work We intend to investigate how to exploit cascade contents e ff ectively along with structural features for predicting cascade a ffi nity. In particular, recent results showed that cascade types may indicate the genre of the content in a cascade. We wish to exploit them in the context of cascade a ffi nity prediction.