Prediction of Retweet Cascade Size over Time

Slides:



Advertisements
Similar presentations
Is Random Model Better? -On its accuracy and efficiency-
Advertisements

Random Forest Predrag Radenković 3237/10
Viral Marketing – Learning Influence Probabilities.
Learning Influence Probabilities in Social Networks 1 2 Amit Goyal 1 Francesco Bonchi 2 Laks V. S. Lakshmanan 1 U. of British Columbia Yahoo! Research.
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Influence and Passivity in Social Media Daniel M. Romero, Wojciech Galuba, Sitaram Asur, and Bernardo A. Huberman Social Computing Lab, HP Labs.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
Single Category Classification Stage One Additive Weighted Prototype Model.
Ensemble Learning: An Introduction
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
1 Accurate Object Detection with Joint Classification- Regression Random Forests Presenter ByungIn Yoo CS688/WST665.
Chapter 9: Graphs Spanning Trees Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova, Simpson College.
Intelligible Models for Classification and Regression
18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star.
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
1 Chapter 1 Analysis Basics. 2 Chapter Outline What is analysis? What to count and consider Mathematical background Rates of growth Tournament method.
How do I decide whom to follow on Twitter ? IARank: Ranking Users on Twitter in Near Real-time, Based on their Information Amplification Potential.
DETECTING SPAMMERS AND CONTENT PROMOTERS IN ONLINE VIDEO SOCIAL NETWORKS Fabrício Benevenuto ∗, Tiago Rodrigues, Virgílio Almeida, Jussara Almeida, and.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
BOOSTING David Kauchak CS451 – Fall Admin Final project.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Mean and Standard Deviation of Discrete Random Variables.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
LOGO Finding High-Quality Content in Social Media Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis and Gilad Mishne (WSDM 2008) Advisor.
Regression Lines. Today’s Aim: To learn the method for calculating the most accurate Line of Best Fit for a set of data.
Prediction of Influencers from Word Use Chan Shing Hei.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Post-Ranking query suggestion by diversifying search Chao Wang.
Demosaicking for Multispectral Filter Array (MSFA)
A Latent Social Approach to YouTube Popularity Prediction Amandianeze Nwana Prof. Salman Avestimehr Prof. Tsuhan Chen.
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Root Cause Localization on Power Networks Zhen Chen, ECEE, Arizona State University Joint work with Kai Zhu and Lei Ying.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Inferring Networks of Diffusion and Influence
Deep Feedforward Networks
An Empirical Comparison of Supervised Learning Algorithms
6.4 – nth Roots Day 1.
DM-Group Meeting Liangzhe Chen, Nov
Summary Presented by : Aishwarya Deep Shukla
Learning Influence Probabilities In Social Networks
CIKM Competition 2014 Second Place Solution
Instance Based Learning
Learning Chapter 18 and Parts of Chapter 20
Topic 2.1 Extended A – Instantaneous velocity
Viral Marketing over Social Networks
Chapter 9: Graphs Spanning Trees
Invitation to Computer Science 5th Edition
Information Organization: Evaluation of Classification Performance
Presentation transcript:

Prediction of Retweet Cascade Size over Time Andrey Kupavskii, Liudmila Ostroumova, Alexey Umnov, Svyatoslav Usachev, Pavel Serdyukov, Gleb Gusev, Andrey Kustarev {kupavskiy, ostroumova-la, umnov, kaathewise, pavser, gleb57, kustarev}@yandex-team.ru Takeaway: wait for 30 seconds to make the prediction much more precise The second one: we also utilize the information about the spread of the cascade up to moment T0. Algorithm: We train gradient boosted decision tree models. One of them approximates the natural logarithm of the size of the cascade at the moment T, minimizing mean square root error. Two others do binary classification that sorts out large epidemics: tweets that gained more than 4000 retweets and [1600,3999] retweets. Conclusions: The prediction have high precision. If you use the initial spread of thed tweet, the quality of the prediction increases significantly. New features like PageRank in thed retweet graph or the flow of the cascade are important for the prediction. Features: Social and time-sensitive features of the initial node, content features, features of the infected nodes up to the moment T0. PageRank in the retweet graph can be used as a measure of user influence. Future work: Experimental results: Analysis of other measures of tweet popularity Study of the cascade growth in more detail Comparison of different measures of user influence Modeling the tweet spread from the epidemiological point of view New features: PageRank in the retweet graph: The vertices of the retweet graph are users, we have an edge (A,B) with weight w, if user B retweeted user A w times. We calculate PageRank for both weighed and unweighed graph. The flow of the cascade: For each edge from participating user to his follower we define the activity of the follower and the edge which depends on time. Informally, the flow of the initial part of a cascade is the sum of activities over all edges between participating users and their followers. Other features: Average local and global retweet ratios of the initial user up to the moment T, the number of retweets at the moment T0, sum of average retweet ratios, PageRanks, and the total number of followers of the infected users at the moment T0, Motivation: sociology, breaking news detection, viral marketing, freshness of the search engine layout. Viral marketing: You spread an advertisement and you want to get 1000 retweets within a day. You choose the set of initial users and then you can try to predict, whether you get 1000 retweets or not. If you wait for some time and use the information about the initial spread of the cascade, then you can make the prediction more accurate. Prediction: We predict the number of retweets the tweet will gain during the time T since the initial tweet. Two variants of the prediction task: The first one: we utilize only the information available at the moment of the initial tweet. Tweet class Baseline all No flow No PR [1600,3999] 0.659 0.775 0.76 0.761 ≥4000 0.436 0.67 0.657 0.632 F1-score for the binary classification of two groups of tweets that gained the largest number of retweets using different sets of features . Baseline + New features T0=0, T=15m 0.981 0.957 T0=0, T=1w 1.243 1.226 T0=15s, T=15m 0.796 T0=15s T=1w 1.050 T0=30s, T=15m 0.588 T0=30s, T=1w 0.838 Mean square error of the logarithm of the predicted cascade size at moment T. If the error is equal to x, then, roughly speaking, the actual and predicted number of retweets on average differ in ex times.