Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint

Slides:

Advertisements

Similar presentations

Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.

Advertisements

Object Specific Compressed Sensing by minimizing a weighted L2-norm A. Mahalanobis.

Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.

Yue Han and Lei Yu Binghamton University.

1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.

Yuan Yao Joint work with Hanghang Tong, Xifeng Yan, Feng Xu, and Jian Lu MATRI: A Multi-Aspect and Transitive Trust Inference Model 1 May 13-17, WWW 2013.

I NCREMENTAL S INGULAR V ALUE D ECOMPOSITION A LGORITHMS FOR H IGHLY S CALABLE R ECOMMENDER S YSTEMS (S ARWAR ET AL ) Presented by Sameer Saproo.

Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014.

SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.

Reduced Support Vector Machine

1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)

Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.

Fast Random Walk with Restart and Its Applications

Intelligible Models for Classification and Regression

Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.

Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Multi-Task Learning for Boosting with Application to Web Search Ranking Olivier Chapelle et al. Presenter: Wei Cheng.

Approximation of Protein Structure for Fast Similarity Measures Fabian Schwarzer Itay Lotan Stanford University.

RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.

Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.

Lei Li Computer Science Department Carnegie Mellon University Pre Proposal Time Series Learning completed work 11/27/2015.

1 Panther: Fast Top-K Similarity Search on Large Networks Jing Zhang 1, Jie Tang 1, Cong Ma 1, Hanghang Tong 2, Yu Jing 1, and Juanzi Li 1 1 Department.

Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun

Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.

CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct

ICONIP 2010, Sydney, Australia 1 An Enhanced Semi-supervised Recommendation Model Based on Green’s Function Dingyan Wang and Irwin King Dept. of Computer.

Online Evolutionary Collaborative Filtering RECSYS 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University.

Facets: Fast Comprehensive Mining of Coevolving High-order Time Series Hanghang TongPing JiYongjie CaiWei FanQing He Joint Work by Presenter:Wei Fan.

Arizona State University1 Fast Mining of a Network of Coevolving Time Series Wei FanHanghang TongPing JiYongjie Cai.

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

A Discriminatively Trained, Multiscale, Deformable Part Model Yeong-Jun Cho Computer Vision and Pattern Recognition,2008.

SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.

Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.

Arizona State University Fast Eigen-Functions Tracking on Dynamic Graphs Chen Chen and Hanghang Tong - 1 -

ItemBased Collaborative Filtering Recommendation Algorithms 1.

1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.

TOPIC: TOward Perfect InfluenCe Graph Summarization Lei Shi, Sibai Sun, Yuan Xuan, Yue Su, Hanghang Tong, Shuai Ma, Yang Chen.

Author(s): Rahul Sami, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial.

Estimating standard error using bootstrap

Tijl De Bie John Shawe-Taylor ECS, ISIS, University of Southampton

Recommendation in Scholarly Big Data

Data Driven Resource Allocation for Distributed Learning

Sofus A. Macskassy Fetch Technologies

Bisection and Twisted SVD on GPU

Erasmus University Rotterdam

Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.

Edge Weight Prediction in Weighted Signed Networks

Nonnegative polynomials and applications to learning

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

CMPT 733, SPRING 2016 Jiannan Wang

Random Sampling over Joins Revisited

Q4 : How does Netflix recommend movies?

Generic and Automatic Address Configuration for Data Center Networks

System Level Diesel Engine Emission Modeling Using Neural Networks

Discovering Functional Communities in Social Media

Recommending Mobile Apps - A Collaborative Filtering Viewpoint

RECOMMENDER SYSTEMS WITH SOCIAL REGULARIZATION

Singular Value Decomposition

Identification of Wiener models using support vector regression

Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.

Introduction to Radial Basis Function Networks

Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.

A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.

Modeling IDS using hybrid intelligent systems

Presentation transcript:

Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Aug 24-27, KDD 2014 1

Background and Motivations Modeling Multi-aspect Computation Speedup Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 2

Background and Motivations Modeling Multi-aspect Computation Speedup Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 3

CQA 4

What is the Long-Term Impact of a Q/A post? Q: How many users will find it beneficial? (Survey: we conduct a survey, asking different users about which is the best metric as the long-term impact; and 79.4% users choose the voting score. Analogy: to some extent, the voting score resembles the number of the citations that a research paper receives in the scientific publication domain. It reflects the net number of users who have a positive attitude toward it.) What is the Long-Term Impact of a Q/A post?

Challenges Q: Why not off-the-shell data mining algorithms? Challenge 1: Multi-aspect C1.1. Coupling between questions and answers C1.2. Feature non-linearity C1.3. Posts dynamically arrive Challenge 2: Efficiency 6

C1.1 Coupling Fq Fa Strong positive correlation! Predicted Q Impact Yq [Yao+@ASONAM’14] Predicted Q Impact Yq Predicted A Impact Ya Fq 1 Fa 2 3 Voting Consistency Question Features Answer Features [Yao+@ASONAM’14] Y Yao, H Tong, T Xie, L Akoglu, F Xu, J Lu. Joint Voting Prediction for Questions and Answers in CQA. ASONAM 2014.

C1.1 Coupling LIP-M: [Yao+@ASONAM’14] Question prediction Answer prediction 1 2 3 Voting Consistency Regularization 8

C1.2 Non-linearity The kernel trick (e.g., SVM) Mercer kernel Kernel matrix as new feature matrix 9

+ C1.3 Dynamics Solution: recursive least squares regression (x1, y1) … (xt, yt) Current model Existing examples Modelt New model New examples + Modelt+1 (xt+1, yt+1) Solution: recursive least squares regression [Haykin2005] [Haykin2005] S Haykin. Adaptive filter theory. 2005.

This Paper Q1: how to comprehensively capture the multi-aspect in one algorithm? Coupling, non-linearity, and dynamics Q2: how to make the long-term impact prediction algorithm efficient? 11

Background and Motivations Modeling Multi-aspect Computation Speedup Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 12

Modeling Non-linearity Basic Idea: kernelize LIP-M Details - LIP-KM: Closed-form solution: Complexity: O(n3) Question prediction Answer prediction 1 2 3 Voting Consistency Regularization 13

Modeling Dynamics Basic idea: recursively update LIP-KM Details - LIP-KIM: Complexity: O(n3) -> O(n2) (matrix inverse lemma) 14

Background and Motivations Modeling Multi-aspect Computation Speedup Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 16

Approximation Method (1) Basic idea: compress the kernel matrix Details – LIP-KIMA: 1) Separate decomposition 2) Make decomposition reusable 3) Apply decomposition on LIP-KIM Complexity: O(n2) -> O(n) (Nyström approximation) (SVD on X1) n*n -> n*r (Eigen-decomposition) (Eigen-decomposition) 17

Approximation Method (2) Basic idea: filter less informative examples Details - LIP-KIMAA: Complexity: O(n) -> <O(n) Current model New examples (x1, y1) (x2, y2) … (xt, yt) Existing examples Modelt (xt+1, yt+1) Filtering New model LIP-KIMAA is built upon LIP-KIMA, + Modelt+1 ? 18

Summary LIP-KIMAA <O(n) K: Non-linearity I: Dynamics M: Coupling A: Approximation LIP-KIMA O(n) LIP-KIM O(n2) LIP-KM O(n3) LIP-KI (Recursive Kernel Ridge Regression) LIP-IM Each upward solid arrow makes the model more comprehensive Each dash arrow makes the model more scalable Shaded boxes: our proposed work in this paper White boxes: existing work LIP-K (Kernel Ridge Regression) LIP-M (CoPs) LIP-I (Recursive Ridge Regression) Coupling Non-linearity Dynamics Ridge Regression 19

Background and Motivations Modeling Multi-aspect Computation Speedup Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 20

Experiment Setup Datasets (http://blog.stackoverflow.com/category/cc-wiki-dump/) Stack Overflow , Mathematics Stack Exchange Features Content (bag-of-words) & contextual features Time Initial set Incremental set Training set Test set 21

Evaluation Objectives O1: Effectiveness How accurate are the proposed algorithms for long-term impact prediction? O2: Efficiency How scalable are the proposed algorithms? 22

Effectiveness Results Our methods (better) root mean squared error Comparisons with existing models. 23

Efficiency Results The speed comparisons. LIP-KIMAA (sub-linear) Ours (better) The speed comparisons. 25

Quality-Speed Balance-off Our methods (better) 26

Background and Motivations Modeling Multi-aspect Computation Speedup Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 27

Conclusions A family of algorithms for long-term impact prediction of CQA posts Q1: how to capture coupling, non-linearity, and dynamics? A1: voting consistency + kernel trick + recursive updating Q2: how to make the algorithms scalable? A2: approximation methods Empirical Evaluations Effectiveness: up to 35.8% improvement Efficiency: up to 390x speedup and sub-linear scalability 28

Authors: Yuan Yao, Hanghang Tong, Feng Xu, and Jian Lu Thanks！ Q&A Yuan Yao, yyao@smail.nju.edu.cn Authors: Yuan Yao, Hanghang Tong, Feng Xu, and Jian Lu