Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014.

Slides:

Advertisements

Similar presentations

Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.

Advertisements

Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.

Reachability Analysis for AMS Verification using Hybrid Support Function and SMT- based Method Honghuang Lin, Peng Li Dept. of ECE, Texas A&M University.

Yue Han and Lei Yu Binghamton University.

1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.

Yuan Yao Joint work with Hanghang Tong, Xifeng Yan, Feng Xu, and Jian Lu MATRI: A Multi-Aspect and Transitive Trust Inference Model 1 May 13-17, WWW 2013.

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University.

Lecture 11: Recursive Parameter Estimation

SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.

© 2011 IBM Corporation IBM Research SIAM-DM 2011, Mesa AZ, USA, Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection Hanghang.

Reduced Support Vector Machine

1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)

SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.

Sparsity, Scalability and Distribution in Recommender Systems

Fast Random Walk with Restart and Its Applications

Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)

Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.

Mathematical Programming in Support Vector Machines

Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.

1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.

Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Feature-Aware Filtering for Point-Set Surface Denoising Min Ki Park*Seung.

Presented By Wanchen Lu 2/25/2013

WEMAREC: Accurate and Scalable Recommendation through Weighted and Ensemble Matrix Approximation Chao Chen ⨳ , Dongsheng Li

Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.

Methodology in Research Md Yazid Mohd Saman Date: 01-Sep-15.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction By K. Weinberger, F. Sha, and L. Saul Presented by Michael Barnathan.

CHAPTER 4 Adaptive Tapped-delay-line Filters Using the Least Squares Adaptive Filtering.

Multi-Task Learning for Boosting with Application to Web Search Ranking Olivier Chapelle et al. Presenter: Wei Cheng.

Approximation of Protein Structure for Fast Similarity Measures Fabian Schwarzer Itay Lotan Stanford University.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.

Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.

Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.

Nonlinear Learning Using Local Coordinate Coding K. Yu, T. Zhang and Y. Gong, NIPS 2009 Improved Local Coordinate Coding Using Local Tangents K. Yu and.

Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

Lei Li Computer Science Department Carnegie Mellon University Pre Proposal Time Series Learning completed work 11/27/2015.

1 Panther: Fast Top-K Similarity Search on Large Networks Jing Zhang 1, Jie Tang 1, Cong Ma 1, Hanghang Tong 2, Yu Jing 1, and Juanzi Li 1 1 Department.

Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.

An Efficient Greedy Method for Unsupervised Feature Selection

Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.

Unscented Kalman Filter 1. 2 Linearization via Unscented Transform EKF UKF.

Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun

Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.

GeoMF: Joint Geographical Modeling and Matrix Factorization for Point-of-Interest Recommendation Defu Lian, Cong Zhao, Xing Xie, Guangzhong Sun, EnhongChen,

OBJECT TRACKING USING PARTICLE FILTERS. Table of Contents Tracking Tracking Tracking as a probabilistic inference problem Tracking as a probabilistic.

Paper Title Authors names Conference and Year Presented by Your Name Date.

Kijung Shin Jinhong Jung Lee Sael U Kang

Support Vector Machines Exercise solutions Ata Kaban The University of Birmingham.

INFOMGP Student names and numbers Papers’ references Title.

Facets: Fast Comprehensive Mining of Coevolving High-order Time Series Hanghang TongPing JiYongjie CaiWei FanQing He Joint Work by Presenter:Wei Fan.

Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.

Arizona State University1 Fast Mining of a Network of Coevolving Time Series Wei FanHanghang TongPing JiYongjie Cai.

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.

Arizona State University Fast Eigen-Functions Tracking on Dynamic Graphs Chen Chen and Hanghang Tong - 1 -

TOPIC: TOward Perfect InfluenCe Graph Summarization Lei Shi, Sibai Sun, Yuan Xuan, Yue Su, Hanghang Tong, Shuai Ma, Yang Chen.

SketchVisor: Robust Network Measurement for Software Packet Processing

Experience Report: System Log Analysis for Anomaly Detection

Tijl De Bie John Shawe-Taylor ECS, ISIS, University of Southampton

CS 9633 Machine Learning Support Vector Machines

Data Driven Resource Allocation for Distributed Learning

Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

RECOMMENDER SYSTEMS WITH SOCIAL REGULARIZATION

Y2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences 1, Zhizhong.

Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.

Modeling IDS using hybrid intelligent systems

Presentation transcript:

Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 2

Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 3

CQA 4

Long-Term Impact Q: How many users will find it beneficial? 5 What is the Long-Term Impact of a Q/A post?

Challenges Q: Why not off-the-shell data mining algorithms? Challenge 1: Multi-aspect  C1.1. Coupling between questions and answers  C1.2. Feature non-linearity  C1.3. Posts dynamically arrive Challenge 2: Efficiency 6

C1.1 Coupling 7 Strong positive correlation! Y Yao, H Tong, T Xie, L Akoglu, F Xu, J Lu. Joint Voting Prediction for Questions and Answers in CQA. ASONAM FqFq Predicted Q Impact Y q Question Features FaFa Predicted A Impact Y a Answer Features Voting Consistency 1 2 3

C1.1 Coupling LIP-M: Question predictionAnswer prediction Voting ConsistencyRegularization

C1.2 Non-linearity The kernel trick (e.g., SVM)  Mercer kernel  Kernel matrix as new feature matrix 9

C1.3 Dynamics Solution: recursive least squares regression [Haykin2005] 10 (x 1, y 1 ) (x 2, y 2 ) … (x t, y t ) (x t+1, y t+1 ) Model t Model t+1 + [Haykin2005] S Haykin. Adaptive filter theory Existing examples New examples Current model New model

This Paper Q1: how to comprehensively capture the multi-aspect in one algorithm?  Coupling, non-linearity, and dynamics Q2: how to make the long-term impact prediction algorithm efficient? 11

Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 12

Modeling Non-linearity Basic Idea: kernelize LIP-M Details - LIP-KM:  Closed-form solution: Complexity: O(n 3 ) Question predictionAnswer prediction Voting ConsistencyRegularization

Modeling Dynamics Basic idea: recursively update LIP-KM Details - LIP-KIM: Complexity: O(n 3 ) -> O(n 2 ) (matrix inverse lemma) 14

Modeling Dynamics THEOREM 1. Correctness of LIP-KIM. Let β 1 be the output of LIP-KM at time t+1, and β 2 be the output of LIP-KIM updated from time t to t+1, we have that β 1 = β 2. 15

Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 16

Approximation Method (1) Basic idea: compress the kernel matrix Details – LIP-KIMA:  1) Separate decomposition  2) Make decomposition reusable  3) Apply decomposition on LIP-KIM Complexity: O(n 2 ) -> O(n) (Nyström approximation) (Eigen-decomposition) (SVD on X 1 ) (Eigen-decomposition) 17

Approximation Method (2) Basic idea: filter less informative examples Details - LIP-KIMAA: Complexity: O(n) -> <O(n) ? (x 1, y 1 ) (x 2, y 2 ) … (x t, y t ) Model t Existing examples Current model (x t+1, y t+1 ) New examples Model t+1 + New model Filtering 18

LIP-KIM O(n 2 ) LIP-M (CoPs) Ridge Regression Coupling Non-linearityDynamics LIP-KI (Recursive Kernel Ridge Regression) LIP-KM O(n 3 ) LIP-IM LIP-KIMA O(n) LIP-KIMAA <O(n) K: Non-linearity I: Dynamics M: Coupling A: Approximation LIP-K (Kernel Ridge Regression) LIP-I (Recursive Ridge Regression) Summary 19

Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 20

Experiment Setup Datasets ( )  Stack Overflow, Mathematics Stack Exchange Features  Content (bag-of-words) & contextual features Test setTraining set Initial setIncremental set Time 21

Evaluation Objectives O1: Effectiveness  How accurate are the proposed algorithms for long-term impact prediction? O2: Efficiency  How scalable are the proposed algorithms? 22

Effectiveness Results Comparisons with existing models. (better) Our methods 23

Effectiveness Results Performance gain analysis. K: Non-linearity I: Dynamics M: Coupling (lower is better) 24

Efficiency Results The speed comparisons. (better) LIP-KIMAA (sub-linear) Ours 25

Quality-Speed Balance-off Our methods (better) 26

Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 27

Conclusions A family of algorithms for long-term impact prediction of CQA posts  Q1: how to capture coupling, non-linearity, and dynamics?  A1: voting consistency + kernel trick + recursive updating  Q2: how to make the algorithms scalable?  A2: approximation methods Empirical Evaluations  Effectiveness: up to 35.8% improvement  Efficiency: up to 390x speedup and sub-linear scalability 28

Q&A Thanks ！ 29 Authors: Yuan Yao, Hanghang Tong, Feng Xu, and Jian Lu Yuan Yao,