Fine-tuning Ranking Models: a two-step optimization approach Vitor Jan 29, 2008 Text Learning Meeting - CMU With invaluable ideas from ….

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

1 Online Feature Selection for Information Retrieval Niranjan Balasubramanian University of Massachusetts Amherst Joint work with: Giridhar Kumaran and.
Regularized risk minimization
Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:
A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Boosting Approach to ML
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Learning Structural SVMs with Latent Variables Xionghao Liu.
x – independent variable (input)
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.
The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:
Iterative Set Expansion of Named Entities using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University.
Unconstrained Optimization Problem
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Spam Detection Jingrui He 10/08/2007. Spam Types  Spam Unsolicited commercial  Blog Spam Unwanted comments in blogs  Splogs Fake blogs.
Online Learning Algorithms
Modeling Intention in Speech Acts, Information Leaks and User Ranking Methods Vitor R. Carvalho Carnegie Mellon University.
TransRank: A Novel Algorithm for Transfer of Rank Learning Depin Chen, Jun Yan, Gang Wang et al. University of Science and Technology of China, USTC Machine.
Cao et al. ICML 2010 Presented by Danushka Bollegala.
1 K-nearest neighbor methods William Cohen April 2008.
Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction Ming-Wei Chang and Scott Wen-tau Yih Microsoft Research 1.
8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.
Universit at Dortmund, LS VIII
Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.
Online Learning for Collaborative Filtering
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
LARGE MARGIN CLASSIFIERS David Kauchak CS 451 – Fall 2013.
1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram.
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Modeling Intention in Speech Acts, Information Leaks and User Ranking Methods Vitor R. Carvalho Carnegie Mellon University.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
Learning to Rank From Pairwise Approach to Listwise Approach.
Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.
Logistic Regression William Cohen.
1 Learning to Rank --A Brief Review Yunpeng Xu. 2 Ranking and sorting Rank: only has K structured categories Sorting: each sample has a distinct rank.
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
NTU & MSRA Ming-Feng Tsai
1 Modeling Intention in Vitor R. Carvalho Ph.D. Thesis DefenseThesis Committee: Language Technologies Institute William W. Cohen (chair) School of.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Big Data Infrastructure Week 8: Data Mining (1/4) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.
BSP: An iterated local search heuristic for the hyperplane with minimum number of misclassifications Usman Roshan.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Ranking and Learning 290N UCSB, Tao Yang, 2014
Deep Feedforward Networks
Large Margin classifiers
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
An Empirical Study of Learning to Rank for Entity Search
Max-margin sequential learning methods
Learning to Rank Shubhra kanti karmaker (Santu)
Asymmetric Gradient Boosting with Application to Spam Filtering
CS 4/527: Artificial Intelligence
Ranking Users for Intelligent Message Addressing
CMU Y2 Rosetta GnG Distillation
CS639: Data Management for Data Science
Learning to Rank Typed Graph Walks: Local and Global Approaches
Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
Learning to Rank with Ties
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Linear regression with one variable
Recommender Systems Problem formulation Machine Learning.
Presentation transcript:

Fine-tuning Ranking Models: a two-step optimization approach Vitor Jan 29, 2008 Text Learning Meeting - CMU With invaluable ideas from ….

Motivation Rank, Rank, Rank… –Web retrieval, movie recommendation, NFL draft, etc. –Einat’s contextual search –Richard’s set expansion (SEAL) –Andy’s context sensitive spelling correction algorithm –Selecting seeds in Frank’s political blog classification algorithm –Ramnath’s thunderbird extension for Leak prediction Recipient suggestion

Help your brothers! Try Cut Once!, our Thunderbird extension –Works well with Gmail accounts It’s working reasonably well We need feedback.

Leak warnings: hit x to remove recipient Pause or cancel send of message Timer: msg is sent after 10sec by default Suggestions: hit + to add Thunderbird plug-in Classifier/rankers written in JavaScript Recipient Recommendation

36 Enron users

Recipient Recommendation Threaded [Carvalho & Cohen, ECIR-08]

Aggregating Rankings Many “Data Fusion” methods –2 types: Normalized scores: CombSUM, CombMNZ, etc. Unnormalized scores: BordaCount, Reciprocal Rank Sum, etc. Reciprocal Rank: –The sum of the inverse of the rank of document in each ranking. [Aslam & Montague, 2001]; [Ogilvie & Callan, 2003]; [Macdonald & Ounis, 2006]

Aggregated Ranking Results [Carvalho & Cohen, ECIR-08]

Intelligent Auto-completion TOCCBCC CCBCC

[Carvalho & Cohen, ECIR-08]

Can we do better? Not using other features, but better ranking methods Machine learning to improve ranking: Learning to rank: –Many (recent) methods: ListNet, Perceptrons, RankSvm, RankBoost, AdaRank, Genetic Programming, Ordinal Regression, etc. –Mostly supervised –Generally small training sets –Workshop in SIGIR-07 (Einat was in the PC)

Pairwise-based Ranking Rank q d 1 d 2 d 3 d 4 d 5 d 6... d T We assume a linear function f Goal: induce a ranking function f(d) s.t. Therefore, constraints are:

Ranking with Perceptrons Nice convergence properties and mistake bounds –bound on the number of mistakes/misranks Fast and scalable Many variants [Collins 2002, Gao et al 2005, Elsas et al 2008] –Voting, averaging, committee, pocket, etc. –General update rule: –Here: Averaged version of perceptron

Rank SVM Equivalent to maximing AUC [Joachims, KDD-02], [Herbrich et al, 2000] Equivalent to:

Loss Function

Loss Functions SigmoidRank SVMrank Not convex

Fine-tuning Ranking Models Base Ranker Sigmoid Rank Non-convex: Minimizing a very close approximation for the number of misranks Final model Base ranking model e.g., RankSVM, Perceptron, etc.

Gradient Descent

Results in CC prediction 36 Enron users

Set Expansion (SEAL) Results [Listnet: Cao et al., ICML-07] [Wang & Cohen, ICDM-2007]

Results in Letor

Learning Curve TOCCBCC Enron: user lokay-m

Learning Curve CCBCC Enron: user campbel-m

Learning Curve CCBCC Enron: user campbel-m

Regularization Parameter TREC3TREC4Ohsumed  =2

Some Ideas Instead of number of misranks, optimize other loss functions: –Mean Average Precision, MRR, etc. –Rank Term: –Some preliminary results with Sigmoid-MAP Does it work for classification?

Thanks