Fine-tuning Ranking Models: a two-step optimization approach Vitor Jan 29, 2008 Text Learning Meeting - CMU With invaluable ideas from ….
Motivation Rank, Rank, Rank… –Web retrieval, movie recommendation, NFL draft, etc. –Einat’s contextual search –Richard’s set expansion (SEAL) –Andy’s context sensitive spelling correction algorithm –Selecting seeds in Frank’s political blog classification algorithm –Ramnath’s thunderbird extension for Leak prediction Recipient suggestion
Help your brothers! Try Cut Once!, our Thunderbird extension –Works well with Gmail accounts It’s working reasonably well We need feedback.
Leak warnings: hit x to remove recipient Pause or cancel send of message Timer: msg is sent after 10sec by default Suggestions: hit + to add Thunderbird plug-in Classifier/rankers written in JavaScript Recipient Recommendation
36 Enron users
Recipient Recommendation Threaded [Carvalho & Cohen, ECIR-08]
Aggregating Rankings Many “Data Fusion” methods –2 types: Normalized scores: CombSUM, CombMNZ, etc. Unnormalized scores: BordaCount, Reciprocal Rank Sum, etc. Reciprocal Rank: –The sum of the inverse of the rank of document in each ranking. [Aslam & Montague, 2001]; [Ogilvie & Callan, 2003]; [Macdonald & Ounis, 2006]
Aggregated Ranking Results [Carvalho & Cohen, ECIR-08]
Intelligent Auto-completion TOCCBCC CCBCC
[Carvalho & Cohen, ECIR-08]
Can we do better? Not using other features, but better ranking methods Machine learning to improve ranking: Learning to rank: –Many (recent) methods: ListNet, Perceptrons, RankSvm, RankBoost, AdaRank, Genetic Programming, Ordinal Regression, etc. –Mostly supervised –Generally small training sets –Workshop in SIGIR-07 (Einat was in the PC)
Pairwise-based Ranking Rank q d 1 d 2 d 3 d 4 d 5 d 6... d T We assume a linear function f Goal: induce a ranking function f(d) s.t. Therefore, constraints are:
Ranking with Perceptrons Nice convergence properties and mistake bounds –bound on the number of mistakes/misranks Fast and scalable Many variants [Collins 2002, Gao et al 2005, Elsas et al 2008] –Voting, averaging, committee, pocket, etc. –General update rule: –Here: Averaged version of perceptron
Rank SVM Equivalent to maximing AUC [Joachims, KDD-02], [Herbrich et al, 2000] Equivalent to:
Loss Function
Loss Functions SigmoidRank SVMrank Not convex
Fine-tuning Ranking Models Base Ranker Sigmoid Rank Non-convex: Minimizing a very close approximation for the number of misranks Final model Base ranking model e.g., RankSVM, Perceptron, etc.
Gradient Descent
Results in CC prediction 36 Enron users
Set Expansion (SEAL) Results [Listnet: Cao et al., ICML-07] [Wang & Cohen, ICDM-2007]
Results in Letor
Learning Curve TOCCBCC Enron: user lokay-m
Learning Curve CCBCC Enron: user campbel-m
Learning Curve CCBCC Enron: user campbel-m
Regularization Parameter TREC3TREC4Ohsumed =2
Some Ideas Instead of number of misranks, optimize other loss functions: –Mean Average Precision, MRR, etc. –Rank Term: –Some preliminary results with Sigmoid-MAP Does it work for classification?
Thanks