RANKING David Kauchak CS 451 – Fall 2013. Admin Assignment 4 Assignment 5.

Slides:

Advertisements

Similar presentations

Query Classification Using Asymmetrical Learning Zheng Zhu Birkbeck College, University of London.

Advertisements

Content-based Recommendation Systems

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

DECISION TREES. Decision trees  One possible representation for hypotheses.

Introduction to Information Retrieval

Imbalanced data David Kauchak CS 451 – Fall 2013.

Data Mining Classification: Alternative Techniques

Text Categorization Karl Rees Ling 580 April 2, 2001.

Text Similarity David Kauchak CS457 Fall 2011.

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.

Supervised learning Given training examples of inputs and corresponding outputs, produce the “correct” outputs for new inputs Two main scenarios: –Classification:

CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Three kinds of learning

Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.

Recommender systems Ram Akella November 26 th 2008.

Introduction to Machine Learning Approach Lecture 5.

Introduction to machine learning

EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()

Machine Learning CS 165B Spring 2012

Face Detection using the Viola-Jones Method

FEATURES David Kauchak CS 451 – Fall Admin Assignment 2  This class will make you a better programmer!  How did it go?  How much time did you.

Evaluation David Kauchak cs458 Fall 2012 adapted from:

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Machine Learning CS311 David Kauchak Spring 2013 Some material borrowed from: Sara Owsley Sood and others.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

MULTICLASS CONTINUED AND RANKING David Kauchak CS 451 – Fall 2013.

GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013.

Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.

ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.

Iterative Readability Computation for Domain-Specific Resources By Jin Zhao and Min-Yen Kan 11/06/2010.

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

MACHINE LEARNING BASICS David Kauchak CS159 Fall 2014.

Universit at Dortmund, LS VIII

윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.

BOOSTING David Kauchak CS451 – Fall Admin Final project.

1 Computing Relevance, Similarity: The Vector Space Model.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

MULTICLASS David Kauchak CS 451 – Fall Admin Assignment 4 Assignment 2 back soon If you need assignment feedback… CS Lunch tomorrow (Thursday):

Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

CS378 Final Project The Netflix Data Set Class Project Ideas and Guidelines.

Post-Ranking query suggestion by diversifying search Chao Wang.

Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.

Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Who am I? Work in Probabilistic Machine Learning Like to teach 

Linguistic Graph Similarity for News Sentence Searching

Gradient descent David Kauchak CS 158 – Fall 2016.

Large Margin classifiers

Deep learning David Kauchak CS158 – Fall 2016.

Web News Sentence Searching Using Linguistic Graph Similarity

Trees, bagging, boosting, and stacking

Machine Learning Basics

Machine Learning Week 1.

Machine Learning basics

Word representations David Kauchak CS158 – Fall 2016.

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

Presentation transcript:

RANKING David Kauchak CS 451 – Fall 2013

Admin Assignment 4 Assignment 5

Course feedback Thanks!

Course feedback Difficulty

Course feedback Time/week

Course feedback Experiments/follow-up questions More interesting data sets More advanced topics I’m hoping to get to: - large margin classifiers - probabilistic modeling - Unsupervised learning/clustering - Ensemble learning - Collaborative filtering - MapReduce/Hadoop

Java tip for the day static ArrayList data = dataset.getData(); How can I iterate over it?

Java tip for the day ArrayList data = dataset.getData(); for( int i = 0; i < data.size(); i++ ){ Example ex = data.get(i) } OR // can do on anything that implements the Iterable interface for( Example ex: data ){ }

An aside: text classification Raw datalabels Chardonnay Pinot Grigio Zinfandel

Text: raw data Raw dataFeatures?labels Chardonnay Pinot Grigio Zinfandel

Feature examples Raw data Features (1, 1, 1, 0, 0, 1, 0, 0, …) clintonsaidcaliforniaacrosstvwrongcapital pinot Clinton said pinot repeatedly last week on tv, “pinot, pinot, pinot” Occurrence of words labels Chardonnay Pinot Grigio Zinfandel

Feature examples Raw data Features (4, 1, 1, 0, 0, 1, 0, 0, …) clintonsaidcaliforniaacrosstvwrongcapital pinot Clinton said pinot repeatedly last week on tv, “pinot, pinot, pinot” Frequency of word occurrences labels Chardonnay Pinot Grigio Zinfandel This is the representation we’re using for assignment 5

Decision trees for text Each internal node represents whether or not the text has a particular word

Decision trees for text wheat is a commodity that can be found in states across the nation

Decision trees for text The US views technology as a commodity that it can export by the buschl.

Printing out decision trees (wheat (buschl predict: not wheat (export predict: not wheat predict: wheat)) (farm (commodity (agriculture predict: not wheat predict: wheat) predict: wheat))

Ranking problems Suggest a simpler word for the word below: vital

Suggest a simpler word Suggest a simpler word for the word below: vital wordfrequency important13 necessary12 essential11 needed8 critical3 crucial2 mandatory1 required1 vital1

Suggest a simpler word Suggest a simpler word for the word below: acquired

Suggest a simpler word Suggest a simpler word for the word below: acquired wordfrequency gotten12 received9 gained8 obtained5 got3 purchased2 bought2 got hold of1 acquired1

Suggest a simpler word vital important necessary essential needed critical crucial mandatory required vital gotten received gained obtained got purchased bought got hold of acquired … training data train ranker list of synonyms list ranked by simplicity

Ranking problems in general ranking1ranking2ranking3 f 1, f 2, …, f n training data: a set of rankings where each ranking consists of a set of ranked examples … train ranker ranking/ordering or examples f 1, f 2, …, f n

Ranking problems in general ranking1ranking2ranking3 f 1, f 2, …, f n training data: a set of rankings where each ranking consists of a set of ranked examples … Real-world ranking problems?

Netflix My List

Search

Ranking Applications reranking N-best output lists - machine translation - computational biology - parsing - … flight search …

Black box approach to ranking Abstraction: we have a generic binary classifier, how can we use it to solve our new problem binary classifier +1 optionally: also output a confidence/score Can we solve our ranking problem with this?

Predict better vs. worse ranking1 f 1, f 2, …, f n Train a classifier to decide if the first input is better than second: - Consider all possible pairings of the examples in a ranking - Label as positive if the first example is higher ranked, negative otherwise

Predict better vs. worse ranking1 f 1, f 2, …, f n Train a classifier to decide if the first input is better than second: - Consider all possible pairings of the examples in a ranking - Label as positive if the first example is higher ranked, negative otherwise f 1, f 2, …, f n new examplesbinary label +1 +1

Predict better vs. worse binary classifier +1 Our binary classifier only takes one example as input

Predict better vs. worse binary classifier +1 Our binary classifier only takes one example as input a 1, a 2, …, a n b 1, b 2, …, b n f’ 1, f’ 2, …, f’ n’ How can we do this? We want features that compare the two examples.

Combined feature vector Many approaches! Will depend on domain and classifier Two common approaches: 1. difference: 2. greater than/less than:

Training f 1, f 2, …, f n new exampleslabel f’ 1, f’ 2, …, f’ n’ extract features train classifier binary classifier

Testing unranked f 1, f 2, …, f n binary classifier ranking?

Testing unranked f 1, f 2, …, f n extract features f’ 1, f’ 2, …, f’ n’

Testing f 1, f 2, …, f n extract features f’ 1, f’ 2, …, f’ n’ binary classifier +1 +1

Testing f 1, f 2, …, f n What is the ranking? Algorithm?

Testing f 1, f 2, …, f n f 1, f 2, …, f n for each binary example e jk : label[j] += f jk (e jk ) label[k] -= f jk (e jk ) rank according to label scores

An improvement? ranking1 f 1, f 2, …, f n new examplesbinary label Are these two examples the same?

Weighted binary classification ranking1 f 1, f 2, …, f n new examplesweighted label Weight based on distance in ranking

Weighted binary classification ranking1 f 1, f 2, …, f n new examplesweighted label In general can weight with any consistent distance metric Can we solve this problem?

Testing If the classifier outputs a confidence, then we’ve learned a distance measure between examples During testing we want to rank the examples based on the learned distance measure Ideas?

Testing If the classifier outputs a confidence, then we’ve learned a distance measure between examples During testing we want to rank the examples based on the learned distance measure Sort the examples and use the output of the binary classifier as the similarity between examples!

Ranking evaluation ranking f 1, f 2, …, f n prediction Ideas?

Idea 1: accuracy ranking f 1, f 2, …, f n prediction 1/5 = 0.2 Any problems with this?

Doesn’t capture “near” correct ranking f 1, f 2, …, f n prediction 1/5 = 0.2 prediction

Idea 2: correlation ranking prediction Look at the correlation between the ranking and the prediction