Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking 049011 - Algorithms for Large Data Sets Student Symposium.

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
A Machine Learning Approach for Improved BM25 Retrieval
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
A Quality Focused Crawler for Health Information Tim Tang.
Evaluating Search Engine
Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK.
Assignment: Improving search rank – search engine optimization Read the following post carefully.
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:
Web Search – Summer Term 2006 IV. Web Search - Crawling (c) Wolfgang Hürst, Albert-Ludwigs-University.
Ensemble Learning: An Introduction
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
Chapter 5 Searching for Truth: Locating Information on the WWW.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Information Retrieval
Web Search – Summer Term 2006 VII. Selected Topics - PageRank (closer look) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Adapting Deep RankNet for Personalized Search
Google and the Page Rank Algorithm Székely Endre
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date:
Learning to Rank for Information Retrieval
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Chapter 5 Searching for Truth: Locating Information on the WWW.
Adversarial Information Retrieval on the Web or How I spammed Google and lost Dr. Frank McCown Search Engine Development – COMP 475 Mar. 24, 2009.
The Technology Behind. The World Wide Web In July 2008, Google announced that they found 1 trillion unique webpages! Billions of new web pages appear.
Know your Neighbors: Web Spam Detection Using the Web Topology Presented By, SOUMO GORAI Carlos Castillo(1), Debora Donato(1), Aristides Gionis(1), Vanessa.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Universit at Dortmund, LS VIII
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Web Searching. How does a search engine work? It does NOT search the Web (when you make a query) It contains a database with info on numerous Web sites.
Web Search Algorithms By Matt Richard and Kyle Krueger.
Question Answering over Implicitly Structured Web Content
Collaborative Filtering  Introduction  Search or Content based Method  User-Based Collaborative Filtering  Item-to-Item Collaborative Filtering  Using.
Search Engines.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
Search Engines By: Faruq Hasan.
PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015.
Learning to Rank From Pairwise Approach to Listwise Approach.
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.
NTU & MSRA Ming-Feng Tsai
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Autumn Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Ranking and Learning 290N UCSB, Tao Yang, 2014
Machine Learning With Python Sreejith.S Jaganadh.G.
Learning to Rank Shubhra kanti karmaker (Santu)
Overview of Machine Learning
Searching for Truth: Locating Information on the WWW
Searching for Truth: Locating Information on the WWW
Searching for Truth: Locating Information on the WWW
Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
CS249: Neural Language Model
Presentation transcript:

Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium Speaker: Li-Tal Mashiach

©Li-Tal Mashiach, Technion, References  Learning to Rank Using Gradient Descent  Learning to Rank Using Gradient Descent ICML, 2005, Burges et al  Beyond PageRank: Machine Learning for Static Ranking  Beyond PageRank: Machine Learning for Static Ranking WWW 2006, Brill et al

©Li-Tal Mashiach, Technion, Today’s topics  Motivation & Introduction  RankNet  fRank  Discussion  Future Work suggestion Predict Popularity Rank (PP-Rank)

©Li-Tal Mashiach, Technion, Motivation  The Web is growing exponentially in size  The number of incorrect, spamming, and malicious sites is also growing static ranking  Having a good static ranking is crucially important PageRank  Recent works showed that PageRank may not perform any better than other simple measure on certain tasks

©Li-Tal Mashiach, Technion, Motivation – Cont.  Combination of many features is more accurate than one feature PageRank is only link structure feature  It is harder for malicious users to manipulate the ranking in case of machine learning approach

©Li-Tal Mashiach, Technion, Introduction  Neural networks  Training Cost function Gradient Descent

©Li-Tal Mashiach, Technion, Neural Networks Like the brain, neural network is a massively parallel collection of small and simple processing units where the interconnections form a large part of the network's intelligence.

©Li-Tal Mashiach, Technion, Training neural network The task is similar to teaching a student  First, show him some examples  After that, ask him to solve some problems  Finally, correct him, and start the whole process again Hopefully, he’ll get it right after a couple of rounds

©Li-Tal Mashiach, Technion, Training neural network – cont.  Cost function  Cost function – Error function to minimize Sum squared error Cross entropy  Gradient Descent take the derivative of the cost function with respect to the network parameters change those parameters in a gradient- related direction

©Li-Tal Mashiach, Technion, Static ranking as a Classification problem  x i represents a set of features of a Web page i  y i is a rank  The classification problem  The classification problem - learn the function that maps all pages’ features to their rank  But all we really care about is the order of the pages

©Li-Tal Mashiach, Technion, RankNet order of objects values assigned to them  Optimize the order of objects, rather than the values assigned to them  RankNet is given Collection of pairs of items Z={ } Target probabilities that Web page i is to be ranked higher than j  RankNet learns the order of the items  Using probabilistic cost function (cross entropy) for training

©Li-Tal Mashiach, Technion, fRank  Uses RankNet to learn the static ranking function  Training according to human judgments query For each query, rating is assigned manually to a number of results The rating measures how relevant the result is for the query

©Li-Tal Mashiach, Technion, fRank – Cont.  Uses set of features from each page: PageRank PageRank Popularity Popularity – number of visits Anchor text and inlinks Anchor text and inlinks – total amount of text in links, number of unique words, etc. Page Page – number of words, frequency of the most common term, etc. Domain Domain – various averages across all pages in the domain – PageRank, number of outlinks, etc.

©Li-Tal Mashiach, Technion, fRank Results  fRank performs significantly better than PageRank  Page and Popularity feature sets were the most significant contributors  By collecting more popularity data, fRank performance continues to improve

©Li-Tal Mashiach, Technion, Discussion  The training for static ranking cannot be depend on queries Using human judgments for static ranking (?)  PageRank advantages protecting from spams  fRank is not useful for directing the crawl

©Li-Tal Mashiach, Technion, Future work – PP-Rank  Training the machine to predict popularity of Web Page  Using popularity data for training Amount of visits Amount of visits How long users stay in the page How long users stay in the page Did they leave by clicking back Did they leave by clicking back …  should be normalized to the pattern of each user

©Li-Tal Mashiach, Technion, PP-Rank - Advantages  Can predict popularity of pages that were just created (no page points to them yet)  Can be a measure for directing the crawler  The rank will be not according to what web masters find interesting (PageRank), but according to what users find interesting

©Li-Tal Mashiach, Technion, Summary  Ranking is the key to search engine  Learning-based approach for static ranking is a promising new field RankNet fRank PP-Rank

©Li-Tal Mashiach, Technion, ANY QUESTIONS?