26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Query Classification Using Asymmetrical Learning Zheng Zhu Birkbeck College, University of London.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Albert Gatt Corpora and Statistical Methods Lecture 13.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Text Classification With Labeled and Unlabeled Data Presenter: Aleksandar Milisic Supervisor: Dr. David Albrecht.
Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Active Learning for Class Imbalance Problem
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
The identification of interesting web sites Presented by Xiaoshu Cai.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Personalized Web Search by Mapping User Queries to Categories Fang Liu Presented by Jing Zhang CS491CXZ February 26, 2004.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Universit at Dortmund, LS VIII
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Reduction of Training Noises for Text Classifiers Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Post-Ranking query suggestion by diversifying search Chao Wang.
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Support-Vector Networks C Cortes and V Vapnik (Tue) Computational Models of Intelligence Joon Shik Kim.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
1 An introduction to support vector machine (SVM) Advisor : Dr.Hsu Graduate : Ching –Wen Hong.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
11 Automated multi-label text categorization with VG-RAM weightless neural networks Presenter: Guan-Yu Chen A. F. DeSouza, F. Pedroni, E. Oliveira, P.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
1 Clustering Web Queries John S. Whissell, Charles L.A. Clarke, Azin Ashkan CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/08/31.
Using ODP Metadata to Personalize Search Presented by Lan Nie 09/21/2005, Lehigh University.
Source: Procedia Computer Science(2015)70:
Machine Learning Week 1.
Author: Kazunari Sugiyama, etc. (WWW2004)
Text Categorization Assigning documents to a fixed set of categories
Using Link Information to Enhance Web Page Classification
Presentation transcript:

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006

26/01/20162Gianluca Demartini Outline Introduction Basic Concepts Rankings Algorithms considered Experimental Setup Results Conclusions

26/01/20163Gianluca Demartini Introduction Search the Web: Ranked list or Categories based organization? Clustering Search vs Faceted Search  Clustering: Grouping documents according to some measure of similarity computed using associations among features (typically words and phrases) Result - 1 big hierarchy  Faceted: Creating a set of category hierarchies each of which corresponds to a different facet (dimension or feature type) relevant to the collection to be navigated Result - a set of category hierarchies each of which corresponds to a different facet Supporting Vector Machines Classifiers

26/01/20164Gianluca Demartini SVM text classification A linear SVM is a hyperplane that separates a set of positive examples from a set of negative examples with maximum margin

26/01/20165Gianluca Demartini The formula for the output of a linear SVM is Where w is the normal vector to the hyperplane, and x is the input vector Given training examples labeled either "yes" or "no", a maximum-margin hyperplane is identified which splits the "yes" from the "no" training examples SVM text classification

Clustering Search Engines SVM better than Bayesian for Text Classification Many clustering algorithms proposed How to rank the resulting Categories?  Algorithm independent We analyze 9 different metrics used to order the clusters 26/01/2016Gianluca Demartini6

26/01/20167Gianluca Demartini Outline Introduction Basic Concepts Rankings Algorithms considered Experimental Setup Results Conclusions

26/01/20168Gianluca Demartini Category Ranking Algorithms 9 different ranking algorithms considered:  Rank based metrics  Text Similarity metrics  Other Metrics

26/01/20169Gianluca Demartini Category Ranking Algorithms - Rank Based Metrics PageRank computation: p at position x Average PageRank Total PageRank Average Rank Minimal Rank

26/01/201610Gianluca Demartini Category Ranking Algorithms - Text Similarity Metrics Similarity between pages and categories (title + description)  Values returned by the SVM classifiers Average Similarity Score (AvgValue)  Over all the pages that belong to a category Maximum Similarity Score (MaxValue)  Over all the pages that belong to a category

26/01/201611Gianluca Demartini Category Ranking Algorithms - Other Metrics Order by Size: using the number of docs belonging to the category  Used by most of the Clustering Search Engines (Vivisimo) Alphabetical Order  Used in Faceted Search (Flamenco) Random Order  To compare the other metrics

26/01/201612Gianluca Demartini Outline Introduction Basic Concepts Rankings Algorithms considered Experimental Setup Results Conclusions

26/01/201613Gianluca Demartini Experimental Setup 9 algorithms, 18 people Supporting Vector Machines (SVM) as Text Classifiers ODP categories (top 3 levels) most frequent terms in DMOZ titles and descriptions of web pages English categories Each user evaluated each algorithm once We measure the time spent for search the relevant result and the position of the results

26/01/2016Gianluca Demartini14

26/01/201615Gianluca Demartini Experimental Results Time to find the relevant result:

26/01/201616Gianluca Demartini Experimental Results Average of the position of the algo for each user:

26/01/201617Gianluca Demartini Experimental Results Average Rank of the Result and of the Cluster

26/01/201618Gianluca Demartini Conclusions & Future Work MaxValue seems to be the best way to rank the clusters in a Clustering Search Engine Alphabetical and Size Ranking are not so good We want to test other algorithms  Using query-based metrics (similarity between q and p)  Click-thorought data

26/01/201619Gianluca Demartini Thanks for your attention! Q&A