Scott Wen-tau Yih Joint work with Kristina Toutanova, John Platt, Chris Meek Microsoft Research.

Slides:

Advertisements

Similar presentations

A Support Vector Method for Optimizing Average Precision

Advertisements

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 7: Scoring and results assembly.

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.

Basic IR: Modeling Basic IR Task: Slightly more complex:

Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.

2 Information Retrieval System IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3.

Catching the Drift: Learning Broad Matches from Clickthrough Data Sonal Gupta, Mikhail Bilenko, Matthew Richardson University of Texas at Austin, Microsoft.

Self Organization of a Massive Document Collection

Scott Wen-tau Yih (Microsoft Research) Joint work with Vahed Qazvinian (University of Michigan)

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.

Po-Sen Huang1 Xiaodong He2 Jianfeng Gao2 Li Deng2

What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.

Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK.

DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December

Personalized Search Result Diversification via Structured Learning

IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.

A machine learning approach to improve precision for navigational queries in a Web information retrieval system Reiner Kraft

IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date:

Latent Semantic Analysis Hongning Wang VS model in practice Document and query are represented by term vectors – Terms are not necessarily orthogonal.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Introduction to Machine Learning for Information Retrieval Xiaolong Wang.

COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦

©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.

1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

1 Query Operations Relevance Feedback & Query Expansion.

On Scaling Latent Semantic Indexing for Large Peer-to-Peer Systems Chunqiang Tang, Sandhya Dwarkadas, Zhichen Xu University of Rochester; Yahoo! Inc. ACM.

Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.

Expressing Implicit Semantic Relations without Supervision ACL 2006.

Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.

Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

Clustering More than Two Million Biomedical Publications Comparing the Accuracies of Nine Text-Based Similarity Approaches Boyack et al. (2011). PLoS ONE.

Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.

KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,

Link Distribution on Wikipedia [0407]KwangHee Park.

10.0 Latent Semantic Analysis for Linguistic Processing References : 1. “Exploiting Latent Semantic Information in Statistical Language Modeling”, Proceedings.

Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR

NTU & MSRA Ming-Feng Tsai

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

KNN & Naïve Bayes Hongning Wang

Data Mining and Text Mining. The Standard Data Mining process.

Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta

Ranking and Learning 290N UCSB, Tao Yang, 2014

Learning to Rank Shubhra kanti karmaker (Santu)

John Lafferty, Chengxiang Zhai School of Computer Science

Learning Term-weighting Functions for Similarity Measures

Learning to Rank with Ties

Restructuring Sparse High Dimensional Data for Effective Retrieval

Topic: Semantic Text Mining

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

Presentation transcript:

Scott Wen-tau Yih Joint work with Kristina Toutanova, John Platt, Chris Meek Microsoft Research

English Query Doc Spanish Document Set

Query: ACL in Portland ACL Construction LLC (Portland) ACL Construction LLC in Portland, OR -- Map, Phone Number, Reviews, … ACL HLT 2011 The 49th Annual Meeting of the Association for Computational Linguistics… acl2011.org

Query: ACL in Portland Don't Have ACL SurgeryDon't Have ACL Surgery Used By Top Athletes Worldwide Don't Let Them Cut You See Us First Expert Knee Surgeons Get the best knee doctor for your torn ACL surgery. EverettBoneAndJoint.com/Knee ACL: Anterior Cruciate Ligament injuries

Represent text objects as vectors Word/Phrase: term co-occurrences Document: term vectors with TFIDF/BM25 weighting Similarity is determined using functions like cosine of the corresponding vectors Weaknesses Different but related terms cannot be matched e.g., (buy, used, car) vs. (purchase, pre-owned, vehicle) Not suitable for cross-lingual settings  vqvq vdvd

High-dimensional spaceLow-dimensional space

Projection Probabilistic Supervised Unsupervised PLSA LDA PCA LSA OPCA CCA HDLR CL-LSIJPLSA CPLSA PLTM S2Net

Introduction Problem & Approach Experiments Cross-language document retrieval Ad relevance measures Web search ranking Discussion & Conclusions

Approach: Siamese neural network architecture Train the model using labeled (query, doc) Optimize for pre-selected similarity function (cosine) v qry v doc QueryDoc

Approach: Siamese neural network architecture Train the model using labeled (query, doc) Optimize for pre-selected similarity function (cosine) v qry v doc QueryDoc Model

Model form is the same as LSA/PCA Learning the projection matrix discriminatively v qry v doc

Introduction Problem & Approach Experiments Cross-language document retrieval Ad relevance measures Web search ranking Discussion & Conclusions

14.2% increase! Better

query 1 query 2 query 3 doc 1 doc 2 doc 3  Parallel corpus from clicks  82,834,648 query-doc pairs querydoc 1Good doc 2Fair doc 3Bad  Human relevance judgment  16,510 queries  15 doc per query in average Train latent semantic models Evaluate using labeled data

Only S2Net outperforms VSM compared to other projection models

After combined with VSM, results are all improved More details and interesting results of generative topic models can be found in [SIGIR-11]

Introduction Problem & Approach Experiments Cross-language document retrieval Ad relevance measures Web search ranking Discussion & Conclusions

S2Net vs. generative topic models Can handle explicit negative examples No special constraints on input vectors S2Net vs. linear projection methods Loss function designed to closely match the true objective Computationally more expensive S2Net vs. metric learning Target high-dimensional input space Scale well as the number of examples increases

Loss function Closer to the true evaluation objective Slight nonlinearity Cosine instead of inner-product Leverage a large amount of training data Easily parallelizable: distributed gradient computation

S2Net: Discriminative learning framework for dimensionality reduction Learns a good projection matrix that leads to robust text similarity measures Strong empirical results on different tasks Future work Model improvement Handle Web-scale parallel corpus more efficiently Convex loss function Explore more applications e.g., word/phrase similarity