Chinese Academy of Sciences, Beijing, China Semantic Matching by Non-Linear Word Transportation for Information Retrieval Jiafeng Guo* Yixing Fan* Qingyao Ai+ W. Bruce Croft+ *CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China +Center for Intelligent Information Retrieval, University of Massachusetts Amherst, MA, USA
Outline Introduction Non-Linear Word Transportation Model Discussion Experiments Conclusions
Introduction Effective Retrieval Models Bag-of-Words (BoW) Vocabulary mismatch Relevance score exact matching of words() semantically related words()
Techniques Query Expansion Latent Models Translation Models Word Embedding Word Mover’s Distance
Query Expansion Global Method Local Method Problem corpus being search or hand-crafted thesaurus Local Method top ranked documents(PRF) Problem Query drift
Latent Models Latent space in reduced dimensionality Problem Query and Documents(e.g. LDA-based document model) Problem Loss of many detailed matching signals over words Do not improve the performance(need to combine)
Translation Models Documents -> Queries(word dependency) Problem mixture model and binomial model(Berger et al.) title and Document pair(Jin et al.) mutual Information between words(Karimzadehgan et al) Problem How to formalize and estimate the translation probability
Word Embedding Semantical representations of words semantics and syntactic The Potential in IR need to be further explored Bag of Word Embedding(BoWE) monolingual and bilingual(Vulic et al.) generalized language model(Ganguly et al.)
Word Mover’s Distance Transportation problem Earth Mover’s Distance urban planning and civil engineering Earth Mover’s Distance image retrieval and multimedia search Word Mover’s Distance document classification
Non-Linear Word Transportation Bag of Word Embedding(BoWE) Non-linear transportation(Inspired by WMD) Fixed document capacity and non-fixed query capacity Efficiently approximate Neighborhood pruning and indexing strategies
Bag of Word Embedding(BoWE) Richer Representation Similarity between words(e.g., “car” and “auto”) Word Embedding Matrix 𝑊∈ ℝ 𝐾× 𝑉 𝐷={ 𝑤 1 𝑑 , 𝑡𝑓 1 , …, 𝑤 𝑚 𝑑 , 𝑡𝑓 𝑚 } 𝑄={ 𝑤 1 𝑞 ,𝑞 𝑡𝑓 1 , …, 𝑤 𝑛 𝑞 , 𝑡𝑓 𝑛 }
Non-Linear Word Transportation Information Capacity Document word(fixed) Query word(unlimited) Vague nature of query intent Information Gain(Profit) Law of diminishing marginal returns
Non-Linear Word Transportation Find optimal flows 𝐹= 𝑓 𝑖𝑗
Non-Linear Word Transportation Document Word Capacity 𝑐 𝑖 = 𝑡𝑓 𝑖 +𝑢 𝑐𝑓 𝑖 |𝐶| 𝐷 +𝑢 Transportation Profit 𝑟 𝑖𝑗 = 𝑐𝑜𝑠 𝑤 𝑖 𝑑 , 𝑤 𝑖 𝑞 =max(𝑐𝑜𝑠 𝑤 𝑖 𝑑 , 𝑤 𝑖 𝑞 ,0)
Transportation Profit Risk parameter 𝛼 exactly word > semantically related word multiple times “salmon” and “fish”(0.72) The higher 𝛼, the less profit the transportation can bring
Model Summary Non-linear word transportation model Damping Effect Exact and Semantic matching signal Damping Effect Document word capacity Transportation Profit Neighborhood pruning 𝑉 × 𝑄 (e.g. kNN)
Model Discussion word alignment effect due to the relaxation of constraints on the query side and the marginal diminishing effect a document will be assigned a higher score interpret more distinct query words
Semantic Matching Query Expansion Latent Models local analysis are orthogonal to our work Latent Models represents the document as a bag of word embeddings Statistical Translation models more flexibility, multiple feature in estimation
Word Mover’s Distance NWT WMD Relevance between queries and documents Maximum profit and non-linear problem WMD Dissimilarity between documents Minimum cost and linear transportation problem
Experiments
Word Embedding and Evaluation Word Embeddings Corpus Specific(CBOW and Skip-Gram) Corpus Independent(Glove) Evaluation Measures MAP, NDCG@20 and P@20
Retrieval Performance and analysis
Case Studies Named Entities Ambiguous Acronyms “brazil america relation” “argentina” and ”spain” for “brazil” “europe” and ”africa” for “america” Ambiguous Acronyms “Find Information on taking the SAT college entrance exam” “fri”, “tue” and “wed”
Impact of Word Embeddings
Different Dimensionality
Indexed Neighbor Size
Linear vs. Non-Linear
Conclusions Transportation based on the BoWE capture detailed semantic matching signals The non-linear formulation relaxation of constraints and the margin diminishing effect The flexibility in model definition word capacity and transportation profit