Download presentation
Presentation is loading. Please wait.
Published byAmy Horn Modified over 9 years ago
1
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it
2
Overview Keyword-based IR and early conceptual approaches Keyword-based IR and early conceptual approaches Context and concepts in modern topical IR Context and concepts in modern topical IR Emerging IR tasks requiring knowledge structures Emerging IR tasks requiring knowledge structures Research at FUB Research at FUB Conclusions Conclusions
3
DocumentsQuery Vectors of weighted keywords Vector of weighted keywords Retrieved documents Matching Vector-based IR
4
Term weighting tf.idf and vector space model (Salton) very popular in70’s and 80’s BM25 (Robertson) has been the state of the art in the 90’s Several recent term-weighting functions based on statistical language modeling (Ponte, Lafferty) A new weighting framework based on deviation from randomness + information gain (FUB + UG)
6
Inherent limitations of keyword-based IR Vocabulary problem Vocabulary problem Relations are ignored Relations are ignored
7
Early approaches to conceptual IR n-grams n-grams (Salton 1975, Maarek 1989) parse tree parse tree (Dillon 1983, Metzler 1989) case relations case relations (Fillmore 1968, Somers 1987) conceptual graphs conceptual graphs (Dick 1991)
8
Why early conceptual IR not successful No best representation scheme No best representation scheme Manual coding too costly Manual coding too costly Automated coding too hard Automated coding too hard Training required both for the indexer and the user Training required both for the indexer and the user Effectiveness not clearly demonstrated Effectiveness not clearly demonstrated Retrieval task often not appropriate Retrieval task often not appropriate
9
Overview Vector-based IR and early conceptual approaches Vector-based IR and early conceptual approaches Context and concepts in modern topical IR Context and concepts in modern topical IR Emerging IR tasks requiring knowledge structures Emerging IR tasks requiring knowledge structures Research at FUB Research at FUB Conclusions Conclusions
10
Evolution of topical IR Very short queries Very short queries Heterogeneous collections Heterogeneous collections Unreliable sources Unreliable sources Interactive sessions Interactive sessions
11
Indexing DocsQueryContextVisualization Ranking Use Indexing Interaction Model of modern topical IR
13
Performance of retrieval feedback versus query difficulty
14
Ranking based on interdocument similarity Cluster hypothesis (van Rijsbergen 1978) Approaches - Matching the query against document clusters (Willet 1988) - Matching the query against transformed document representations (GVSM, Wong 1987, LSI, Deerwester 1990) representations (GVSM, Wong 1987, LSI, Deerwester 1990) - Computing the conceptual distance between query and documents (Order-theoretical ranking, Carpineto 2000) documents (Order-theoretical ranking, Carpineto 2000)
15
Order-theoretical ranking NNS 0 FINANCE (Query) 1 NNS FINANCE CREDIT KBS (D7) 4 KBS 1 NNS FINANCE BANK ACCOUNT (D1) 1 NNS 1 FINANCE 2 NNS BANK 2 NNS BANK ACCOUNT (D3) 2 FINANCE CREDIT KBS (D4) 3 CREDIT KBS (D5) 3 NNS BANK RIVER (D2) 3 BANK 4 KBS WATERS (D6)
16
Performance of order-theoretical ranking Better than hierarchic clustering and comparable to best matching on the whole collection Markedly better than both hierarchic clustering and best matching on non-matching relevant documents Order-theoretical ranking does not scale up well but it is synergistic with best matching document ranking
17
Overview Vector-based IR and early conceptual approaches Vector-based IR and early conceptual approaches Context and concepts in modern topical IR Context and concepts in modern topical IR Emerging IR tasks requiring knowledge structures Emerging IR tasks requiring knowledge structures Research at FUB Research at FUB Conclusions Conclusions
18
Question Answering Task: Closed-class questions in unrestricted domains with no guarantee of answer and result possibly scattered over multiple documents
19
Question Answering Approach: 1.Recognize type of queries 2.Retrieve relevant documents 3.Find sought entities near question words 4.Fall back to best-matching passage retrieval in case of failure
20
Web Information Retrieval
21
Current tasks: named-entity finding task topic distillation task Approach: 1.Use of multiple methods 2.Combination of results via interpolation and normalization schemes
22
XML document retrieval Goal: Use document structure to improve precision and recall of unstructured queries “concerts this weekend at Sofia under 20 euros” Approaches: Automatic inference of query structure Semi-automatic query annotation Hybrid query languages
23
Overview Vector-based IR and early conceptual approaches Vector-based IR and early conceptual approaches Context and concepts in modern topical IR Context and concepts in modern topical IR Emerging IR tasks requiring knowledge structures Emerging IR tasks requiring knowledge structures Research at FUB Research at FUB Conclusions Conclusions
24
Recommender systems “Related keyword” feature versus Context-dependent query reformulation
27
Combining text retrieval and text mining with concept lattices Integration of multiple search strategies (querying, browsing, thesaurus climbing, bounding) into a unique Web interface Goal
28
The use of conceptual structures surfaces in traditional topic relevance retrieval and it is at the heart of many non-topical retrieval tasks Towards conceptual search Conclusions Understand term meaning Adapt to the user Can translate between applications Explainable Capable of filtering and summarization
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.