Even More TopX: Relevance Feedback Ralf Schenkel Joint work with Osama Samodi, Martin Theobald
TopX Results with INEX ,000 XMLified English Wikipedia articles 107 topics with –structural query (CAS) –nonstructural (aka keyword) query (CO) –informal description of information need –assessed answers (text passages) Evaluation metric based on recall/precision: fraction of relevant characters retrieved 1% recall result list C: #characters retrieved R: #relevant characters retrieved P[0.01]=R/C
Results with INEX 2007 structure queries keyword queries Structural constraints can improve result quality document retrieval improved structure queries improved keyword queries (unchecked)
Users vs. Structural XML IR //professor[contains(.,SB) and contains(.//course,IR] I need information about a professor in SB who teaches IR. Structural query languages do not work in practise: Schema is unknown or heterogeneous Language is too complex Humans dont think XPath Results often unsatisfying System support to generate good structured queries: User interfaces (advanced search) Natural language processing Interactive query refinement
Relevance Feedback for Interactive Query Refinement 1. User submits query … 2. User marks relevant and nonrelevant docs 3. System finds best terms to distinguish between relevant and nonrelevant docs query evaluation XML IR index Fagin IR index 4. System submits expanded query XML not(Fagin) Feedback for XML IR: Start with keyword query Find structural expansions Create structural query
Tag+Content of descendants sec Semistructured data… Structural Features article body sec subsec XML has evolved… frontmatterbackmatter sec subsec pp p With the advent of XSLT… author Baeza-Yates Content of result User marks relevant result Possible features: Tag+Content of ancestors Tag+Content of descen- dants of ancestors C: XML D: p[XSLT] A: sec[data] AD: article//author[Baeza]
where r f number of relevant results with f R number of relevant results ef f number of elements that contain f Enumber of all elements Feature Selection Compute Robertson-Sparck-Jones weight for each feature (also used as weight in query): Order features by Robertson Selection Value: where p f probability that f occurs in relevant result, q f probability that f occurs in nonrelevant result
Query Construction C: XML D: p[XSLT] A: sec[data] AD: article//author[Baeza] Initial query: query evaluation Tag+Content of descendants Content of result Tag+Content of ancestors Tag+Content of descen- dants of ancestors *[query evaluation]*[query evaluation XML] p[XSLT] sec[data] article author[Baeza] needs schema information! descendant- or-self axis
More Fancy Query Construction *[query evaluation]*[query evaluation XML] p[XSLT] sec[data] article author[Baeza] No valid NEXI query, but XPath (ancestor axis) DAG queries in TopX needs disjunctive evaluation
Example: pyramids of egypt
Architecture TopX Search Engine query + results feedback Weighting + Selection expanded query query results C Module D Module A Module Candidate Classes AD Module INEX Tools & Assessments
RF in the TopX 2.0 Interface
Evaluation Methodology Goal: avoid training on the data Freeze known results at the top Remove known results+X from the collection –resColl-result: remove results only (~doc retrieval) –resColl-desc: remove results+descendants –resColl-anc: remove results+ancestors –resColl-path: remove results+desc+anc –resColl-doc: remove whole doc with known results
Evaluation: INEX 2003&2004 INEX collection (IEEE-CS journal and conference articles): –12,107 XML docs with 12 mio. elements –queries with manual relevance assessments 52 keyword queries from 2003 & 2004 with our TopX Search Engine [VLDB05] Baseline run with MAP~0.1, Automatic feedback for top-k from relevance assessments Evaluation ignores results used for feedback and descendants of results (rescoll-desc)
INEX 2003&2004, rescoll-desc All dimensions together are best.Reasonable results for INEX 2005 RF Track
Results for INEX 2005 Track INEX IEEE collection (scientific articles) Feedback for the top-20 from the assessments (with the strict quantisation -> only relevant and nonrelevant) top 10 expansion features runs with top 1500 results MAP with inex_eval (with strict quantisation)
(Some) Results for INEX 2006 RF Track INEX Wikipedia collection Feedback for the top-20 from the assessments (with the generalized quantisation -> graded relevance) top 10 expansion features runs with top 100 results for first 50 topics (time…) MAP with inex_eval (with generalised quantisation) Significance tests (Wilcoxon signed-rank, t-test)
Conclusions Queries with structural constraints to improve result quality Relevance Feedback to create such queries Structure of collection matters a lot