LOGO XML Keyword Search Refinement 郭青松. Outline  Introduction  Query Refinement in Traditional IR  XML Keyword Query Refinement  My work.

Slides:



Advertisements
Similar presentations
Even More TopX: Relevance Feedback Ralf Schenkel Joint work with Osama Samodi, Martin Theobald.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Information Retrieval in Practice
Search Engines and Information Retrieval
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
IR Models: Structural Models
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Information Retrieval in Practice
1 Query Language Baeza-Yates and Navarro Modern Information Retrieval, 1999 Chapter 4.
Computer comunication B Information retrieval. Information retrieval: introduction 1 This topic addresses the question on how it is possible to find relevant.
Query Reformulation: User Relevance Feedback. Introduction Difficulty of formulating user queries –Users have insufficient knowledge of the collection.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
INEX 2003, Germany Searching in an XML Corpus Using Content and Structure INEX 2003, Germany Yiftah Ben-Aharon, Sara Cohen, Yael Grumbach, Yaron Kanza,
Internet Resources Discovery (IRD) Advanced Topics.
MARS: Applying Multiplicative Adaptive User Preference Retrieval to Web Search Zhixiang Chen & Xiannong Meng U.Texas-PanAm & Bucknell Univ.
CS246 Basic Information Retrieval. Today’s Topic  Basic Information Retrieval (IR)  Bag of words assumption  Boolean Model  Inverted index  Vector-space.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Query Expansion.
Search Engines and Information Retrieval Chapter 1.
DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement and Relevance Feedback.
Modern Information Retrieval Computer engineering department Fall 2005.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
Querying Structured Text in an XML Database By Xuemei Luo.
1 Query Operations Relevance Feedback & Query Expansion.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
ISP 433/533 Week 11 XML Retrieval. Structured Information Traditional IR –Unit of information: terms and documents –No structure Need more granularity.
Chapter 6: Information Retrieval and Web Search
1 Computing Relevance, Similarity: The Vector Space Model.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
Chapter 23: Probabilistic Language Models April 13, 2004.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
C.Watterscsci64031 Probabilistic Retrieval Model.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
1 CS 430: Information Discovery Lecture 21 Interactive Retrieval.
SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.
Information Retrieval in Practice
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Basic Information Retrieval
موضوع پروژه : بازیابی اطلاعات Information Retrieval
Chapter 5: Information Retrieval and Web Search
Retrieval Utilities Relevance feedback Clustering
Zhixiang Chen & Xiannong Meng U.Texas-PanAm & Bucknell Univ.
Introduction to XML IR XML Group.
Presentation transcript:

LOGO XML Keyword Search Refinement 郭青松

Outline  Introduction  Query Refinement in Traditional IR  XML Keyword Query Refinement  My work

Why we need query refinement?  User express their query intention by keywords, but their don’t know how to formulate good query  Lack of experience  Too many expression forms  Unfamiliar with the system  Have no idea about the data  „Query Refinement  Refine the query and get good results

What is Query Refinement?  Query expansion(query reformulation)  Given an ill-formed query from the user, we refine the query and help the user to better retrieve documents.  The goal is to improve precision and/or recall.  Example:  “cars”  “car”, “automobile”, “auto”

XML Search  Tag + Keyword search  book: xml  Path Expression + Keyword search (CAS Queries)  /book[./title about “xml db”]  Structure query  XPath, XQuery  Keyword search (CO Queries)  “xml”

XML Keywords Search VS IR  IR  Flat HTML pages  Whole page returned  XML  Model(tree 、 graph)  Structural(semi-structural)  Semantic-based query(LCA, SLCA…)  Information fragment returned

Need of XML Keyword Query Refinement  Hard to know the XML content  Especially big xml document  Information fragments(LCA\SLCA)  Easily affect the results(Precision )  Huge difference of query results  IR style refinement methods is not suitable for xml  Only content be considered  Need structure information to form a good query

Outline  Introduction  Query Refinement in Traditional IR  XML Keyword Query Refinement  My work

Tasks  Spelling Correction  Word Splitting/Word Merging  Phrase Segmentation  Word Stemming  Acronym Expansion  Add/Delete Terms  Substitution

Classes of Query Refinement  Relevance feedback  Users mark documents(relevant, nonrelevant)  Reweight the terms in the query  Automatic query Refinement  System analysis the relevance of documents and query, give refined query automatically  Global analysis  Local analysis

Relevance Feedback  Began in the 1960s  Improvement in recall and precision  Basic process as follows 1.The user issues their initial query 2.The system returns an initial result set. 3.The user then marks some returned documents as relevant or nonrelevant. 4.The system then re-weights the terms and refine the query results

Relevance Feedback Models  Boolean.  Terms appear in document: relevance  Vector Space.  q=(t 1, t 2,…, t n ) d=(w 1, w 2,…, w n )  Probabilistic.  Relevance of a query and documents evaluate as probability  Probabilistic ranking principle

Rocchio algorithm for vector-space model  q m :refined query vector  q 0 : the original query vector  D r : relevant documents, D nr : nonrelevant documents  α, β, γ: weights attached to each term Average relevant- document vector Average non-relevant document vector

Global analysis(1)  Using all documents to compute the similarity of query q and terms in the documents  Similarity Thesaurus based

Global analysis(2) Select r terms with highest sim value and adding into initial query, reformulate the new query Similarity of terms Query vector Similarity of query and terms

Local analysis  Local analysis: Using initial query results(especially documents front,local documents) to refine the query  Local clustering  Clustering the term of local documents  Query refined with the relevant cluster  Similarity of terms in query and terms in documents  Local context analysis(LCA)  Get the most similar term in local documents with the query q to expanse  Similarity of q and terms in documents Company name

Outline  Introduction  Query Refinement in Traditional IR  XML Keyword Query Refinement  My work

XML Refinement Manner(1)  Query refined form  Keywords query  New Keywords Query Treat as traditional IR problem IR with XML Keyword search Semantics  Keywords  Structural Query  User participant  Manually(User Interactive ) Structural Feedback  Automatic Company name

XML Refinement Manner (2)  Manually Refined to new Keywords Query  IR(consider the structure of xml)  Manually Transform to Structural Query  Relevance Feedback  Automatic Refined to new Keywords Query  Lu jiaheng:  Automatic Transform to Structural Query  NLP

Automatic Refined to new Keywords Query(1)  Query  Refined Query  Rule based  Operation  Term merging:  Term splitting:  Term substitution:  Term deletion Original queryRefined query IR,2003,MikeInformation Retrieval,2003,Mike Mike, publicationMike, publications Database, paperDatabase, in-proceedings XML, John,2003XML, John machin, learnmachine, learning Hobby, news, paperHobby, newspaper On, line, data, baseOnline, database

Automatic Refined to new Keywords Query(2)  Ranking Refined query candidates set S(RQ)  Refinement cost  Cost: the step of “op” from “Q” to “RQ”  Dynamic programming  Efficient Refinement Algorithms  Avoid the multiple scan invert list  stack-based,stack-based, short-list-eager approach  RQ candidates have the same refinement cost  Q={XML, Jim, 2001}  {XML, 2001}, {Jim, 2001} or {XML, Jim}

NLPX  Natural Language Query (NLQ)  NEXI  NEXI(Narrowed Extended XPath I)  //A[about(//B,C)]  A: path expression,  B :relative path expression to A  C is the content requirement.  ‘about’ clause represents an individual information request.

NLPX—Lexical and Semantic Tagging  structural words: content requirements  boundary words: Path expression  instruction words  R :return request, S :support request. Find sections about compression in articles about information retrieval Tagged: Find/XIN sections/XST about/XBD compression/NN in/IN articles/XST about/XBD information/NN retrieval/NN

NLPX—Template Matching  most queries correspond to a small set of patterns  formulate grammar templates with patterns Query: Request+ Request : CO_Request | CAS_Request CO_Request: NounPhrase+ CAS_Request: SupportRequest | ReturnRequest SupportRequest: Structure [Bound] NounPhrase+ ReturnRequest: Instruction Structure [Bound] NounPhrase+ Grammar Templates Request 1 Request 2 Structural: /article/sec /articlec Content: compression information retrieval Instruction: R S Information Requests

NLPX—NEXI Query Production  merge the information request into NEXI query.  A[about(.,C)]  A :the request structural attribute and  C : the request content attribute. //article[about(.,information retrieval)]//sec[about (.,compression)]

Query generation process  Create target component  Break up the query into units  Generate initial target  combinations of input target components  Generate queries  modifying a target component  combing two components

Initialization  Breaks up the input query into terms  Structure( XML tags or attributes)  Content term(refer to text)  Create component  Structure term  unbound target  Content term  binding to a bound target  Probability enumeration

Target component and target sets {//author[~’jennifer widom’]} {//editor[~’jennifer widom’]} {//title[~’jennifer widom’]} {//article} {//inproceedings} Jennifer widom papers {//article} {//author[ ∼ ‘jennifer widom’]} {//inproceedings} {//author[ ∼ ‘jennifer widom’]} {//inproceedings} {//editor[ ∼ ‘jennifer widom’]} {//article} {//editor[ ∼ ‘jennifer widom’]} {//inproceedings} {//title[ ∼ ‘jennifer widom’]} {//article} {//title[ ∼ ‘jennifer widom’]} Query: Papers by jennifer widom

Transformation Operators(1)  Aggregation: merge targets with same tag  {//a}, {//a[~’x’]}  {//a[~’x’]}  {//a[~’x’]}, {//a[~’y’]}  {//a[~’x y’]}  Prefix expansion: add an ancestor condition  {//b}  {//a//b}  {//b[~’x’]}  {//a//b[~’x’]}  Ordering: combine targets  {//a}, {//b}  {//a//b} or {//a[//b]}  {//a}, {//b[~’x’]}  {//a//b[~’x’]} or {//a[//b[~’x’]]}

Conclusion  Two stronger assumption  Keyword query non-ambiguity  Availability of XML thesaurus  Accuracy:  terms classification didn’t consider specific XML context  Time costly:  Term classification  Targets create scan the XML documents

Outline  Introduction  Query Refinement in Traditional IR  XML Keyword Query Refinement  My work

LOGO