Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University

Slides:

Advertisements

Similar presentations

QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.

Advertisements

DAISY Dutch lAnguage Investigation of Summarization technologY Katholieke Universiteit Leuven Rijksuniversiteit Groningen Q-go.

Lucene Part3‏. Lucene High Level Infrastructure When you look at building your search solution, you often find that the process is split into two main.

Information Retrieval in Practice

T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)

ISP 433/533 Week 2 IR Models.

Information Retrieval and Extraction -- Course Introduction Chia-Hui Chang National Central University

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

Basi di dati distribuite Prof. M.T. PAZIENZA a.a

Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.

Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.

ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.

Digital Library Service Integration (DLSI) --> Looking for Collections and Services to be DLSI Testbeds

1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.

Query Biased Snippet Generation in XML Search Yi Chen Yu Huang, Ziyang Liu, Yi Chen Arizona State University.

University of Kansas Data Discovery on the Information Highway Susan Gauch University of Kansas.

Overview of Search Engines

1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.

Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.

Databases & Data Warehouses Chapter 3 Database Processing.

Information Retrieval – and projects we have done. Group Members: Aditya Tiwari ( ) Harshit Mittal ( ) Rohit Kumar Saraf ( ) Vinay.

Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.

CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)

AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

資訊檢索與擷取 Information Retrieval and Extraction

Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.

Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.

Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.

Querying Structured Text in an XML Database By Xuemei Luo.

RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah

1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)

Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.

TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.

By Chung-Hong Lee ( 李俊宏 ) Assistant Professor Dept. of Information Management Chang Jung Christian University 資料庫與資訊檢索系統的整合 - 一個文件資料庫系統的開發研究.

Distributed Information Retrieval Using a Multi-Agent System and The Role of Logic Programming.

Chapter 6: Information Retrieval and Web Search

Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.

Introduction to Digital Libraries hussein suleman uct cs honours 2003.

Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.

Search Engine Architecture

Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.

1 FollowMyLink Individual APT Presentation Third Talk February 2006.

2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )

Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.

Hsin-Hsi Chen1-1 Chapter 1 Introduction Hsin-Hsi Chen （陳信希）國立台灣大學資訊程學系.

Information Retrieval

For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.

FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.

DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.

Using Semantic Relations to Improve Information Retrieval

Overview of Statistical NLP IR Group Meeting March 7, 2006.

AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.

General Architecture of Retrieval Systems 1Adrienn Skrop.

Multi-Source Information Extraction Valentin Tablan University of Sheffield.

Information Retrieval in Practice

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Search Engine Architecture

Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin

Information Retrieval and Web Search

Multimedia Information Retrieval

Thanks to Bill Arms, Marti Hearst

Content Augmentation for Mixed-Mode News Broadcasts Mike Dowman

Information Retrieval and Web Design

Presentation transcript:

Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University

Information Retrieval (IR) Problem Definition and Generic IR system select and return to the user desired documents from a large set of documents in accordance with criteria specified by the user Functions  Document search the selection of documents from an existing collection of documents  Document routing the dissemination of incoming documents to appropriate users on the basis of user interest profiles

Document Detection: Search

Search: Text Operation Document Corpus  the content of the corpus may have significant performance in some applications Preprocessing of Document Corpus  stemming  stop words removing  phrases, multi-term items ...

Search: Indexing Building Index from Stems  key place for optimizing run-time performance  cost to build the index for a large corpus Document Index  a list of terms, stems, phrases, etc.  frequency of terms in the document and corpus  frequency of the co-occurrence of terms within the corpus  index may be as large as the original document corpus

Search: Query Operation Detection Need  the user ’ s criteria for a relevant document Convert Detection Need to System Specific Query  first transformed into a detection query, and then a retrieval query.  detection query: specific to the retrieval engine, but independent of the corpus  retrieval query: specific to the retrieval engine, and to the corpus

Search: Query Model Compare query with index Rank the list of relevant documents  Return the top ‘ N ’ documents

Routing

Routing: Detection Needs Profile of Multiple Detection Needs  A Profile is a group of individual Detection Needs that describes a user ’ s areas of interest.  All Profiles will be compared to each incoming document (via the Profile index).  If a document matches a Profile the user is notified about the existence of a relevant document.

Routing: Query Index Convert detection need to system specific query Building Index from Queries  The index will be system specific and will make use of all the preprocessing techniques employed by a particular detection system Routing Profile Index  similar to build the corpus index for searching  the quantify of source data (Profiles) is usually much less than a document corpus  Profiles may have more specific, structured data in the form of SGML tagged fields

Routing: Document Preprocessing Document to be routed  A stream of incoming documents is handled one at a time to determine where each should be directed  Routing implementation may handle multiple document streams and multiple Profiles Preprocessing of Document  A document is preprocessed in the same manner that a query would be set-up in a search  The document and query roles are reversed compared with the search process

Routing: Ranking Compare Document with Index  Identify which Profiles are relevant to the document  Given a document, which of the indexed profiles match it? Resultant List of Profiles  The list of Profiles identify which user should receive the document

Summary Generate a representation of the meaning or content of each object based on its description. Generate a representation of the meaning of the information need. Compare these two representations to select those objects that are most likely to match the information need.

DocumentsQueries Document Representation Query Representation Comparison Basic Architecture of an Information Retrieval System

Research Issues Issue 1  What makes a good document representation?  How can a representation be generated from a description of the document?  What are retrievable units and how are they organized? Issue 2 How can we represent the information need and how can we acquire this representation?  from a description of the information need or  through interaction with the user?

Research Issues (Continued) Issue 3 How can we compare representations to judge likelihood that a document matches an information need? Issue 4 How can we evaluate the effectiveness of the retrieval process?

Information Extraction Definition An information extraction system is a cascade of transducers or modules that at each step add structure and often lose information, hopefully irrelevant, by applying rules that are acquired manually and/or automatically.

Information Extraction (Continued) What are the transducers or modules? What are their input and output? What structure is added? What information is lost? What is the form of the rules? How are the rules applied? How are the rules acquired?

Example: Parser Transducer: parser Input: the sequence of words or lexical items Output: a parse tree Information added: predicate-argument and modification relations Information lost: no Rule form: unification grammars Application method: chart parser Acquisition method: manually

Modules Text Zoner turn a text into a set of text segments Preprocessor turn a text or text segment into a sequence of sentences, each of which is a sequence of lexical items, where a lexical item is a word together with its lexical attributes Filter turn a set of sentences into a smaller set of sentences by filtering out the irrelevant ones Preparser take a sequence of lexical items and try to identify various reliably determinable, small-scale structures

Modules (Continued) Parser input a sequence of lexical items and perhaps small-scale structures (phrases) and output a set of parse tree fragments, possibly complete Fragment Combiner turn a set of parse tree or logical form fragments into a parse tree or logical form for the whole sentence Semantic Interpreter generate a semantic structure or logical form from a parse tree or from parse tree fragments

Modules (Continued) Lexical Disambiguation turn a semantic structure with general or ambiguous predicates into a semantic structure with specific, unambiguous predicates Coreference Resolution, or Discourse Processing turn a tree-like structure into a network-like structure by identifying different descriptions of the same entity in different parts of the text Template Generator derive the templates from the semantic structures