Document Databases for Information Management Gregor Erbach FTW, Wien DFKI, Saarbrucken ETL, Tsukuba

Slides:



Advertisements
Similar presentations
Metadata in Carrot II Current metadata –TF.IDF for both documents and collections –Full-text index –Metadata are transferred between different nodes Potential.
Advertisements

Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Applications Chapter 9, Cimiano Ontology Learning Textbook Presented by Aaron Stewart.
Information Retrieval in Practice
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Architecture of a Search Engine
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Xyleme A Dynamic Warehouse for XML Data of the Web.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
1 COS 425: Database and Information Management Systems XML and information exchange.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Properties of Text CS336 Lecture 3:. 2 Information Retrieval Searching unstructured documents Typically text –Newspaper articles –Web pages Other documents.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Overview of Search Engines
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
Information Retrieval in Practice
On the Use of Regular Expressions for Searching Text Charles L.A. Clarke and Gordon V. Cormack Fast Text Searching.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
Search Engines and Information Retrieval Chapter 1.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
1 The BT Digital Library A case study in intelligent content management Paul Warren
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
12 July 2002Colloquim on Applications of Natural Langauge Corpora, Saarland University Domain-specific Web Corpora and their Applications Gregor Erbach.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
The Development of a search engine & Comparison according to algorithms Sungsoo Kim Haebeom Lee The mid-term progress report.
Flexible Text Mining using Interactive Information Extraction David Milward
Abstract Question answering is an important task of natural language processing. Unification-based grammars have emerged as formalisms for reasoning about.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Natural language processing tools Lê Đức Trọng 1.
Structure of IR Systems INST 734 Module 1 Doug Oard.
Jennifer Widom XML Data Introduction, Well-formed XML.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
MIT Artificial Intelligence Laboratory — Research Directions The START Information Access System Boris Katz
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
XML Extensible Markup Language
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Information Retrieval in Practice
Search Engine Architecture
Information Retrieval and Web Search
Information Retrieval and Web Search
Social Knowledge Mining
XML Data Introduction, Well-formed XML.
How to publish in a format that enhances literature-based discovery?
Inf 722 Information Organisation
Lecture 8 Information Retrieval Introduction
Information Retrieval and Web Design
Topic: Semantic Text Mining
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Document Databases for Information Management Gregor Erbach FTW, Wien DFKI, Saarbrucken ETL, Tsukuba

IM = DM? Is Information Management the same as Document Management? –No, because the relevant information may be distributed across several documents, or may only be a small part of a document Then what is information management? –Extraction, storage, indexing and retrieval of information units contained in documents.

IM Applications Document Retrieval Routing Question Answering Factual Database Construction Summarisation

Document Annotation Document Annotation adds information to documents Annotation Formats: SGML, XML, LaTeX,... Annotation Standards: HTML, NITF, TEI, CES, GDA, Map Task, TreeBank, DublinCore

Formal Properties of XML Tree structures nodes with attribute/value pairs node content is a string which can contain XML trees nodes can have identifiers no type hierarchy

Language Technologies Think of language technologies as processes that add annotations to documents, based on an analysis of the documents' linguistic content. This point of view allows a uniform treatment of human-generated and LT- generated annotations.

Document-Level LT Language Identification Categorisation Summarisation All of these can be applied to parts of documents also.

Collection-Level LT Clustering Topic detection and tracking Multi-document summarisation

Fine-Grained LT Morphology Part-of-speech Tagging (shallow) parsing coreference resolution information extraction

LT and Document Annotation (Annotated) Text Document LT Annotated Text Document

Information Retrieval Retrieval of information units in response to an information need How is the information need stated (keywords, questions, examples)? How is the information need represented? How are information units represented? How are the representations matched?

How are documents represented? XML trees index of word/phrase occurrences index of relations (represented as feature structures) word, phrase, relation index should have pointers to text locations

How are queries represented? Words / phrases relations (expressed as feature structures)

How are representations matched? Unification Apparent mismatches between query and representation can be resolved by relaxation of the query. Required inference by forward or backward chaining, as required.

Research Issues Relevance ranking for feature-structure based queries Efficient indexing and matching of feature structures is required (  fast unification) Information content (ontologies) to be represented in the formalism