Document Databases for Information Management Gregor Erbach FTW, Wien DFKI, Saarbrucken ETL, Tsukuba

Document Databases for Information Management Gregor Erbach FTW, Wien DFKI, Saarbrucken ETL, Tsukuba erbach@ftw.at

IM = DM? Is Information Management the same as Document Management? –No, because the relevant information may be distributed across several documents, or may only be a small part of a document Then what is information management? –Extraction, storage, indexing and retrieval of information units contained in documents.

IM Applications Document Retrieval Routing Question Answering Factual Database Construction Summarisation

Document Annotation Document Annotation adds information to documents Annotation Formats: SGML, XML, LaTeX,... Annotation Standards: HTML, NITF, TEI, CES, GDA, Map Task, TreeBank, DublinCore

Formal Properties of XML Tree structures nodes with attribute/value pairs node content is a string which can contain XML trees nodes can have identifiers no type hierarchy

Language Technologies Think of language technologies as processes that add annotations to documents, based on an analysis of the documents' linguistic content. This point of view allows a uniform treatment of human-generated and LT- generated annotations.

Document-Level LT Language Identification Categorisation Summarisation All of these can be applied to parts of documents also.

Collection-Level LT Clustering Topic detection and tracking Multi-document summarisation

Fine-Grained LT Morphology Part-of-speech Tagging (shallow) parsing coreference resolution information extraction

LT and Document Annotation (Annotated) Text Document LT Annotated Text Document

Information Retrieval Retrieval of information units in response to an information need How is the information need stated (keywords, questions, examples)? How is the information need represented? How are information units represented? How are the representations matched?

How are documents represented? XML trees index of word/phrase occurrences index of relations (represented as feature structures) word, phrase, relation index should have pointers to text locations

How are queries represented? Words / phrases relations (expressed as feature structures)

How are representations matched? Unification Apparent mismatches between query and representation can be resolved by relaxation of the query. Required inference by forward or backward chaining, as required.

Research Issues Relevance ranking for feature-structure based queries Efficient indexing and matching of feature structures is required (  fast unification) Information content (ontologies) to be represented in the formalism

Document Databases for Information Management Gregor Erbach FTW, Wien DFKI, Saarbrucken ETL, Tsukuba

Similar presentations

Presentation on theme: "Document Databases for Information Management Gregor Erbach FTW, Wien DFKI, Saarbrucken ETL, Tsukuba"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Document Databases for Information Management Gregor Erbach FTW, Wien DFKI, Saarbrucken ETL, Tsukuba

Similar presentations

Presentation on theme: "Document Databases for Information Management Gregor Erbach FTW, Wien DFKI, Saarbrucken ETL, Tsukuba"— Presentation transcript:

Similar presentations

About project

Feedback