Presentation is loading. Please wait.

Presentation is loading. Please wait.

? Searching the WWW today document retrieval keyword based search user

Similar presentations


Presentation on theme: "? Searching the WWW today document retrieval keyword based search user"— Presentation transcript:

1

2 ? Searching the WWW today document retrieval keyword based search user
document(s) query result search engine text images video audio ? Searching today: Query consits out of (some) keywords [sometimes connected by boolean operators) Search engine provides list of ‚related‘ documents (related = containment of keyword) How to connect keywords to documents?  search engine index Documents are analysed by information retrieval techniques (data mining) Keywords are extracted from text (word stemming, relevance weight, etc…) PROBLEMS: Language inherent polysemy (ambiguity, synonyms, …) Missing keyword context Document context not extracted by data mining Semantic web relies on metadata [i.e. data that describes documents in machine accessible format] Metadata solves disambiguation problems Metadata might solve context problems But….from where to get the metadata?? keywords + metadata documents to read search engine index document content

3 There is already (often unused) Metadata
semantic annotation document index TOC references Document contains inherent semantic annotation Index  internal conceptual knowledge TOC  internal structural knowledge References  referential knowledge index (conceptual knowledge) TOC (structural knowledge) references (referential knowledge) basis of semantic document annotation

4 Documents, Tags, and Annotations
<b> Lorem</b> ipsum dolor sit amet, <br/>consectetuer adipiscing elit. <br/> <a href=“……“ title=“..“/> Sed orci purus, semper eget, <br/> tristique quis, adipiscing <br/> <!--<rdf:annotation user=“…“ tag=“…“…/> posuere, erat. Aenean <br/> ultricies odio id sem. Sed <br/><h1> nec felis sit amet ante </h1> tempor sagittis. Vestibulum <br/> est nunc, lobortis cursus, <br/> semper vel, pulvinar sed, <br/> odio. Vestibulum blandit… document consists of annotations  strings associate distinguished document parts with metadata Documents might carry inherent semantic annotation (tags)  for metadata association  for user/creator association Metadata can be provided by author [ document inherent] or by user [ document addendum, see Web 2.0] smallest addressable document unit

5 Documents, Tags, and Annotations
Examples smallest document unit: word higher order units: sentence, paragraph, page, chapter, part, … book Different kind of contextes Syntactic structural: physical units Semantic structural: logical units smallest document unit: pixel higher order units: blocks, macro blocks, slices, frames, objects, scenes, acts,… video

6 Logical Document Structure
Table of Contents (TOC) from structural tags page 1 page 2 page 3 page 4 page 5 page 6 page 7 page 8 page 9 page 10 page 11 page 13 page 12 page 14 page 15 1.1 Introduction 1.2 Definition of the basic formalism 1.3 Reasoning Algorithms 2.1 Introduction 1.1 OR-Branching finding a model 1. Basic Description Logics 2. Complexity of Reasoning can be specified explicitely (structural information) or implicitely (formatting information) can be associated with names/titles can be used for document navigation Also for video: Automatic scene detection (start/end point) Automatic segmentation of video Annotate segments manually / or with multimedia mining techniques Encode Table of Contents (TOC) with MPEG-7 encoding Basic Description Logics 1 Introduction 1 Definition of the basic formalism 5 Reasoning algirithms 7 Complexity of Reasoning 11 Introduction 11 OR-Branching: finding a model 12

7 Conceptual Document Structure
root SUB SUB rodent field mouse conceptual structure SEA SUB SUB SUB SUB SUB SUB hamster dentision beaver habitat meadow vole prairie vole SUB SUB Using explicitely given conceptual document structure together with logical document structure to define the document index SUB  subsumption SEA  see also Can be considered as a kind of ontological skeleton Covers concepts of the document and their relationships Using implicitely given conceptual structure requires understanding of document content Explicitely given conceptual structure (only a small fraction of entire conceptual structure) can be defined by document author (e.g., index entries, external metadata) document users (e.g., social tagging) The conceptual document structure can also be used for document navigation incisor rotation of teeth 1 2 3 4 5 6 7 8 9 10 11 13 12 14 15 16 Paragraph1.1 Paragraph1.2 Paragraph1.3 logical structure

8 Conceptual Document Structure
rodent, 1 beaver, 10, 11 dentision incisor, 4 rotation of teeth, 5 hamster, 2 - 4 see also meadow vole field mouse, 13, 15 prairie vole, 16 meadow vole, 16 habitat, 15 see also rodent Conceptual Structure also comprises explicit links to complex Ontologies that describe the meaning of the documents content With simple data mining techniques, a fraction of the conceptual structure of multimedia docuzments (e.g. MPEG videos streams, lecture videos) can also be made explicit  the other talk about automated annotation Document index (MPEG Index) can be encoded as MPEG7 File and can also be used for navigation Document Index

9 Referential Document Structure
Internal links: References between parts of the same document e.g., see / see also, footnotes, figures, comments… External links: References between different documents e.g., bibliographic references and citations,… Only a fraction of the entire referential document structure is given explicitely Graph Visualization (Link Graph) together with logical document structure  table of figure, references, … Implicitely given referential structure  requires text understanding capabilities

10 The Structures in Concert
root SUB SUB conceptual structure rodent field mouse SUB SUB SUB SUB SUB SUB hamster dentision beaver habitat meadow vole prairie vole SUB SUB Given, you want to know something about rodents……..what do I have to read (in 1 document /book or even within a collection of documents) I want to read as few as possible, but I want to understand everything as good as possible incisor rotation of teeth referential structure 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 logical structure Paragraph1.1 Paragraph1.2 Paragraph1.3

11 Thank your for your attention!
Conclusion Documents have intrinsic logical, conceptual and referential characteristics There are complex dependencies among the document structures carrying those characteristics Logical, conceptual, and referential structures along with their interdependencies should be made explicit ( meta data) Applications should maintain and use those meta data, e.g. for authoring navigation searching Thank your for your attention!

12 The Structures in Concert
All three structures in concert can be used for Document reading tours (extended document retrieval) goal oriented selections of documents (what is mandatory to understand the topic under consideration?) with additional reading directions (which document unit to read in what order) by also considering user annotations, personalized reading tours can be suggested (dependent on prior knowledge of the user) Collaborative authoring (avoiding ambiguities or duplicates, support index generation and cross referencing,…) Compute answers… (with the help of sophisticated reasoning and additional means of data mining and content understanding)

13 Related Work  Topic Maps
Topic Maps represent concepts and relationships (conceptional structure and (part of) referential structure) Topic Map rodent part of 1 type association type role whole part 10 topic beaver dentision Topic Maps do not include the logical document structure 11 association Resources


Download ppt "? Searching the WWW today document retrieval keyword based search user"

Similar presentations


Ads by Google