? Searching the WWW today document retrieval keyword based search user

Slides:

Advertisements

Similar presentations

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.

Advertisements

This screen is for reference only Objective: The VP Storyboard Template provides a skeletal structure for VP case design where each slide corresponds to.

OntoBlog: Linking Ontology and Blogs Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of Informatics, Japan 2 Asian.

Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.

Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW st Semantic Authoring and Annotation Workshop Athens,

Modern Information Retrieval Chapter 1 Introduction.

8/28/97Information Organization and Retrieval Files and Databases University of California, Berkeley School of Information Management and Systems SIMS.

Information Retrieval

Web Searching. Web Search Engine A web search engine is designed to search for information on the World Wide Web and FTP servers The search results are.

Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.

CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.

Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.

Information Systems & Semantic Web University of Koblenz ▪ Landau, Germany Semantic Web - Multimedia Annotation – Steffen Staab

WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.

Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.

Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 12 This presentation © 2004, MacAvon Media Productions Hypertext and Hypermedia.

Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.

Introduction to the Semantic Web and Linked Data

Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.

WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.

Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.

Soon Joo Hyun Database Systems Research and Development Lab. US-KOREA Joint Workshop on Digital Library t Introduction ICU Information and Communication.

Basic HTML Document Structure. Slide 2 Goals (XHTML HTML5) XHTML Separate document structure and content from document formatting HTML 5 Create a formal.

Example KCL master page Example sub title (0 x indent) Example description (1 x indent) Introduction (0 x indent) Body text (1 x indent) lorem ipsum dolor.

Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.

Research Introduction to the concept of incorporating sources into your own work.

Visual Information Retrieval

Introduction Multimedia initial focus

XML QUESTIONS AND ANSWERS

Subclinical Hypothyroidism

Logo This could be the poster title

Authors & Affiliations

Hypertext and Hypermedia

Poster Number – Poster Title

Title of Presentation Poster

Multimedia Information Retrieval

Template for a 32” x 54” poster presentation (your title goes here)

Search Techniques and Advanced tools for Researchers

Poster Title Template, Times 72 font

Template for a 3’ x 4’ poster presentation (your title goes here)

Author Name. , Author Name

Type Your Poster Title in Here This is where you should put your name and date Introduction Type your words here: Lorem ipsum dolor sit amet, consectetuer.

DIVISION FOR SOCIAL IMPACT

Information Retrieval

Template for a 4” x 8” poster presentation (your title goes here)

Introduction to Semantic Metadata & Semantic Web

MPEG-7 Video Retrieval using Bayesian Networks

Template for a 43” x 54” poster presentation (your title goes here)

Creative Brief Company: IAN & SON LLP UEN: 712LL1090A

Searching EIT, Author Gay Robertson, 2017.

Type Your Poster Title in Here This is where you should put author names & affiliations Introduction Type your words here: Lorem ipsum dolor sit amet,

Template for a 36” x 54” poster presentation (your title goes here)

Template for a 4” x 4” poster presentation (your title goes here)

Author Name. , Author Name

Type Your Poster Title in Here This is where you should put your name and date Introduction Type your words here: Lorem ipsum dolor sit amet, consectetuer.

Template for a 27” x 54” poster presentation (your title goes here)

Document Subtitle Ut sed magna ac diam euismod vestibulum

Multimedia Information Retrieval

How to publish in a format that enhances literature-based discovery?

YOUR DEPARTMENT LOGO HERE

This could be the poster title

Introduction to Information Retrieval

Type Your Poster Title in Here This is where you should put your name and date Introduction Type your words here: Lorem ipsum dolor sit amet, consectetuer.

Title of Presentation Poster

Kliknij aby dodać tekst

Information Retrieval and Web Design

Poster title This could be your sub-line – Authors, Co-Authors,

PRESENTATION TITLE HERE 3RD LINE OPTIONAL

Poster Title Template, Times 72 font

Petr Novotný Gymnázium Dr. Karla Polesného Znojmo

Presentation transcript:

? Searching the WWW today document retrieval keyword based search user document(s) query result search engine text images video audio … ? Searching today: Query consits out of (some) keywords [sometimes connected by boolean operators) Search engine provides list of ‚related‘ documents (related = containment of keyword) How to connect keywords to documents?  search engine index Documents are analysed by information retrieval techniques (data mining) Keywords are extracted from text (word stemming, relevance weight, etc…) PROBLEMS: Language inherent polysemy (ambiguity, synonyms, …) Missing keyword context Document context not extracted by data mining Semantic web relies on metadata [i.e. data that describes documents in machine accessible format] Metadata solves disambiguation problems Metadata might solve context problems But….from where to get the metadata?? keywords + metadata documents to read search engine index document content

There is already (often unused) Metadata semantic annotation document index TOC references Document contains inherent semantic annotation Index  internal conceptual knowledge TOC  internal structural knowledge References  referential knowledge index (conceptual knowledge) TOC (structural knowledge) references (referential knowledge) basis of semantic document annotation

Documents, Tags, and Annotations <b> Lorem</b> ipsum dolor sit amet, <br/>consectetuer adipiscing elit. <br/> <a href=“……“ title=“..“/> Sed orci purus, semper eget, <br/> tristique quis, adipiscing <br/> <!--<rdf:annotation user=“…“ tag=“…“…/> posuere, erat. Aenean <br/> ultricies odio id sem. Sed <br/><h1> nec felis sit amet ante </h1> tempor sagittis. Vestibulum <br/> est nunc, lobortis cursus, <br/> semper vel, pulvinar sed, <br/> odio. Vestibulum blandit… document consists of annotations  strings associate distinguished document parts with metadata Documents might carry inherent semantic annotation (tags)  for metadata association  for user/creator association Metadata can be provided by author [ document inherent] or by user [ document addendum, see Web 2.0] smallest addressable document unit

Documents, Tags, and Annotations Examples smallest document unit: word higher order units: sentence, paragraph, page, chapter, part, … book Different kind of contextes Syntactic structural: physical units Semantic structural: logical units smallest document unit: pixel higher order units: blocks, macro blocks, slices, frames, objects, scenes, acts,… video

Logical Document Structure Table of Contents (TOC) from structural tags page 1 page 2 page 3 page 4 page 5 page 6 page 7 page 8 page 9 page 10 page 11 page 13 page 12 page 14 page 15 1.1 Introduction 1.2 Definition of the basic formalism 1.3 Reasoning Algorithms 2.1 Introduction 1.1 OR-Branching finding a model 1. Basic Description Logics 2. Complexity of Reasoning can be specified explicitely (structural information) or implicitely (formatting information) can be associated with names/titles can be used for document navigation Also for video: Automatic scene detection (start/end point) Automatic segmentation of video Annotate segments manually / or with multimedia mining techniques Encode Table of Contents (TOC) with MPEG-7 encoding Basic Description Logics 1 Introduction 1 Definition of the basic formalism 5 Reasoning algirithms 7 Complexity of Reasoning 11 Introduction 11 OR-Branching: finding a model 12

Conceptual Document Structure root SUB SUB rodent field mouse conceptual structure SEA SUB SUB SUB SUB SUB SUB hamster dentision beaver habitat meadow vole prairie vole SUB SUB Using explicitely given conceptual document structure together with logical document structure to define the document index SUB  subsumption SEA  see also Can be considered as a kind of ontological skeleton Covers concepts of the document and their relationships Using implicitely given conceptual structure requires understanding of document content Explicitely given conceptual structure (only a small fraction of entire conceptual structure) can be defined by document author (e.g., index entries, external metadata) document users (e.g., social tagging) The conceptual document structure can also be used for document navigation incisor rotation of teeth 1 2 3 4 5 6 7 8 9 10 11 13 12 14 15 16 Paragraph1.1 Paragraph1.2 Paragraph1.3 logical structure

Conceptual Document Structure rodent, 1 beaver, 10, 11 dentision incisor, 4 rotation of teeth, 5 hamster, 2 - 4 see also meadow vole … field mouse, 13, 15 prairie vole, 16 meadow vole, 16 habitat, 15 see also rodent Conceptual Structure also comprises explicit links to complex Ontologies that describe the meaning of the documents content With simple data mining techniques, a fraction of the conceptual structure of multimedia docuzments (e.g. MPEG videos streams, lecture videos) can also be made explicit  the other talk about automated annotation Document index (MPEG Index) can be encoded as MPEG7 File and can also be used for navigation Document Index

Referential Document Structure Internal links: References between parts of the same document e.g., see / see also, footnotes, figures, comments… External links: References between different documents e.g., bibliographic references and citations,… Only a fraction of the entire referential document structure is given explicitely Graph Visualization (Link Graph) together with logical document structure  table of figure, references, … Implicitely given referential structure  requires text understanding capabilities

The Structures in Concert root SUB SUB conceptual structure rodent field mouse SUB SUB SUB SUB SUB SUB hamster dentision beaver habitat meadow vole prairie vole SUB SUB Given, you want to know something about rodents……..what do I have to read (in 1 document /book or even within a collection of documents) I want to read as few as possible, but I want to understand everything as good as possible incisor rotation of teeth referential structure 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 logical structure Paragraph1.1 Paragraph1.2 Paragraph1.3

Thank your for your attention! Conclusion Documents have intrinsic logical, conceptual and referential characteristics There are complex dependencies among the document structures carrying those characteristics Logical, conceptual, and referential structures along with their interdependencies should be made explicit ( meta data) Applications should maintain and use those meta data, e.g. for authoring navigation searching Thank your for your attention!

The Structures in Concert All three structures in concert can be used for Document reading tours (extended document retrieval) goal oriented selections of documents (what is mandatory to understand the topic under consideration?) with additional reading directions (which document unit to read in what order) by also considering user annotations, personalized reading tours can be suggested (dependent on prior knowledge of the user) Collaborative authoring (avoiding ambiguities or duplicates, support index generation and cross referencing,…) Compute answers… (with the help of sophisticated reasoning and additional means of data mining and content understanding)

Related Work  Topic Maps Topic Maps represent concepts and relationships (conceptional structure and (part of) referential structure) Topic Map rodent part of 1 type association type role whole part 10 topic beaver dentision Topic Maps do not include the logical document structure 11 association Resources