© 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

© 2001 Franz J. Kurfess Knowledge Retrieval 2 Course Overview u Introduction u Knowledge Processing u Knowledge Acquisition, Representation and Manipulation u Knowledge Organization u Classification, Categorization u Ontologies, Taxonomies, Thesauri u Knowledge Retrieval u Information Retrieval u Knowledge Navigation u Knowledge Presentation u Knowledge Visualization u Knowledge Exchange u Knowledge Capture, Transfer, and Distribution u Usage of Knowledge u Access Patterns, User Feedback u Knowledge Management Techniques u Topic Maps, Agents u Knowledge Management Tools u Knowledge Management in Organizations

© 2001 Franz J. Kurfess Knowledge Retrieval 3 Overview Knowledge Retrieval u Motivation u Objectives u Finding Out About u Keywords and Queries u Documents u Indexing u Data Retrieval u Access via Address, Field, Name u Information Retrieval u Parsing u Matching Against Indices u Retrieval Assessment u Knowledge Retrieval u Context u Usage u Knowledge Discovery u Data Mining u Rule Extraction u Important Concepts and Terms u Chapter Summary

© 2001 Franz J. Kurfess Knowledge Retrieval 11 Keywords u linguistic atoms used to characterize the subject or content of a document u words u pieces of words (stems) u phrases u provide the basis for a match between u the user’s characterization of information need u the contents of the document u problems u ambiguity u choice of keywords [Belew 2000]

© 2001 Franz J. Kurfess Knowledge Retrieval 12 Queries u formulated in a query language u natural language u interaction with human information providers u artificial language u interaction with computers u especially search engines u vocabulary u controlled u limited set of keywords may be used u uncontrolled u any keywords may be used u syntax u often Boolean operators (AND, OR) u sometimes regular expressions [Belew 2000]

© 2001 Franz J. Kurfess Knowledge Retrieval 13 Documents u general interpretation u any document that can be represented digitally u text, image, music, video, program, etc. u practical interpretation u passage of text u strings of characters in an alphabet u written natural language u length may vary u longer documents may be composed of shorter ones

© 2001 Franz J. Kurfess Knowledge Retrieval 14 Aboutness of Documents u describes the suitability of a document as answer to a query u assumptions u all documents have equal aboutness u the probability of any document in a corpus to be considered relevant is equal for all documents u simplistic; not valid in reality u a paragraph is the smallest unit of text with appreciable aboutness [Belew 2000]

© 2001 Franz J. Kurfess Knowledge Retrieval 15 Structural Aspects of Documents u documents may be composed of documents u paragraphs, subsections, sections, chapters, parts u footnotes, references u documents may contain meta-data u information about the document u not part of the content of the document itself u may be used for organization and retrieval purposes u can be abused by creators u usually to increase the perceived relevance

© 2001 Franz J. Kurfess Knowledge Retrieval 16 Document Proxies u surrogates for the real document u abridged representations u catalog, abstract u pointers u bibliographical citation, URL u different media u microfiches u digital representations

© 2001 Franz J. Kurfess Knowledge Retrieval 17 Indexing u a vocabulary of keywords is assigned to all documents of a corpus u an index maps each document doc i to the set of keywords {kw j } it is about Index : doc i  about {kw j } Index -1 : {kw j }  describes doc i u indexing of a document / corpus u manual: humans select appropriate keywords u automatic: a computer program selects the keywords u building the index relation between documents and sets of keywords is critical for information retrieval [Belew 2000]

© 2001 Franz J. Kurfess Knowledge Retrieval 19 Data Retrieval u access to specific data items u access via address, field, name u typically used in data bases u user asks for items with specific features u absence or presence of features u values u system returns data items u no irrelevant items u deterministic retrieval method

© 2001 Franz J. Kurfess Knowledge Retrieval 20 Information Retrieval (IR) u access to documents u also referred to as document retrieval u access via keywords u IR aspects u parsing u matching against indices u retrieval assessment

© 2001 Franz J. Kurfess Knowledge Retrieval 22 Parsing u extraction of lexical features from documents u mostly words u may require some manipulation of the extracted features u e.g. stemming of words u used as the basis for automatic compilation of indices [Belew 2000]

© 2001 Franz J. Kurfess Knowledge Retrieval 23 Matching Against Indices  identification of documents that are relevant for a particular query  keywords of the query are compared against the keywords that appear in the document  either in the data or meta-data of the document  in addition to queries, other features of documents may be used  descriptive features provided by the author or cataloger  usually meta-data  derived features computed from the contents of the document [Belew 2000]

© 2001 Franz J. Kurfess Knowledge Retrieval 26 Vector Space  interpretation of the index matrix  relates documents and keywords  can grow extremely large  binary matrix of 100,000 words * 1,000,000 documents  sparsely populated: most entries will be 0  can be used to determine similarity of documents  overlap in keywords  proximity in the (virtual) vector space  associative memories can be used as hardware implementation  extremely fast, but expensive to build [Belew 2000]

© 2001 Franz J. Kurfess Knowledge Retrieval 29 Retrieval Assessment  subjective assessment  how well do the retrieved documents satisfy the request of the user  objective assessment  idealized omniscient expert determines the quality of the response [Belew 2000]

© 2001 Franz J. Kurfess Knowledge Retrieval 31 Relevance Feedback  subjective assessment of retrieval results  often used to iteratively improve retrieval results  may be collected by the retrieval system for statistical evaluation  can be viewed as a variant of object recognition  the object to be recognized is the prototypical document the user is looking for  this document may or may not exist  the difference between the retrieved document(s) and the idealized prototype indicates the quality of the retrieval results [Belew 2000]

© 2001 Franz J. Kurfess Knowledge Retrieval 32 Relevance Feedback in Vector Space  relevance feedback is used to move the query towards the cluster of positive documents  moving away from bad documents does not necessarily improve the results  it can also be used as a filter for a constant stream of documents  as in news channels or similar situations [Belew 2000]

© 2001 Franz J. Kurfess Knowledge Retrieval 34 Consensual Relevance  relevance feedback from multiple users  identifies documents that many users found useful or interesting  used by some Web sites  related to collaborative filtering  can also be used as an evaluation method for search engines  performance criteria must be carefully considered  precision and recall, plus many others [Belew 2000]

© 2001 Franz J. Kurfess Knowledge Retrieval 35 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 IR Diagram Term 1 Term 2 Term 3 Term 4 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Documents Query Index Corpus Keywords

© 2001 Franz J. Kurfess Knowledge Retrieval 40 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 KR Diagram Term 1 Term 2Term 3 Term 4 Keywords Documents Query Index Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Corpus Term A Term B Term E Term M Term D Term JTerm I Term H Term F Term C Term G Term K Term L Ontology

© 2001 Franz J. Kurfess Knowledge Retrieval 42 Context in Knowledge Retrieval  in addition to keywords, relationships between keywords and documents are exploited  explicit links  hypertext  related concepts  thesaurus, ontology  proximity  spatial: place, directory  temporal: creation date/time  intermediate relations  author/creator  organization  project

© 2001 Franz J. Kurfess Knowledge Retrieval 43 Inference beyond the Index  determines relationships between documents  citations are explicit references to relevant documents  bibliographic references  legal citations  hypertext  example NEC CiteSeer CiteSeer

© 2001 Franz J. Kurfess Knowledge Retrieval 45 Hypertext  inter-document links provide explicit relationships between documents  can be used to determine the relevance of a document for a query  example: Google Google  intra-document links may offer additional context information for some terms  footnotes, glossaries, related terms

© 2001 Franz J. Kurfess Knowledge Retrieval 46 Adaptive Retrieval Techniques  fine-tuning the matching between queries and retrieved documents  learning of relationships between terms  training with term pairs (thesaurus)  pattern detection in past queries  automatic grouping of documents according to common features  clustering of documents

© 2001 Franz J. Kurfess Knowledge Retrieval 48 Query Model  query types (templates)  frequently used types of queries  e.g. problem/solution, symptoms/diagnosis, problem/further checks,...  category types  abstractions of query types  used to determine categories or topics for the grouping of search results  context information  current working document/directory  previous queries [Pratt, Hearst, Fagan 2000]

© 2001 Franz J. Kurfess Knowledge Retrieval 49 Terminology Model  individual terms are connected to related terms  thesaurus/ontology  synonyms, super-/sub-classes, related terms  identifies labels for the category types [Pratt, Hearst, Fagan 2000]

© 2001 Franz J. Kurfess Knowledge Retrieval 50 Matching  categorizer  determines the categories to be selected for the grouping of results  assigns retrieved documents to the categories  organizer  arranges categories into a hierarchy  should be balanced and easy to browse by the user  depends on the distribution of the search results [Pratt, Hearst, Fagan 2000]

© 2001 Franz J. Kurfess Knowledge Retrieval 51 Results  retrieved documents are grouped into hierarchically arranged categories meaningful for the user  the categories are related to the query  the categories are related to each other  all categories have similar size  not always achievable due to the distribution of documents  reduced search times  higher user satisfaction [Pratt, Hearst, Fagan 2000]

© 2001 Franz J. Kurfess Knowledge Retrieval 52 DynaCat  knowledge-based approach to the organization of search results  categorizes results into meaningful groups that correspond to the user’s query  uses knowledge of query types and of the domain terminology to generate hierarchical categories  applied to the domain of medicine  MEDLINE is an on-line repository of medical abstracts  9.2 million bibliographic entries from 3800 journals  PubMed is a web-based search tool  returns titles as an relevance-ranked list  links to “related articles”

© 2001 Franz J. Kurfess Knowledge Retrieval 56 Information vs. Knowledge Retrieval  IR  keywords as main components of the query  index as match-making facility  statistical basis for selection of relevant documents  (ordered) list of results  KR  keywords plus context information for the query  index plus ontology for matching query and documents  relationships between keywords and documents influence the selection of relevant documents  results are grouped into meaningful categories

© 2001 Franz J. Kurfess Knowledge Retrieval 64 Important Concepts and Terms  natural language processing  neural network  predicate logic  propositional logic  rational agent  rationality  Turing test  agent  automated reasoning  belief network  cognitive science  computer science  hidden Markov model  intelligence  knowledge representation  linguistics  Lisp  logic  machine learning  microworlds

© 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

Similar presentations

Presentation on theme: "© 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

© 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

Similar presentations

Presentation on theme: "© 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly."— Presentation transcript:

Similar presentations

About project

Feedback