Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.

Similar presentations


Presentation on theme: "© 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly."— Presentation transcript:

1 © 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly

2 © 2001 Franz J. Kurfess Knowledge Retrieval 2 Course Overview u Introduction u Knowledge Processing u Knowledge Acquisition, Representation and Manipulation u Knowledge Organization u Classification, Categorization u Ontologies, Taxonomies, Thesauri u Knowledge Retrieval u Information Retrieval u Knowledge Navigation u Knowledge Presentation u Knowledge Visualization u Knowledge Exchange u Knowledge Capture, Transfer, and Distribution u Usage of Knowledge u Access Patterns, User Feedback u Knowledge Management Techniques u Topic Maps, Agents u Knowledge Management Tools u Knowledge Management in Organizations

3 © 2001 Franz J. Kurfess Knowledge Retrieval 3 Overview Knowledge Retrieval u Motivation u Objectives u Finding Out About u Keywords and Queries u Documents u Indexing u Data Retrieval u Access via Address, Field, Name u Information Retrieval u Parsing u Matching Against Indices u Retrieval Assessment u Knowledge Retrieval u Context u Usage u Knowledge Discovery u Data Mining u Rule Extraction u Important Concepts and Terms u Chapter Summary

4 © 2001 Franz J. Kurfess Knowledge Retrieval 4 Logistics u Term Project u APIs u Lab and Homework Assignments u Deadline HW 1: May 1 u Exams u Midterm: Thursday, May 3

5 © 2001 Franz J. Kurfess Knowledge Retrieval 5 Finding Out About [Belew 2000]

6 © 2001 Franz J. Kurfess Knowledge Retrieval 6 Pre-Test

7 © 2001 Franz J. Kurfess Knowledge Retrieval 7 Motivation

8 © 2001 Franz J. Kurfess Knowledge Retrieval 8 Objectives

9 © 2001 Franz J. Kurfess Knowledge Retrieval 9 Evaluation Criteria

10 © 2001 Franz J. Kurfess Knowledge Retrieval 10 Finding Out About u Keywords u Queries u Documents u Indexing [Belew 2000]

11 © 2001 Franz J. Kurfess Knowledge Retrieval 11 Keywords u linguistic atoms used to characterize the subject or content of a document u words u pieces of words (stems) u phrases u provide the basis for a match between u the user’s characterization of information need u the contents of the document u problems u ambiguity u choice of keywords [Belew 2000]

12 © 2001 Franz J. Kurfess Knowledge Retrieval 12 Queries u formulated in a query language u natural language u interaction with human information providers u artificial language u interaction with computers u especially search engines u vocabulary u controlled u limited set of keywords may be used u uncontrolled u any keywords may be used u syntax u often Boolean operators (AND, OR) u sometimes regular expressions [Belew 2000]

13 © 2001 Franz J. Kurfess Knowledge Retrieval 13 Documents u general interpretation u any document that can be represented digitally u text, image, music, video, program, etc. u practical interpretation u passage of text u strings of characters in an alphabet u written natural language u length may vary u longer documents may be composed of shorter ones

14 © 2001 Franz J. Kurfess Knowledge Retrieval 14 Aboutness of Documents u describes the suitability of a document as answer to a query u assumptions u all documents have equal aboutness u the probability of any document in a corpus to be considered relevant is equal for all documents u simplistic; not valid in reality u a paragraph is the smallest unit of text with appreciable aboutness [Belew 2000]

15 © 2001 Franz J. Kurfess Knowledge Retrieval 15 Structural Aspects of Documents u documents may be composed of documents u paragraphs, subsections, sections, chapters, parts u footnotes, references u documents may contain meta-data u information about the document u not part of the content of the document itself u may be used for organization and retrieval purposes u can be abused by creators u usually to increase the perceived relevance

16 © 2001 Franz J. Kurfess Knowledge Retrieval 16 Document Proxies u surrogates for the real document u abridged representations u catalog, abstract u pointers u bibliographical citation, URL u different media u microfiches u digital representations

17 © 2001 Franz J. Kurfess Knowledge Retrieval 17 Indexing u a vocabulary of keywords is assigned to all documents of a corpus u an index maps each document doc i to the set of keywords {kw j } it is about Index : doc i  about {kw j } Index -1 : {kw j }  describes doc i u indexing of a document / corpus u manual: humans select appropriate keywords u automatic: a computer program selects the keywords u building the index relation between documents and sets of keywords is critical for information retrieval [Belew 2000]

18 © 2001 Franz J. Kurfess Knowledge Retrieval 18 FOA Conversation Loop [Belew 2000]

19 © 2001 Franz J. Kurfess Knowledge Retrieval 19 Data Retrieval u access to specific data items u access via address, field, name u typically used in data bases u user asks for items with specific features u absence or presence of features u values u system returns data items u no irrelevant items u deterministic retrieval method

20 © 2001 Franz J. Kurfess Knowledge Retrieval 20 Information Retrieval (IR) u access to documents u also referred to as document retrieval u access via keywords u IR aspects u parsing u matching against indices u retrieval assessment

21 © 2001 Franz J. Kurfess Knowledge Retrieval 21 Diagram Search Engine [Belew 2000]

22 © 2001 Franz J. Kurfess Knowledge Retrieval 22 Parsing u extraction of lexical features from documents u mostly words u may require some manipulation of the extracted features u e.g. stemming of words u used as the basis for automatic compilation of indices [Belew 2000]

23 © 2001 Franz J. Kurfess Knowledge Retrieval 23 Matching Against Indices  identification of documents that are relevant for a particular query  keywords of the query are compared against the keywords that appear in the document  either in the data or meta-data of the document  in addition to queries, other features of documents may be used  descriptive features provided by the author or cataloger  usually meta-data  derived features computed from the contents of the document [Belew 2000]

24 © 2001 Franz J. Kurfess Knowledge Retrieval 24 Retrieved and Relevant Documents recall  |retrieved  relevant| / |relevant| precision  |retrieved  relevant| / |retrieved| [Belew 2000]

25 © 2001 Franz J. Kurfess Knowledge Retrieval 25 Specificity vs. Exhaustivity [Belew 2000]

26 © 2001 Franz J. Kurfess Knowledge Retrieval 26 Vector Space  interpretation of the index matrix  relates documents and keywords  can grow extremely large  binary matrix of 100,000 words * 1,000,000 documents  sparsely populated: most entries will be 0  can be used to determine similarity of documents  overlap in keywords  proximity in the (virtual) vector space  associative memories can be used as hardware implementation  extremely fast, but expensive to build [Belew 2000]

27 © 2001 Franz J. Kurfess Knowledge Retrieval 27 Vector Space Diagram [Belew 2000]

28 © 2001 Franz J. Kurfess Knowledge Retrieval 28 Document Retrieval [Belew 2000]

29 © 2001 Franz J. Kurfess Knowledge Retrieval 29 Retrieval Assessment  subjective assessment  how well do the retrieved documents satisfy the request of the user  objective assessment  idealized omniscient expert determines the quality of the response [Belew 2000]

30 © 2001 Franz J. Kurfess Knowledge Retrieval 30 Retrieval Assessment Diagram [Belew 2000]

31 © 2001 Franz J. Kurfess Knowledge Retrieval 31 Relevance Feedback  subjective assessment of retrieval results  often used to iteratively improve retrieval results  may be collected by the retrieval system for statistical evaluation  can be viewed as a variant of object recognition  the object to be recognized is the prototypical document the user is looking for  this document may or may not exist  the difference between the retrieved document(s) and the idealized prototype indicates the quality of the retrieval results [Belew 2000]

32 © 2001 Franz J. Kurfess Knowledge Retrieval 32 Relevance Feedback in Vector Space  relevance feedback is used to move the query towards the cluster of positive documents  moving away from bad documents does not necessarily improve the results  it can also be used as a filter for a constant stream of documents  as in news channels or similar situations [Belew 2000]

33 © 2001 Franz J. Kurfess Knowledge Retrieval 33 Query Session Example [Belew 2000]

34 © 2001 Franz J. Kurfess Knowledge Retrieval 34 Consensual Relevance  relevance feedback from multiple users  identifies documents that many users found useful or interesting  used by some Web sites  related to collaborative filtering  can also be used as an evaluation method for search engines  performance criteria must be carefully considered  precision and recall, plus many others [Belew 2000]

35 © 2001 Franz J. Kurfess Knowledge Retrieval 35 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 IR Diagram Term 1 Term 2 Term 3 Term 4 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Documents Query Index Corpus Keywords

36 © 2001 Franz J. Kurfess Knowledge Retrieval 36 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 IR Diagram Term 1 Term 2 Term 3 Term 4 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Documents Query Index Corpus Keywords

37 © 2001 Franz J. Kurfess Knowledge Retrieval 37 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 IR Diagram Term 1 Term 2 Term 3 Term 4 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Documents Query Index Corpus Keywords

38 © 2001 Franz J. Kurfess Knowledge Retrieval 38 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 IR Diagram Term 1 Term 2 Term 3 Term 4 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Documents Query Index Corpus Keywords

39 © 2001 Franz J. Kurfess Knowledge Retrieval 39 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 IR Diagram Term 1 Term 2 Term 3 Term 4 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Documents Query Index Corpus Keywords

40 © 2001 Franz J. Kurfess Knowledge Retrieval 40 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 KR Diagram Term 1 Term 2Term 3 Term 4 Keywords Documents Query Index Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Corpus Term A Term B Term E Term M Term D Term JTerm I Term H Term F Term C Term G Term K Term L Ontology

41 © 2001 Franz J. Kurfess Knowledge Retrieval 41 Knowledge Retrieval u Context u Usage

42 © 2001 Franz J. Kurfess Knowledge Retrieval 42 Context in Knowledge Retrieval  in addition to keywords, relationships between keywords and documents are exploited  explicit links  hypertext  related concepts  thesaurus, ontology  proximity  spatial: place, directory  temporal: creation date/time  intermediate relations  author/creator  organization  project

43 © 2001 Franz J. Kurfess Knowledge Retrieval 43 Inference beyond the Index  determines relationships between documents  citations are explicit references to relevant documents  bibliographic references  legal citations  hypertext  example NEC CiteSeer CiteSeer

44 © 2001 Franz J. Kurfess Knowledge Retrieval 44 Additional Information Sources [Belew 2000, after Kochen 1975]

45 © 2001 Franz J. Kurfess Knowledge Retrieval 45 Hypertext  inter-document links provide explicit relationships between documents  can be used to determine the relevance of a document for a query  example: Google Google  intra-document links may offer additional context information for some terms  footnotes, glossaries, related terms

46 © 2001 Franz J. Kurfess Knowledge Retrieval 46 Adaptive Retrieval Techniques  fine-tuning the matching between queries and retrieved documents  learning of relationships between terms  training with term pairs (thesaurus)  pattern detection in past queries  automatic grouping of documents according to common features  clustering of documents

47 © 2001 Franz J. Kurfess Knowledge Retrieval 47 Document Classification

48 © 2001 Franz J. Kurfess Knowledge Retrieval 48 Query Model  query types (templates)  frequently used types of queries  e.g. problem/solution, symptoms/diagnosis, problem/further checks,...  category types  abstractions of query types  used to determine categories or topics for the grouping of search results  context information  current working document/directory  previous queries [Pratt, Hearst, Fagan 2000]

49 © 2001 Franz J. Kurfess Knowledge Retrieval 49 Terminology Model  individual terms are connected to related terms  thesaurus/ontology  synonyms, super-/sub-classes, related terms  identifies labels for the category types [Pratt, Hearst, Fagan 2000]

50 © 2001 Franz J. Kurfess Knowledge Retrieval 50 Matching  categorizer  determines the categories to be selected for the grouping of results  assigns retrieved documents to the categories  organizer  arranges categories into a hierarchy  should be balanced and easy to browse by the user  depends on the distribution of the search results [Pratt, Hearst, Fagan 2000]

51 © 2001 Franz J. Kurfess Knowledge Retrieval 51 Results  retrieved documents are grouped into hierarchically arranged categories meaningful for the user  the categories are related to the query  the categories are related to each other  all categories have similar size  not always achievable due to the distribution of documents  reduced search times  higher user satisfaction [Pratt, Hearst, Fagan 2000]

52 © 2001 Franz J. Kurfess Knowledge Retrieval 52 DynaCat  knowledge-based approach to the organization of search results  categorizes results into meaningful groups that correspond to the user’s query  uses knowledge of query types and of the domain terminology to generate hierarchical categories  applied to the domain of medicine  MEDLINE is an on-line repository of medical abstracts  9.2 million bibliographic entries from 3800 journals  PubMed is a web-based search tool  returns titles as an relevance-ranked list  links to “related articles”

53 © 2001 Franz J. Kurfess Knowledge Retrieval 53 DyanCat Results

54 © 2001 Franz J. Kurfess Knowledge Retrieval 54 DynaCat Query Types

55 © 2001 Franz J. Kurfess Knowledge Retrieval 55 DynaCat Search

56 © 2001 Franz J. Kurfess Knowledge Retrieval 56 Information vs. Knowledge Retrieval  IR  keywords as main components of the query  index as match-making facility  statistical basis for selection of relevant documents  (ordered) list of results  KR  keywords plus context information for the query  index plus ontology for matching query and documents  relationships between keywords and documents influence the selection of relevant documents  results are grouped into meaningful categories

57 © 2001 Franz J. Kurfess Knowledge Retrieval 57 KR Diagram

58 © 2001 Franz J. Kurfess Knowledge Retrieval 58 Knowledge Discovery u Data Mining u Rule Extraction

59 © 2001 Franz J. Kurfess Knowledge Retrieval 59

60 © 2001 Franz J. Kurfess Knowledge Retrieval 60 Reference [Kearns 00] [Kearns 00]

61 © 2001 Franz J. Kurfess Knowledge Retrieval 61 Reference [Sommerville 01] [Sommerville 01]  [Sommerville 01]

62 © 2001 Franz J. Kurfess Knowledge Retrieval 62 Post-Test

63 © 2001 Franz J. Kurfess Knowledge Retrieval 63 Evaluation u Criteria

64 © 2001 Franz J. Kurfess Knowledge Retrieval 64 Important Concepts and Terms  natural language processing  neural network  predicate logic  propositional logic  rational agent  rationality  Turing test  agent  automated reasoning  belief network  cognitive science  computer science  hidden Markov model  intelligence  knowledge representation  linguistics  Lisp  logic  machine learning  microworlds

65 © 2001 Franz J. Kurfess Knowledge Retrieval 65 Summary Chapter-Topic

66 © 2001 Franz J. Kurfess Knowledge Retrieval 66


Download ppt "© 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly."

Similar presentations


Ads by Google