Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Franz J. Kurfess Knowledge Retrieval.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Chapter 5: Introduction to Information Retrieval
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
1 Cal Poly SLO Computer Science Department Franz J. Kurfess Computers and Knowledge 1.
1 Cal Poly SLO Computer Science Department Franz J. Kurfess Computers and Knowledge 1.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Franz J. Kurfess Knowledge Retrieval.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
1 © Franz J. Kurfess Constrained Access Franz J. Kurfess Cal Poly SLO Computer Science Department.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
1 Knowledge Retrieval. 2 Dr. Franz J. Kurfess Computer Science Department Cal Poly.
© Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.
Franz Kurfess: Knowledge Retrieval Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Franz J. Kurfess Knowledge.
1 Cal Poly SLO Computer Science Department Franz J. Kurfess Computers and Knowledge 1.
Franz Kurfess: Knowledge Retrieval Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Franz J. Kurfess Knowledge.
1 Information Retrieval and Web Search Introduction.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
© 2001 Franz J. Kurfess Introduction 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
Franz Kurfess: Knowledge Exchange Cal Poly SLO Computer Science Department Franz J. Kurfess Knowledge Exchange.
© 2001 Franz J. Kurfess Knowledge Retrieval 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.
© 2001 Franz J. Kurfess Introduction 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Overview of Search Engines
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Multimedia Databases (MMDB)
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Defining Text Mining Preprocessing Transforming unstructured data stored in document collections into a more explicitly structured intermediate format.
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Chapter 8 Browsing and Searching the Web. 2Practical PC 5 th Edition Chapter 8 Getting Started In this Chapter, you will learn: − What is a Web page −
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Web- and Multimedia-based Information Systems Lecture 2.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Franz J. Kurfess Knowledge Retrieval.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Information Retrieval
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Information Retrieval in Practice
Search Engine Architecture
What is Information Retrieval (IR)?
Information Retrieval and Web Search
Information Retrieval and Web Search
Information Retrieval and Web Search
Multimedia Information Retrieval
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Information Retrieval and Web Search
Presentation transcript:

Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Franz J. Kurfess Knowledge Retrieval

Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Franz J. Kurfess Knowledge Retrieval

Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Franz J. Kurfess Knowledge Retrieval

Some of the material in these slides was developed for a lecture series sponsored by the European Community under the BPD program with Vilnius University as host institution Acknowledgements

5 Use and Distribution of these Slides These slides are primarily intended for the students in classes I teach. In some cases, I only make PDF versions publicly available. If you would like to get a copy of the originals (Apple KeyNote or Microsoft PowerPoint), please contact me via at I hereby grant permission to use them in educational settings. If you do so, it would be nice to send me an about it. If you’re considering using them in a commercial environment, please contact me

This lecture series has been sponsored by the European Community under the BPD program with Vilnius University as host institution Acknowledgements

7 © Franz J. Kurfess Overview Knowledge Retrieval ❖ Finding Out About Finding Out About  Keywords and Queries; Documents; Indexing ❖ Data Retrieval Data Retrieval  Access via Address, Field, Name ❖ Information Retrieval Information Retrieval  Access via Content (Values); Parsing; Matching Against Indices; Retrieval Assessment ❖ Knowledge Retrieval Knowledge Retrieval  Access via Structure;Meaning;Context; Usage ❖ Knowledge Discovery Knowledge Discovery  Data Mining; Rule Extraction

8 © Franz J. Kurfess Logistics

9 Preliminaries

10 © Franz J. Kurfess Bridge-In ❖ How do you organize your knowledge?  brain  paper  computer

11 © Franz J. Kurfess Finding Out About [Belew 2000]

12 Motivation and Objectives

13 © Franz J. Kurfess Motivation

14 © Franz J. Kurfess Objectives

15 © Franz J. Kurfess Evaluation Criteria

16 Finding Out About

17 Finding Out About [Belew 2000] Keywords Queries Documents Indexing

18 © Franz J. Kurfess Keywords [Belew 2000] ❖ linguistic atoms used to characterize the subject or content of a document  words  pieces of words (stems)  phrases ❖ provide the basis for a match between  the user’s characterization of information need  the contents of the document ❖ problems  ambiguity  choice of keywords

19 © Franz J. Kurfess Queries [Belew 2000] ❖ formulated in a query language  natural language  interaction with human information providers  artificial language  interaction with computers  especially search engines  vocabulary  controlled  limited set of keywords may be used  uncontrolled  any keywords may be used  syntax  often Boolean operators (AND, OR)  sometimes regular expressions

20 © Franz J. Kurfess Documents ❖ general interpretation  any document that can be represented digitally  text, image, music, video, program, etc. ❖ practical interpretation  passage of text  strings of characters in an alphabet  written natural language  length may vary  longer documents may be composed of shorter ones

21 © Franz J. Kurfess Aboutness of Documents [Belew 2000] ❖ describes the suitability of a document as answer to a query ❖ assumptions  all documents have equal aboutness  the probability of any document in a corpus to be considered relevant is equal for all documents  simplistic; not valid in reality  a paragraph is the smallest unit of text with appreciable aboutness

22 © Franz J. Kurfess Structural Aspects of Documents ❖ documents may be composed of other smaller pieces, or other documents  paragraphs, subsections, sections, chapters, parts  footnotes, references ❖ documents may contain meta-data  information about the document  not part of the content of the document itself  may be used for organization and retrieval purposes  can be abused by creators  usually to increase the perceived relevance

23 © Franz J. Kurfess Document Proxies ❖ surrogates for the real document  abridged representations  catalog, abstract  pointers  bibliographical citation, URL  different media  microfiches  digital representations

24 © Franz J. Kurfess Indexing [Belew 2000] ❖ a vocabulary of keywords is assigned to all documents of a corpus ❖ an index maps each document doc i to the set of keywords {kw j } it is about Index: doc i → about {kw j } Index -1 : {kw j } → describes doc i ❖ indexing of a document / corpus  manual: humans select appropriate keywords  automatic: a computer program selects the keywords ❖ building the index relation between documents and sets of keywords is critical for information retrieval

25 © Franz J. Kurfess FOA Conversation Loop [Belew 2000]

26 Data Retrieval access to specific data items access via address, field, name typically used in data bases user asks for items with specific features absence or presence of features values system returns data items no irrelevant items deterministic retrieval method

27 Information Retrieval (IR) access to documents also referred to as document retrieval access via keywords IR aspects parsing matching against indices retrieval assessment

28 © Franz J. Kurfess Diagram Search Engine [Belew 2000]

29 © Franz J. Kurfess Parsing [Belew 2000] ❖ extraction of lexical features from documents  mostly words ❖ may require some manipulation of the extracted features  e.g. stemming of words ❖ used as the basis for automatic compilation of indices

30 © Franz J. Kurfess Parsing Tools ❖ Montytagger  python and Java ❖ fnTBL (C++)  fast ❖ Brill Tagger (C)  the original; influenced several later ones ❖ Natural Language Toolkit:  good starting point for basics of NLP algorithms

31 © Franz J. Kurfess Matching Against Indices [Belew 2000] ❖ identification of documents that are relevant for a particular query ❖ keywords of the query are compared against the keywords that appear in the document  either in the data or meta-data of the document ❖ in addition to queries, other features of documents may be used  descriptive features provided by the author or cataloger  usually meta-data  derived features computed from the contents of the document

32 © Franz J. Kurfess Vector Space [Belew 2000] ❖ interpretation of the index matrix  relates documents and keywords ❖ can grow extremely large  binary matrix of 100,000 words * 1,000,000 documents  sparsely populated: most entries will be 0 ❖ can be used to determine similarity of documents  overlap in keywords  proximity in the (virtual) vector space ❖ associative memories can be used as hardware implementation  extremely fast, but expensive to build

33 © Franz J. Kurfess Vector Space Diagram [Belew 2000]

34 © Franz J. Kurfess Measuring Retrieval ❖ ideally, all relevant documents should be retrieved  relative to the query posed by the user  relative to the set of documents available (corpus)  relevance can be subjective ❖ precision and recall  relevant documents vs. retrieved documents

35 © Franz J. Kurfess Document Retrieval [Belew 2000]

36 © Franz J. Kurfess Precision and Recall [Belew 2000] recall  |retrieved  relevant| / |relevant| precision  |retrieved  relevant| / |retrieved|

37 © Franz J. Kurfess Specificity vs. Exhaustivity [Belew 2000]

38 © Franz J. Kurfess Retrieval Assessment [Belew 2000] ❖ subjective assessment  how well do the retrieved documents satisfy the request of the user ❖ objective assessment  idealized omniscient expert determines the quality of the response

39 © Franz J. Kurfess Retrieval Assessment Diagram [Belew 2000]

40 © Franz J. Kurfess Relevance Feedback [Belew 2000] ❖ subjective assessment of retrieval results ❖ often used to iteratively improve retrieval results ❖ may be collected by the retrieval system for statistical evaluation ❖ can be viewed as a variant of object recognition  the object to be recognized is the prototypical document the user is looking for  this document may or may not exist  the difference between the retrieved document(s) and the idealized prototype indicates the quality of the retrieval results

41 © Franz J. Kurfess Relevance Feedback in Vector Space [Belew 2000] ❖ relevance feedback is used to move the query towards the cluster of positive documents  moving away from bad documents does not necessarily improve the results ❖ it can also be used as a filter for a constant stream of documents  as in news channels or similar situations

42 © Franz J. Kurfess Query Session Example [Belew 2000]

43 © Franz J. Kurfess Consensual Relevance [Belew 2000] ❖ relevance feedback from multiple users  identifies documents that many users found useful or interesting  used by some Web sites  related to collaborative filtering  can also be used as an evaluation method for search engines  performance criteria must be carefully considered  precision and recall, plus many others

44 © Franz J. Kurfess Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 IR Diagram Term 1 Term 2 Term 3 Term 4 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Documents Query Index Corpus Keywords

45 © Franz J. Kurfess Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 IR Diagram Term 1 Term 2 Term 3 Term 4 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Documents Query Index Corpus Keywords

46 © Franz J. Kurfess Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 IR Diagram Term 1 Term 2 Term 3 Term 4 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Documents Query Index Corpus Keywords

47 © Franz J. Kurfess Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 IR Diagram Term 1 Term 2 Term 3 Term 4 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Documents Query Index Corpus Keywords

48 © Franz J. Kurfess Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 IR Diagram Term 1 Term 2 Term 3 Term 4 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Documents Query Index Corpus Keywords

49 Knowledge Retrieval Context Usage exploratory search faceted search

50 © Franz J. Kurfess Context in Knowledge Retrieval ❖ in addition to keywords, relationships between keywords and documents are exploited  explicit links  hypertext  related concepts  thesaurus, ontology  proximity  spatial: place, directory  temporal: creation date/time  intermediate relations  author/creator  organization  project

51 © Franz J. Kurfess Inference beyond the Index ❖ determines relationships between documents ❖ citations are explicit references to relevant documents  bibliographic references  legal citations  hypertext ❖ examples  NEC CiteSeer CiteSeer  Google Scholar

52 © Franz J. Kurfess Additional Information Sources [Belew 2000, after Kochen 1975]

53 © Franz J. Kurfess Hypertext ❖ inter-document links provide explicit relationships between documents  can be used to determine the relevance of a document for a query  example: Google Google ❖ intra-document links may offer additional context information for some terms  footnotes, glossaries, related terms

54 © Franz J. Kurfess Adaptive Retrieval Techniques ❖ fine-tuning the matching between queries and retrieved documents  learning of relationships between terms  training with term pairs (thesaurus)  pattern detection in past queries  automatic grouping of documents according to common features  clustering of similar documents  pre-defined categories  metadata  overlap in keywords  consensual relevance  source

55 © Franz J. Kurfess Document Classification

56 © Franz J. Kurfess Query Model [Pratt, Hearst, Fagan 2000] ❖ query types (templates)  frequently used types of queries  e.g. problem/solution, symptoms/diagnosis, problem/further checks,... ❖ category types  abstractions of query types  used to determine categories or topics for the grouping of search results ❖ context information  current working document/directory  previous queries

57 © Franz J. Kurfess Terminology Model [Pratt, Hearst, Fagan 2000] ❖ individual terms are connected to related terms  thesaurus/ontology  synonyms, super-/sub-classes, related terms ❖ identifies labels for the category types

58 © Franz J. Kurfess Matching [Pratt, Hearst, Fagan 2000] ❖ categorizer  determines the categories to be selected for the grouping of results  assigns retrieved documents to the categories ❖ organizer  arranges categories into a hierarchy  should be balanced and easy to browse by the user  depends on the distribution of the search results

59 © Franz J. Kurfess Results [Pratt, Hearst, Fagan 2000] ❖ retrieved documents are grouped into hierarchically arranged categories meaningful for the user  the categories are related to the query  the categories are related to each other  all categories have similar size  not always achievable due to the distribution of documents ❖ reduced search times ❖ higher user satisfaction

60 © Franz J. Kurfess Examples

61 © Franz J. Kurfess DynaCat [DynaCat, 2000] ❖ knowledge-based approach to the organization of search results ❖ categorizes results into meaningful groups that correspond to the user’s query  uses knowledge of query types and of the domain terminology to generate hierarchical categories ❖ applied to the domain of medicine  MEDLINE is an on-line repository of medical abstracts  9.2 million bibliographic entries from 3800 journals  PubMed is a web-based search tool  returns titles as an relevance-ranked list  links to “related articles”

Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Franz J. Kurfess DyanCat Results [DynaCat, 2000]

Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Franz J. Kurfess DynaCat Query Types [DynaCat, 2000]

Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Franz J. Kurfess DynaCat Search [DynaCat, 2000]

65 © Franz J. Kurfess Information vs. Knowledge Retrieval IRKR keywords as main components of the query keywords plus context information for the query index as match-making facility index plus ontology for matching query and documents statistical basis for selection of relevant documents relationships between keywords and documents influence the selection of relevant documents (ordered) list of results results are grouped into meaningful categories

66 © Franz J. Kurfess Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 KR Diagram Term 1 Term 2 Term 3 Term 4 Keywords Documents Query Index Doc. 5 Doc. 4 Doc. 3 Doc. 2 Doc. 1 Corpus Term A Term B Term E Term M Term D Term JTerm I Term H Term F Term C Term G Term K Term L Ontology keyword input relation expansion synonym expansion

67 © Franz J. Kurfess Exploratory Search ❖ finding knowledge through association ❖ hypothesis: Human-made associations between knowledge items are valuable for others  especially if the associations are made by experts or experienced users

68 © Franz J. Kurfess Activity: Modern Exploratory Search ❖ What are current concepts, methods and tools that enable exploratory search?

69 © Franz J. Kurfess Vannevar Bush: Memex ❖ better knowledge management for scientific document collections  build, maintain, and share paths through the document space containing knowledge (“knowledge trails”)  see Vannevar Bush, “As We May Think”, Atlantic Monthly, July 1945; www. theatlantic.com/194507/bushwww. theatlantic.com/194507/bush

70 © Franz J. Kurfess Faceted Search ❖ exploration of a domain via attributes  select a relevant attribute, and display the elements of the domain ordered according to the attribute

71 © Franz J. Kurfess Activity: Faceted Search Outside of Web Browsers ❖ What are tools or applications that employ faceted search to display items to the user?

72 © Franz J. Kurfess Faceted Search in iTunes

73 © Franz J. Kurfess Variations on Faceted Search ❖ displaying lists of items ordered according to an attribute can get quite boring ❖ attributes often lend themselves to alternative presentation methods  visual  static  color, size, shape  dynamic  movement, changes over time  auditory  often for supplementary information

74 Knowledge Discovery combination of Data Mining Knowledge Extraction Knowledge Fusion

75 © Franz J. Kurfess Data Mining ❖ identification of interesting “nuggets” in huge quantities of data  often relations between subsets  automatic or semi-automatic ❖ techniques  classification, correlation (e.g. temporal, spatial)

76 © Franz J. Kurfess Knowledge Extraction ❖ conversion of internal representations of knowledge into human-understandable format  extraction of rules from neural networks is one example

77 © Franz J. Kurfess Knowledge Fusion ❖ multiple pieces of information are combined into one  redundancy  do several pieces contain the same type of information  compatibility  do the individual pieces have similar formats and interpretations  are there mappings to convert values into the same format  consistency  are the values of the individual pieces close

78 © Franz J. Kurfess Post-Test

79 © Franz J. Kurfess Evaluation ❖ Criteria

80 Image Search contextual meta-data text in the same or close-by documents e.g. on the same Web page, or in the same directory content-based analysis and comparison of images feature extraction

81 © Franz J. Kurfess Contextual Image Search ❖ images are indexed via keywords  captions of images  metadata (tags)  surrounding text ❖ relies on the assumption that the indexed text is correlated to the image and a good description of its content ❖ basis for most current image search engines

82 © Franz J. Kurfess Content-Based Image Search ❖ images are compared against query images  no text elements as proxies  or at least not in a “pure” content-based image search  relies on feature extraction or object recognition  direct comparison of pictures on a pixel-by-pixel basis is impractical  only yields identical pictures, not similar ones  computationally very challenging  especially if the query image is a subset of a target image  allows the use of a picture as a “template” to find related pictures

83 © Franz J. Kurfess Example: UC Santa Cruz Image Search Using a single image as a template, computer software can find similar images in a large database of photos, as shown in these examples. Images courtesy of P. Milanfar. ❖ feature extraction and object recognition from images and video

84 © Franz J. Kurfess Example: Einstein and Picasa ❖ color ❖ black & white ❖ age ❖ changes in appearanc e ❖ artistic images ❖ pictures of artifacts resembling the person ❖ recognition of a person’s face across a wide spectrum

85 Music Search

86 © Franz J. Kurfess Shazam ❖ compares music perceived by a device against a data base with pre-recorded music  e.g. music playing on the radio or TV perceived through the microphone of a mobile device ❖ does not determine melodies or lyrics  doesn’t work with humming ❖ sources   projects/proposals/dfg/p44-wang.pdf projects/proposals/dfg/p44-wang.pdf

87 © Franz J. Kurfess The Four Basic Steps ❖ Shazam fills a database of fingerprinted songs. ❖ When the user tags a song it takes a ten second fingerprint. ❖ It uploads the finger print and tries to find a match. ❖ The results are then sent back to the phone.

88 © Franz J. Kurfess Shazam Fingerprints

89 © Franz J. Kurfess Shazam Constellation Map

90 © Franz J. Kurfess Pandora

91 Search for Computers

92 © Franz J. Kurfess Search for Computers ❖ result of a search query is intended for a computer, not a human  agent, program ❖ purpose  de-duplication  syntactic  semantic  validation and verification  check veracity or consistency of statements

93 © Franz J. Kurfess Syntactic De-duplication ❖ search for entities that are identical at the representation level  same representation format  bits, bytes, records, objects,  specialized schemes like MP3, TIFF, JPEG, RAW,...  same content within the same format  fully identical  partially identical ❖ straightforward comparison  comparison of bits, bytes, strings, or other basic blocks  computationally challenging for large collections

94 © Franz J. Kurfess Semantic De-duplication ❖ identical content across different representation formats  movies stored in different video formats, resolutions  images in different formats ❖ difficult comparison  mapping or conversion to a common format  meaning of identical may be unclear  “lossless” music format vs. MP3  similarity measures  sometimes as “signatures”

95 © Franz J. Kurfess Examples Semantic De- duplication ❖ Netflix stores movies in dozens of different formats  different resolutions, different target players, different devices ❖ duplication removal for music libraries  “semantic” aspects may be limited  meta-information like title, artist, duration

96 Ausklang

97 © Franz J. Kurfess Post-Test

98 © Franz J. Kurfess Evaluation ❖ Criteria

99 © Franz J. Kurfess KP/KM Activity ❖ select a domain that requires significant human involvement for dealing with knowledge ❖ identify at least two candidates for  knowledge representation  reasoning ❖ evaluate their suitability  human perspective  understandable and usable for humans  computational perspective  storage, processing

100 © Franz J. Kurfess KP/KM Activity Outcomes 2007 ❖ Images with Metadata ❖ Extracting contact information from text ❖ Qualitative and quantitative knowledge about cheese making ❖ Visualization of astronomy data ❖ Surveillance/security KM ❖ Marketing ❖ Face recognition ❖ Visual marketing

101 © Franz J. Kurfess Important Concepts and Terms ❖ automated reasoning ❖ belief network ❖ cognitive science ❖ computer science ❖ deduction ❖ frame ❖ human problem solving ❖ inference ❖ intelligence ❖ knowledge acquisition ❖ knowledge representation ❖ linguistics ❖ logic ❖ machine learning ❖ natural language ❖ ontology ❖ ontological commitment ❖ predicate logic ❖ probabilistic reasoning ❖ propositional logic ❖ psychology ❖ rational agent ❖ rationality ❖ reasoning ❖ rule-based system ❖ semantic network ❖ surrogate ❖ taxonomy ❖ Turing machine

102 © Franz J. Kurfess Summary Knowledge Retrieval ❖ identification, selection, and presentation of documents relevant to a user query ❖ utilization of structural information, context, meta- data in addition to keyword search ❖ organized presentation of results  categories, visual arrangement ❖ internal representations may be converted to human-understandable ones

103 © Franz J. Kurfess