Stephen Brown Museums and the Web Asia 9-12 December 2013 Hong Kong

Slides:



Advertisements
Similar presentations
Usage Statistics in Context: related standards and tools Oliver Pesch Chief Strategist, E-Resources EBSCO Information Services Usage Statistics and Publishers:
Advertisements

Traditional IR models Jian-Yun Nie.
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Basic IR: Modeling Basic IR Task: Slightly more complex:
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Welcome to the Academic Search Premier tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
IR Models: Overview, Boolean, and Vector
Applications Chapter 9, Cimiano Ontology Learning Textbook Presented by Aaron Stewart.
ISP 433/533 Week 2 IR Models.
IR Models: Structural Models
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
1 Extending PRIX for Similarity-based XML Query Group Members: Yan Qi, Jicheng Zhao, Dan Situ, Ning Liao.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
Vector Space Model CS 652 Information Extraction and Integration.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
WELCOME TO THE WORLD OF FUZZY SYSTEMS. DEFINITION Fuzzy logic is a superset of conventional (Boolean) logic that has been extended to handle the concept.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
Welcome to the Sport Discus tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques to make.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
L. Padmasree Vamshi Ambati J. Anand Chandulal J. Anand Chandulal M. Sreenivasa Rao M. Sreenivasa Rao Signature Based Duplicate Detection in Digital Libraries.
Welcome to the Web of Science tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques to.
Welcome to the Science Direct tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques to.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Query Processing In Multimedia Databases Dheeraj Kumar Mekala Devarasetty Bhanu Kiran.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
Slide 1 Standard Grade Computing Databases. Slide 2 Standard Grade Computing Definitions DatabaseA database is a structured collection of similar information.
Introduction to Digital Libraries Searching
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Amy Dai Machine learning techniques for detecting topics in research papers.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li 1, Weiyi Meng 2, Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
What have we learned?. What is a database? An organized collection of related data.
Metadata Crosswalking: Repurposing a Legacy Database for Use in CONTENTdm.
ReSeTrus Development of a digital library technology based on redundancy elimination and semantic elevation, with special emphasis on trust management.
Find an Essay in a Book Using Sophi Search or the ATLA Religion Database.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
GOOGLE SCHOLAR Compiled by Helene van der Sandt. WHAT IS GOOGLE SCHOLAR?
Yixin Chen and James Z. Wang The Pennsylvania State University
Set Theoretic Models 1. IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models.
Introduction n IR systems usually adopt index terms to process queries n Index term: u a keyword or group of selected words u any word (more general) n.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
How to find journal articles. Thousands of journals; millions of articles … But how do you find the articles you need?
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Information Retrieval Models School of Informatics Dept. of Library and Information Studies Dr. Miguel E. Ruiz.
Smart Web Search Agents Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: -
Discovery and Metadata March 9, 2004 John Weatherley
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Fuzzy Expert Systems (part 1) By: H.Nematzadeh
Fuzzy Expert Systems (part 1) By: H.Nematzadeh
Library Workshop for ENG1377 Exploring iSearch & Google Scholar
Information Retrieval
Introduction of KNS55 Platform
Multimedia Information Retrieval
Put the names of the people in the group here
Magnet & /facet Zheng Liang
Introduction to Information Retrieval
Put the names of the people in the group here
The ultimate in data organization
Information Retrieval and Web Design
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Advanced information retrieval
Information Retrieval and Web Design
Presentation transcript:

Stephen Brown Museums and the Web Asia 9-12 December 2013 Hong Kong Where are the pictures? Linking photographic records across collections using fuzzy logic Stephen Brown Museums and the Web Asia 9-12 December 2013 Hong Kong

Research question Can fuzzy logic based data mining algorithms be used to identify matches between different online collections? Answer: yes

erps16578 [Royal Museum, the court (i.e. Bargello Museum, the courtyard), Florence, Italy] No person listed 1 photomechanical print : photochrom color. [between ca. 1890 and ca. 1900]. From the Library of Congress The Courtyard of the Bargello, Florence Henry Little Bromide (Print) 1895 From the ERPS collection 13

Method Data preparation (correcting typographical errors, standardizing data such as dates, removing duplicate entries and mapping data to a common metadata schema). Data aggregation (combining standardized records in a single XML database where they can be mined for similarities). Query expansion (extending the range of keywords that are searched for). Field comparison (comparing the contents of individual fields and combining these to produce an overall similarity metric).

Alternative computing logics Classic logic is binary True/False zero/one Set theory Fuzzy logic Degrees of truth Fuzzy set theory 18

The concept of tall people Ben Youngs 5’10” Toby Flood 6’2” Geoff Parling 6’6” 19

The concept of tall people Classical approach: Any one over 6”is tall Ben Youngs 5’10” Toby Flood 6’2” Geoff Parling 6’6” 20

The concept of tall people Classical approach: Any one over 6”is tall Ben Youngs 5’10” Toby Flood 6’2” Geoff Parling 6’6” 21

Classical computing The membership function of the set tall people 1 5” 6” 7” Toby Flood 6’2” Ben Youngs 5’10” Geoff Parling 6’6” 22

The concept of tall people Fuzzy approach: Everyone is tall to some degree (as measure by the membership function) Ben Youngs 5’10” Toby Flood 6’2” Geoff Parling 6’6” 23

Fuzzy computing The membership function of the set tall people 1 5” 6” 5” 6” 7” Toby Flood 6’2” Ben Youngs 5’10” Geoff Parling 6’6” 24

Soft computing The membership function of the set tall people 1 5” 6” 0.95 0.7 0.45 5” 6” 7” Toby Flood 6’2” Ben Youngs 5’10” Geoff Parling 6’6” 25

Fuzzy computing Allows for vagueness in concepts Soft boundaries Partial degrees of truth 26

Lightweight Semantic Similarity B Chrysanthemum 1 Flower A. Chrysanthemum B. Flower

Lightweight Semantic Similarity B Chrysanthemum 1 Flower Chrysanthemum Cosine of the angle between A and B = 0 Therefore, no similarity between A and B Flower

Lightweight Semantic Similarity Chrysanthemum Flower

Lightweight Semantic Similarity Fuzzy term vectors using synset similarity values from WordNet Chrysanthemum Cosine of the angle between A and B > 0 Therefore, some similarity between A and B Flower

Combined similarity metric IF title is good AND person is good THEN match is good. IF title is good AND (date is good OR process is good) THEN match is ok. IF person is good AND title is bad THEN match is ok. IF title is bad AND person is bad THEN match is bad.

Conclusion Large numbers of small amounts of text are common in collections records. Text volumes too small for corpus linguistics analysis. Need for query expansion Text volumes too small for established Semantic Similarity analysis Lightweight semantic