Copyright © 2001 eMotion, Inc. All Rights Reserved r Metadata Issues Sharon Flank eMotion, Inc.

Slides:



Advertisements
Similar presentations
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Advertisements

AM Queries and Views. Overview Asset Manager provides sophisticated querying and reporting capability, from simple filters to a complex language that.
Complex queries in the PATENTSCOPE search system Cyberspace September 2013 Sandrine Ammann Marketing & Communications Officer.
COMMERCIAL METADATA APPROACH BY ANDREA DE POLO (ALINARI)
INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
PubMed and its search options Jan Emmerich, Sonja Jacobi, Kerstin Müller (5th Semester Library Management)
Overview of PubWEST Patent and Trademark Depository Library Training Seminar April 2006.
IS530 Lesson 12 Boolean vs. Statistical Retrieval Systems.
1 Texmex – November 15 th, 2005 Strategy for the future Global goal “Understand” (= structure…) TV and other MM documents Prepare these documents for applications.
Engineering Village ™ ® Basic Searching On Compendex ®
Copyright © 2003 Americas’ SAP Users’ Group Simple Document Management in Project Systems Kent Bettisworth BETTISWORTH & ASSOCIATES, INC. Tuesday, May.
Information Retrieval in Practice
Search Engines and Information Retrieval
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
Intelligent Information Retrieval CS 336 –Lecture 2: Query Language Xiaoyan Li Spring 2006 Modified from Lisa Ballesteros’s slides.
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
Information Retrieval in Practice
The Wharton School of the University of Pennsylvania OPIM 101 2/16/19981 The Information Retrieval Problem n The IR problem is very hard n Why? Many reasons,
Overview of Search Engines
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 1- 1 Chapter 1 - Introduction: Databases and Database Users - Outline Types of Databases and.
CoopIS, Trento, Italy, 05/09/2001 Schema design and query processing in a federated multimedia database system Henrike Berthold & Klaus Meyer-Wegener.
Search Engines and Information Retrieval Chapter 1.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Collection Level Data Problems … & Suggestions for Avoiding Them.
Types of Periodicals in Literature Professional Scholarly Literary.
Multilingual Information Exchange APAN, Bangkok 27 January 2005
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Querying Structured Text in an XML Database By Xuemei Luo.
Databases. Databases Database Searching Database Searching Definition: A database is any organized collection of data that can be retrieved using organized.
Never-ending Search: (What you REALLY need to know about online searching) Ms. Emili school year.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
WJEC Applied ICT Databases – Queries and Database Practice Queries When you create a database – one of the main strengths of it is the ability to.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Talk Schedule Question Answering from Bryan Klimt July 28, 2005.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #22 Secure Web Information.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Search Tools and Search Engines Searching for Information and common found internet file types.
Web Directories: Group 5 Jack Baker Laura Bingham Morgan Stewart.
Topic Maps introduction Peter-Paul Kruijsen CTO, Morpheus software ISOC seminar, april 5 th 2005.
Using OARE Search Engines. Environmental Index (EBSCO) Advanced Search.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
3 Copyright © 2010, Oracle. All rights reserved. Product Data Hub: PIM Functional Training Program Setup Workbench Fundamentals.
A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.
SEPTEMBER 2015 Databases. Database (review) A database is a collection of data arranged for ease and speed of search and retrieval (The American Heritage.
Information Retrieval in Practice
Information Retrieval in Practice
Education 499-R01 Search Basics.
Digital Video Library - Jacky Ma.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Information Retrieval (in Practice)
Searching for and Accessing Information
Multimedia Information Retrieval
Search Techniques and Advanced tools for Researchers
Information Retrieval
Data Resource Management
Data Model.
International Marketing and Output Database Conference 2005
Data Resource Management
Introduction to Search Engines
Presentation transcript:

Copyright © 2001 eMotion, Inc. All Rights Reserved r Metadata Issues Sharon Flank eMotion, Inc.

Overview The value of metadata Possible structures for metadata Dynamic Semantic Expansion Managing Hierarchical Information Optimizing for the long term What’s wrong with keywords?

Why Metadata? Image recognition is immature –Colors –Fingerprint matching Transcripts do not speak for themselves –Creation info –History –Implicit info

Possible Metadata Structures Fielded Data –Automatically collected –Added by a human indexer –Checkboxes, fill-ins, pull-downs Descriptive Data Channel Data –Closed caption –Emerging types (e.g. KLV)

Fielded Data Attribute-value pairs Independent of database and data structure Various search methods –Exact match-- Wildcard –Keyword-- Numeric –Boolean-- Date May have “smart” fields

Smart Features Synonym matching – puma, mountain lion, cougar Related term matching – big cat, carnivore Name identification – Hilary, Hillary, H.R. Clinton, George W. Bush, Al, Albert, Gore not gore, Bush not bush

More Smart Features Word roots/stemming – geese/goose, not geeses or gooses Combines with fielded search –Can specify categories with a pull-down choice –Can specify dates, locations, equipment –Can include free-form query text at the same time

More Smart Features Locations – Boston, Massachusetts, MA, New England, Northeast, U.S., North America Relative frequency –Greater weight on rarer terms Phrase recognition – fire engine vs. engine fire – tiger should not retrieve tiger shark

Smart Features: Summary

Places burden on system, not on cataloguer or searcher Based on standard resources 400,000+ terms Extensible –New terms –New relationships –Multilingual Dynamic Semantic Expansion

Managing Hierarchical Information Fields vs. objects Object model –Can represent more complex information –Brittle –Difficult to maintain, extend Fields –Simpler –Can represent most information efficiently

Objects Person Composer Talent Director Producer Actress Singer Barbra Streisand

Object Model: Complications For every new role Barbra Streisand plays: Thesaurus manager enters each new role in thesaurus. Cataloguers then enter Barbra Streisand (in new role) in metadata. No variants permitted. Search processing requires traversing the hierarchy, i.e. multiple actions.

Fields Actor/Actress: Barbra Streisand Singer: Barbra Streisand Producer: Barbra Streisand Director: Barbra Streisand Composer: Barbra Streisand

Field Model Cataloguers enter name. Variants automatically handled. –Barbara, Barb, Ms. B. Streisand Search processing is one simple transaction.

Optimizing for the Long Term Select a standard that is –Long-lasting –Well known –Easy to learn

Comparing Standards

Retrieval of Keywords Sparse –<5 keywords –Boolean keyword search is fine –Many assets may have the same keywords –May be hard to guess keywords Thorough –>5 keywords –Phrases are lost –Booleans OK –Semantic matching still useful –Less need to guess keywords

Comparing Search Methods

What’s Wrong with Keywords? Hard to write –Haphazard –Need tools to standardize even a little –Cataloguers need extensive training, QA Hard to maintain –Difficult to keep up standards over time Hard to search –Searchers need to guess at keywords

Why English? With Natural Language search, system takes on the burden of relating terms 50% cheaper/faster to catalogue Easy to train cataloguers No search training needed Higher precision even for advanced users

XML is Not Enough Simply a labeling mechanism Washington vs. Location: Washington Person: Washington Doesn’t tell us which one matches George Washington or Washington DC or East

Conclusion Select a standard that is –Low-maintenance –Easy for cataloguers –Easy to extend –Supportive of naïve users –Supportive of expert searchers