Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse University Developing & Evaluating Metadata for Improved.

Slides:

Advertisements

Similar presentations

Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.

Advertisements

By: Rachel Hall and Amy Austin.  Grade: All levels  Price  Host App: $24.99  Student App: free (allows up to 32 students)  Location: App Store 

Eye Tracking Analysis of User Behavior in WWW Search Laura Granka Thorsten Joachims Geri Gay.

Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.

Information Retrieval Visualization CPSC 533c Class Presentation Qixing Zheng March 22, 2004.

Information Retrieval in Practice

Search Engines and Information Retrieval

1 Adaptive Management Portal April

Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,

Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.

Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.

Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.

1 CS 430: Information Discovery Lecture 20 The User in the Loop.

1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.

The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.

Overview of Search Engines

ACCESS TO QUALITY RESOURCES ON RUSSIA Tanja Pursiainen, University of Helsinki, Aleksanteri institute. EVA 2004 Moscow, 29 November 2004.

Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.

1 CS 430: Information Discovery Lecture 15 Library Catalogs 3.

Automatic Metadata Generation & Evaluation Generating & Evaluating MetaData Elizabeth D. Liddy Center for Natural Language Processing School of Information.

Educator’s Guide Using Instructables With Your Students.

Automatic Metadata Generation & Evaluation Automating & Evaluating Metadata Generation Elizabeth D. Liddy Center for Natural Language Processing School.

Publishing Digital Content to a LOR Publishing Digital Content to a LOR 1.

Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,

Search Engines and Information Retrieval Chapter 1.

Research Methodology. Refers to search for knowledge. Research is an academic activity.

Eye Tracking in the Design and Evaluation of Digital Libraries

1 CS 430: Information Discovery Lecture 14 Automatic Extraction of Metadata.

Annual reports and feedback from UMLS licensees Kin Wah Fung MD, MSc, MA The UMLS Team National Library of Medicine Workshop on the Future of the UMLS.

Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.

Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.

Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK

Semantic Learning Instructor: Professor Cercone Razieh Niazi.

1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)

Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials Arwen Hutt, University of Tennessee.

What to Know: 9 Essential Things to Know About Web Searching Janet Eke Graduate School of Library and Information Science University of Illinois at Champaign-Urbana.

Introduction to Omeka. What is Omeka? - An Open Source web publishing platform - Used by libraries, archives, museums, and scholars through a set of commonly.

Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.

BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™

Search Engine Architecture

4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.

BEN METADATA SPECIFICATION Isovera Consulting Feb

The Digital Library for Earth System Science: Contributing resources and collections GCCS Internship Orientation Holly Devaul 19 June 2003.

L JSTOR Tools for Linguists 22nd June 2009 Michael Krot Clare Llewellyn Matt O’Donnell.

Endangered Species A Collaborative Teaching Unit.

1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul

Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.

Information Retrieval

JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.

UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.

Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.

CP3024 Lecture 12 Search Engines. What is the main WWW problem?  With an estimated 800 million web pages finding the one you want is difficult!

1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.

Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.

Evaluation and Assessment of Instructional Design Module #4 Designing Effective Instructional Design Research Tools Part 2: Data-Collection Techniques.

Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.

Discovery and Metadata March 9, 2004 John Weatherley

Evaluation of an Information System in an Information Seeking Process Lena Blomgren, Helena Vallo and Katriina Byström The Swedish School of Library and.

Alexandria Digital Library ADL Metadata Architecture Greg Janée.

Information Retrieval in Practice

Information Organization: Overview

Search Engine Architecture

WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000

Search Engine Architecture

Introduction to Information Retrieval

Search Engine Architecture

Attributes and Values Describing Entities.

Information Retrieval and Web Design

Introduction Dataset search

Presentation transcript:

Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse University Developing & Evaluating Metadata for Improved Information Access

Background Breaking the Metadata Generation Bottleneck –1 st NSDL project ( ) –Adapted Natural Language Processing technology for automatic metadata generation –15 Dublin Core + 8 Gem education elements Project had a modest evaluation study –Results suggested that automatically generated metadata was qualitatively nearly equal to manually generated metadata

Types of Features: Linguistic Root forms of words Part-of-speech tags Phrases (Noun, Verb, Proper Noun) Categories (Person, Geographic, Organization) Concepts (sense disambiguated words / phrases) Semantic Relations Events Non-linguistic Length of document HTML and XML tags NLP-Based Metadata Generation

Potential Keyword data Html Document Configuration HTML Converter Metadata Retrieval Module Cataloger Catalog Date Rights Publisher Format Language Resource Type eQuery Extraction Module Creator Grade/Level Duration Date Pedagogy Audience Standard XML Document with Metadata PreProcessor Tf/idf Keywords Title Description Essential Resources Relation Output Gathering Program MetaExtract HTML Document

MetaTest Research Questions Do we need metadata for information access? –Why? How much metadata do we need? –For what purposes? Which elements do we need? –For which digital library tasks? How is metadata utilized by information-seekers? –When browsing / searching / previewing? Can automatically generated metadata perform as well as manually assigned metadata? –For browsing / searching / previewing?

Three Types of Evaluation of Metadata 1.Human expert qualitative review 2.Eye-tracking in searching & browsing tasks 3.Quantitative information retrieval experiment with 3 conditions 1.Automatically assigned metadata 2.Manually assigned metadata 3.Full-text indexing

Evaluation Methodology 1.System automatically meta-tagged a Digital Library collection that had already been manually tagged. 2.Solicited subject pool of teachers via listservs. 3.Had users qualitatively evaluate metadata tags. 4.Conducted searching & browsing experiments. 5.Monitored with eye-tracking & post-search interviews. 6.Observed relative utility of each meta-data element for both tasks. 7.Are now preparing for an IR experiment to compare 2 types of metadata generation + full-text indexing.

Who Were the Respondents? Type of Educator Elementary Teacher6% Middle School Teacher6% High School Teacher66% Higher Education Teacher6% Instructional Designer3% School Media3% Other11% <1 Year6% 1-3 Years29% 3-9 Years29% 10+ Years 37% Experience with Lesson Plans Science69% Math6% Engineering3% Combination11% Other11% Subject Taught

MetaData Element Coverage For 35 lesson plans & learning activities from the GEM Gateway Metadata elements present on automatic vs. manually generated records

Qualitative Statistical Analysis 35 subjects evaluated 7 resources + metadata records 234 total cases Ordinal level data measuring metadata quality –Unsure, Very Poorly, Poorly, Well, Very Well Mann-Whitney Test of Independent Pairs –Non-parametric test –Accepts Ordinal data –Does not require normal distribution, homogeneity of variance, or same sample size

Medians of Metadata Element Quality TitleDescriptionGradeKeywordDurationMaterialPedagogy Method Pedagogy Process Pedagogy Assessment Pedagogy Group Manual Quality Automatic Quality No statistical difference for 8 of 10 elements Minimally statistically significant better manual metadata for Title and Keyword elements Median Score Inter-Quartile Range Mean Rank

How users of Digital Libraries use and process metadata? –Test on three conditions Records with descriptions Records with Metadata Records with both descriptions and metadata Eye Tracking in Digital Libraries

What the Eyes Can Tell Us Indices of ocular behavior are used to infer cognitive processing, e.g., –Attention –Decision making –Knowledge organization –Encoding and access The longer an individual fixates on an area, the more difficult or complex that information is to process. The first few fixations indicate areas of particular importance or informativeness.

User Study: Data Collection User wears an eye-tracking device while browsing or searching STEM educational resources The eye fixations (stops) and saccades (gaze paths) are recorded. Fixations enable a person to gather information. No information can be acquired during saccades. The colors represent different intervals of time (from green through red).

Methods Pre-exposure search attempt –3 trials of entering search terms using free text, modifiers, boolean expressions etc. Exposure to test stimuli –Information in 1 of 3 formats Metadata only Description only Metadata and Description –Eye tracking during exposure Post- exposure search & follow-up interview

Scanpath of Metadata Only Condition

Graphically Understanding the Data Contour map shows the aggregate of eye fixations. Peak fixation areas are Description element, with some interest in URL and subject elements. Note dominance of upper left side. LookZone shows amount of time spent in each zone of record. User spent 27 seconds or 54% of time looking at escription metadata element. Very little time was spent on other elements.

Preliminary Findings: Eye Tracking Narrative resources are viewed in linear order, but metadata is not. Titles and sources are the most-viewed metadata. First few sentences in resource are read more carefully; the rest is skimmed. Before selecting a resource, users re-visit the record for confirmation. Subjects focus on narrative descriptions when both descriptions & metadata are on same page.

Preliminary Findings: Interview Data 65% changed their initial search terms after exposure to test stimuli. 20% indicated they would use their chosen document for the intended purpose. 60% said they learned something from retrieved document that helped them restructure their next search. 100% indicated they use Google when searching for lecture / lesson information. Less than half of the participants knew what metadata was.

Preliminary Findings: Search Attempts On post-exposure search attempts, mean number of search terms increased by 25% for those in the combined condition. Number of search terms decreased for both of the other conditions. Men used more search terms on their first query attempts, while women used more on their 2 nd query attempts. Men were more likely to use modifiers and full text queries, while women tended to use more Boolean expressions.

Upcoming Retrieval Experiment Real Users – STEM Teachers –Queries –Relevance Assessments Information retrieval experiment with 3, possibly 4 conditions 1.Automatically assigned metadata 2.Manually assigned metadata 3.Full-text indexing 4.Fielded searches

Concluding Thoughts Provocative findings –Need replication on other document types Digital Library is a networked structure, but not captured in linear world of metadata –Rich annotation by users is a type of metadata that is not currently but could be captured automatically Consider information extraction technologies –Entities, Relations, and Events Metadata can be useful in multiple ways –Not just for discovery –Have not experimented with management aspects of use Results of retrieval experiment will be key to understanding need for metadata for access