Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse University Developing & Evaluating Metadata for Improved.

Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse University Developing & Evaluating Metadata for Improved Information Access

Background Breaking the Metadata Generation Bottleneck –1 st NSDL project (2000-2002) –Adapted Natural Language Processing technology for automatic metadata generation –15 Dublin Core + 8 Gem education elements Project had a modest evaluation study –Results suggested that automatically generated metadata was qualitatively nearly equal to manually generated metadata

Types of Features: Linguistic Root forms of words Part-of-speech tags Phrases (Noun, Verb, Proper Noun) Categories (Person, Geographic, Organization) Concepts (sense disambiguated words / phrases) Semantic Relations Events Non-linguistic Length of document HTML and XML tags NLP-Based Metadata Generation

Potential Keyword data Html Document Configuration HTML Converter Metadata Retrieval Module Cataloger Catalog Date Rights Publisher Format Language Resource Type eQuery Extraction Module Creator Grade/Level Duration Date Pedagogy Audience Standard XML Document with Metadata PreProcessor Tf/idf Keywords Title Description Essential Resources Relation Output Gathering Program MetaExtract HTML Document

MetaTest Research Questions Do we need metadata for information access? –Why? How much metadata do we need? –For what purposes? Which elements do we need? –For which digital library tasks? How is metadata utilized by information-seekers? –When browsing / searching / previewing? Can automatically generated metadata perform as well as manually assigned metadata? –For browsing / searching / previewing?

Three Types of Evaluation of Metadata 1.Human expert qualitative review 2.Eye-tracking in searching & browsing tasks 3.Quantitative information retrieval experiment with 3 conditions 1.Automatically assigned metadata 2.Manually assigned metadata 3.Full-text indexing

Evaluation Methodology 1.System automatically meta-tagged a Digital Library collection that had already been manually tagged. 2.Solicited subject pool of teachers via listservs. 3.Had users qualitatively evaluate metadata tags. 4.Conducted searching & browsing experiments. 5.Monitored with eye-tracking & post-search interviews. 6.Observed relative utility of each meta-data element for both tasks. 7.Are now preparing for an IR experiment to compare 2 types of metadata generation + full-text indexing.

Who Were the Respondents? Type of Educator Elementary Teacher6% Middle School Teacher6% High School Teacher66% Higher Education Teacher6% Instructional Designer3% School Media3% Other11% <1 Year6% 1-3 Years29% 3-9 Years29% 10+ Years 37% Experience with Lesson Plans Science69% Math6% Engineering3% Combination11% Other11% Subject Taught

MetaData Element Coverage For 35 lesson plans & learning activities from the GEM Gateway Metadata elements present on automatic vs. manually generated records

Qualitative Statistical Analysis 35 subjects evaluated 7 resources + metadata records 234 total cases Ordinal level data measuring metadata quality –Unsure, Very Poorly, Poorly, Well, Very Well Mann-Whitney Test of Independent Pairs –Non-parametric test –Accepts Ordinal data –Does not require normal distribution, homogeneity of variance, or same sample size

Medians of Metadata Element Quality TitleDescriptionGradeKeywordDurationMaterialPedagogy Method Pedagogy Process Pedagogy Assessment Pedagogy Group Manual Quality 3 2-4 132 3 3-4 122 3 2-4 73 3 3-4 127 3 2.75-4 29 3.5 3-4 49 3 0.5-3 30 -- 3 1.5-3 14 Automatic Quality 3 1-4 105 3 2-4 113 3 3-4 80 3 2-4 99 3 2-3.25 25 3 2-4 39 3 2-4 33 3 1-3 53 3 2-3.5 9 3 2.5-4 18 No statistical difference for 8 of 10 elements Minimally statistically significant better manual metadata for Title and Keyword elements Median Score Inter-Quartile Range Mean Rank

How users of Digital Libraries use and process metadata? –Test on three conditions Records with descriptions Records with Metadata Records with both descriptions and metadata Eye Tracking in Digital Libraries

What the Eyes Can Tell Us Indices of ocular behavior are used to infer cognitive processing, e.g., –Attention –Decision making –Knowledge organization –Encoding and access The longer an individual fixates on an area, the more difficult or complex that information is to process. The first few fixations indicate areas of particular importance or informativeness.

User Study: Data Collection User wears an eye-tracking device while browsing or searching STEM educational resources The eye fixations (stops) and saccades (gaze paths) are recorded. Fixations enable a person to gather information. No information can be acquired during saccades. The colors represent different intervals of time (from green through red).

Methods Pre-exposure search attempt –3 trials of entering search terms using free text, modifiers, boolean expressions etc. Exposure to test stimuli –Information in 1 of 3 formats Metadata only Description only Metadata and Description –Eye tracking during exposure Post- exposure search & follow-up interview

Scanpath of Metadata Only Condition

Graphically Understanding the Data Contour map shows the aggregate of eye fixations. Peak fixation areas are Description element, with some interest in URL and subject elements. Note dominance of upper left side. LookZone shows amount of time spent in each zone of record. User spent 27 seconds or 54% of time looking at escription metadata element. Very little time was spent on other elements.

Preliminary Findings: Eye Tracking Narrative resources are viewed in linear order, but metadata is not. Titles and sources are the most-viewed metadata. First few sentences in resource are read more carefully; the rest is skimmed. Before selecting a resource, users re-visit the record for confirmation. Subjects focus on narrative descriptions when both descriptions & metadata are on same page.

Preliminary Findings: Interview Data 65% changed their initial search terms after exposure to test stimuli. 20% indicated they would use their chosen document for the intended purpose. 60% said they learned something from retrieved document that helped them restructure their next search. 100% indicated they use Google when searching for lecture / lesson information. Less than half of the participants knew what metadata was.

Preliminary Findings: Search Attempts On post-exposure search attempts, mean number of search terms increased by 25% for those in the combined condition. Number of search terms decreased for both of the other conditions. Men used more search terms on their first query attempts, while women used more on their 2 nd query attempts. Men were more likely to use modifiers and full text queries, while women tended to use more Boolean expressions.

Upcoming Retrieval Experiment Real Users – STEM Teachers –Queries –Relevance Assessments Information retrieval experiment with 3, possibly 4 conditions 1.Automatically assigned metadata 2.Manually assigned metadata 3.Full-text indexing 4.Fielded searches

Concluding Thoughts Provocative findings –Need replication on other document types Digital Library is a networked structure, but not captured in linear world of metadata –Rich annotation by users is a type of metadata that is not currently but could be captured automatically Consider information extraction technologies –Entities, Relations, and Events Metadata can be useful in multiple ways –Not just for discovery –Have not experimented with management aspects of use Results of retrieval experiment will be key to understanding need for metadata for access

Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse University Developing & Evaluating Metadata for Improved.

Similar presentations

Presentation on theme: "Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse University Developing & Evaluating Metadata for Improved."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse University Developing & Evaluating Metadata for Improved.

Similar presentations

Presentation on theme: "Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse University Developing & Evaluating Metadata for Improved."— Presentation transcript:

Similar presentations

About project

Feedback