Controlled Vocabulary Working Group Activities 2005-2006.

Slides:



Advertisements
Similar presentations
Information Architecture for Indexers Presented by Fred Leise American Society of Indexers National Conference Galveston, Texas May 18, 2002 © 2002 ContextualAnalysis.
Advertisements

Jeopardy Objects Navigation Buttons True/False Parts of a Report Vocabulary Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300 Q $400 Q $500 Final.
Advanced Google Becoming a Power Googler. (c) Thomas T. Kaun 2005 How Google Works PageRank: The number of pages link to any given page. “Importance”
Engineering Village ™ Basic Searching.
2009 Mid–Term Review El Verde Field Station June 4, 2009.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
Managing Data Resources
Internet Resources Discovery (IRD) Search Engines Quality.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Environmental Terminology System and Services (ETSS) June 2007.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
1 Using Scopus for Literature Research. 2 Why Scopus?  A comprehensive abstract and citation database of peer- reviewed literature and quality web sources.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Search engines. The number of Internet hosts exceeded in in in in in
UCB CS Research Fair Search Text Mining Web Site Usability Marti Hearst SIMS.
Libraries and Institutional Content Management Systems
Information Architecture Donna Maurer Usability Specialist.
Long-Term Ecological Research working_groups/controlled_vocabulary Working Group: “Synthesis through data.
Sullivan University Library EbsCOhost® is a database collection that is provided by the Kentucky Virtual Library® (KYVL ® ). What is KYVL®? KYVL® is.
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
 Workshops: March & May 2011 and lots of VTCs! Details at:
Indexes/Abstracts Ready Reference Dr. Dania Bilal IS 530 Spring 2002.
An Interactive Multimedia Database of U.S. Courthouses 1 CourtsWeb, is a website that evaluates and documents recent federal courthouses. It is a decision.
1 DATABASES By: Hanna Ben-Or Phone: October 2011.
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
SCIENTIFIC SOLUTIONS Journal Citation Reports ® New Features of Version 4.0.
1 Scopus as a Research Tool March Why Scopus?  A comprehensive abstract and citation database of peer-reviewed literature and quality web sources.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Controlled Vocabulary Working Group PRESENTED BY JOHN PORTER.
“Scientists seeking data should be able to efficiently and reliably locate LTER datasets through searching, browsing …“  Get feedback on general direction.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
Review of Literature Announcement: Today’s class location has been rescheduled to TEC 112 Next Week: Bring four questions (15 copies) to share with your.
1 © Netskills Quality Internet Training, University of Newcastle Search Engines and Other Animals © Netskills, Quality Internet Training, University of.
Heuristic evaluation Functionality: Visual Design: Efficiency:
Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter rm.action?confKey=jhp7e.
Medline on OvidSP. Medline Facts Extensive MeSH thesaurus structure with many synonyms used in mapping and multidatabase searching with Embase Thesaurus.
Search Engine Optimization & Pay Per Click Advertising
0 eCPIC User Training: Resource Library These training materials are owned by the Federal Government. They can be used or modified only by FESCOM member.
EPA’s Environmental Terminology System and Services (ETSS) Michael Pendleton Data Standards Branch, EPA/OEI Ecoiformatics Technical Collaborative Indicators.
New Ideas for IA Readings review - How to manage the process Content Management Process Management - New ideas in design Information Objects Content Genres.
Working with Templates Lesson 6. Skills Matrix SKILL #MATRIX SKILL 1.1.1Work with templates 1.1.6Insert blank pages or cover pages.
Usability Issues in Metasearch Interface Design: persectives of an information provider LITA Human Machine Interface Interest Group June 25, 2004 Oliver.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
IT: Web Technologies: Web Animation 1 Copyright © Texas Education Agency, All rights reserved. 1 Web Technologies Designing Web Site Layout Using.
Strategies for Adding EML Support to the GCE Data Toolbox for Matlab Wade Sheldon Georgia Coastal Ecosystems LTER (WWW: gce-lter.marsci.uga.edu/lter)
 Finalize VOCAB “Terms of Reference”  Define use cases for the keyword database and its development  Develop procedures for capturing and managing.
1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul
GEMET GEneral Multilingual Environmental Thesaurus leading the way to federated terminologies Stefan Jensen, Head of information services group with input.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Controlled Vocabulary Giri Palanisamy Eda C. Melendez-Colom Corinna Gries Duane Costa John Porter.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
U.S. Department of the Interior U.S. Geological Survey The Biological Data Profile Extending the FGDC Metadata Standard Kirsten Larsen.
Using OARE Search Engines. Environmental Index (EBSCO) Advanced Search.
LTER IM Meeting 2008 – Benson, Boose, Bohm, Gries, Gu, Kaplan, Koskela, Laney, Porter, Remillard, Sheldon and others.
Terminology Components for Ecoinformatics Sharing Gail Hodge Consultant to USGS BIO/NBII Information International Associates, Inc. 28 January 2004 science.
Long Term Ecological Research Network Information System LTER EML Status LTER Information Manager’s Meeting 28 July 2004 Mark Servilla
Controlled Vocabulary Working Group Activities
1 e-Resources on Social Sciences: Scopus. 2 Why Scopus?  A comprehensive abstract and citation database of peer-reviewed literature and quality web sources.
Taxonomies & Classification for Organizing Content
PubMed Database Interface (Basic Course Module 4 Part A)
LTER Metadata Query Interface – Current Status and Future Challenges
EBSCOhost Page Composer
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
PubMed Database Interface (Basic Course: Module 4 Part A)
PubMed Database Interface Part A (Basic Course Module 4)
LTER Controlled Vocabulary Virtual WaterCooler - July, 2018
Presentation transcript:

Controlled Vocabulary Working Group Activities

The Problem ► Inconsistent, disjunct and sparse keywords negatively impact data discovery 72.2% of all keywords are used at only a single LTER site 90% of all keywords are used at 4 or fewer LTER sites

The Problem ► Good “Browse” interfaces require some organization of keywords ► E.g. BIOSPHERE  PLANTS ► VASCULAR PLANTS  OAK ► NON-VASCULAR PLANTS  ANIMALS ► VERTEBRATES ► INVERTEBRATES

Possible Solutions 1. Create an LTER Controlled Vocabulary or Thesaurus or Ontology  Advantages: ► Absolute control on contents ► Ability to customize to meet LTER needs  Disadvantages: ► Development will be time and resource expensive ► Such development can be a highly technical field requiring specialists

Possible Solutions 2. Adopt an existing controlled vocabulary, thesaurus or ontology  Advantages: ► Minimal cost to LTER ► Aids in linking LTER to a larger world of data systems  Disadvantages: ► Lack of control ► Existing systems may not be suitable for LTER use  Lack desirable terms

2005 LTER IM Meeting ► A the 2005 IM meeting we decided that the best option to explore was Option 2 (use an existing resource)  Rationale: ► Could potentially save lots of time, trouble and money! ► Helps forge links with other groups ► Could make LTER systems interact better with other similar systems

Plan of Action

General Steps ►  Identify existing resources that LTER could use  NBII Thesaurus  GEMET (GEneral Multilingual Environmental Thesaurus)  Global Change Master Directory (GCMD)  SEEK Ontology ► Evaluate the usability of existing systems ► Develop tools and relationships needed to exploit and improve the system(s) of choice

Assembling Resources ► assemble list of existing keywords  EML ► Keywords  ► title words  ► attribute definition words  ► taxonomy keywords  ITIS SPIRE web service from UMD.BaltCo....  DTOC   publications titles, keywords and abstracts   Site keyword lists - e.g., AND-LTER   need to count word and site frequency and number of keywords per document

Some Statistics Source Number of Terms Number used at 5 or more sites Most Frequently used EML Keywords 2,71186 LTER (1002), Temperature (701) EML Titles 2, And (768), Data (394), LTER (350) EML Attributes 6, The (4,207), Data(1,621), Carbon(328) DTOC Keywords 2, ARC (1645), Temperature (732) Bibliography Titles 13,5381,855 Of (12,611), Forest (2,050)

Consolidated List ► The consolidated list includes 21,153 words or terms along with  Number of “lists” on which it appeared (max 5)  Number of sites and uses from each list  Max and Min number of sites using (0-26)  Max and Min number of uses (0-12,611)  Is it a multi-word term?

Ranking/Rating Words ► Terms were sorted by:  Number of Lists  Max. number of sites on any single list  Min. number of sites on any single list  Number of uses ► The top 1010 terms were then rated as “useful” (U), “marginal/not sure” (M) or “not useful” (N) by volunteers  Needed for abbreviations e.g., CO2 and words that are too general (e.g., “Above”, “Total”)  The resulting list was then additionally sorted by a term score T=((U*1)+(M*0)+(N*-1))/(U+M+N)  Always “Useful”=1.00, Always “Not Useful”= -1.00

Top of the list

Bottom of the list

Preliminary Evaluation ► Volunteers have used highly ranked words from the “list of 1000” to test retrieval from various thesauri  So far NBII seems to be preferred, but we need additional testers ► Inigo San Gil has been working on automated queries of the of NBII Thesaurus

Tasks for this meeting ► Once we have a controlled vocabulary, how shall we use it? What tools do we need to develop? ► What additional testing/evaluation is required (bring in PI’s?)? What institutional relationships need to be pursued? What actions do we need to take to improve the usability of resources for LTER use?