LTER Controlled Vocabulary Virtual WaterCooler - July, 2018

Slides:



Advertisements
Similar presentations
Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office.
Advertisements

WELCOME to the LTER Data Co-op with PASTA (Provenance Aware Synthesis Tracking Architecture) All Scientists Meeting 2012 Your source for LTER data.
Why, what were the idea ? 1.Create a data infrastructure, 2.Data + the knowledge products that are produced on the basis of data a) Efficiant access to.
LTER IM Articulation Work: Developing Community Web Recommendations Nicole Kaplan (SGS), Karen Baker (CCE, PAL), Barbara Benson (NTL), Eda Melendez-Colom.
2009 Mid–Term Review El Verde Field Station June 4, 2009.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Coolheads Consulting Copyright © 2003 Coolheads Consulting The Internal Revenue Service Tax Map Michel Biezunski Coolheads Consulting New York City, USA.
A Digital Geolibrary: Integrating Keywords and PlacenamesECDL A Digital GeoLibrary: Integrating Keywords And Place Names Mathew Weaver and Lois Delcambre.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Long-Term Ecological Research working_groups/controlled_vocabulary Working Group: “Synthesis through data.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
 Workshops: March & May 2011 and lots of VTCs! Details at:
Information Literacy, Search Strategies & Catalog Instruction Frederic Murray Assistant Professor MLIS, University of British Columbia BA, Political Science,
Objective: Researchers need access to data, regardless of the language used in the metadata. Our objective is to facilitate discovery of ILTER data regardless.
EcoTrends THE GOOD, THE BAD AND THE UGLY (and lessons learned along the way) OR THE GOOD, THE BETTER AND THE BEST (as JP might say) Christine Laney.
The MMI Tools Carlos Rueda Monterey Bay Aquarium Research Institute OOS Semantic Interoperability Workshop Marine Metadata Interoperability Project Boulder,
Controlled Vocabulary Working Group PRESENTED BY JOHN PORTER.
LTER IMC Meeting Sept Past Activities Created list of about ~650 terms based on widely-used LTER EML Keywords Autocomplete search aid added to.
“Scientists seeking data should be able to efficiently and reliably locate LTER datasets through searching, browsing …“  Get feedback on general direction.
Use of Hierarchical Keywords for Easy Data Management on HUBzero HUBbub Conference 2013 September 6 th, 2013 Gaurav Nanda, Jonathan Tan, Peter Auyeung,
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
Keyword vs. Controlled Vocabulary Searching 12 Basic Skills for IQ.
Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter rm.action?confKey=jhp7e.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
VOCAB VTC Jan. 13, VTC Agenda Terms of Reference – take a look Planning: Use-Case and Vocabulary Database Workshop – Feb. –March 2011 Planning:
IMC Business Meeting Elections/Nominations 2014 IMC meeting (Corinna) Outside partnerships (time permitting) –SGS (Nicole) IMC Annual Meeting 2013, BNZ.
Controlled Vocabulary VTC June 1, Agenda Review some past activities Plan some future activities.
 Finalize VOCAB “Terms of Reference”  Define use cases for the keyword database and its development  Develop procedures for capturing and managing.
LTER information management as an example. Overview: I am NOT going to present you with a series of concepts and documents I will tell you a 38 years.
Why EML Metrics Primary quality checks are limited –schema compliance –EML parser (ids and references) Dataset quality not sufficient for automated use.
Information Architecture & Design Week 5 Schedule -Planning IA Structures -Other Readings -Research Topic Presentations Nadalia your Presentations.
Network Information System Advisory Committee NISAC Activities Report Briefing to the LTER IM Committee 26-Jul-2010 Wade Sheldon NISAC Co-Chair.
Information Management using Ecological Metadata Language Corinna Gries - CAP Margaret O’Brien - SBC.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Controlled Vocabulary Giri Palanisamy Eda C. Melendez-Colom Corinna Gries Duane Costa John Porter.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
LTER Information Management Training Materials LTER Information Managers Committee Thesauri and Controlled Vocabularies.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Consultative process for finalizing the Guidance Document to facilitate the implementation of the clearing-house mechanism regional and national nodes.
EML Best Practices for LTER Site Metadata EML Best Practices Committee (Corinna Gries, Margaret O’Brien, Ken Ramsey, Wade Sheldon)
LTER IM Meeting 2008 – Benson, Boose, Bohm, Gries, Gu, Kaplan, Koskela, Laney, Porter, Remillard, Sheldon and others.
LTER GIS Working Group Update Adam Skibbe and Theresa Valentine 2012 June Water Cooler.
Collaborative Project Database Margaret O’Brien, Corinna Gries, Wade Sheldon, Jonathan Walsh, John Porter, Sven Bohm, James Brunt, Suzanne Remillard, Ken.
Controlled Vocabulary Working Group Activities
GEONIS. From the IM Proposals Developing “PASTA” ready spatial data for the Network Information System (NIS) – 1. Attend a workshop to create best practices.
Open Science Framework Jeffrey Spies University of Virginia.
Morpho – metadata management software SEEK Training January 2004.
A PRACTICAL OVERVIEW ON THE LUQ INFORMATION MANAGEMENT LTER: LUQ IM By: Eda C. Melendez Colom LUQ Information Manager.
Long Term Ecological Research Network Information System LTER EML Status LTER Information Manager’s Meeting 28 July 2004 Mark Servilla
Controlled Vocabulary Working Group Activities
Network Information System Advisory Committee NISAC Activity Report 2007 LTER IM Meeting Wade Sheldon (GCE) Committee Co-chair.
12 Basic Skills for IQ: Keyword vs. Controlled Vocabulary Searching.
1 Using DLESE: Finding Resources to Enhance Teaching Shelley Olds Holly Devaul 11 July 2004.
Information Organization: Overview
Network Information System Advisory Committee (NISAC)
Improving Data Discovery Through Semantic Search
Status Report – IMC 2011 Wade Sheldon (GCE) & Sven Bohm (KBS)
Taxonomies & Classification for Organizing Content
ITE 130 Web Searching.
Spicing Up Your Knowledge Management Strategy
Data Management: Documentation & Metadata
LTER Metadata Query Interface – Current Status and Future Challenges
Improving Dual Kidney Allocation
Information Retrieval
Rolling Review of Education Statistics
Policy Content at SSA … an update, and sharing questions
Measurement Semantics: “MEASEM”
Information Organization: Overview
Make EML with r and share on github
Open the meeting, agenda, logistics, introductions
Presentation transcript:

LTER Controlled Vocabulary Virtual WaterCooler - July, 2018

VTC - Objectives Set stage for working groups and panel discussions of lexical tools (including the controlled vocabulary) at the 2018 LTER All-Scientists’ Meeting Goal: “Scientists seeking data should be able to efficiently and reliably locate Ecological datasets through searching, and browsing …“

Why not be Eclectic? Pick your own words? Eclectic use of terms to used for discovering data makes it difficult to perform reliable or efficient searches Often several terms for one concept One site uses CO2 another Carbon Dioxide, another Carbon-dioxide Carbon to Nitrogen Ratio, C:N, C:N Ratio, Carbon-to-nitrogen Ratio No way to relate broader terms with narrower terms Searching on “Landscape Change” doesn’t find data sets related to “desertification” even though desertification is a kind of landscape change

Goals for Development of THE LTER THeSAURUS Identify a list of preferred terms that would be used by sites in creating metadata documents Focused on LTER-wide searches Want to facilitate cross-site synthesis People searching EDI rather than individual sites are interested in relevant data from multiple sites Wanted to hit the “sweet spot” for the number of terms (currently have ~700 terms) Too many terms make keywording documents difficult, and results in searches with too few datasets Too few terms make it hard to locate usably small numbers of datasets

Steps Taken (2011 & 2013) Assembled list of words already in LTER Metadata (EML documents) Selected using criteria: Keywords shared with GCMD and NBII, or Keywords used at more than one LTER site Reviewed by Information Managers Removals and additions were suggested Edited based on voting

Some STATISTICS (new) 96% of LTER Data Packages contain one or more terms found in the thesaurus Important for browsing! Only 4% can’t be browsed 9X Data - Simple searches using terms in the thesaurus return a median of 18 datasets (non-thesaurus terms return only 2) 5X Sites - Searches using terms in the thesaurus retrieve data from a median of 5 sites (non-thesaurus terms return data from only a median of 1 site) Of the 824 terms used for 5 or more data packages at 2 or more site, 632 (77%) are in the Thesaurus

KEYWORDS USED ACROSS SITES Truncated at 100, the max is 295 (mostly species names)

Preferred Terms Across Sites The median number of preferred terms per dataset is 5

Recent Activities Statistical Analysis of Keywords in LTER documents Survey requesting information on how keywords are incorporated into LTER Data Packages IM’s play lead role 77% of the time, researchers 23% Identification of additional candidate terms Only 192 frequently used terms are NOT in the Thesaurus Many are synonyms of terms that are already in the thesaurus, or places or taxonomic terms

Lexical Structures Goal: Improve Searching & Browsing Reliability (of all the suitable target documents, what percentage did you find) Efficiency (of the documents your search returned, what percentage were suitable) A list alone is not sufficient to support browsing and sophisticated searching of data – more structure is needed

Currently the LTER Controlled Vocabulary is contained in a Thesaurus Synonyms (use-for terms) Broader -> Narrower A few non-hierarchical relationships Integrated into PASTA Browse search Advanced searches Has been incorporated into EnvThes and some other thesauri Web services for aiding searching and selecting terms are available

Structures Complexity List Synonym Ring Taxonomy Thesaurus Ontology LTER Status = Complexity Multiple taxonomys are a Polytaxonomy

ISSUES FOR THE ALL-SCIENTISTS’ MEETING Do we need to move to use of an Ontology or other lexical structure? Should we abandon the LTER Controlled Vocabulary in favor of another, existing resource? If not, what upgrades are needed (updated software, additional terms) How do we deal with place names (Gazeteer), and Taxonomic Names as Keywords?

THANKS! Members of the Controlled Vocabulary Working Group have all made major contributions to the work of the group. Henshaw, Donald; Jones, Julia; Laundre, James; Ruess, Roger; Downing, Jason; Costa, Duane; Servilla, Mark; San Gil, Inigo; Brunt, James; Melendez-Colom, Eda; Crowl, Todd; Gries, Corinna; O'Brien, Margaret; Vanderbilt, Kristin; and Porter, John