Objective: Researchers need access to data, regardless of the language used in the metadata. Our objective is to facilitate discovery of ILTER data regardless.

Slides:



Advertisements
Similar presentations
WDL Technical Architecture Working Group (TAWG) June 2010 Achievements and Recommendations Co-chaired by Noha Adly, Bibliotheca Alexandrina Babak Hamidzadeh,
Advertisements

Issues in methods and reuse for hypermedia ethnography Presented at QUADS Showcase day September 28, 2006 Louise Corti.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
IUFRO International Union of Forest Research Organizations Eero Mikkola The Increasing Importance of Metadata in Forest Information Gathering NEFIS Symposium.
Environmental Information Data Centre: enabling the discovery of CEH-held data John Watkins Deputy Director EIDC.
ILTER Information Management Committee Kristin Vanderbilt, Ph.D. Sevilleta LTER University of New Mexico Albuquerque, NM USA.
Sourcebook for Hydrogen Applications Expansion & Dissemination Jeffrey Serfass General Manager Partnership for Advancing the Transition to Hydrogen APEC.
Proposal to be considered at the OneGeology Workshop Linguistic base of the OneGeology project Oleg Petrov Grigory Brekhov Evgeny Kiselev Viktor Snezhko.
John Porter MANY HANDS: FOSTERING ECOLOGICAL DATA SHARING THROUGH ILTER INFORMATION MANAGEMENT COLLABORATIONS.
U.S. Department of the Interior U.S. Geological Survey National Geospatial Technical Operations Center Towards a More Consistent Framework for Disseminated.
Environmental Terminology System and Services (ETSS) June 2007.
Building the Digital Coast with the International Coastal Atlas Network Ned Dwyer Coastal & Marine Resources Centre, Ireland Dawn Wright Oregon State University.
ÆKOS: A new paradigm for discovery and access to complex ecological data David Turner, Paul Chinnick, Andrew Graham, Matt Schneider, Craig Walker Logos.
Long-Term Ecological Research working_groups/controlled_vocabulary Working Group: “Synthesis through data.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Dawn Wright Oregon State University Ned Dwyer Coastal & Marine Resources Centre, Ireland The International Coastal Atlas Network (ICAN) FGDC Marine & Coastal.
Based on material developed by Samantha Romanello and
 Workshops: March & May 2011 and lots of VTCs! Details at:
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Search Server Index Search Server Index Somewhere There’s a PLACE for Us: Linking Fedora Digital Collections and Open Geoportal Eleta Exline, Thelma Thompson,
International Long Term Ecological Research (ILTER) Network: Past, Present, and Future Kristin Vanderbilt, Ph.D. Co-chair, US ILTER Committee Department.
FishBase Summary Page about Salmo salar in the standard Language of FishBase (English) ENBI-WP-11: Multilingual Access to European Biodiversity Sites through.
Controlled Vocabulary Working Group PRESENTED BY JOHN PORTER.
LTER IMC Meeting Sept Past Activities Created list of about ~650 terms based on widely-used LTER EML Keywords Autocomplete search aid added to.
Multilingual Information Exchange APAN, Bangkok 27 January 2005
International/Interagency Collaboration – Information Technology for Environmental Information and Environmental Data Exchange Network Thomas F. Lahr,
“Scientists seeking data should be able to efficiently and reliably locate LTER datasets through searching, browsing …“  Get feedback on general direction.
Pacific Northwest Information Node Status Report Robert Norheim USGS FRESC Cascadia Field Station University of Washington College of Forest Resources.
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter rm.action?confKey=jhp7e.
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
GLOBAL BIODIVERSITY INFORMATION FACILITY Éamonn Ó Tuama Senior Programme Officer, IDA 21 June Metadata publishing with the IPT.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
DATA-MODEL ASSIMILATION CHALLENGES AND OPPORTUNITIES IN THE LTER PROGRAM Debra Peters Lead Research Scientist, USDA ARS, Jornada Experimental Range, Las.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
A Prototype Ontology Tool and Interface for Coastal Atlas Interoperability Dawn J. Wright 1, Luiz Bermudez 2 (presenter), Liz O’Dea 3, Yassine Lassoued.
Using Desktop Data in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
LTER Data Management Margaret O’Brien Santa Barbara Coastal Long Term Ecological Research (LTER) Project Santa Barbara Channel Biodiversity Observation.
Introducing the International Coastal Atlas Network Ned Dwyer & Valerie Cummins Coastal & Marine Resources Centre, Ireland Dawn Wright Oregon State University.
Controlled Vocabulary Giri Palanisamy Eda C. Melendez-Colom Corinna Gries Duane Costa John Porter.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio.
LTER IM Meeting 2008 – Benson, Boose, Bohm, Gries, Gu, Kaplan, Koskela, Laney, Porter, Remillard, Sheldon and others.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
The Research Data Archive at NCAR: A System Designed to Handle Diverse Datasets Bob Dattore and Steven Worley National Center for Atmospheric Research.
The LTER Network Planning Grant Barbara Benson NTL-LTER.
Controlled Vocabulary Working Group Activities
A Collaborative Approach to Developing a Multilingual Forestry Thesaurus A project in development between IUFRO, CABI and FAO –Gillian Petrokofsky, CAB.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen Senior Programme Officer, ECAT 3 Oct th Nodes Meeting.
The International Coastal Atlas Network (ICAN) Overview and Recent Activities Ned Dwyer Dawn Wright.
Controlled Vocabulary Working Group Activities
The Agricultural Ontology Server (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Food and Agriculture Organization.
Building an Information Management System for Global Data Sharing: A Strategy for the International Long Term Ecological Research (ILTER) Network Kristin.
EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.
Strategies for NIS Development
Multilingual WorldWideScience:
Tomas Kliment Junior Researcher Italian National Research Council
CUAHSI HIS Sharing hydrologic data
European Network of e-Lexicography
Designing an Infrastructure for Heterogeneity of Ecosystem
Bird of Feather Session
LTER Controlled Vocabulary Virtual WaterCooler - July, 2018
Multi-National Invoices
Robert Dattore and Steven Worley
Presentation transcript:

Objective: Researchers need access to data, regardless of the language used in the metadata. Our objective is to facilitate discovery of ILTER data regardless of the languages used duiring their creation Abstract: Data in ILTER data archives are only useful if it can be found by researchers. At the international-level the challenge of searching datasets is compounded by the need to deal with multiple languages. A series of workshops in China explored the challenges of creating an information management system for the International LTER. During a 2008 workshop at Lake Taihu, participants recommended that each ILTER region host a Metacat to house network metadata as Ecological Metadata Language documents. Since that time, Metacat-based metadata catalogs have been established in Taiwan, Japan, Spain, Brazil, and Malaysia. A second workshop in Shanghai, China in 2012 explored the options for using a multilingual controlled vocabulary that would allow researchers to discover ILTER data on an international scale. The latter workshop led to development of prototype search tools that incorporate both translation and search enrichment services. The search enrichment services allow automated search on more specific terms (i.e., “narrower terms”). Thus a search on “forest ecosystems” would include datasets whose metadata included "boreal forests", "clearcuts", "forests", "old-growth forests" and "old growth” as well. Adding a translation layer adds additional search terms ("Bosque", “ Foresta", "Forst", "Forêt", "Las", "Metsä", "Skog", "Wald", " 森林 ", and " 皆伐 "). The prototype tools use EnvThes (an existing multilingual thesaurus that already fully incorporates the U.S. LTER controlled vocabulary) as their thesaurus, but are web-service based, allowing them to be incorporated into a wide array of customized searching applications. Prototype tools can be seen at: Abstract: Data in ILTER data archives are only useful if it can be found by researchers. At the international-level the challenge of searching datasets is compounded by the need to deal with multiple languages. A series of workshops in China explored the challenges of creating an information management system for the International LTER. During a 2008 workshop at Lake Taihu, participants recommended that each ILTER region host a Metacat to house network metadata as Ecological Metadata Language documents. Since that time, Metacat-based metadata catalogs have been established in Taiwan, Japan, Spain, Brazil, and Malaysia. A second workshop in Shanghai, China in 2012 explored the options for using a multilingual controlled vocabulary that would allow researchers to discover ILTER data on an international scale. The latter workshop led to development of prototype search tools that incorporate both translation and search enrichment services. The search enrichment services allow automated search on more specific terms (i.e., “narrower terms”). Thus a search on “forest ecosystems” would include datasets whose metadata included "boreal forests", "clearcuts", "forests", "old-growth forests" and "old growth” as well. Adding a translation layer adds additional search terms ("Bosque", “ Foresta", "Forst", "Forêt", "Las", "Metsä", "Skog", "Wald", " 森林 ", and " 皆伐 "). The prototype tools use EnvThes (an existing multilingual thesaurus that already fully incorporates the U.S. LTER controlled vocabulary) as their thesaurus, but are web-service based, allowing them to be incorporated into a wide array of customized searching applications. Prototype tools can be seen at: Scientists seeking data should be able to efficiently and reliably locate ILTER datasets through searching… Goals: Identify solutions that will permit data discovery by a linguistically diverse set of researchers Identify a list of preferred terms and create a controlled vocabulary that could be used by sites in creating metadata documents Focus on ILTER-wide searches Want to facilitate cross-site synthesis on an international basis Translate preferred terms to multiple languages to create a multilingual controlled vocabulary Enable locating datasets through searching Enhance existing data discovery tools to make them work better through the organization of terms in ways that facilitate linkages among them Goals: Identify solutions that will permit data discovery by a linguistically diverse set of researchers Identify a list of preferred terms and create a controlled vocabulary that could be used by sites in creating metadata documents Focus on ILTER-wide searches Want to facilitate cross-site synthesis on an international basis Translate preferred terms to multiple languages to create a multilingual controlled vocabulary Enable locating datasets through searching Enhance existing data discovery tools to make them work better through the organization of terms in ways that facilitate linkages among them Solutions: US LTER adopted a 634 term controlled vocabulary in It has been incorporated in to EnvThes, the EnvEurope project’s multilingual thesaurus (Figure 2) Providing a rich context for terms by exploiting interrelationships with other terms helps overcome some of the ambiguity introduced through translation Solutions: US LTER adopted a 634 term controlled vocabulary in It has been incorporated in to EnvThes, the EnvEurope project’s multilingual thesaurus (Figure 2) Providing a rich context for terms by exploiting interrelationships with other terms helps overcome some of the ambiguity introduced through translation Products: The ILTER data discovery prototype interface is accessible at: More about the EnvThes thesaurus is available here: The US LTER Controlled Vocabulary Working Group has developed a number of applications and web services to support the use of the LTER Controlled Vocabulary. This page provides a listing of resources: Copies of code etc. are available on the LTER SVN site ( in the "vocab" tree. Products: The ILTER data discovery prototype interface is accessible at: More about the EnvThes thesaurus is available here: The US LTER Controlled Vocabulary Working Group has developed a number of applications and web services to support the use of the LTER Controlled Vocabulary. This page provides a listing of resources: Copies of code etc. are available on the LTER SVN site ( in the "vocab" tree. Focus of U.S. Controlled Vocabulary WG and EnvThes Multilingual and Thesaurus-Based Search Tools for International Long-Term Ecological Research Data Kristin Vanderbilt 1, Nicolas Bertrand 2, David Blankman 3, Xuebing Guo 4, Don Henshaw 5, Honglin He 4, Karpjoo Jeong 6, Eun-Shik Kim 7, Chau-Chin Lin 8, Sheng-Shan Lu 8, Margaret O’Brien 9, Éamonn Ó Tuama 10, Takeshi Osawa 11, John Porter 12, Wen Su 4 and other participants of the 2008 and 2012 ILTER workshops. 1 Sevilleta LTER, University of New Mexico, Albuquerque, NM, USA; 2 CEH Lancaster, Lancaster Environment Centre, Lancaster, United Kingdom; 3 Israel LTER, Jerusalem, Israel; 4 Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China; 5 H.J. Andrews LTER, U.S. Forest Service Pacific Northwest Research Station, Corvallis, OR, USA; 6 Department of Advanced Technology Fusion, Konkuk University, Seoul, Korea; 7 Department of Forestry, Environment, and Systems, College of Forest Science, Kookmin University, Seoul, Korea; 8 Taiwan Forestry Research Institute, Taipei, Taiwan; 9 Santa Barbara Coast LTER, Marine Science Institute, University of California at Santa Barbara, Santa Barbara, CA, USA; 10 Global Biodiversity Information Facility Secretariat, Copenhagen, Denmark; 11 National Institute for Agro-Environmental Sciences, Tsukuba, Japan; 12 Virginia Coast Reserve LTER, Department of Environmental Sciences, University of Virginia, Charlottesville, VA, USA Figure 1. Terms selected from more than one natural language vary in the extent to which they represent the same concepts. These variations can be seen as forming a continuum that ranges from exact equivalence in meaning through a) inexact equivalence, b) partial equivalence, c) single to many equivalence, to d) non-equivalence. Language and Search System Selection Search Form with Autocomplete Preferred Term Lists Translator and Search Enhancer Multilingual Thesaurus Metacat Data Catalog List of Datasets Figure 3. Prototype system for multilingual searching of ecological data. User selects the language they wish to use for search input and the Metacat to be used in the searching. Based on that selection an appropriate autocompletion list can be used to help guide the searcher towards terms in the thesaurus. The thesaurus can then be used to select additional terms such as synonyms or narrower terms, prior to preparing a search. The enhanced search “pathQuery” can then be sent to the Metacat, previously selected, after being translated into the language appropriate for a particular metacat. Select Languages for input and searching Groups of terms can be selected based on their relationship the input term and searched Select a Metacat to search Results for a search of the TFRI Metacat using Chinese Terms Figure 4. Sequence of web pages used to conduct a multilingual search for data. The search process can be simplified by adding “default” choices. Autocomplete helps guide choices to words in the Thesaurus Table 1. Inexact equivalence of US English and Japanese ‘wetlands’ concepts Prototype Multilingual Data Discovery System : A system for searching multilingual metacats has been developed A user selects a term from the controlled vocabulary in their language of choice, and the query will be translated via the multilingual thesaurus (Figure 3) to find datasets documented in other languages (Figure 4) Figure 2. Lexical resources such as taxonomys, thesauri and ontologies help provide linkages between concepts. These linkages can be exploited to interconnect concepts in different languages. The EnvThes thesaurus fully incorporates the U.S. LTER Thesaurus ( along with other resources. It also has translations of many of the terms into 13 languages, including French, German, Chinese, Japanese, Italian, Finnish, Polish, Portuguese, Swedish and English. Acknowledgements: The two workshops held to support development of the ILTER Multilingual Information Management System were held at Lake Taihu Field Station and Eastern China Normal University in Shanghai with the generous support of the Chinese Ecological Research Network (CERN) and the National Ecosystem Research Network of China (CNERN). The U.S. National Science Foundation supported travel of U.S. participants. Challenges: Often several terms are used for one concept in a single network, making it necessary to search on multiple equivalent terms One site uses CO2 another Carbon Dioxide, another Carbon-dioxide Carbon to Nitrogen Ratio, C:N, C:N Ratio, Carbon-to-nitrogen Ratio Searching on a broader term does not return narrower terms Searching on “Landscape Change” doesn’t find data sets related to “desertification” even though desertification is a kind of landscape change The meaning of a term in one language may fall along the continuum of exact equivalence to non-equivalence in another language (Figure 1 and Table 1) Challenges: Often several terms are used for one concept in a single network, making it necessary to search on multiple equivalent terms One site uses CO2 another Carbon Dioxide, another Carbon-dioxide Carbon to Nitrogen Ratio, C:N, C:N Ratio, Carbon-to-nitrogen Ratio Searching on a broader term does not return narrower terms Searching on “Landscape Change” doesn’t find data sets related to “desertification” even though desertification is a kind of landscape change The meaning of a term in one language may fall along the continuum of exact equivalence to non-equivalence in another language (Figure 1 and Table 1)