IPlant's Taxonomic Name Resolution Service Naim Matasci BIO5 / The iPlant Collaborative tnrs.iplantc.org.

Slides:



Advertisements
Similar presentations
AUSTRALIA’S VIRTUAL HERBARIUM
Advertisements

GUID-1 Workshop Welcome and Introduction Donald Hobern GBIF Program Officer for Data Access and Database Interoperability February 2006.
Virtualizing Entomology Collection Student: Di Wang (Alan) Sponsors: John Marris: Curator, Entomology Research Museum Stuart Charters: Department of Applied.
BGBM - Biodiversity Informatics04 June 2013 How the specimen data is organised and published at BGBM.
Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.
The iPlant Tree of Life Project and Toolkit: Building a Cyberinfrastructure for Plant Science Research Naim Matasci The iPlant Collaborative Evolution.
Scaling up The International Plant Names Index (IPNI) James A. Macklin Harvard University Herbaria Paul J. Morris Harvard University Herbaria & Museum.
Biodiversity Heritage Library by Connie Rinaldo. Overview History EOL/BHL: WHY? Members/Collaborators Process Governance Sustainability: Legal and Financial.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Plant names: obstacles and solutions
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer September G A Darwin-Core Archive solution to publishing and.
Virtual Federal Herbarium Prototype. What is a virtual federal herbarium? A “library” of specimen data and images of plants and fungi A searchable public.
1 iPlant Data Store (iDS) Supporting the Lifecycle of Data Nirav Merchant 1.
Dan Masiga Molecular Biology and Biotechnology Department International Centre of Insect Physiology and Ecology, Nairobi, Kenya BARCODE Data Standard The.
Introduction to Computers
PESI Pan-European Species-directories Infrastructure European GBIF nodes Meeting — Paris, 4 April 2011 Walter Berendsohn (based on presentation by Yde.
BISQUE: Enabling Cloud and Grid Powered Image Analysis Ramona Walls iPlant Collaborative
The Macroalgal Digitization Project Chris Neefus, Department of Biological Sciences University of New Hampshire, Durham, New Hampshire.
Use case lessons: Components of the SEEK architecture Robert K. Peet University of North Carolina.
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
Enabling Cloud and Grid Powered Image Phenotyping Nirav Merchant iPlant Collaborative
Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences Naim Matasci BIO5 / The iPlant Collaborative EEB, University of Arizona Oct 4, 2011.
Synopsis of current BIEN and Enquist projects managed by Martha iPlant 2014.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
The TNRS: a Taxonomic Name Resolution Service for Plants Naim Matasci The iPlant Collaborative iEvoBio 2011 Jun 21-22,
IPlant's Taxonomic Name Resolution Service Naim Matasci BIO5 / The iPlant Collaborative.
Enabling Cloud and Grid Powered Image Phenotyping Martha Narro iPlant Collaborative Adapted.
TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson.
Scratchpads The virtual research environment for biodiversity data Simon Rycroft, Dave Roberts, Vince Smith, Alice Heaton, Katherine Bouton, Laurence Livermore,
IPlant Collaborative Hands-on Cyberinfrastructure Workshop – Part 2 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 29, 2015,
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
The iPlant Collaborative Using iPlant for sharing, managing, and analyzing ecological data Ramona Walls Presented at ESA 2014 – Ignite session August 12,
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Data Store.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop - Part 1 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 28, 2015,
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Overview PlantCollections – Publish information about public garden collections – Using existing infrastructure Morphbank – Goals and capabilities of.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store – Managing Your ‘Big’ Data.
SELTMANN 2, Katja; PRIETO-MARQUEZ 1,*, Albert; RONQUIST 2, Fredrik; RICCARDI 3, Gregory A.; DEANS 5, Andy; JAMMIGUMPULA 2, Neelima; MAST 1, Austin; WINNER.
Spotlight on the Global Plants Initiative
Enabling Cloud and Grid Powered Image Phenotyping
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.
CAAB and taxon management at CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart
HISCOM An Australian Virtual Herbarium Jim Croft Australian National Herbarium.
Royal Botanic Garden Edinburgh Funded mostly by Scottish Government Martin Pullan – Biodiversity informatics David Harris – Herbarium Curator.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop BISQUE.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
AUSTRALIA’S VIRTUAL HERBARIUM A national collaborative model for integrated access to distributed biological information Australian National Herbarium.
Enabling Plant Sciences Research with the iPlant Discovery Environment and Condor Juan Antonio Raygoza Garay, Sonya Lowry, John Wregglesworth.
31 st May 2007Image Management in Bio- and Environmental Sciences: New Directions Julia Hoare Digitising Linnaeus: developing global access to taxonomic.
The challenge of biodiversity: Plot, organism and taxonomic databases Robert K. Peet University of North Carolina The National Plots Database Committee.
Globally Unique Identifiers: What, why, when, which and what now? Dave Thau University of Kansas
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen Senior Programme Officer, ECAT 3 Oct th Nodes Meeting.
Centre for Environmental Data and Recording - CEDaR Established in 1995 to collect, collate and disseminate all biodiversity and geodiversity records for.
Biodiversity Heritage Library: A Successful Collaboration, A Fully Open Access Collection Marty Schlabach Mann Library, Cornell University Upstate New.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Freeland, LAPI II, 18 NOV 2008 Digital Libraries for Science: Botanicus & Biodiversity Heritage Library Chris Freeland Director of Bioinformatics, Missouri.
Papua New Forest Research Institute
CyVerse Tools and Services
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Biodiversity Informatics 101
Herbaria source of new plant species
Big Data Needs Little CRUD:
Interactive Keys for Plant Identification
Herbaria source of new plant species
Presentation transcript:

iPlant's Taxonomic Name Resolution Service Naim Matasci BIO5 / The iPlant Collaborative tnrs.iplantc.org

What is iPlant?

Empowering a New Plant Biology

TMU* Growth of Biological Collections (1600 – 2012) *TMU: Totally Made Up

If you can't find it, it doesn't exist

Data Reuse What's the correlation between leaf morphology and leaf economy (R. Walls)? Evolution of pit domatia (M. Donoghue)

iPlant Data Store Based on iRODS – Metadata driven – Storing, Sharing and Distributing Redundant (mirrors at TACC and UoA) Really, really, really big (6 PB + 40 PB LTS) Really, really, really fast

100GB: 29m15s iPlant Data Store Performance UC Berkeley to iDS 1 GB / 17.5 seconds Desktop PC (UA): Mac with 7.2K Internal Hard Drive External Drive: USB 2.0: 5.4k Hard Drive Flash Drive: USB 2.0 Patriot XT

PhytoBisque features Rich internet application (completely web based) Draws upon features from popular large scale photo sharing sites and high resolution aerial imagery (google maps) Ability to import and export over 100+ image formats, movies Ability to import extremely large image sets using iPlant data store Can display 20Kx20K image using standard web browser Manage data sets with tags, metadata management Utilizes distributed computing (connected to iPlant execute environment)

Taxonomic uncertainty 1.Non-existent names Misspellings Contamination Annotations Morphospecies Digitization issues (frame shifts, character encoding)Lexical variants (digitization conventions) 2.Synonymy Nomenclatural synonyms Taxonomic synonyms / concepts 3.Misidentifications, incomplete identifications

Non-existent names: Herbarium specimens *New World plant specimens, 34 herbaria, simple match against IPNI and TROPICOS, excluding authors Total specimens:1.1 million Unique species names:53,052 Published names (legitimate & illegitimate):44,532 Misspelled names:9371 (18%) Specimens with misspelled names:101,237 (9%)

Taxonomic Name Resolution Service Computer assisted standardization of plant names Corrects spelling errors and alternative spellings to a standard list of names Convert out-of-date names to currently accepted names

Future More sources – Standard source import with DwC support Better performance TNRastic API Integration with Global Names components

Web: Code: ource/TNRS API (provisional): TNRastic API:

Brad Boyle Brian Enquist Juan Antonio Raygoza Garay Nicole Hopkins Zhenyuan Lu Martha Narro Shannon Oliver William Piel Jill Yarmchuk Bob Magill (Missouri Botanical Garden) Chris Freeland (Missouri Botanical Garden) Chuck Miller (Missouri Botanical Garden) Peter Jorgensen (Missouri Botanical Garden) Amy Zanne (University of Missouri, St. Louis) Peter Stevens (Missouri Botanical Garden) Jay Paige (Missouri Botanical Garden) Bob Peet (University of North Carolina at Chapel Hill) Paul Morris (Harvard University) Alan Paton (Kew Royal Botanic Gardens and their International Plant Names Index) Tony Rees (Commonwealth Scientific and Industrial Research Organisation) Michael Giddens ( Dmitry Mozzherin (Global Biodiversity Information Facility) David Remsen (Global Biodiversity Information Facility) David Patterson (Encyclopedia of Life) Cam Webb (Harvard University) Missouri Botanical Garden (Tropicos) Funding provided by the National Science Foundation Plant Cyberinfrastructure Program (grant #DBI ).