Advanced Computing and Information Systems laboratory iDigBio Cloud and Appliances: Concept, Processes and Progress Jose Fortes (on behalf of the iDigBio.

Slides:



Advertisements
Similar presentations
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Advertisements

Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
IDigBio Minimum Information Standards for Scientific Collections (MISC)/Authority Files Working Group Gil Nelson Andréa Matsunaga (on behalf of the WG)
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Trends in LIS Education Michèle V. Cloonan Dean & Professor Simmons College GSLIS.
An Introduction to DuraCloud Carissa Smith, Partner Specialist Michele Kimpton, Project Director Bill Branan, Lead Software Developer Andrew Woods, Lead.
The Role of Small Herbaria in Large Digitization Projects Chris Neefus, Albion Hodgdon Herbarium (NHA) University of New Hampshire, Durham, New Hampshire,
NSF EF Welcome to Summit III University of Florida Florida State University.
Roles and Goals Greg Riccardi. iDigBio People University of Florida o Larry Page, Jose Fortes, Pamela Soltis, Bruce McFadden, Renato Figueiredo, Reed.
Development Principles PHIN advances the use of standard vocabularies by working with Standards Development Organizations to ensure that public health.
1st iDigBio – BRIT Hackathon iDigBio Augmenting Optical Character Recognition Working Group (AOCR wg) February 13 – 14, 2013.
State Geological Survey Contributions to the National Geothermal Data System.
Building a Data Sharing Community. The Vertebrate Networks Est. 1999, collections (2011) Est collections (2011) Est collections.
Drivers for a PRAGMA Biodiversity Science Expedition Reed Beaman Florida Museum of Natural History University of Florida.
U.S. Department of the Interior U.S. Geological Survey Biodiversity Information Serving Our Nation (BISON): A National Resource for Species Occurrence.
DuraCloud A service provided by Sandy Payette and Michele Kimpton.
Update from the Entomological Society of America (ESA) Systematics, Evolution, and Biodiversity (SysEB) Section Symposium: From Voucher.
Biosciences Working Group Final Update for PRAGMA 25 Wilfred W. Li, Ph.D., UCSD, USA Habibah Wahab, Ph.D., USM, Malaysia Hosted by CNIC, CAS Beijing, China.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Research Coordination, Scientific Community Outreach, Education & Public Outreach Pamela S. Soltis Co-PI & iDigBio Director for Research FLMNH, University.
IDigBio Cyberinfrastructure Working Group Andréa Matsunaga (on behalf of the WG) iDigBio Summit, Gainesville October , 2012.
The Macroalgal Digitization Project Chris Neefus, Department of Biological Sciences University of New Hampshire, Durham, New Hampshire.
SCAN Survey Results: Engaging the Public with Insect Digitization Workflows Dr. Melody Basham Hasbrouck Insect Collection Outreach Specialist Project Director.
Advanced Computing and Information Systems laboratory The case for UF in PRAGMA Jose Fortes (also on behalf of Renato Figueiredo and Reed Beaman)
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
University of Florida Florida State University
 How are changes in distribution patterns of lichens and bryophytes over time correlated with man-made environmental changes?  How accurately can we.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Digitization of Natural History Collections (DIGIT) Larry Speers Program Officer Digitization of Natural History Collections Data TDWG Annual Meeting Oct.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
ADBC: Background, Broader Impacts and Opportunity Anne Maglia Program Director, Division of Biological Infrastructure National Science Foundation
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio Summit meeting Judith E. Skog Biological Sciences Directorate Office of the Assistant Director Emerging Frontiers Division.
Context: The Strategic Plan for Establishing the Network Integrated Biocollections Alliance Judith E. Skog, Office of the Assistant Director, Biological.
Biomedical Informatics Research Network BIRN Workflow Portal.
OOI-CYBERINFRASTRUCTURE OOI Cyberinfrastructure Education and Public Awareness Plan Cyberinfrastructure Design Workshop October 17-19, 2007 University.
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
A Technical Overview Bill Branan DuraCloud Technical Lead.
Amazon Basin Biodiversity Information Facility – ABBIF.
NSDL STEM Exchange: Technical Overview and Implications for Active Dissemination of Federally Funded Resources Across Implementation Systems.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
CONTENTdm A proven solution September A complete digital collection management software solution Stores, manages and provides access for all digital.
INDIGO Outreach and Exploitation process Peter Solagna, Matthew Viljoen EGI.eu.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
12 th Meeting of the GBIF Participant Nodes Committee 6-7 October 2013, Berlin, Germany Towards a generic work programme for a Node Olaf Bánki Senior Programme.
GISELA & CHAIN Workshop Digital Cultural Heritage Network
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
CyVerse Tools and Services
Tools and Services Workshop
Joslynn Lee – Data Science Educator
API Documentation Guidelines
Encouraging the Proliferation of Digital Data
Cyberinfrastructure for the Life Sciences
WIS Strategy – WIS 2.0 Submitted by: Matteo Dell’Acqua(CBS) (Doc 5b)
WGISS Connected Data Assets Oct 24, 2018 Yonsook Enloe
Large Scale Distributed Computing
iDigBio API Hackathon ‘15 Introductory Webinar
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Bird of Feather Session
Presentation transcript:

Advanced Computing and Information Systems laboratory iDigBio Cloud and Appliances: Concept, Processes and Progress Jose Fortes (on behalf of the iDigBio IT team)

Advanced Computing and Information Systems laboratory iDigBio (idigbio.org) Goal: making data and images for millions of biological specimens available in electronic format for the biological research community, agencies, students, educators, and public Mission: leadership, coordination, and outreach in digitization of collections by implementing resources for communication, use of technology, access to data, research and education. A resource: permanent cloud computing infrastructure to link biological data from collections across the USA to use search and analytics tools to mine and reference data 2

Advanced Computing and Information Systems laboratory Seven Thematic Collections Networks (TCNs) InvertNet: An Integrative Platform for Research on Environmental Change, Species Discovery and Identification (Illinois Natural History Survey, University of Illinois) invertnet.orginvertnet.org Plants, Herbivores, and Parasitoids: A Model System for the Study of Tri-Trophic Associations (American Museum of Natural History) tcn.amnh.orgtcn.amnh.org North American Lichens and Bryophytes: Sensitive Indicators of Environmental Quality and Change (U of Wisconsin) symbiota.org/nalichens/index.php symbiota.org/bryophytes/index.phpsymbiota.org/nalichens/index.phpsymbiota.org/bryophytes/index.php Digitizing Fossils to Enable New Syntheses in Biogeography-Creating a PALEONICHES-TCN (U of Kansas) The Macrofungi Collection Consortium: Unlocking a Biodiversity Resource for Understanding Biotic Interactions, Nutrient Cycling and Human Affairs (New York Botanical Garden) Mobilizing New England Vascular Plant Specimen Data to Track Environmental Change (Yale University) Southwest Collections of Anthropods Network (SCAN): A Model for Collections Digitization to Promote Taxonomic and Ecological Research (Northern Arizona University) More than 130 participating institutions

Advanced Computing and Information Systems laboratory iDigBio IT Vision Cyberinfrastructure to enable the collaborative creation, integration and management of digitized biocollections, and their use in scientific research, education and outreach. Visible as a collection of persistent Internet-accessible services, data and resources for biocollection “producers”, “consumers” and “service providers” cyberinfrastructure providers national/global data aggregators

Advanced Computing and Information Systems laboratory CI Stakeholders Domain Data Producers Infrastructure Providers Domain Service Providers Domain Data Consumers National/Global Data Aggregators 5 iDigBio Museums Amazon WS Google Microsoft Azure DataONE TCNs Collectors GBIF ALA Researchers Amazon Turk Georeferencing Imaging services Data quality Mapping EOL TCNs Government Translation OCR BISON NESCent Data Conservancy iPlant Teachers Citizens TCNs Domain-level data

Advanced Computing and Information Systems laboratory Evolution of iDigBio capabilities 6 Time Data ingestion Data access, provision and visualization Provide and enable data feedback Data linking and federation Process and visualize integrated data Increasing storage and server hosting in support of the above Increasing number of appliances in support of the above Web site for interaction with public, community, education and above

Advanced Computing and Information Systems laboratory iDigBio.org 7 News Events Forums Documents Links Data portal Working groups

Advanced Computing and Information Systems laboratory Building the iDigBio Cloud Useful services/APIs (programmatic and web-based) Scalable object storage and information processing Digitization-oriented virtual appliances Standards, proven solutions and software reuse if possible Input from stakeholders (surveys, summit, workshops, …) Needs: storage, server hosting, data feedback transformations …

Advanced Computing and Information Systems laboratory iDigBio data portal v0 at work

Advanced Computing and Information Systems laboratory iDigBio Data Portal: Tutorial

Advanced Computing and Information Systems laboratory iDigBio data portal v0: search

Advanced Computing and Information Systems laboratory iDigBio data portal v0: record info

Advanced Computing and Information Systems laboratory Storage hosting “… able to facilitate storage of images on a case-by-case basis.” “iDigBio currently does not provide archival storage, and hosting of images in iDigBio should not be seen as such.” currently approximately 30 TB space committed to storage for the dissemination of images and derivatives produced by TCNs: North American Lichens and Bryophytes The Macrofungi Collection Consortium Plants, Herbivores, and Parasitoids If you would like iDigBio to store and disseminate your TCN data as well, please contact us. iDigBio also provides limited storage space along with its hosting services, this space currently totals approximately 8TB of storage. 13

Advanced Computing and Information Systems laboratory Appliances, Virtual Private Servers iDigBio packages and distributes pre-configured software tools and environments as software “appliances” Deployment in end-user or in a hosted server environment iDigBio cloud hosts virtual private servers exposing services to the bio-collections community Proposal requests through iDigBio portal interface Virtual private servers on iDigBio cloud: Symbiota, FilteredPush, VertNet, Biogeomancer Virtual appliances Under development: Media ingestion; augmenting-OCR workshop and hack-a-thon Community interactions: Image-to-record services (OCR, NLP, duplicate discovery, workflow), Kepler Kurator, Specify 14

Advanced Computing and Information Systems laboratory Short term Ingestion appliance Web-based UI Images captured (e.g. HD/flash media) /images/1/100.tif /1/101.tif /2/200.tif … iDigBio object Storage cloud (Swift) Batch upload, Cloud APIs Web server Cloud client File interface /1/100.tif GUID1 /1/101.tif GUID2 Facilitate data ingestion, interface with iDigBio

Advanced Computing and Information Systems laboratory Initial Setup 16

Advanced Computing and Information Systems laboratory Initial Screen – Sign In 17

Advanced Computing and Information Systems laboratory Fill out Sign In Form 18

Advanced Computing and Information Systems laboratory Settings Pane After Signing In 19

Advanced Computing and Information Systems laboratory Fill Out Settings 20

Advanced Computing and Information Systems laboratory Move Next to Uploader Pane 21

Advanced Computing and Information Systems laboratory Copy and Paste Path, Upload 22

Advanced Computing and Information Systems laboratory Upload Started 23

Advanced Computing and Information Systems laboratory Case 1: Ingestion Successful on the First Attempt 24

Advanced Computing and Information Systems laboratory Upload Finishes Successfully 25

Advanced Computing and Information Systems laboratory Case 2: Ingestion Successful After Several Attempts 26

Advanced Computing and Information Systems laboratory Network Failed - Upload Aborted 27

Advanced Computing and Information Systems laboratory Upload Resumes 28

Advanced Computing and Information Systems laboratory Upload Finished with Some Errors 29

Advanced Computing and Information Systems laboratory Resume Again 30

Advanced Computing and Information Systems laboratory Now Entire Batch is Successful 31

Advanced Computing and Information Systems laboratory Summary iDigBio cloud Service-oriented, standards-based, focused on ADBC needs Scalable data management and information processing using standard interfaces, data formats, protocols, tools Toolboxes as appliances Evolving collection of community-selected tools Built-in interfaces for effortless iDigBio integration Embed best practices and standards in biocollections work After the first year we have functional web site, data portal, storage and server hosting services Ingestion appliances and ingestion APIs for images and data soon available For feedback: and “Contacts” at

Advanced Computing and Information Systems laboratory Linking Collections to… Ecology Paleontology Genomics Living Collections Other repositories PRAGMA activities

Advanced Computing and Information Systems laboratory Acknowledgments National Science Foundation Judith Skog and Anne Maglia iDigBio IT team at U. of Florida Renato Figueiredo & Andrea Matsunaga, Senior Personnel Alex Thompson, Kevin Love & Matt Collins, IT Experts Jiangyan Xu, Graduate student iDigBio IT team at Florida State U. Greg Riccardi, Director for Informatics Austin Mast, Senior Personnel Gil Nelson & Deb Paul, Digitization Specialists Guillaume Pierre, IT expert 34