IDigBio Cyberinfrastructure Working Group Andréa Matsunaga (on behalf of the WG) iDigBio Summit, Gainesville October 23- 24, 2012.

Slides:



Advertisements
Similar presentations
12 October 2011 Andrew Brown IMu Technology EMu Global Users Group 12 October 2011 IMu Technology.
Advertisements

EMu New Features 2013 Bernard Marshall KE Software.
An iRODS-based Distributed Data Management System for CyberSKA Cameron Kiddle, Arne Grimstrup, Russ Taylor – University of Calgary Venkat Mahadevan, Erik.
IDigBio Minimum Information Standards for Scientific Collections (MISC)/Authority Files Working Group Gil Nelson Andréa Matsunaga (on behalf of the WG)
1 The IIPC Web Curator Tool: Steve Knight The National Library of New Zealand Philip Beresford and Arun Persad The British Library An Open Source Solution.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
Advanced Web 2012 Lecture 2 Sean Costain How the Web Works - Refresh Sean Costain 2012 The web is a matrix of servers that handle client requests.
Browsing the World Wide Web. Spring 2002Computer Networks Applications Browsing Service Allows one to conveniently obtain and display information that.
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System 1 Zaihua Ji Doug Schuster Steven Worley Computational.
File Systems (2). Readings r Silbershatz et al: 11.8.
Application for Internet Radio Directory 19/06/2012 Industrial Project (234313) Kickoff Meeting Supervisors : Oren Somekh, Nadav Golbandi Students : Moran.
Roles and Goals Greg Riccardi. iDigBio People University of Florida o Larry Page, Jose Fortes, Pamela Soltis, Bruce McFadden, Renato Figueiredo, Reed.
Linux Operations and Administration
1st iDigBio – BRIT Hackathon iDigBio Augmenting Optical Character Recognition Working Group (AOCR wg) February 13 – 14, 2013.
Migrating Applications to Windows Azure Virtual Machines Michael Washam Senior Technical Evangelist Microsoft Corporation.
Overview of the ODP Data Provider Sergey Sukhonosov National Oceanographic Data Centre, Russia Expert training on the Ocean Data Portal technology, Buenos.
Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.
Project Rickshaw SEARCH - FIND - GO. Project Rickshaw TEAM MEMBERS KEVIN AUGUSTINO – MATT FOX – DAVID MOORE SPONSORS KARASU TECHNOLOGIES - ERIK PAUL -
Advanced Computing and Information Systems laboratory iDigBio Cloud and Appliances: Concept, Processes and Progress Jose Fortes (on behalf of the iDigBio.
Copyright © Texas Education Agency, All rights reserved.1 Web Technologies Web Administration.
Discussion and conclusion The OGC SOS describes a global standard for storing and recalling sensor data and the associated metadata. The standard covers.
Update from the Entomological Society of America (ESA) Systematics, Evolution, and Biodiversity (SysEB) Section Symposium: From Voucher.
© 2005,2006 NeoAccel Inc. Partners Presentation SSL VPN-Plus 2.0 Quick Start Guide.
Customized cloud platform for computing on your terms !
Building service testbeds on FIRE D5.2.5 Virtual Cluster on Federated Cloud Demonstration Kit August 2012 Version 1.0 Copyright © 2012 CESGA. All rights.
EMu New Features 2015 Ian Brown. EMu 4.2 Edit in a single language 4.2 (Previously for multi-lingual systems all languages had to be edited simultaneously)
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
Hands-On Virtual Computing
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
Codeigniter is an open source web application. It occupies a very small amount of space in the memory and is most useful for developers who aim to develop.
Public Domain/Open Source Software Evaluation Photo Organizer.
DDN & iRODS at ICBR By Alex Oumantsev History of ICBR  Campus wide Interdisciplinary Center for Biotechnology Research  Core Facility  Funded by the.
University of Florida Florida State University
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
Digital Library Syllabus Uploader Will Cameron CSC 8530 October 19, 2006 Project Presentation 2.
PROGRESS: ICCS'2003 GRID SERVICE PROVIDER: How to improve flexibility of grid user interfaces? Michał Kosiedowski.
Experts Workshop on the IPT, v. 2, Copenhagen, Denmark The Pathway to the Integrated Publishing Toolkit version 2 Tim Robertson Systems Architect Global.
Large Scale Parallel File System and Cluster Management ICT, CAS.
Ubuntu, SUSE, OpenSUSE, CentOS & Oracle EL + hundreds on VM Depot Bring your own framework! Ecosystem Supported Microsoft 1st Party Support.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Omeka software exploration for LOR repository. Omeka  Omeka is opensource – that means there is no cost to use the software  Omeka can be hosted on.
IODE Ocean Data Portal - ODP  The objective of the IODE Ocean Data Portal (ODP) is to facilitate and promote the exchange and dissemination of marine.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Usage of virtualization in gLite certification Andreas Unterkircher.
Windows Azure Virtual Machines Anton Boyko. A Continuous Offering From Private to Public Cloud.
Med-CORDEX database Med-CORDEX database = = netcdf files+ their info = File System + relational database = XFS+ mysql db = file server + LAMP server Linux,
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Tools and techniques for managing virtual machine images Andreas.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
Zvezdan Pavković. Storage Non-Persistent Storage Persistent Storage Easily add additional storage. Networking Internal and Input Endpoints configured.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
A.Abhari CPS1251 Topic 1: Introduction to Computers Computer Hardware Computer components Connecting Computers Computer Software Operating System (OS)
Oct HPS Collaboration Meeting Jeremy McCormick (SLAC) HPS Web 2.0 OR Web Apps and Databases (Oh My!) Jeremy McCormick (SLAC)
PROGRESS: GEW'2003 Using Resources of Multiple Grids with the Grid Service Provider Michał Kosiedowski.
Where are my files? Discoveries in establishing a digital archive workflow Sally McDonald Archivist/Librarian Western History/Genealogy, Denver Public.
WMarket For Adminstrators Manual Installation. Basic Dependencies To install your own WMarket instance, you are required to install the following software:
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
Python Driven Sensor Observation Service Benjamin Welton NASA USRP.
Building GoDaddy.com’s Compute Cloud
Using E-Business Suite Attachments
Working With Azure Batch AI
Data Bridge Solving diverse data access in scientific applications
Introduction to Data Management in EGI
OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.
CS6604 Digital Libraries IDEAL Webpages Presented by
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System Zaihua Ji Doug Schuster Steven Worley Computational.
Cloud computing mechanisms
PHP and Forms.
Web Application Development Using PHP
Presentation transcript:

iDigBio Cyberinfrastructure Working Group Andréa Matsunaga (on behalf of the WG) iDigBio Summit, Gainesville October , 2012

Advanced Computing and Information Systems laboratory Cyberinfrastructure WG ure_Working_Group ure_Working_Group Established as an outcome of the iDigBio IT Standards Workshop The focus of the initial group will be on the iDigBio data ingestion procedures related to Application Programming Interface (API) or appliance specification, implementation and test Produce material with concrete data ingestion use cases from TCNs, provide input about the existing cyberinfrastructure, produce data ingestion requirements, and help evaluate iDigBio services and appliances implementation 2

Advanced Computing and Information Systems laboratory iDigBio Data Portal v0 API Retrieval-only operations (REST GET operations) Endpoints: List all endpoints: List collections: List specimens: List media metadata: List media objects: Individual records: Example: 231ed28a5bcahttp://api.idigbio.org/v0/records/eac2e4ec-5dbb-4c34-b56f- 231ed28a5bca {"idigbio:data":{ "dwc:county":"Liberty", "dwc:recordedBy":"Loran C. Anderson", "dwc:scientificNameAuthorship":"(Nees) Small", "id":" ”dwc:eventDate":" :00:00.0", "dwc:scientificName":"Yeatesia viridiflora"}, "idigbio:etag":"c3113b3aa2612ce8af46cde267c355ba ", "idigbio:links":{ "thumbnailurl":" "mediarecord":" "recordset":" "idigbio:uuid":"eac2e4ec-5dbb-4c34-b56f-231ed28a5bca"} 3

Advanced Computing and Information Systems laboratory iDigBio Data Portal v0 Presentation 4

Advanced Computing and Information Systems laboratory Virtual Private Server (VPS) Total: 7 VMs, 17 cores, 39GB RAM, 1.7TB storage Symbiota: 2VMs 1 production, 2 cores, 8GB RAM, 200GB disk, 1 pub IP, apache, php, java, MySQL, SVN, tomcat, 1user 1 for FP testing/development, 2 cores, 8GB RAM, 200GB disk, 1 pub IP, apache, php, java, MySQL, SVN, tomcat, 3 users FilteredPush: 2VMs 1 core, 1024MB RAM, 40GB storage, fp-lite SCAN testbed 2 cores, 4GB RAM, 80 GB storage, mysql, apache, php, tomcat for Symbiota, Morphbank, and FilteredPush Vertnet: 1VM 2 cores, 2 GB RAM, 500 GB storage, 1 pub IP, CentOS6, 5 users, Tomcat, IPT Biogeomancer: 1VM 4 core, 8GB RAM, 500GB storage, 1 public IP, apache, tomcat, postgres and postgis, 3 users aOCR hackathon: 1VM 4 cores, 8 GB RAM, 250 GB storage, Linux (Ubuntu 12.04), Java, PHP, Python, Perl, MySQL, Apache HTTP server, FTP server, ImageMagick, Tesseract, OCRopus, GOCR/JOCR, ZBar 5

Advanced Computing and Information Systems laboratory Databases/DwC-A Examined DatasetDateFormatOccurrencesMediaTaxon TCN-BryophytesJun/01/2012Symbiota-MySQL TCN-LichensJun/01/2012Symbiota-MySQL TCN-MycologyJun/01/2012Symbiota-MySQL TCN-InvertNetMar/14/2012DwC-A TCN-TTD-AMNHJun/21/2012AMNH-MySQL TCN-TTD-NYBGApr/26/2012CSV TCN-PALEONICHESJul/12/2012Specify-MySQL FLMNH-IchthyologyDec/19/2011DwC-A FLMNH-IchthyologyApr/27/2012DwC-A ValdostaApr/16/2012Specify-MySQL MorphbankNov/22/2011DwC-A MorphbankJun/29/2012DwC-A Total 5,338, ,528640,941 6

Advanced Computing and Information Systems laboratory Relational Databases 7 Specimen omoccurrences Specimen omoccurrences Taxon taxa Taxon taxa Media Images Media Images Collection omcollections Collection omcollections Person omcollectors Person omcollectors detHis Dataset Exsiccati title, number, links Dataset Exsiccati title, number, links Symbiota Specify EMu Specimen omoccurrences Specimen omoccurrences Taxon mnl U flora_mnl Taxon mnl U flora_mnl Media images Media images Collection institution Collection institution Person collector Person collector subfamily tribe genus species CollectingEvent colevent CollectingEvent colevent TTD-AMNH Geography locality Geography locality hostFor parent Geography esites Geography esites Specimen collectionobject Specimen collectionobject Taxon taxon Taxon taxon Media attachments Media attachments Collection collection Collection collection detHis Geology paleocontext Geology paleocontext CollectingEvent colevent CollectingEvent colevent parent hybrid Person agent Person agent Geography Locality U geography Geography Locality U geography parent Specimen ecatalogue Specimen ecatalogue Taxon etaxonomy Taxon etaxonomy Media emultimedia Media emultimedia Geology esites Geology esites Person eparties Person eparties CollectingEvent ecollectionevents CollectingEvent ecollectionevents

Advanced Computing and Information Systems laboratory Image ingestion appliance First instance of an application built upon the iDigBio APIs Enables easy, reliable bulk-ingestion of media records User selects image directory folder; appliance takes care of Traversing sub-directories Uploading individual images through iDigBio API Transparently recovering from various failure conditions Providing to user the mapping of file name to iDigBio URLs Ingested images can be accessed by the URLs Appliance runs a lightweight Web server to expose a Web- based UI to users 8

Advanced Computing and Information Systems laboratory Image Ingestion APIs Used By Appliance 9 Create RecordSet URL: RecordSet collection level endpoint e.g. POST Request content: JSON ["idigbio:data"]["ac:variant"]: "IngestionTool" ["idigbio:providerId"]: Currently, client generated random UUID Response content: JSON ["idigbio:uuid"]: RecordSet iDigBio UUID

Advanced Computing and Information Systems laboratory Image Ingestion APIs Used By Appliance 10 Create MediaRecord URL: MediaRecord collection level endpoint e.g. POST Request content: JSON ["idigbio:data"]["ac:variant"]: "IngestionTool" ["idigbio:data"]["dc:rights"]: One of {"cc0", "cc-by", "cc-by-sa", "cc-by-nc", "cc-by-nc-sa"} ["idigbio:data"]["idigbio:localpath"]: Full local path ["idigbio:data"]["idigbio:relationships"]["recordset"]: The iDigBio UUID of the RecordSet the media record belongs to, from the previous step. ["idigbio:providerId"]: User defined GUID prefix + (full local path or file name) ["idigbio:data"]["idigbio:relationships"]["owner"]: Organizational owner of the record, otherwise the signed-in user is saved as the owner. (Optional) Response content: JSON ["idigbio:uuid"]: iDigBio MediaRecord UUID

Advanced Computing and Information Systems laboratory Image Ingestion APIs Used By Appliance 11 Upload Media Object URL: API sub-collection level endpoint e.g. POST 4ddd-b433-ab99141ed405/media (the UUID in the middle is the “iDigBio MediaRecord UUID" returned in the previous step) Request content: Binary multipart/form-data with the image as the "file" Response content: JSON ["idigbio:links"]["media"]: The URL where the image is accessible online ['idigbio:uuid']: The Media Access Point UUID ['idigbio:data']['idigbio:imageEtag']: The hash (MD5) of the image stored at the server, which is compared with the MD5 of the local image to verify the success of the upload

Advanced Computing and Information Systems laboratory Image Ingestion v1 – Call for beta-testers 12 Features already implemented Reliable uploads Automatically retry of failed upload of individual files Keep local record of unsuccessful transfers ; resume after failure of network or service Skip already uploaded files when the same directory is uploaded multiple times Allow user to specify a license for the media object being uploaded Save, Export (local path : URL) mappings for individual files in the batch upload Contact: Andréa Matsunaga Features to add Integrated user authentication Currently, application keys can be provided to beta testers Improved user interface, error reporting, and encoded best practices on the UI Feedback from early adopters Help guide UI improvements and prioritize features to be incorporated Help fine-tune performance (e.g. parallel uploads) and failure handling

Advanced Computing and Information Systems laboratory Current Members Andréa Matsunaga (Co-Lead), iDigBio IT Joanna McCaffrey (Co-Lead), iDigBio Program Manager Renato Figueiredo, iDigBio IT Alex Thompson, iDigBio IT Jiangyan Xu, iDigBio IT Guillaume Jimenez, iDigBio IT Casey McLaughlin, iDigBio IT Michael Bevans, NYBG Photographer James Beach, Paleoniches TCN IT/Co-PI, University of Kansas, Specify Andrew Brown, KE Software Edward Gilbert, Lichens & Bryophytes and SCAN TCNs IT, Symbiota Corinna Gries, Lichens & Bryophytes TCN, PI Tony Kirchgessner, Tri-trophic TCN IT, NYBG Nahil Sobh, InvertNet TCN IT, Co-PI Omar Sobh, InvertNet TCN IT Ex Officio: José Fortes, iDigBio Director for Computational Activities 13 Thank you!