Open Source Technologies at the National Agricultural Library Ursula Pieper IT Specialist – Web Team Lead National Agricultural Library Agricultural Research.

Slides:



Advertisements
Similar presentations
Project E: Citation Understanding the problem space Progress so far How you can contribute : afternoon session Lessons learned and challenges ahead Acknowledgements:
Advertisements

White House New Media & Open Source Software Macon Phillips White House New Media.
Connecticut State Data Center at the Map and Geographic Information Center - MAGIC Connecticut State Data Center Data Collaborator for Planning, Analysis,
ISI Web of Knowledge – Innovative Solutions ISI Web of Knowledge / Web of Science – coming developments BIOSIS Archive Web Citation Index – New product.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
CNRIS CNRIS 2.0 Challenges for a new generation of Research Information Systems.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Doug Nebert, Senior Advisor for Geospatial Technology, System-of-Systems Architect FGDC Secretariat.
Administration & Workflow
IDENTIFIERS & THE DATA CITATION INDEX DISCOVERY, ACCESS, AND CITATION OF PUBLISHED RESEARCH DATA NIGEL ROBINSON 17 OCTOBER 2013.
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
AgriDrupal - a “suite of solutions” for agricultural information management and dissemination, built on the Drupal CMS; - the community of practice around.
Doug Nebert Senior Advisor for Geospatial Technology CSS, FGDC Secretariat.
Open and Shared Information System OaSIS. SUNCOM’s Standard Business Process Centralized ordering for the enterprise Maintenance of an enterprise inventory.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Ricerca Distribuita Semantica Protocolli opensource per la condivisione di risorse online.
Get More Value from Your Reference Data—Make it Meaningful with TopBraid RDM Bob DuCharme Data Governance and Information Quality Conference June 9.
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
DPubS: An Open Source Electronic Publishing System Sarah E. Thomas Cornell University Library CNI December 2005.
Research Data Management At the Smithsonian Using SIdora Nano Tech Working Group May 15, 2014.
Updates from EOSDIS -- as they relate to LANCE Kevin Murphy LANCE UWG, 23rd September
DSpace. TM 2 Agenda  Introduction to DSpace  DSpace community  Institutional Repository  Easy to add/find content in DSpace  Building Online Communities.
The Department of Energy’s Public Access Solution Giving Voice to Energy and Science R&D Results Jeffrey Salmon Deputy Director for Resource Management.
IUScholarWorks is a set of services to make the work of IU scholars freely available. Allows IU departments, institutes, centers and research units to.
University of California Libraries Digital library building blocks: Empowering libraries in an increasingly competitive online information space Daniel.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
A survey based analysis on training opportunities Dr. Jūratė Kuprienė Framing the digital curation curriculum International Conference Florence, Italy.
The DPubS Development Project: Building an Open Source Electronic Publishing System David Ruddy Cornell University Library.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
 PBMA-KMS deployed in March of 2001 is the first fully operational NASA-wide multi-functional Knowledge Management System  Knowledgebase 200+ Best Practices.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
Module 9 User Profiles and Social Networking. Module Overview Configuring User Profiles Implementing SharePoint 2010 Social Networking Features.
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
U.S. Department of the Interior U.S. Geological Survey Decision Support Tools and USGS Data Management Best Practices Cassandra Ladino USGS Chesapeake.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Cynthia US Department of Agriculture National Agricultural Library 30 September 2015 Ag Data Commons Adding value to open agricultural research.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
The role of the National Agricultural Library in arthropod genomics research - implementing and developing tools for genomic data management Monica Poelchau.
Open Access and Institutional Repositories. Accra, June 2007 Institutional repositories in SA research institutions: the DISA experience Dr D Peters.
Global Change Master Directory (GCMD) Mission “To assist the scientific community in the discovery of Earth science data, related services, and ancillary.
8th Sakai Conference4-7 December 2007 Newport Beach Sakaibrary Project Update: Subject Research Guides December 6, 2007.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
Electronic Theses and Dissertations: The bepress Approach Ben Hermalin Interim Dean, Haas School of Business, UC Berkeley & Co-Founder, bepress.
A Tripal based Arthropod genome portal The i5k A Tripal based Arthropod genome portal Christopher Childers USDA/ARS/NAL i5k.nal.usda.gov.
GROUP PresentsPresents. WEB CRAWLER A visualization of links in the World Wide Web Software Engineering C Semester Two Massey University - Palmerston.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
FORGING THE LIBRARY WEBSITE WITH OPEN SOURCE PROF. JUNIOR TIDAL LIBRARY DEPARTMENT NEW YORK CITY COLLEGE OF TECHNOLOGY TECH DAY
The i5k – enabling genomic data access, visualization and curation for the i5k community Monica Poelchau and the i5k group.
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
Navigating the Expanded Role of the Metadata Librarian
Ian Bruno, Suzanna Ward The Cambridge Crystallographic Data Centre
DataNet Collaboration
VI-SEEM Data Discovery Service
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
VI-SEEM Data Repository
An ecosystem of contributions
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Metadata The metadata contains
Bird of Feather Session
Presentation transcript:

Open Source Technologies at the National Agricultural Library Ursula Pieper IT Specialist – Web Team Lead National Agricultural Library Agricultural Research Service United States Department of Agriculture Feb 17, 2016

2 Ursula Pieper Acknowledgements: Knowledge Services Division (Susan McCarthy) Monica Poelchau and Chris Childers (i5K Workspace) Peter Arbuckle and Ezra Kahn (LCA Commons) Jeffrey Campbell (LTAR) Cynthia Parr (Ag Data Commons) Information Services Division (Vernon Chapman) Chuck Schoppet, NAL – (Fedora Commons/Islandora)

Why Open Source? Benefit from community contributions and support Security managed by community Cost – Vendor lock-in Can get customized locally Interoperability Re-use of skills

PHP Available NAL Drupal Python Grails Java Solr Subject Matter Experts Django

Open Source based Projects (Selection) Drupal Python Grails Java Solr Django Ag Data Commons –Scientific data catalog/repository LCA Commons –Life Cycle Assessment repo and tools PubAg –Catalog of agricultural scientific literature Workspace –Repository and workspace for Arthropod Genomes Long Term Agro-ecosystem Research –Historical and future agricultural research data National Nutrient Database Dr. Duke's Phytochemical and Ethnobotanical Databases

Open Source based Projects (Selection) Drupal Grails Java Based Ag Data Commons Workspace LCA Commons PubAg – Data Management System LCA Commons National Nutrient Database Phytochem Database (Duke) Long-term Agro-ecosystem Research

Ag Data Commons Requirements Public Access to USDA funded research results Support scientific research and evidence-based policy Re-use / re-analysis REE Action Plan: 2012 goals Journal submission requirements Mandates America COMPETES Act OSTP Memorandum M-13-13, Open Data Policy 7

Ag Data Commons A data catalog and repository based on the Drupal DKAN distribution 8

Summary of Required Capabilities Comprehensive catalog of research results –Support for compliance reporting –Feeds Data.gov –Enhanced dataset description for discovery and reuse Flexibility to support distributed data repositories –Some disciplines already have repositories (e.g. GenBank) Preservation of valuable data for long-term research Supportive infrastructure for small agencies & labs Link scholarly literature to its supporting data Sustainable business model 9

Ag Data Commons Pilot Standard DKAN Features Drupal 7 Installation Profile Fulfills Project Open Data requirements –Dataset content type: POD 1.1 metadata schema –Unlimited number of resources can get uploaded –data.json and rdf available Additional Features –Social media links –Some data analysis tools (map, graph through recline library) –License display 10

Ag Data Commons Pilot What’s missing from DKAN? DKAN’s main use case: Government and organizational documents and datasets General improvements –Large File upload, virus checking, file size display –Harvest Dashboard – for harvesting external POD datasets or data using other standards –Solr search –Versioning –Data curation workflow Scientific data require additional functionality –DOI assignments to datasets –Identity management for authors (orcid, etc.) –Citation information (Primary citation, Methods citation, Related publications) –Collection of additional metadata –Long-term archiving capabilities –Funding source reference –Embargo period –Specialized taxonomies 11

Ag Data Commons Pilot Lessons learned Keeping codebase compliant with standard DKAN –All configuration changes need to get committed to code –Codebase cannot clash with standard DKAN (which requires discipline when under time pressure) –Significant pain merging NAL customizations with new DKAN releases –Local programming and systems support is necessary (our model) Contributing back to DKAN and Drupal –Many of NAL’s customizations are adopted (and then maintained) by standard DKAN –General Drupal functionality: Open data schema mapper NALT Thesaurus Taking advantage of customizations by other organiz ations –Workflow, Stories, Visualizations 12

Ag Data Commons Pilot 13

I5k Provides tools and resources for scientists working on insect genomes. Goal: –to store insect genome sequences –visualize them, –enable their curation –make them accessible to scientists. Designed specifically to handle and support genomic data. Website:

Key open-source software used by the i5k Workspace 1.Main portal/website –built with Drupal/Tripal 2.Key web application for genome visualization and feature annotation –Jbrowse/Apollo

Key open-source software used by the i5k Workspace

I5K NAL 1. Drupal + Tripal Chado is a database schema for biological data Tripal allows Drupal to access data stored in the Chado database to populate web pages using Drupal functionality. Community: small and academic

Apollo is a web application that allows interactive, instantaneous editing of genome features It is one of the key features of the i5k Workspace Community: small and academic I5K NAL 2. Apollo

Registration module for Apollo application –Completely built in house –Integrates notifications, account creation, and captcha Visualizing custom data types: gene pages –Hierarchical view to display gene/transcript relationships Search website (many thousands of nodes) –Apache Solr search I5K NAL Customized Resources

Customization requires one full-time developer at the NAL Because our customizations are forked off the main repository, any updates in the main branch require more updates on our part Customizations are too specific to our website to be able to fully contribute back to/integrate with the main project I5K NAL Tripal: Lessons learned

Instead of building customized resources, we contributed financially to the salary of the lead developer. Improvements were not specific to the NAL’s goals, but were aimed at improving the stability of the application Even without a financial contribution, bug reports and feature requests from the entire user community are usually addressed very quickly due to an active development team, and a lead developer solely focused on this project. I5K NAL Apollo: Customized resources

How you interact with the development community of an OSS project depends on –1) the community itself –2) the specificity of the customization required I5K NAL Apollo: Lessons learned

I5K NAL

Life Cycle Assessment (LCA) Commons LCA Commons is a repository that provides access to data and tools that support life cycle assessment of agricultural products. We collect, curate, and provide access to data edited and formatted explicitly for use in LCA The LCA Commons is designed specifically to handle and support unit process data for LCA. Website:

LCA Commons Technology Stack Three separate applications accessed through Drupal web content management system. –Discovery and Editorial Applications Groovy/grails web implementation of domain specific openLCA data model/modeling tool –LCA Collection on Ag Data Commons DKAN catalog and datastore

LCA Commons Technology Stack

Discovery Application Editorial Application LCA Collection on Ag Data Commons lcacommons.gov Application Groovy/Grails Framework Solr Index openLCA API Activiti BPM DKAN Drupal Technology Drupal Custom User Mgt. openLCA mySQL DKAN Datastore DKAN Datastore DKAN Catalog Database LCA Commons Technology Stack

LCA Commons Customized Resources openLCA datastore not designed explicitly for data management beyond what is necessary for desktop modeling. – has required developing custom “work-arounds” for data management Activiti BPM has required significant customization for editorial workflow for LCA data Will need to develop customized search capabilities that enable search across all three applications through Drupal

LCA Commons Lessons learned Technology selection based on clearly defined functional requirements is critical –Using openLCA for an application for which it was not exactly designed has required custom development –AND innovation in the field Spurred openLCA developer to build functionality that more closely meets our needs and pushed the domain forward in terms of data sharing and management

LCA Commons

PubAg Data Management System PubAg is the National Agricultural Library's search system for agricultural information. Content: –Full-text articles relevant to the agricultural sciences –Citations to peer-reviewed journal articles. Repository (Data Management): –Fedora Commons/Islandora/Drupal Public Interface: –Apache Solr and Java application layer

PubAg Data Management System

From Islandora (

PubAg Data Management System Lessons learned Customization needed to accommodate NAL Quality Assurance and workflow Performance tuning is necessary and non-trivial for large repositories

PubAg Data Management System Internal Access Only

Long-Term Agroecosystem Research Network Historical and future agricultural research data Aims to ensure sustained crop and livestock production and ecosystem services from agroecosystems. Aims to forecast and verify the effects of environmental trends, public policies, and emerging technologies.

Long-Term Agroecosystem Research Network Historical and future agricultural research data 18 sites across country Aim: 30 to 100+ years of data

Long-Term Agroecosystem Research Network

Long-Term Agroecosystem Research Network Lessons learned The project is still in the initial stages Lessons learned is: we still have a lot to learn

Long-Term Agroecosystem Research Network

Conclusion What have we learned? Use of open source technology –Allows us to test out technology in depth without a huge initial investment –Gives us access to community development (avoids reinventing the wheel) –Is mainly useful when customized ?