Presentation is loading. Please wait.

Presentation is loading. Please wait.

Open Source Technologies at the National Agricultural Library Ursula Pieper IT Specialist – Web Team Lead National Agricultural Library Agricultural Research.

Similar presentations


Presentation on theme: "Open Source Technologies at the National Agricultural Library Ursula Pieper IT Specialist – Web Team Lead National Agricultural Library Agricultural Research."— Presentation transcript:

1 Open Source Technologies at the National Agricultural Library Ursula Pieper IT Specialist – Web Team Lead National Agricultural Library Agricultural Research Service United States Department of Agriculture Feb 17, 2016

2 2 Ursula Pieper Ursula.Pieper@ars.usda.gov 301-504-7379 Acknowledgements: Knowledge Services Division (Susan McCarthy) Monica Poelchau and Chris Childers (i5K Workspace) Peter Arbuckle and Ezra Kahn (LCA Commons) Jeffrey Campbell (LTAR) Cynthia Parr (Ag Data Commons) Information Services Division (Vernon Chapman) Chuck Schoppet, NAL – (Fedora Commons/Islandora)

3 Why Open Source? Benefit from community contributions and support Security managed by community Cost – Vendor lock-in Can get customized locally Interoperability Re-use of skills

4 PHP Available Expertise @ NAL Drupal Python Grails Java Solr Subject Matter Experts Django

5 Open Source based Projects (Selection) Drupal Python Grails Java Solr Django Ag Data Commons –Scientific data catalog/repository LCA Commons –Life Cycle Assessment repo and tools PubAg –Catalog of agricultural scientific literature I5K@NAL Workspace –Repository and workspace for Arthropod Genomes Long Term Agro-ecosystem Research –Historical and future agricultural research data National Nutrient Database Dr. Duke's Phytochemical and Ethnobotanical Databases

6 Open Source based Projects (Selection) Drupal Grails Java Based Ag Data Commons http://data.nal.usda.gov i5K@NAL Workspace http://i5k.nal.usda.gov LCA Commons http://lcacommons.gov PubAg – Data Management System http://pubag.nal.usda.gov LCA Commons http://lcacommons.gov National Nutrient Database http://ndb.nal.usda.gov/ndb/ Phytochem Database (Duke) http://phytochem.nal.usda.gov Long-term Agro-ecosystem Research http://ltar.nal.usda.gov

7 Ag Data Commons Requirements Public Access to USDA funded research results Support scientific research and evidence-based policy Re-use / re-analysis REE Action Plan: 2012 goals Journal submission requirements Mandates America COMPETES Act OSTP Memorandum M-13-13, Open Data Policy 7

8 Ag Data Commons A data catalog and repository based on the Drupal DKAN distribution 8

9 Summary of Required Capabilities Comprehensive catalog of research results –Support for compliance reporting –Feeds Data.gov –Enhanced dataset description for discovery and reuse Flexibility to support distributed data repositories –Some disciplines already have repositories (e.g. GenBank) Preservation of valuable data for long-term research Supportive infrastructure for small agencies & labs Link scholarly literature to its supporting data Sustainable business model 9

10 Ag Data Commons Pilot Standard DKAN Features Drupal 7 Installation Profile Fulfills Project Open Data requirements –Dataset content type: POD 1.1 metadata schema –Unlimited number of resources can get uploaded –data.json and rdf available Additional Features –Social media links –Some data analysis tools (map, graph through recline library) –License display 10

11 Ag Data Commons Pilot What’s missing from DKAN? DKAN’s main use case: Government and organizational documents and datasets General improvements –Large File upload, virus checking, file size display –Harvest Dashboard – for harvesting external POD datasets or data using other standards –Solr search –Versioning –Data curation workflow Scientific data require additional functionality –DOI assignments to datasets –Identity management for authors (orcid, etc.) –Citation information (Primary citation, Methods citation, Related publications) –Collection of additional metadata –Long-term archiving capabilities –Funding source reference –Embargo period –Specialized taxonomies 11

12 Ag Data Commons Pilot Lessons learned Keeping codebase compliant with standard DKAN –All configuration changes need to get committed to code –Codebase cannot clash with standard DKAN (which requires discipline when under time pressure) –Significant pain merging NAL customizations with new DKAN releases –Local programming and systems support is necessary (our model) Contributing back to DKAN and Drupal –Many of NAL’s customizations are adopted (and then maintained) by standard DKAN –General Drupal functionality: Open data schema mapper NALT Thesaurus Taking advantage of customizations by other organiz ations –Workflow, Stories, Visualizations 12

13 Ag Data Commons Pilot https://data.nal.usda.gov 13

14 I5k Workspace@NAL Provides tools and resources for scientists working on insect genomes. Goal: –to store insect genome sequences –visualize them, –enable their curation –make them accessible to scientists. Designed specifically to handle and support genomic data. Website: https://i5k.nal.usda.govhttps://i5k.nal.usda.gov

15 Key open-source software used by the i5k Workspace 1.Main portal/website –built with Drupal/Tripal 2.Key web application for genome visualization and feature annotation –Jbrowse/Apollo

16 Key open-source software used by the i5k Workspace

17 I5K Workspace @ NAL 1. Drupal + Tripal Chado is a database schema for biological data Tripal allows Drupal to access data stored in the Chado database to populate web pages using Drupal functionality. Community: small and academic

18 Apollo is a web application that allows interactive, instantaneous editing of genome features It is one of the key features of the i5k Workspace Community: small and academic I5K Workspace @ NAL 2. Apollo

19 Registration module for Apollo application –Completely built in house –Integrates notifications, account creation, and captcha Visualizing custom data types: gene pages –Hierarchical view to display gene/transcript relationships Search website (many thousands of nodes) –Apache Solr search I5K Workspace @ NAL Customized Resources

20 Customization requires one full-time developer at the NAL Because our customizations are forked off the main repository, any updates in the main branch require more updates on our part Customizations are too specific to our website to be able to fully contribute back to/integrate with the main project I5K Workspace @ NAL Tripal: Lessons learned

21 Instead of building customized resources, we contributed financially to the salary of the lead developer. Improvements were not specific to the NAL’s goals, but were aimed at improving the stability of the application Even without a financial contribution, bug reports and feature requests from the entire user community are usually addressed very quickly due to an active development team, and a lead developer solely focused on this project. I5K Workspace @ NAL Apollo: Customized resources

22 How you interact with the development community of an OSS project depends on –1) the community itself –2) the specificity of the customization required I5K Workspace @ NAL Apollo: Lessons learned

23 I5K Workspace @ NAL https://i5k.nal.usda.gov

24 Life Cycle Assessment (LCA) Commons LCA Commons is a repository that provides access to data and tools that support life cycle assessment of agricultural products. We collect, curate, and provide access to data edited and formatted explicitly for use in LCA The LCA Commons is designed specifically to handle and support unit process data for LCA. Website: www.lcacommons.govwww.lcacommons.gov

25 LCA Commons Technology Stack Three separate applications accessed through Drupal web content management system. –Discovery and Editorial Applications Groovy/grails web implementation of domain specific openLCA data model/modeling tool –LCA Collection on Ag Data Commons DKAN catalog and datastore

26 LCA Commons Technology Stack

27 Discovery Application Editorial Application LCA Collection on Ag Data Commons lcacommons.gov Application Groovy/Grails Framework Solr Index openLCA API Activiti BPM DKAN Drupal Technology Drupal Custom User Mgt. openLCA mySQL DKAN Datastore DKAN Datastore DKAN Catalog Database LCA Commons Technology Stack

28 LCA Commons Customized Resources openLCA datastore not designed explicitly for data management beyond what is necessary for desktop modeling. – has required developing custom “work-arounds” for data management Activiti BPM has required significant customization for editorial workflow for LCA data Will need to develop customized search capabilities that enable search across all three applications through Drupal

29 LCA Commons Lessons learned Technology selection based on clearly defined functional requirements is critical –Using openLCA for an application for which it was not exactly designed has required custom development –AND innovation in the field Spurred openLCA developer to build functionality that more closely meets our needs and pushed the domain forward in terms of data sharing and management

30 LCA Commons http://lcacommons.gov

31 PubAg Data Management System PubAg is the National Agricultural Library's search system for agricultural information. Content: –Full-text articles relevant to the agricultural sciences –Citations to peer-reviewed journal articles. Repository (Data Management): –Fedora Commons/Islandora/Drupal Public Interface: –Apache Solr and Java application layer

32 PubAg Data Management System

33 From Islandora (https://wiki.duraspace.org/)

34 PubAg Data Management System Lessons learned Customization needed to accommodate NAL Quality Assurance and workflow Performance tuning is necessary and non-trivial for large repositories

35 PubAg Data Management System Internal Access Only

36 Long-Term Agroecosystem Research Network Historical and future agricultural research data https://ltar.nal.usda.gov https://ltar.nal.usda.gov Aims to ensure sustained crop and livestock production and ecosystem services from agroecosystems. Aims to forecast and verify the effects of environmental trends, public policies, and emerging technologies.

37 Long-Term Agroecosystem Research Network Historical and future agricultural research data 18 sites across country Aim: 30 to 100+ years of data

38 Long-Term Agroecosystem Research Network

39 Long-Term Agroecosystem Research Network Lessons learned The project is still in the initial stages Lessons learned is: we still have a lot to learn

40 Long-Term Agroecosystem Research Network http://ltar.nal.usda.gov

41 Conclusion What have we learned? Use of open source technology –Allows us to test out technology in depth without a huge initial investment –Gives us access to community development (avoids reinventing the wheel) –Is mainly useful when customized ?


Download ppt "Open Source Technologies at the National Agricultural Library Ursula Pieper IT Specialist – Web Team Lead National Agricultural Library Agricultural Research."

Similar presentations


Ads by Google