CyVerse Tools and Services Jason Williams – Education, Outreach, Training Lead Cold Spring Harbor Laboratory williams@cshl.edu @JasonWilliamsNY
Transforming science through data-driven discovery CyVerse vision Transforming science through data-driven discovery More than 40K users, PBs of data, and hundreds of publications, courses, and discoveries
CyVerse evolution iPlant 2013 CyVerse 2016 Cyberinfrastructure for Life Sciences funding renewal CyVerse 2016 Transforming Science Through Data-Driven Discovery iPlant 2008 Empowering a New Plant Biology 2017 2006 public launch 2010 2015
CyVerse growth: user accounts
CyVerse growth: publications/acknowledgements
Community-focused cyberinfrastructure Platforms, tools, datasets Storage and compute Training and support
CyVerse is built for data Microbial Plant Animal Biomedical Ecological Sequence Images Other datatypes
CyVerse product stack Ease of Use Flexibility Ready to use Platforms Foundational Capabilities Established CI Components Extensible Services
Data Store Initial 100 GB allocation – TB allocations available The resources you need to share and manage data with your lab, colleagues and community Initial 100 GB allocation – TB allocations available Automatic data backup Easy upload /download and sharing Focus here is on genomics data, but not restricted to genomics data
Data lifecycle support Discovery Upload Data Commons Repository (DCR), Elasticsearch Discovery Environment, iCommands, Cyberduck Metadata Add, delete, copy; metadata templates; bulk metadata Publication Analysis Data Commons Repository (DCR), NCBI-SRA Discovery Environment, Atmosphere, Agave API, BisQue, DNA Subway Sharing Community Data folders, Data Commons, quick share links
Discovery Environment Hundreds of bioinformatics Apps in an easy-to-use interface A platform that can run almost any bioinformatics application Seamlessly integrated with data and high performance computing User extensible – add your own applications Focus here is on genomics data, but not restricted to genomics data
Sequence Read Processing Example Workflows Sequence Read Processing Data Publication HTProcess SRA Submission Data Commons Assembly Genome Transcriptome Variation Analysis Assembly Analysis Genome Annotation Association Association Pipeline Validate Pipeline RNA-Seq Methylation Discovery Environment Agave API Atmosphere
Atmosphere Simple: Access to hundreds of virtual machine images Cloud computing for the life sciences Simple: Access to hundreds of virtual machine images Flexible: Fully customize your software setup Powerful: Integrated with CyVerse computing and data resources Focus here is on genomics data, but not restricted to genomics data
On-demand Cloud CyVerse Cloud Atmosphere Instance (virtual machine) (Disk + CPU + Memory) + (Image) 128.196.34.158 CyVerse Cloud Atmosphere Instance (virtual machine)
Science APIs Science-as-a-service platform Fully customize CyVerse resources Science-as-a-service platform Define your own compute, and storage resources (local and CyVerse) Build your own app store of scientific codes and workflows Focus here is on genomics data, but not restricted to genomics data
API-enabled federation RENCI CSHL NASA Powered by CyVerse Arizona TACC
DNA Subway Commonly used bioinformatics tools in streamlined workflows Educational workflows for Genomes, DNA Barcoding, RNA-Seq Commonly used bioinformatics tools in streamlined workflows Teach important concepts in biology and bioinformatics Inquiry-based experiments for novel discovery and publication of data Focus here is on genomics data, but not restricted to genomics data
Support for Course-Based Research Experiences
Bisque Secure image storage, analysis, and data management Image analysis, management, and metadata Secure image storage, analysis, and data management Integrate existing applications or create new ones Custom visualization and image handling routines and APIs Focus here is on genomics data, but not restricted to genomics data
Image and GxE-driven collaboration
Looking ahead Future Funding: Division of Biological Infrastructure $100 Million, 10-year investment Year 9 of 10 (end date Sept 30, 2018) Future Funding: “The NSF BIO Directorate and NSF leadership are pleased with the progress of the project, and will be inviting an application for continued funding to support advances in life science research.” Discussions with Other Agencies and Foundations
Future-focused mission goals Enable data-driven discovery: Enable “deep” data integration and analysis Support sophisticated data expeditions defined by users or user groups Foster interoperability across computational resources and platforms: Deliver CyVerse as a self contained platform to public and private sector entities Encourage ”Powered by CyVerse Align with other resources: Amazon, Google, NIH Commons, many other federal projects. Train the next generation of data scientists: Develop a sophisticated workforce for academia and industry
Looking ahead Transitioning to Stampede 2 This summer, high-performance computing (HPC) systems utilized by CyVerse applications will transition to Stampede2 Improved speed, memory, and overall performance Longer wait times on these jobs till early fall
Looking ahead Improved user support Live chat feature to help you when you are stuck Project-based interfaces to help you organize data, analyses, and collaborators for a more collaborative experience.
Looking ahead All new CyVerse Learning Center Improved, easier to navigate guides and tutorials Organized through GitHub and Read-the-Docs – Easier to contribute to our documentation or make your own
Looking ahead Introducing SciApps Streamlined workflows for the most common analyses needs Extensible compute in an easy-to-navigate interface
CyVerse is a collaborative virtual organization CyVerse Institutions CyVerse is a collaborative virtual organization CyVerse UK