Joslynn Lee – Data Science Educator CyVerse Overview Joslynn Lee – Data Science Educator DNA Learning Center, Cold Spring Harbor Laboratory jolee@cshl.edu
CyVerse evolution From plant science, to life science, and beyond… Transforming Science Through Data-Driven Discovery iPlant 2008 Empowering a New Plant Biology iPlant 2013 Cyberinfrastructure for Life Science Established by the U.S. National Science Foundation (NSF) in 2008 to develop cyberinfrastructure for life sciences research and democratize access to U.S. supercomputing capabilities. iPlant's original mission was to provide the cyberinfrastructure needed by the plant science research community to address Grand Challenge problems that could not be addressed with single-lab research funding iPlant developed a species-generic cyberinfrastructure platform that is now in high demand across research domains of many species. At the recommendation of the NSF, iPlant has extended its scope beyond plants. The broader life science (non-human research) community is quickly adopting iPlant's platform, expanding the user base, and leveraging additional domain knowledge and technical expertise to support the collaborative, while maintaining the founding principles and vision of the project.
We are funded by the National Science Foundation From plant science, to life science, and beyond… Directorate for Biological Sciences $100 Million in investment We are your colleagues and collaborators Freely available to the community Spur national/international collaboration Cite CyVerse: CyVerse.org/acknowledge-cite-cyverse DBI-0735191 and DBI-1265383
Transforming Science Through Data-Driven Discovery CyVerse evolution From plant science, to life science, and beyond… Vision: Transforming science through data-driven discovery Mission: Design, develop, deploy, and expand a national cyberinfrastructure for life science research, and train scientists in its use More than 30K users, PB of data, and hundreds of publications, courses, and discoveries CyVerse 2016 Transforming Science Through Data-Driven Discovery 1024 bytes = 1 KB 1024 KB = 1 MB 1024 MB = 1 GB 1024 GB = 1 TB 1024 TB = 1 PB
What is cyberinfrastructure? CI provides solutions to the challenges of large-scale computational science were unapproachable because the computational requirements were too large, too complex, or simply unknown Platforms, tools, datasets, Storage and compute Training and support Software HPC People
CyVerse evolution From plant science, to life science, and beyond…
CyVerse supports all domains of life science From plant science, to life science, and beyond… Plant / Microbial Animal Biomedical Ecological/Climate CyVerse provides life scientists with powerful computational infrastructure to handle huge datasets and complex analyses
CyVerse supports all level of users User perspectives and potential applications Bench Scientist Bioinformatician Core Facilities Welch et al. 2013
CyVerse collaborators From plant science, to life science, and beyond… Arabidopsis Information Portal CyVerse collaborates to enable access to the solutions that work the best for you
CyVerse is a collaborative virtual organization CyVerse institutions From plant science, to life science, and beyond… CyVerse is a collaborative virtual organization
CyVerse products Ease of Use Flexibility From plant science, to life science, and beyond… Ready to use Platforms Extensible Services Ease of Use Flexibility Established CI Components Foundational Capabilities
CyVerse products Data Store Science APIs Discovery Environment Bisque From plant science, to life science, and beyond… Data Store Science APIs Discovery Environment Bisque Atmosphere DNA Subway
Data Store The resources you need to share and manage data with your lab, colleagues and community Initial 100 GB allocation – TB allocations available Automatic data backup Easy upload /download and sharing
Discovery Environment Hundreds of bioinformatics Apps in an easy-to-use interface user interface for access to the tools and computing resources Run existing bioinformatics software apps on CyVerse clusters or TACC supercomputers quickly, easily, and efficiently User extensible – add your own applications bioinformatics workflow—data management, analysis, sharing large datasets
Atmosphere Cloud computing for the life sciences Simple: One-click access to more than 200 virtual machine images Publish your own software suites, create your own work environments, and run the software for community use access the CyVerse’s core infrastructure resources, including high performance computing (HPC), grid computing environments
Science APIs (Application Programming Interfaces) Fully customize CyVerse resources Science-as-a-service platform Define your own compute, and storage resources (local and CyVerse) Build your own app store of scientific codes and workflows and share with anyone (developers and bioinformaticians)
Bisque Image analysis, management, and metadata Bio-Image Semantic Query User Environment Secure image storage, analysis, and data management Integrate existing applications or create new ones 100+ biological image formats
DNA Subway Educational workflows for Genomes, DNA Barcoding, RNA-Seq Commonly used bioinformatics tools in streamlined workflows Teach important concepts in biology and bioinformatics Inquiry-based experiments for novel discovery and publication of data