The iPlant Collaborative Community Cyberinfrastructure for Life Science Arthropod Genomics Research in ARS Workshop Jason Williams Cold Spring Harbor Laboratory, iPlant
Goals for today’s talk Begin the process of adopting /adapting iPlant to build your own community capacity Learn about what you hope iPlant may be able to offer Highlight existing capabilities of the platform Explain some of the context and rationale behind iPlant
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend iPlant's foundational cyberinfrastructure to understand and ultimately predict the complexity of biological systems and their dynamic nature under various environmental conditions.
The iPlant Collaborative What is Cyberinfrastructure?
The iPlant Collaborative What is Cyberinfrastructure? Platforms, tools, datasets Storage and compute Training and support
The iPlant Collaborative What problems can iPlant Solve? Crops and model plant systems Animal and livestock Agronomic microbes, insects…
The iPlant Collaborative What problems can iPlant Solve? iPlant is built for Data
The iPlant Collaborative How was iPlant built?
The iPlant Collaborative Landscape of community identified priorities Genomic data and analysis: Reference guided assembly De novo assembly RNA-Seq (expression; gene/isoform discovery) Variant calling Genome/Transcriptome annotation ChIP-Seq/Integration of epigenetic information Multiple sequencing platforms New and evolving technologies
The iPlant Collaborative Landscape of community identified priorities Genotypic Environmental Phenotypic Comparative Genomics Sequencing & Assembly Annotation Environmental datasets Climate model products Image-based Phenotyping Molecular Phenotyping Trait Data In planning In progress Foundation in place Evolutionary Models Ecological Models Association Studies Pathway Analysis
iPlant is a collaborative virtual organization The iPlant Collaborative Who makes up iPlant?
The iPlant Collaborative How is iPlant funded? Funded by NSF First funding ($50 Million) in 2008 Renewal funding ($50.3 Million) in 2013 o Scientific Advisory Board o Focus on Genotype-Phenotype science o NSF Recommended expansion of scope beyond plants
Ultracentrifuge - Electrophoresis Cycle sequencing – HTS ~20 years Technology… Transition… Enablement…
The iPlant Collaborative What a unified platform gets you Ability to access and manage data Software to analyze data Computing resources Skills and help to use software and interpret results Get Science Done
The iPlant Collaborative What a unified platform gets you Metadata management Ability to share data and workflows Open source sustainable tools Reproducibility
The iPlant Collaborative What a unified platform gets you High-performance and scalable computing Ability automate and collaborate Funding spent on science, not software or hardware Productivity
The iPlant Collaborative Support for a diverse user base Bioinformatics Users: Easy-to-use tools/interfaces (little or no command-line) Generous data storage, end-to-end workflows Access to training and support
The iPlant Collaborative Support for a diverse user base Bioinformaticians: (More) access to HPC Make tools and algorithms more accessible to users Better ways to manage large-project metadata
The iPlant Collaborative Support for a diverse user base Bioinformatics Engineers (community/core support): Ways to scale support for community or institutional users Optimization of software Shared data storage and user portals
The iPlant Collaborative Products What do you get with your account?
The iPlant Collaborative Products We strive to be the CI Lego blocks Danish 'leg godt' - 'play well’ Also translates as 'I put together' in Latin If a solution is not available you can craft your own using iPlant CI components
iPlant Data Store Initial 100 GB allocation – TB allocations available Automatic data backup Easy upload /download and sharing The resources you need to share and manage data with your lab, colleagues and community
Discovery Environment Hundreds of bioinformatics Apps in an easy-to-use interface A platform that can run almost any bioinformatics application Seamlessly integrated with data and high performance computing User extensible – add your own applications
Atmosphere Cloud computing for the life sciences Simple: One-click access to more than 200 virtual machine images Flexible: Fully customize your software setup Powerful: Integrated with iPlant computing and data resources
Science APIs Fully customize iPlant resources Science-as-a-service platform Define your own compute, and storage resources (local and iPlant) Build your own app store of scientific codes and workflows
DNA Subway Educational workflows for Genomes, DNA Barcoding, RNA-Seq Commonly used bioinformatics tools in streamlined workflows Teach important concepts in biology and bioinformatics Inquiry-based experiments for novel discovery and publication of data
Bisque Image analysis, management, and metadata Secure image storage, analysis, and data management Integrate existing applications or create new ones Custom visualization and image handling routines and APIs
The iPlant Collaborative Genome Assembly and Annotation
The iPlant Collaborative Genome Assembly and Annotation Annotation of the Lobolly Pine Mega genome—Jill Wegrzyn Gb assembly—split into 40 jobs—216 CPU/job (8640 CPU total)—17 hours 22,656 CPU cores on1,888 nodes GenomeAssembly Size (Mb) CPU Run Time Arabidopsis thalianaTAIR :44 Arabidopsis thalianaTAIR :27 Zea maysRefGen_v :53 TACC Lonestar Supercomputer Campbell et al. Plant Physiology. December 4, 2013, DOI: /pp
The iPlant Collaborative An Evolving Data Commons specimen collection analysis project creation publication data discovery and re-use
The iPlant Collaborative Challenge: Transform existing datasets to do custom queries
The iPlant Collaborative Leveraging iPlant Data Store and iRODS
The iPlant Collaborative Collaborating with us
The iPlant Collaborative “Powered by iPlant” supports a variety of ways of using the iPlant infrastructure underneath another application that communicates with users; usually outside the iPlant project. Other major projects have adtoped the iPlant CI as their underlying infrastructure (some completely, some in limited ways – more on this later).
Example “ Powered by iPlant ” Impact CoGE usage and user count after federation and interoperability with iPlant
Extended Support Make bioinformatics tools better We find example after example of codes that get well below.01% of peak on a single core By the end of the year, it will be difficult to get a server below 20 cores. There is little sympathy for data/computing challenges when the software is willing to ignore at least % of available performance D.Stanzione, Director TACC
The iPlant Collaborative Getting tools out there GenSel installed by developers, made available through the DE For whole-genome predictions, widely used in breeding Dorian Garrick, Iowa State University
The iPlant Collaborative Solving problems faster iAnimal genotyping pipeline developed for 1000 Bulls processes two terabytes (TB) of raw sequence data to DNA variants in less than 8 hours James Koltes, Iowa State University
Where to go from here: iPlant Learning Center Get Started Guide Tutorials and Videos Documentation Upcoming Events Workshops Webinars
iPlant can come to you… Tools & Services Workshops Genomics in Education Workshops Targeted to researchers Hands-on learning modules Individual consultations Targeted to educators Pair bioinformatics with classroom labs Help for generating lesson plans Pairs with asynchronous learning Reach broader audiences Follow up with workshop learners Webinars
Where to go from here: If iPlant can, we’ll help show you how… If iPlant can’t we’ll find the path that gets you what you need Don’t hesitate to ask “Can iPlant do this?” Keep asking at ask.iplantcollabortive.org
Staff: Greg Abram Sonali Aditya Ritu Arora Roger Barthelson Rob Bovill Brad Boyle Gordon Burleigh John Cazes Mike Conway Victor Cordero Rion Dooley Aaron Dubrow Andy Edmonds Dmitry Fedorov Melyssa Fratkin Michael Gatto Utkarsh Gaur Cornel Ghiban Executive Team Steve Goff - UA Matthew Vaughn - TACC Nirav Merchant – UA Eric Lyons - UA Doreen Ware – CSHL Current and Former: Faculty Advisors & Collaborators: Ali Akoglu Kobus Barnard Timothy Clausner Brian Enquist Damian Gessler Ruth Grene John Hartman Matthew Hudson David Lowenthal B.S. Manjunath Students: Peter Bailey Jeremy Beaulieu Devi Bhattacharya Storme Briscoe YaDi Chen David Choi Barbara Dobrin David Neale Brian O’Meara Sudha Ram David Salt Mark Schildhauer Doug Soltis Pam Soltis Edgar Spalding Alexis Stamatakis Steve Welch Zhenyuan Lu Aaron MarcuseKubitz Robert McLay Nathan Miller Steve Mock Martha Narro Benoit Parmentier Jmatt Peterson Dennis Roberts Paul Sarando Jerry Schneider Bruce Schumaker Steve Gregory Matthew Hanlon Natalie Henriques Uwe Hilgert Nicole Hopkins EunSook Jeong Logan Johnson Chris Jordan Kathleen Kennedy Mohammed Khalfan David Knapp Lars Koersterk Sangeeta Kuchimanchi Kristian Kvilekval Sue Lauter Tina Lee Edwin Skidmore Brandon Smith Mary Margaret Sprinkle Sriram Srinivasan Josh Stein Lisa Stillwell Jonathan Strootman Peter Van Buren Hans VasquezGross Rebeka Villarreal Ramona Wallls Liya Wang Anton Westveld Jason Williams John Wregglesworth Weijia Xu Andrew Predoehl Sathee Ravindranath Kyle Simek Gregory Striemer Jason Vandeventer Nicholas Woodward Kuan Yang Postdocs: Barbara Banbury Christos Noutsos Solon Pissis Brad Ruhfel John Donoghue Yekatarina Khartianova Chris La Rose Amgad Madkour Aniruddha Marathe Andre Mercer Kurt Michaels Zack Pierce The iPlant Collaborative Who makes up iPlant?
Download these Jason Williams –