Www.iplantcollaborative.org The iPlant Collaborative Pollen RCN March 2 nd, 2013 The iPlant Collaborative Pollen RCN March 2 nd, 2013 Steve Goff BIO5 Institute.

Slides:



Advertisements
Similar presentations
1 Is there an ? Is there an app for that ? Challenges in scalable analysis for Life sciences 1 Nirav Merchant UA BioComputing + iPlant Arizona Research.
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Natasha Pavlovikj, Kevin Begcy, Sairam Behera, Malachy Campbell, Harkamal Walia, Jitender S.Deogun University of Nebraska-Lincoln Evaluating Distributed.
Enabling Phenotypic Image Analysis Using Shared Cyberinfrastructure
XSEDE 13 July 24, Galaxy Team: PSC Team:
The iPlant Collaborative Community Cyberinfrastructure for Life Science Nirav Merchant iPlant / University of Arizona
1 iPlant Data Store (iDS) Supporting the Lifecycle of Data Nirav Merchant 1.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant
IPlant Collaborative Powering a New Plant Biology iPlant Collaborative Powering a New Plant Biology.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Arthropod Genomics Research in ARS Workshop Jason Williams Cold.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Introduction to iPlant Dan Stanzione The iPlant Collaborative September 16th, 2013.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory Botany 2013, New Orleans, LA.
BISQUE: Enabling Cloud and Grid Powered Image Analysis Ramona Walls iPlant Collaborative
IPlant's Taxonomic Name Resolution Service Naim Matasci BIO5 / The iPlant Collaborative tnrs.iplantc.org.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Data to Discovery The iPlant Collaborative Community Cyberinfrastructure for Life Science Nirav Merchant iPlant / University.
Enabling Cloud and Grid Powered Image Phenotyping Nirav Merchant iPlant Collaborative
The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve.
1 iPlant: Cyberinfrastructure for Plant Sciences (and Beyond) Your Name Here 1.
IPlant Collaborative Bringing Together High Performance Computing and Biology.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Customized cloud platform for computing on your terms ! Nirav Merchant
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant
The iPlant Collaborative Presented by Sheldon McKay Cold Spring Harbor Laboratory.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Enabling Cloud and Grid Powered Image Phenotyping Martha Narro iPlant Collaborative Adapted.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Network for Integrating Bioinformatics into Life Sciences Education April, 2014.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
My-Plant.org A Phylogenetically Structured Social Network Matthew R Hanlon November 13, 2010.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop – Part 2 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 29, 2015,
NextGen Pipeline: Enabling the Plant Science Community Tom Brutnell (lead), Steve Rounsley (co-lead), Matt Vaughn (Engagement Lead) Ed Buckler, Justin.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
The iPlant Collaborative Using iPlant for sharing, managing, and analyzing ecological data Ramona Walls Presented at ESA 2014 – Ignite session August 12,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Data Store.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop - Part 1 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 28, 2015,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams iPlant / Cold Spring Harbor Laboratory Texas A&M Tools and Services.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop University of Hawaii at Manoa; December 10-11, 2012.
Overview of Atmosphere
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store – Managing Your ‘Big’ Data.
IPlant Collaborative Bringing Together High Performance Computing and Biology.
Enabling Cloud and Grid Powered Image Phenotyping
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant.
The iPlant Collaborative iPToL Data Assembly Workshop November 21 st, 2009 Steve Goff, Sonya Lowry, Martha Narro, Dan Stanzione University of Arizona,
IPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment Sriram Srinivasan.
Transforming Science Through Data-driven Discovery Genomics in Education University of Delaware – February 2016 Jason Williams, Education, Outreach, Training.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Atmosphere Joslynn Lee – Data Science Educator Cold Spring Harbor Laboratory,
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store Overview.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store – Managing your ‘Big’ Data Joslynn Lee, Ph.D. – Data Science.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
CyVerse Data Store Managing Your ‘Big’ Data. Welcome to the Data Store Manage and share your data across all CyVerse platforms.
Transforming Science Through Data-driven Discovery Using CyVerse Cyberinfrastructure to Enable Data Intensive Research, Collaboration, and Education Joslynn.
CI Updates and Planning Discussion
CyVerse Tools and Services
Tools and Services Workshop
Customized cloud platform for computing on your terms !
Joslynn Lee – Data Science Educator
Tools and Services Workshop
Data uploading and sharing with CyVerse
Cyberinfrastructure for the Life Sciences
Presentation transcript:

The iPlant Collaborative Pollen RCN March 2 nd, 2013 The iPlant Collaborative Pollen RCN March 2 nd, 2013 Steve Goff BIO5 Institute University of Arizona Steve Goff BIO5 Institute University of Arizona

The iPlant Collaborative Cyberinfrastructure for the Plant Sciences 9:00 - 9:20 AMSteve Goff, Director, iPlant Collaborative: iPlant Overview, Data Store, Discovery Environment 9:20 - 9:30 AMMartha Narro, Sr. Project Coordinator, iPlant Collaborative: Bisque 9:30 – 9:40 AMNaim Matasci, iPlant Collaborative: Atmosphere 9:40 – 9:50 AMMatt Bomhoff, University of Arizona: CoGe 9: :00 AMiPlant Presenters: Questions and Discussion 11: :00 NOONPoster session / Booth Demonstrations by presenters in the previous session (Tutorials: PollenTubeTracker in Bisque, RNAseq in Discovery Environment)

NSF’s PSCIC Program PSCIC Goals:  “to create a new type of organization - a cyberinfrastructure collaborative for plant science”  “to enable new conceptual advances through integrative, computational thinking”  “to address an evolving array of grand challenge questions in plant science: the driving force and organizing principles for the collaborative”

The iPlant Collaborative Cyberinfrastructure for the Plant Sciences NSF Funded Project – finished 5 th year Recommended for second 5 year term iPlant is a cyberinfrastructure platform The platform is extensible by users NSF recommended scope beyond plants iPlant supports plant & animal breeding iPlant will bridge the genomics – breeding gap

NSF Cyberinfrastructure Vision High Performance Computing Data and Data Analysis Virtual Organizations Learning and Workforce Ref: “Cyberinfrastructure Vision for 21st Century Discovery”, NSF Cyberinfrastructure Council, March 2007.

Grand Challenge Projects + Added Efforts Plant Tree of Life – iPToL – May ’09 + Taxonomic Intelligence (TNRS) + Scientific Networking Website (MyPlant) + Perpetually Updated Trees + Species Distribution Maps Genotype to Phenotype – iPG2P – Aug ’09 + Image Analysis Platform (Bisque) + GLM/PLM, Association + Integrated Breeding Platform (GCP/Gates) + Comparative Genomics Platform (CoGe) + Semantic Web Development

NAR Databases & Tools Over Time

PubMed Publications Over Time Accounts for ~70% - Currently >2,500/day

Biology’s “Big Data” Instruments Ultra-High-Throughput Sequencers Example: Illumina HiSeq 2000 >1 terabyte sequence data / 11 days Estimated >1k analysis jobs/day Analysis – the new bottleneck Rapidly introducing new technology ………………AGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTG CAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATC AATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAA ATGACGCCTGTTCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATG CTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGA CGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTA GGCCTTGCAAATGACGCCTGTTCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGC CTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGC CTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTG TATCAATGCTAGGCCTTGCAAATGACGCCTGTTCAATGCT ………………

What iPlant has to offer: Data Management Resources High-Performance Computing Resources Tool Integration System Application Programming Interfaces Cloud Computing Resources Image Analysis Platform Molecular Breeding Platform (with IBP)

The iPlant Collaborative Web site – entry point to tools & documentation

The iPlant Discovery Environment: iPlant needs to empower researchers to use next gen seq, but also point out the pitfalls

The iPlant Data Store Fast data transfers via parallel, non-TCP file transfer (iDrop) Move large (>2 GB) files with ease Multiple, consistent access modes iPlant API iPlant web apps Desktop mount (FUSE/DAV) Java applet (iDrop) Command line Fine-grained ACL permissions Sharing made simple “Cloud Storage”… but it’s not Amazon Access and a storage allocation is automatic with your iPlant account

iPlant Data Store Transfer Performance Data Transfer from UC Berkeley to iPlant Data Store (UA) Dec 5th, 2011: 100GB: <30 min Dec 5th, 2011: 100GB: <30 min

The iPlant Data Store >100 Petabytes avail Fast transfer Storage near HPC Replicated

Leveraging XSEDE TACC, SDSC, PSC, EBI >500,000 Compute Cores 1-4TB shared memory TACC Stampede PSC Blacklight TACC Corral EBI Web Services TACC Lonestar iPlant Access to HPC via XSEDE Scalable Computation for High Throughput Analysis SDSC CI

Bisque Image Management, Analysis, Sharing System Martha Narro will describe.

Customized cloud platform for computing on your terms ! Naim Matasci will describe Atmosphere

Accelerating Analysis – an Example Code Parallelization Biallelic SNP Association Estimated 1,600 years Reduced to 4 hours Challenges: Months of communication Few weeks of development Only used once to date

The Integrated Breeding Portal Also in Chinese, soon French and Spanish

OneKP The problem OneKP: consortium formed to sequence the transcriptomes of 1000 phylogentically diverse plant species. Needs: storage, access to compute resources and expertise, distribution. Our approach Assign personnel with expertise in the required fields to the project Cover storage and computational needs Scrubbed all names to match NCBI taxa names (20% could originally not be matched) iPlant will be offering BLAST and search services against the OneKP results in the next DE release The optimized BLASTX and translation pipeline as available to the community through the Discovery Environment Results iPlant is replicating the entire dataset including raw reads, assemblies and analysis results Annotated 86 million contigs against NCBI's RefSeq using BLASTX Identified the open reading frames and estimated the protein sequences resulting in 19,556,877 potential genes Will increased the number of plant genes in GenBank by a factor 100.

Assembly and Annotation Results Diverse species assembled/annotated: Rice, diploid switchgrass, Ceratopteris, several Solanaceae, mulberry, maize accessions, Thellungiella, barley, wheat, and soybean Laboratory groups engaged: >30, including Cornell, Iowa State University, University of Florida, JCVI, Penn State University, CSIRO, and Purdue Applications deployed to HPC: ALLPATHS, Velvet, Oases, ABYSS, Newbler, SOAPdenovo, SOAPdenovo-Trans, Trinity, Celera Assembler HPC applications available via DE: Velvet, ABYSS, Newbler, SOAPdenovo, Trinity, InterproScan Current deployment and optimization efforts: Trinity, InterproScan, MAKER HPC systems used: PSC Blacklight, TACC Ranger, TACC Lonestar, SDSC Trestles Usage statistics: 7,000 HPC jobs; 1.5 million computing hours in Y1 of this initiative > 1000 HPC-backed assembly/annotation jobs run by iPlant DE users in 8 months The problem Full-scale genome and transcriptome sequencing is affordable and accessible Assembly and knowledge extraction remains challenging Extremely computationally intensive. Complex, low-efficiency software. Command-line only. Our approach Provide HPC resources >100k CPUs multi-TB RAM petascale storage Optimize workflows and algorithms Provide access via Discovery Environment

iPlant Cyberinfrastructure Strengths Extensible, flexible platform architecture Not limited to plant science (iAnimal, iArthropod) Diverse community collaborations Experienced staff working in a distributed fashion Unified access to iPlant (single sign-on) Genotype to Phenotype & Phylogenetics tools Various levels of support, novice to expert user Developing semantic web effort

Staff: Greg Abram Sonali Aditya Roger Barthelson Brad Boyle Todd Bryan Gordon Burleigh John Cazes Mike Conway Karen Cranston Rion Doodey Andy Edmonds Dmitry Fedorov Michael Gatto Utkarsh Gaur Steven Gregory Matthew Hanlon MetadataDataToolsWorkflowsViz Executive Team: Steve Goff Dan Stanzione Andrew Lenards Monica Lent Zhenyuan Lu Eric Lyons Naim Matasci Sheldon McKay Robert McLay Angel Mercer Dave Micklos Nathan Miller Steve Mock Martha Narro Praveen Nuthulapati Shannon Oliver Shiran Pasternak William Peil Dennis Roberts Jerry Schneider Anthony Heath Barbara Heath Natalie Henriques Uwe Hilgert Nicole Hopkins Eun-Sook Jeong Logan Johnson Chris Jordan B.D. Kim Kathleen Kennedy Mohammed Khalfan Lars Koersterk Sangeeta Kuchimanchi Kristian Kvilekval Aruna Lakshmanan Sue Lauter Tina Lee Bruce Schumaker Sriramu Singaram Edwin Skidmore Brandon Smith Mary Margaret Sprinkle Sriram Srinivasan Josh Stein Lisa Stillwell Kris Urie Peter Van Buren Hans Vasquez-Gross Matthew Vaughn Jason Williams John Wregglesworth Weijia Xu Postdocs: Barbara Banbury Jamie Estill Bindu Joseph Christos Noutsos Brad Ruhfel Stephen A. Smith Chunlao Tang Lin Wang Liya Wang Norman Wickett The iPlant Collaborative - Acknowledgments Students: Peter Bailey Jeremy Beaulieu Devi Bhattacharya Storme Briscoe Yi-Da Chen John Donoghue Yekatarina Khartianova Chris La Rose Amgad Madkour Aniruddha Marathe Andrew Mercer Aniruddha Marathe Kurt Michaels Dhanesh Prasad Andrew Predoehl Jose Salcedo Shalini Sasidharan Gregory Striemer Jason Vandeventer Kuan Yang Faculty Advisors & Collaborators: Ali Akoglu Greg Andrews Kobus Barnard Sue Brown Thomas Brutnell Michael Donoghue Casey Dunn Brian Enquist Damian Gessler Ruth Grene John Hartman Matthew Hudson Dan Kliebenstein Jim Leebens-Mack David Lowenthal Robert Martienssen B.S. Manjunath Nirav Merchant David Neale Brian O’Meara Sudha Ram David Salt Mark Schildhauer Doug Soltis Pam Soltis Edgar Spalding Alexis Stamatakis Ann Stapleton Lincoln Stein Val Tannen Todd Vision Doreen Ware Steve Welch Mark Westneat