The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve.

Slides:



Advertisements
Similar presentations
1 Is there an ? Is there an app for that ? Challenges in scalable analysis for Life sciences 1 Nirav Merchant UA BioComputing + iPlant Arizona Research.
Advertisements

Enabling Phenotypic Image Analysis Using Shared Cyberinfrastructure
The iPlant Tree of Life Project and Toolkit: Building a Cyberinfrastructure for Plant Science Research Naim Matasci The iPlant Collaborative Evolution.
Office of Science Office of Biological and Environmental Research Susan K. Gregurick, Ph.D. Program Manager Computational Biology & Bioinformatics Biological.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Nirav Merchant iPlant / University of Arizona
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
The iPlant Collaborative Cyberinfrastructure Matt Vaughn Cold Spring Harbor Laboratory April 2010.
The iPlant Collaborative Cyberinfrastructure aka Development of Public Cyberinfrastructure to Support Plant Science Presented by Dan Stanzione Co-PI and.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant
IPlant Collaborative Powering a New Plant Biology iPlant Collaborative Powering a New Plant Biology.
Customized cloud platform for computing on your terms !
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory Botany 2013, New Orleans, LA.
BISQUE: Enabling Cloud and Grid Powered Image Analysis Ramona Walls iPlant Collaborative
Enabling Cloud and Grid Powered Image Phenotyping Nirav Merchant iPlant Collaborative
1 iPlant: Cyberinfrastructure for Plant Sciences (and Beyond) Your Name Here 1.
IPlant Collaborative Bringing Together High Performance Computing and Biology.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Customized cloud platform for computing on your terms ! Nirav Merchant
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant
The iPlant Collaborative Presented by Sheldon McKay Cold Spring Harbor Laboratory.
Cyberinfrastructure Planning at NSF Deborah L. Crawford Acting Director, Office of Cyberinfrastructure HPC Acquisition Models September 9, 2005.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Enabling Cloud and Grid Powered Image Phenotyping Martha Narro iPlant Collaborative Adapted.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Network for Integrating Bioinformatics into Life Sciences Education April, 2014.
Metadata in the iPlant Collaborative Cyberinfrastructure Birds of a Feather meeting at PAG XXII, Jan. 14, 2014.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
My-Plant.org A Phylogenetically Structured Social Network Matthew R Hanlon November 13, 2010.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop – Part 2 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 29, 2015,
IPG2P Working Group Update. iPG2P Final deliverable: – Procedure allowing an investigator to begin with trait of interest in species possessing limited.
NextGen Pipeline: Enabling the Plant Science Community Tom Brutnell (lead), Steve Rounsley (co-lead), Matt Vaughn (Engagement Lead) Ed Buckler, Justin.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
The iPlant Collaborative Using iPlant for sharing, managing, and analyzing ecological data Ramona Walls Presented at ESA 2014 – Ignite session August 12,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Contribution of Epigenetic Variation to Expression Changes Among Tissues and Genotypes Steve Eichten – Springer Lab PAG iPlant Workshop 1/17/12.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop University of Hawaii at Manoa; December 10-11, 2012.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
The iPlant Collaborative Pollen RCN March 2 nd, 2013 The iPlant Collaborative Pollen RCN March 2 nd, 2013 Steve Goff BIO5 Institute.
Overview of Atmosphere
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant.
IPlant Collaborative Bringing Together High Performance Computing and Biology.
Agenda iPG2P Steering Committee September 27, 2011 Welcome Fusheng Wei, Scientific Analyst Virginia Tech Workshop (Ruth) iPlant presentation to NSB (Martha)
Enabling Cloud and Grid Powered Image Phenotyping
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop BISQUE.
The iPlant Collaborative
The iPlant Collaborative iPToL Data Assembly Workshop November 21 st, 2009 Steve Goff, Sonya Lowry, Martha Narro, Dan Stanzione University of Arizona,
Enabling Plant Sciences Research with the iPlant Discovery Environment and Condor Juan Antonio Raygoza Garay, Sonya Lowry, John Wregglesworth.
Transforming Science Through Data-driven Discovery Genomics in Education University of Delaware – February 2016 Jason Williams, Education, Outreach, Training.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Atmosphere Joslynn Lee – Data Science Educator Cold Spring Harbor Laboratory,
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store Overview.
CyberGIS Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store – Managing your ‘Big’ Data Joslynn Lee, Ph.D. – Data Science.
Transforming Science Through Data-driven Discovery Using CyVerse Cyberinfrastructure to Enable Data Intensive Research, Collaboration, and Education Joslynn.
Transforming Science Through Data-driven Discovery Using CyVerse Cyberinfrastructure to Enable Data Intensive Research, Collaboration, and Education Atmosphere.
Transforming Science Through Data-driven Discovery Bringing your Bioinformatics tools to CyVerse’s Discovery Environment using Docker Upendra Kumar Devisetty.
Introductory RNA-seq Transcriptome Profiling
CI Updates and Planning Discussion
CyVerse Tools and Services
Tools and Services Workshop
Customized cloud platform for computing on your terms !
Joslynn Lee – Data Science Educator
Cyberinfrastructure for the Life Sciences
Presentation transcript:

The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve Goff iPlant Collaborative, BIO5 Institute School of Plant Science University of Arizona

What is iPlant? iPlant’s mission is to build the CI to support plant biology’s Grand Challenge solutions Phase I – Community Input Phase II – Building the CI Foundation Next Phase – Enabling Plant Science Discovery Now need to integrate workflows and test theories Will support tool integration and synthesis activities

NSF Cyberinfrastructure Vision High Performance Computing Data and Data Analysis Virtual Organizations Learning and Workforce Ref: “Cyberinfrastructure Vision for 21st Century Discovery”, NSF Cyberinfrastructure Council, March 2007.

CI for Plant Science: Observations Investment in data creation is high Sources of data are disparate. Investment in existing tools is significant Tools shouldn’t be discarded Tools shouldn’t be reproduced, but lack: – Interoperability w/other tools – Data standards – Scalability – Consistency of interface access & use – Experimental reproducibility

iPlant is a process and a platform (or set of platforms, depending on your point of view).

Computational & Storage Capability – Compute: Ranger, Lonestar, Stampede (UT/TeraGrid) Saguaro, Sonora (ASU) Marin, Ice (UA) ~700 Teraflops – Storage: Corral, Ranch (UT), Ocotillo (ASU) > 10 Petabytes of storage available for the project – Visualization: Spur, Stallion (UT), Matinee (ASU), UA-Cave Among the world’s largest visualization systems – Virtualized/Cloud Services: iPlant, TeraGrid, vendor clouds Cloud tech to deliver persistent gateways and user services Thanks to large-scale NSF investments, iPlant has excellent CI access

Bench Biologists APIs Data Algorithms Discovery Environment Data StoreAtmosphere Computational Biologists Semantic Web Layer iPlant Cyberinfrastructure

Overview of Components iPlant Discovery Environment - Core Software iRODS Integration – Core Services Atmosphere Cloud – Core Services Semantic Web Tech – SSWAP Team iPlant Tool/Workflow API – Core Software & Engagement Teams

Discovery Environment DNA Subway 3 rd Party Science Gateways User Scripts & Applications Public APIs Low-Level Services EventI/ODataAppsJobProfileAuth Condor PBS SGF LSF LL iRODS MySQL LDAP Eucalyptus Action Folders Shibboleth Globus/ Unicore GPIR MyProxyXSEDE iPlant Hardware Resources High Perf Computing Databases Storage Cloud Systems Semantic Web

iRODS Integrated Rule-Oriented Data System Why iRODS? – Large data storage in simple format – Sharing of large data among iPlant CI Resources – Sharing of large data with colleagues and collaborators – Processing large data with TACC resources General information on iRODS: Access iPlant’s iRODS: irodsweb.iplantcollaborative.org Documentation:

11 Atmosphere iPlant’s Cloud Computing Resources Tutorial: re/Demo+with+picture+walkthrough re/Demo+with+picture+walkthrough Why Atmosphere? – Use a virtual machine (VM) with preinstalled software – Create a VM to install complex software – Create and share an image of a VM (VMI) – Mount data from iPlant iRODS for use by your VM

12 Semantic Web Why Semantic Web Technology? – Provides a means for web-services to communicate and be aware of one another iPlant Consumer Semantic Web Remote Service User-Created Service in Atmosphere Semantic Web iPlant’s Discovery Environment iPlant Service Semantic Web Remote Consumer

iPG2P: From Genotype to Phenotype Visual Analytics – R. Grene and G. Abram: Information Visualization Tools capable of displaying diverse types of data from laboratory, field, in silico analyses and simulations Data Integration – D. Ware and C. Jordan: Methods for describing and unifying data sets into systems that support iPG2P activities Statistical Inference – D. Kliebenstein and E. Buckler: Platform for using advanced computational approaches to statistically link genotype to phenotype Modeling Tools – J. White, C. Myers, S. Welch : Framework for the construction, simulation and analysis of computational models of plant Ultra High Throughput Sequencing – T. Brutnell and M. Vaughn: HPC resources and applications to process large-volume sequence data

Genome Services Ultra High-Throughput Sequencing Scalable computing Data NCBI SRA Desktop AmazonS3 FTP HTTP Data Wrangling Quality Control Preprocessing Rescaling Barcoding Alignments BWA TopHat Cufflinks SAMTools SAM Alignments Expression Levels (RPKM) Genome Variants (VCF3.3) Community Use Cases Expression studies Forward genetic screens Association studies

High Throughput Image Analysis Scope: Enable image-based plant sciences research by incorporating image processing algorithms, grid computing, and databasing into an analysis pipeline Objectives 1.Integrate Phytomorph and BISQUE as PhytoBisque 2.Broaden access to algorithms that benefit the community 3.Automate workflows so that plant biologists need not be computer scientists Storage Authentication APIs Compute cluster E. U of Wisconsin, B.S Majunath and K. UCSB

Phytobisque: Example Use Case Given a flatbed scanner image of Arabidopsis seeds, measures the length, width, and area and produce a population estimate for each trait Seed trait QTL can be mapped when applied to mapped populations like Ler x CVI

Basic QTL/GWAS analysis R/Qtl, QTLcartographer, et al. Community can integrate these into the CI Basic QTL/GWAS analysis R/Qtl, QTLcartographer, et al. Community can integrate these into the CI Iterative analyses iPlant workflow management simplifies automation Compare methods! Iterative analyses iPlant workflow management simplifies automation Compare methods! Exploratory methods Hand-built R, Python, SAS, C codes Easy integration into iPlant CI via API Adopt common data model Exploratory methods Hand-built R, Python, SAS, C codes Easy integration into iPlant CI via API Adopt common data model Scalability Challenges: High- density markers, large populations, combinatorial analyses iPlant-authored parallel GLM (etc) implementations Common data model Utilize workflow framework Scalability Challenges: High- density markers, large populations, combinatorial analyses iPlant-authored parallel GLM (etc) implementations Common data model Utilize workflow framework A Strategy for Association Studies

Simplest case*: a few minutes using GLM on desktop TASSEL 1000-replicate bootstrap: hours / trait Runtimes only gets larger (days to years) for more complex analyses * One trait x 40 million markers with no bootstrapping or epistasis testing Statistical Inference: Scalable GLM 6 traits of interest 40 million markers in maize NAM 1000 replicate analyses Epistasis testing XX GenotypePhenotype ANOVA

GPU-based QTL Mapping 19 Aspects of the problem are highly parallel Re-architect data flow and mapping algorithms for GPU architecture Interface for C and GPU implementations will be identical Ali Akoglu and Dave Lowenthal, UArizona Alignment-based protein searches sped up 6-10x

iPlant Tree of Life (iPToL) Large phylogenetic inference Building a tree of life for up to 500,000 green plants Tree Visualization Scalable visualization for small to large trees Data Assembly and Integration Acquisition, organization and processing the data Taxonomic Intelligence Sorting out different names for the same species Tree Reconciliation Resolving discordant gene and species trees Trait Evolution Using tree to understand how traits evolved

Phyloviewer: visualization of large phylogenetic trees 21

My-Plant Social networking for plant biologists Organized by clade Used to organize the data collection for the “big tree”

Taxonomic Name Resolution Service

Integration of New Tools w/o Programming This part is done!!! This part is coming soon!

Related Activities  Integrated Breeding Platform  Social networking portal for plant breeders  R analysis packages  Breeders fieldbook  1kp (1,000 plant transcriptomes)  DOE’s Knowledgebase (Kbase)  Seed projects  Elixir  CoGe

Future Workshop Activities  Small tool/workflow integration meetings  2-3 days each, local participants  4-5 meetings starting in June 2011  Addressing specific biological questions  With appropriate test data and available software  Building on iPlant’s cyberinfrastructure  Complementary tools and additional data access  Preference for broad use, high impact tools & workflows  Can be kept private until published  Positive results will stimulate additional support

27 iPlant’s Building Blocks 27 MetadataDataToolsWorkflowsViz Executive Team: Steve Goff Dan Stanzione Staff: Greg Abram Victoria Bryan Rion Dooley Andy Edmonds Juan Antonio Raygoza Garay Karla Gendler Damian Gessler Cornel Ghiban Michael Gonzales Hariolf Häfele Matthew Helmke Faculty Advisors: Greg Andrews Kobus Barnard Susan Brown Vicki Chandler John Hartman Nirav Merchant Students: Storme Briscoe Steven Gregory Monica Lent Bansri Poduval Pavithra Ravi Shannon Wermes Jill Yarmchuk Sudha Ram Ann Stapleton Lincoln Stein Doreen Ware Sue Wessler Ramin Yadegari Natalie Henriques Uwe Hilgert Nicole Hopkins Lisa Howells Kathleen Kennedy Mohammed Khalfan Seung-jin Kim Adam Kubach Sangeeta Kuchimanchi Tina Lee Andrew Lenards Sonya Lowry Jerry Lu Eric Lyons Naim Matasci Sheldon McKay Dave Micklos Andy Muir Martha Narro Christos Noutos Dennis Roberts Bernice Rogowitz Jerry Schneider Bruce Schumaker Edwin Skidmore Sriram Srinivasan Mary Margaret Sprinkle Matthew Vaughn Liya Wang Sharon Wei Jason Williams Frank Willmore John Wregglesworth Weijia Xu