Presentation is loading. Please wait.

Presentation is loading. Please wait.

The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve.

Similar presentations


Presentation on theme: "The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve."— Presentation transcript:

1 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve Goff iPlant Collaborative, BIO5 Institute School of Plant Science University of Arizona

2 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org What is iPlant? iPlant’s mission is to build the CI to support plant biology’s Grand Challenge solutions Phase I – Community Input Phase II – Building the CI Foundation Next Phase – Enabling Plant Science Discovery Now need to integrate workflows and test theories Will support tool integration and synthesis activities

3 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org NSF Cyberinfrastructure Vision High Performance Computing Data and Data Analysis Virtual Organizations Learning and Workforce Ref: “Cyberinfrastructure Vision for 21st Century Discovery”, NSF Cyberinfrastructure Council, March 2007.

4 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org CI for Plant Science: Observations Investment in data creation is high Sources of data are disparate. Investment in existing tools is significant Tools shouldn’t be discarded Tools shouldn’t be reproduced, but lack: – Interoperability w/other tools – Data standards – Scalability – Consistency of interface access & use – Experimental reproducibility

5 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org iPlant is a process and a platform (or set of platforms, depending on your point of view).

6 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org Computational & Storage Capability – Compute: Ranger, Lonestar, Stampede (UT/TeraGrid) Saguaro, Sonora (ASU) Marin, Ice (UA) ~700 Teraflops – Storage: Corral, Ranch (UT), Ocotillo (ASU) > 10 Petabytes of storage available for the project – Visualization: Spur, Stallion (UT), Matinee (ASU), UA-Cave Among the world’s largest visualization systems – Virtualized/Cloud Services: iPlant, TeraGrid, vendor clouds Cloud tech to deliver persistent gateways and user services Thanks to large-scale NSF investments, iPlant has excellent CI access

7 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org Bench Biologists APIs Data Algorithms Discovery Environment Data StoreAtmosphere Computational Biologists Semantic Web Layer iPlant Cyberinfrastructure

8 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org Overview of Components iPlant Discovery Environment - Core Software iRODS Integration – Core Services Atmosphere Cloud – Core Services Semantic Web Tech – SSWAP Team iPlant Tool/Workflow API – Core Software & Engagement Teams

9 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org Discovery Environment DNA Subway 3 rd Party Science Gateways User Scripts & Applications Public APIs Low-Level Services EventI/ODataAppsJobProfileAuth Condor PBS SGF LSF LL iRODS MySQL LDAP Eucalyptus Action Folders Shibboleth Globus/ Unicore GPIR MyProxyXSEDE iPlant Hardware Resources High Perf Computing Databases Storage Cloud Systems Semantic Web

10 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org iRODS Integrated Rule-Oriented Data System www.irods.org www.irods.org Why iRODS? – Large data storage in simple format – Sharing of large data among iPlant CI Resources – Sharing of large data with colleagues and collaborators – Processing large data with TACC resources General information on iRODS: www.irods.orgwww.irods.org Access iPlant’s iRODS: irodsweb.iplantcollaborative.org Documentation: https://pods.iplantcollaborative.org/wiki/display/systems/iRODS https://pods.iplantcollaborative.org/wiki/display/systems/iRODS

11 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org 11 Atmosphere iPlant’s Cloud Computing Resources http://atmosphere.iplantcollaborative.org http://atmosphere.iplantcollaborative.org Tutorial: https://pods.iplantcollaborative.org/wiki/display/atmosphe re/Demo+with+picture+walkthrough https://pods.iplantcollaborative.org/wiki/display/atmosphe re/Demo+with+picture+walkthrough Why Atmosphere? – Use a virtual machine (VM) with preinstalled software – Create a VM to install complex software – Create and share an image of a VM (VMI) – Mount data from iPlant iRODS for use by your VM

12 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org 12 Semantic Web http://www.iplantcollaborative.org/communities/developers/semanticweb http://www.iplantcollaborative.org/communities/developers/semanticweb Why Semantic Web Technology? – Provides a means for web-services to communicate and be aware of one another iPlant Consumer Semantic Web Remote Service User-Created Service in Atmosphere Semantic Web iPlant’s Discovery Environment iPlant Service Semantic Web Remote Consumer

13 iPG2P: From Genotype to Phenotype Visual Analytics – R. Grene and G. Abram: Information Visualization Tools capable of displaying diverse types of data from laboratory, field, in silico analyses and simulations Data Integration – D. Ware and C. Jordan: Methods for describing and unifying data sets into systems that support iPG2P activities Statistical Inference – D. Kliebenstein and E. Buckler: Platform for using advanced computational approaches to statistically link genotype to phenotype Modeling Tools – J. White, C. Myers, S. Welch : Framework for the construction, simulation and analysis of computational models of plant Ultra High Throughput Sequencing – T. Brutnell and M. Vaughn: HPC resources and applications to process large-volume sequence data

14 Genome Services Ultra High-Throughput Sequencing Scalable computing Data NCBI SRA Desktop AmazonS3 FTP HTTP Data Wrangling Quality Control Preprocessing Rescaling Barcoding Alignments BWA TopHat Cufflinks SAMTools SAM Alignments Expression Levels (RPKM) Genome Variants (VCF3.3) Community Use Cases Expression studies Forward genetic screens Association studies

15 High Throughput Image Analysis Scope: Enable image-based plant sciences research by incorporating image processing algorithms, grid computing, and databasing into an analysis pipeline Objectives 1.Integrate Phytomorph and BISQUE as PhytoBisque 2.Broaden access to algorithms that benefit the community 3.Automate workflows so that plant biologists need not be computer scientists Storage Authentication APIs Compute cluster E. Spalding @ U of Wisconsin, B.S Majunath and K. Kvilekval @ UCSB

16 Phytobisque: Example Use Case Given a flatbed scanner image of Arabidopsis seeds, measures the length, width, and area and produce a population estimate for each trait Seed trait QTL can be mapped when applied to mapped populations like Ler x CVI

17 Basic QTL/GWAS analysis R/Qtl, QTLcartographer, et al. Community can integrate these into the CI Basic QTL/GWAS analysis R/Qtl, QTLcartographer, et al. Community can integrate these into the CI Iterative analyses iPlant workflow management simplifies automation Compare methods! Iterative analyses iPlant workflow management simplifies automation Compare methods! Exploratory methods Hand-built R, Python, SAS, C codes Easy integration into iPlant CI via API Adopt common data model Exploratory methods Hand-built R, Python, SAS, C codes Easy integration into iPlant CI via API Adopt common data model Scalability Challenges: High- density markers, large populations, combinatorial analyses iPlant-authored parallel GLM (etc) implementations Common data model Utilize workflow framework Scalability Challenges: High- density markers, large populations, combinatorial analyses iPlant-authored parallel GLM (etc) implementations Common data model Utilize workflow framework A Strategy for Association Studies

18 Simplest case*: a few minutes using GLM on desktop TASSEL 1000-replicate bootstrap: 75-150 hours / trait Runtimes only gets larger (days to years) for more complex analyses * One trait x 40 million markers with no bootstrapping or epistasis testing Statistical Inference: Scalable GLM 6 traits of interest 40 million markers in maize NAM 1000 replicate analyses Epistasis testing XX GenotypePhenotype ANOVA

19 GPU-based QTL Mapping 19 Aspects of the problem are highly parallel Re-architect data flow and mapping algorithms for GPU architecture Interface for C and GPU implementations will be identical Ali Akoglu and Dave Lowenthal, UArizona Alignment-based protein searches sped up 6-10x

20 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org iPlant Tree of Life (iPToL) Large phylogenetic inference Building a tree of life for up to 500,000 green plants Tree Visualization Scalable visualization for small to large trees Data Assembly and Integration Acquisition, organization and processing the data Taxonomic Intelligence Sorting out different names for the same species Tree Reconciliation Resolving discordant gene and species trees Trait Evolution Using tree to understand how traits evolved

21 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org Phyloviewer: visualization of large phylogenetic trees 21

22 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org My-Plant Social networking for plant biologists Organized by clade Used to organize the data collection for the “big tree”

23 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org Taxonomic Name Resolution Service

24 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org Integration of New Tools w/o Programming This part is done!!! This part is coming soon!

25 Related Activities  Integrated Breeding Platform  Social networking portal for plant breeders  R analysis packages  Breeders fieldbook  1kp (1,000 plant transcriptomes)  DOE’s Knowledgebase (Kbase)  Seed projects  Elixir  CoGe

26 Future Workshop Activities  Small tool/workflow integration meetings  2-3 days each, 10-20 local participants  4-5 meetings starting in June 2011  Addressing specific biological questions  With appropriate test data and available software  Building on iPlant’s cyberinfrastructure  Complementary tools and additional data access  Preference for broad use, high impact tools & workflows  Can be kept private until published  Positive results will stimulate additional support

27 www.iplantcollaborative.orgwww.iplantcollaborative.org sgoff@iplantcollaborative.org 27 iPlant’s Building Blocks 27 MetadataDataToolsWorkflowsViz Executive Team: Steve Goff Dan Stanzione Staff: Greg Abram Victoria Bryan Rion Dooley Andy Edmonds Juan Antonio Raygoza Garay Karla Gendler Damian Gessler Cornel Ghiban Michael Gonzales Hariolf Häfele Matthew Helmke Faculty Advisors: Greg Andrews Kobus Barnard Susan Brown Vicki Chandler John Hartman Nirav Merchant Students: Storme Briscoe Steven Gregory Monica Lent Bansri Poduval Pavithra Ravi Shannon Wermes Jill Yarmchuk Sudha Ram Ann Stapleton Lincoln Stein Doreen Ware Sue Wessler Ramin Yadegari Natalie Henriques Uwe Hilgert Nicole Hopkins Lisa Howells Kathleen Kennedy Mohammed Khalfan Seung-jin Kim Adam Kubach Sangeeta Kuchimanchi Tina Lee Andrew Lenards Sonya Lowry Jerry Lu Eric Lyons Naim Matasci Sheldon McKay Dave Micklos Andy Muir Martha Narro Christos Noutos Dennis Roberts Bernice Rogowitz Jerry Schneider Bruce Schumaker Edwin Skidmore Sriram Srinivasan Mary Margaret Sprinkle Matthew Vaughn Liya Wang Sharon Wei Jason Williams Frank Willmore John Wregglesworth Weijia Xu


Download ppt "The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve."

Similar presentations


Ads by Google