The iPlant Tree of Life Project and Toolkit: Building a Cyberinfrastructure for Plant Science Research Naim Matasci The iPlant Collaborative Evolution.

Slides:



Advertisements
Similar presentations
Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
Advertisements

Extending the iPlant DE: A Phylogenetics Workflow.
Bioinformatics and Phylogenetic Analysis
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
NGS Analysis Using Galaxy
Characterizing the Phylogenetic Tree-Search Problem Daniel Money And Simon Whelan ~Anusha Sura.
The iPlant Collaborative Cyberinfrastructure aka Development of Public Cyberinfrastructure to Support Plant Science Presented by Dan Stanzione Co-PI and.
IPlant Collaborative Powering a New Plant Biology iPlant Collaborative Powering a New Plant Biology.
Customized cloud platform for computing on your terms !
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
BISQUE: Enabling Cloud and Grid Powered Image Analysis Ramona Walls iPlant Collaborative
IPlant's Taxonomic Name Resolution Service Naim Matasci BIO5 / The iPlant Collaborative tnrs.iplantc.org.
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Software Architecture
Cluster Computing Applications for Bioinformatics Thurs., Aug. 9, 2007 Introduction to cluster computing Working with Linux operating systems Overview.
1 Enabling Webscale Research in Europe Julien Masanès European Archive Foundation Consultation Workshop, Brussels, 19/1/2010.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Building and Using Workflows Within the DE; Phylogenetics.
Pollen transcript unigene identifier log 2 -fold change Annotation (BLAST) Unigene L. longiflorum chloroplast, complete genome Unigene
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences Naim Matasci BIO5 / The iPlant Collaborative EEB, University of Arizona Oct 4, 2011.
Synopsis of current BIEN and Enquist projects managed by Martha iPlant 2014.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
The TNRS: a Taxonomic Name Resolution Service for Plants Naim Matasci The iPlant Collaborative iEvoBio 2011 Jun 21-22,
IPlant's Taxonomic Name Resolution Service Naim Matasci BIO5 / The iPlant Collaborative.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Enabling Cloud and Grid Powered Image Phenotyping Martha Narro iPlant Collaborative Adapted.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Windows Azure Conference 2014 Designing Applications for Scalability.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Building and Using Workflows Within the DE; Phylogenetics.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop – Part 2 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 29, 2015,
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Building and Using Workflows Within the DE; Phylogenetics.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
The iPlant Collaborative Using iPlant for sharing, managing, and analyzing ecological data Ramona Walls Presented at ESA 2014 – Ignite session August 12,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop - Part 1 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 28, 2015,
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
Build an Automated Workflow Visual Workflow Creator Discovery Environment.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop BISQUE.
The iPlant Collaborative
Using Divide-and-Conquer to Construct the Tree of Life Tandy Warnow University of Illinois at Urbana-Champaign.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
….. The cloud The cluster…... What is “the cloud”? 1.Many computers “in the sky” 2.A service “in the sky” 3.Sometimes #1 and #2.
Introductory Phylogenetic Workflows in the Discovery Environment Sheldon McKay iPlant Collaborative, DNALC, Cold Spring Harbor Laboratory Feb 8, 2012.
Transforming Science Through Data-driven Discovery Genomics in Education University of Delaware – February 2016 Jason Williams, Education, Outreach, Training.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Building and Using Workflows Within the DE; Phylogenetics.
Cyril Pommier et al. / Feedback from the RDA and WheatIS recommendations for Wheat Data Interoperability Adoption of the Wheat Data Interoperability Guidelines.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Darwin’s Tree of Life, July million species Phylogenetic inference from genomic.
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
Introductory RNA-seq Transcriptome Profiling
CyVerse Tools and Services
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Tools and Services Workshop Overview of Atmosphere
JMC CGEMS SUMMER GENOMICS TRAINING WORKSHOPS
Cyberinfrastructure for the Life Sciences
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Presentation transcript:

The iPlant Tree of Life Project and Toolkit: Building a Cyberinfrastructure for Plant Science Research Naim Matasci The iPlant Collaborative Evolution 2011 Jun 17-21, 2011

What is iPlant?

Discovery Environment NEW RELEASE COMING SOON!

4

Physical Infrastructure Computation 63K cores cluster 20K cores cluster 1 TB RAM Storage 2 PB 20 PB archive

Cloud Storage Store, access and share large datasets Multiple points of entry: web interface, mounted FS, API Free and secure AVAILABLE NOW!

Cloud Computing Virtual Machines – Up to 4 cores, 32 GB RAM, 100 GB dedicated disk – Run any x86-compatible OS (even Windows) – Persistent or on-demand – Log in via SSH or secure VNC Use Cases – Internet-enabled Servers – Database management appliances – Virtual desktops – … The sky is the limit! AVAILABLE NOW!

Consumer Applications 8 iPlant's CI

iPlant Tree of Life Grand Challange Large phylogenetic inference Building a tree of life for up to 500,000 green plants Tree Visualization Scalable visualization for small to large trees Data Assembly and Integration Acquisition, organization and processing the data Taxonomic Intelligence Sorting out different names for the same species Tree Reconciliation Resolving discordant gene and species trees Trait Evolution Using trees to understand how traits evolved

BIG TREES To optimize existing methods to construct phylogenetic trees in the order of 500K taxa.

Big Trees NINJA/WINDJAMMER (Travis Wheeler) Neighbor-Joining implementation that can analyze > 200K species Six day run time reduced 32-fold to 4.5 hours for 220K species data set Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set RAxML-Light (Alexandros Stamatakis) Large Scale Maximum Likelihood implementation 55K Tree published (Stephen A. Smith et al., “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany 98, no. 3 (2011): ) AVAILABLE NOW!

TREE VISUALIZATION To develop an application for viewing, analyzing and exploring large phylogenetic trees.

Tree Visualization > 500K Taxa Fast Web based, platform independent Semantic zooming Metadata driven display of information

iPlant Tree Viewer Prototype AVAILABLE NOW!

1KP Collaboration (1KP) – To support the data analysis of the Thousand Plant Transcriptomes Project

1KP unexplored territory N(genes) dozens of species completed genomes N(species) dozens of genes PCR in 10 4 species

Broad phylogenetic coverage algaenon-floweringflowering (angiosperm) on role of polyploidy in Darwin’s “abominable mystery” Phylogenomics of 1000 species across plant taxa

TREE RECONCILIATION To reconcile the evolutionary history of genes and species.

Gene family data courtesy John Bowers Tree Reconciliation

TAXONOMIC NAME RESOLUTION Collaboration (BIEN) - To unify and resolve synonymous, erroneous, or other conflicting taxonomic names.

Taxonomic uncertainty 1.Non-existent names Misspellings Contamination Annotations Morphospecies Digitization issues (frame shifts, character encoding)Lexical variants (digitization conventions) 2.Synonymy Nomenclatural synonyms Taxonomic synonyms / concepts 3.Misidentifications, incomplete identifications

AS SEEN IN NATURE! AVAILABLE NOW!

Taxonomic Name Resolution Service Computer assisted standardization of plant names Corrects spelling errors and alternative spellings to a standard list of names Convert out-of-date names to currently accepted names

TRAIT EVOLUTION To develop an infrastructure for downstream analysis of large trees.

Trait Evolution Toolkit to study the evolution of traits of interest on very large phylogenies – Diversification – Biogeographic patterns – Adaptation – Co-evolution – …

Current analyses (Proof of concept) Phylogenetically Independent Contrasts (Felsenstein 1985) Continuous Ancestral Character Estimation (Schulter et al. 1997, Paradis 2004) Discrete Ancestral Character Estimation (Pagel 1994, Paradis 2004)

Community Integrated (2 ½ Days Workshop) EUtils Lopper RAxML Ninja Phyml Muscle PHYLIP VCF to GFF script LRmaqqtl FASTX quality stats FASTX quality boxplot FASTX nucleotide distribution Cuffcompare ERMINEJ progressiveMauve iPlantBorda (mlpy) iPlantCanberra (mlpy) vbay MECPM OUCH Picante Ontologize BOWTIE BWA TopHat SHRiMP Cuffdiff GNU Core Text utilities GeneMania SRA import PARS PL DTT BBC biclustering

MY-PLANT.ORG To easily share information and research, collaborate, and stay on top of the latest news in the field.

Collaborative Tool AVAILABLE NOW! NEW AND IMPROVED!