DNA Subway Green Line Overview. Growth of Sequence Read Archive (SRA) 2.2 Quadrillion bases Log Scale!

Slides:



Advertisements
Similar presentations
Peter Tsai Bioinformatics Institute, University of Auckland
Advertisements

Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
RNA-seq Analysis in Galaxy
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
1 iPlant Data Store (iDS) Supporting the Lifecycle of Data Nirav Merchant 1.
Experiment 4- Gene Expression Study in Arabidopsis Thaliana.
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
An Introduction to RNA-Seq Transcriptome Profiling with iPlant
IPlant Collaborative Powering a New Plant Biology iPlant Collaborative Powering a New Plant Biology.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
An Introduction to RNA-Seq Transcriptome Profiling with iPlant.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Network for Integrating Bioinformatics into Life Sciences Education April, 2014.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Introduction to RNA-Seq
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq using the Discovery Environment And COGE.
IPlant Genomics in Education Workshop Genome Exploration in Your Classroom.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
I. Introduction and Red Line Education for Data-unlimited Science.
The iPlant Collaborative
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop – Part 2 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 29, 2015,
Genomics and Arabidopsis. What is ‘genomics’? Study of an organism’s entire genome –All the DNA encoded in the organism –Nucleus, mitochondria, chloroplasts.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
NextGen Pipeline: Enabling the Plant Science Community Tom Brutnell (lead), Steve Rounsley (co-lead), Matt Vaughn (Engagement Lead) Ed Buckler, Justin.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
The iPlant Collaborative Using iPlant for sharing, managing, and analyzing ecological data Ramona Walls Presented at ESA 2014 – Ignite session August 12,
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop - Part 1 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 28, 2015,
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
IPlant Genomics in Education Workshop Genome Exploration in Your Classroom.
RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.
Data Workflow Overview Genomics High- Throughput Facility Genome Analyzer IIx Institute for Genomics and Bioinformatics Computation Resources Storage Capacity.
IPlant Genomics in Education
EB3233 Bioinformatics Introduction to Bioinformatics.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store – Managing Your ‘Big’ Data.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq visualization with cummeRbund.
Bioinformatics support at School of Biological Sciences
The iPlant Collaborative
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
An Introduction to RNA-Seq Transcriptome Profiling with iPlant (
The iPlant Collaborative
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Data Demo and MAKER-P.
Transforming Science Through Data-driven Discovery Genomics in Education University of Delaware – February 2016 Jason Williams, Education, Outreach, Training.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
IPlant Genomics in Education Workshop Genome Exploration in Your Classroom.
Introductory RNA-seq Transcriptome Profiling
CyVerse Tools and Services
Tools and Services Workshop
Joslynn Lee – Data Science Educator
CyVerse Discovery Environment
JMC CGEMS SUMMER GENOMICS TRAINING WORKSHOPS
Introductory RNA-Seq Transcriptome Profiling
Footer.
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
MCBIOS 2016 – University of Memphis, TN
Campus and Phoenix Resources
Presentation transcript:

DNA Subway Green Line Overview

Growth of Sequence Read Archive (SRA) 2.2 Quadrillion bases Log Scale!

DNA Subway Green Line: Transcriptome analysis Green Line Examine RNA-Seq data for differential expression or annotate sequenced genome Use high-performance computing to analyze complete datasets Generate lists of genes and fold-changes; add results to Red Line projects

RNA Collected from multiple samples/time points Library prep and sequencing QC of Reads Assembly and mapping Abundance estimation RNA-Seq Overview Green Line: Differential expression

Next Generation RNA-Sequencing for Undergraduates 2. Isolate RNA 1. Design experiment (Differential expression or genome annotation) 3. Sequence RNA 4. Analyze RNA sequence datasets using the Green Line of DNA Subway and other bioinformatics tools 5. Follow-up validations

Working Group Faculty Projects 2014 Agnes Ayme-SouthgateCollege of Charleston, SCGene expression changes in Apis melifera flight muscle during life-stage transitions Judy BrusslanCalifornia State University, Long Beach, CA Gene expression changes during leaf development and senescence in Arabidopsis thaliana Raymond EnkeJames Madison University, VAGene expression changes during retina development in Gallus gallus Shaye LewisPrarie View A&M University, TXGene expression in caprine testes during juvenile development to puberty Irina MakarevitchHamline University, MNGene expression changes in maize in response to cold stress Judith OgilvieSaint Louis University, MOGene expression changes in the retinas of mice with retinitis pigmentosa Jeremy SetoNew York City College of Technology – CUNY, NY Gene expression changes during differentiation of rat pheochromocytoma line cells (PC12) to a neuronal-like phenotype Carrie ThurberAbraham Baldwin Agricultural College, ILGene expression changes during seed abscission in Sorghum bicolor George UdeBowie State University, MDTranscriptome analysis of floral inflorescence genes in banana/plantains Deirdre VadenPrairie View A&M University, TX Gene expression changes in peripheral blood mononuclear cells from hypertensive rats treated with captopril Scott WoodyUniversity of Wisconsin, WI Gene expression changes upon gibberellic acid exposure in Brassica rapa (Fast Plants, self-compatible) gibberellic acid (gad) mutants

RNA-Seq Overview Green Line: Differential expression and genome annotation Biologically rich Technical challenge (bottleneck) Molecular techniques Sequencing technologies Expression plots Validation experiments Image from Advanced Sequencing Technologies & Applications

RNA-Seq Overview Green Line: Technical challenges 1.Crowded field of choices

RNA-Seq Overview Green Line: Technical challenges 2.Bioinformatics skill HWUSI-EAS455:3:1:1:1096 length=41 CAAGGCCCGGGAACGAATTCACCGCCGTATGGCTGACCGG C HWUSI-EAS455:3:1:2:1592 length=41 GAGGCGTTGACGGGAAAAGGGATATTAGCTCAGCTGAATCT + @SRR HWUSI-EAS455:3:1:2:869 length=41 TGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCATGCA + Bioinformatician

RNA-Seq Overview Green Line: Technical challenges 3.Data and computation

DNA Subway Green Line: Transcriptome analysis Tuxedo Workflow Simple layout HPC powered Integrated with iPlant Data Store

DNA Subway Green Line: Tuxedo Workflow

DNA Subway Green Line: HPC through iPlant Agave API Base Cluster (Dell/Intel/Mellanox): Intel Sandy Bridge processors Dell dual-socket nodes w/32GB RAM (2GB/core) 6,400 nodes 56 Gb/s Mellanox FDR InfiniBand interconnect More than 100,000 cores, 2.2 PF peak performance Co-Processors: Intel Xeon Phi “MIC” Many Integrated Core processors Special release of “Knight’s Corner” (61 cores) All MIC cards are on site at TACC o more than 6000 installed 7+ PF peak performance Max Total Concurrency: exceeds 500,000 cores 1.8M threads Stampede

DNA Subway Green Line: iPlant Data Store Initial 100 GB allocation – TB allocations available Automatic data backup Easy upload /download and sharing

DNA Subway Public Maize RNA-Seq Dataset

DNA Subway Green Line: Differential expression

DNA Subway Green Line: Differential expression

DNA Subway Green Line: Differential expression

DNA Subway Green Line: Differential expression

DNA Subway Green Line: Differential expression

DNA Subway Green Line: Differential expression

DNA Subway Green Line: Differential expression

DNA Subway Green Line: Differential expression

DNA Subway Green Line: Differential expression

DNA Subway Green Line: Differential expression

DNA Subway “Power Desktop” Intuitive interface to support seamless genome “round trip” for eukaryote of choice Access high performance computing to analyze whole genome data (RNA- seq, initially) Scaffold data to sequenced genomes available in iPlant Data Store Directly upload RNA-seq reads as biological evidence for genome annotation using Red Line

The iPlant Collaborative is funded by a grant from the National Science Foundation Plant Cyberinfrastructure Program (#DBI ).