Download presentation
Presentation is loading. Please wait.
Published byDomenic Caldwell Modified over 9 years ago
1
NextGen Pipeline: Enabling the Plant Science Community Tom Brutnell (lead), Steve Rounsley (co-lead), Matt Vaughn (Engagement Lead) Ed Buckler, Justin Borevitz, Todd Mockler, Pat Schnable, Bob Schmitz, Matt Hudson, Brad Barbazuk, Damian Gessler
2
Ultra high-throughput sequence analysis (UHTS) Several platforms including 454, ABI-Solid, Illumina/Solexa that are capable of generating 1 to 100’s of Gb of DNA sequence on a single run. Library preparations are relatively simple and kits available Data analysis is computationally challenging (need to process Tb of data) and beyond the reach of many experimental biologists. What is NextGen?
3
UHTS-RNA UHTS-DNA
5
Makes phenotyping not genotyping rate limiting Genome-wide association studies Allele-mining Enables a much deeper understanding of “non-model” species 1000 genomes project (transcriptome of 1000 plant species) Genome sequence now available for B. distachyon, S. italica genomes, RILs of maize and rice Provides detailed transcriptional resolution on global scale Map 5’, 3’ UTR, TSS, transcript isoforms, Examine smRNA populations Map methylation, TF binding sites, etc… How will UHTS change plant science?
6
Develop an a computational pipeline to process ultra-high throughput sequence datasets First iteration of NextGen 1.0 Pipeline will perform simple variant detection or transcript quantification starting from DNA and RNA-derived datasets. Designed explicitly to support modularity and extensibility Import fastq files and export data in SAM/BAM format. NextGen 1.0 Pipeline
7
Subsequent versions will have added functionalities that may include: Ability to process/compare multiple samples Support varient detection for non-reference genomes Support multiple methods of analysis (BWA,SOAP2/BOWTIE) Support additional workflows (smRNA annotation, ChIP seq, de novo assembly) Input from working groups is imperative What is the decision tree for subsequent iterations? What do modeling/stats/viz groups need as NextGen deliverables? How can NextGen exploit tools under development for G2P? NextGen 2.0 Pipeline
8
Flowering time and photosynthesis How can NextGen inform modeling efforts Abiotic Stress Should we develop a smRNA pipeline for 2.0 Input from working groups is imperative What is the decision tree for subsequent iterations? What do modeling/stats/viz groups need as NextGen deliverables? How can NextGen exploit tools under development for G2P? Meeting the needs of biological use cases
9
Integrating NextGen/Viz Pipeline
10
Workflow A pathway of operations Entities: – Operation – Data – Flow The flow through the operations is managed by the workflow software (e.g., VizTrails) Candidate software and package are named /ber=Bernice Rogowitz
11
Integrating NextGen/Viz/Modeling Pipelines List of 20 homogolous maize gene IDs Find expression values for these genes (e.g, Next Gen) For each, examine structure of transcripts and expression over time (e.g, EFP Maize Genome Browser) 5 genes of interest List of homologous Arabadopsis gene IDs Modeling and Statistical Inference Literature search Homolog Finder (e.g, CoGE) Candidate maize gene Co-Expression Analysis (e.g., ATTED2) Expression Network of 10 Arabidopsis Genes Homolog Finder (e.g, CoGE) Expression data for 20 maize genes /ber/tb Examine clusters that can handle maize data (e.g., eNorthern, MapMan) note: very limited data for maize so may need to go to rice) iterate
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.