Gene Expression in Loblolly Pine Early Development Keithanne Mockaitis Carol Loopstra Indiana University Center for Genomics.

Slides:



Advertisements
Similar presentations
Mo17 shotgun project Goal: sequence Mo17 gene space with inexpensive new technologies Datasets in progress: Four-phases of 454-FLX sequencing to max of.
Advertisements

RNA-Seq as a Discovery Tool
Marius Nicolae Computer Science and Engineering Department
RNA-Seq based discovery and reconstruction of unannotated transcripts
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
12/04/2017 RNA seq (I) Edouard Severing.
Integrating Genome and Transcriptome Resources into TreeGenes Jill Wegrzyn David Neale Doreen Main Keithanne Mockaitis.
Transcriptome Sequencing with Reference
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Canadian Bioinformatics Workshops
A NEW BOVINE EMBRYO SPECIFIC FIBRONECTIN SPLICE VARIANT K. GOOSSENS 1, A. VAN SOOM 2, A. VAN ZEVEREN 1, L.J. PEELMAN 1 1 Department of Nutrition, Genetics.
Transcriptomics Jim Noonan GENE 760.
Cell expansion plays a major role in growth Root cells expand their volume 50 times by expanding lengthwise but not widthwise.
Gene Expression And Regulation Bioinformatics January 11, 2006 D. A. McClellan
Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are.
Available tomato microarray platforms Tom1: cDNA spotted array; 12K spots; 8K unigenes; Developed at Cornell University. Publicly available. Tom2: Long.
Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
University of Oklahoma Genome Center4/14/12.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Genome-Wide SNP Discovery from de novo Assemblies of Pepper (Capsicum annuum ) Transcriptomes Hamid Ashrafi 1, Jiqiang Yao 2, Kevin Stoffel 1, Sebastian.
Li and Dewey BMC Bioinformatics 2011, 12:323
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements.
Chapter 5 Genome Sequences and Gene Numbers. 5.1Introduction  Genome size vary from approximately 470 genes for Mycoplasma genitalium to 25,000 for human.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
Genome Alignment. Alignment Methods Needleman-Wunsch (global) and Smith- Waterman (local) use dynamic programming Guaranteed to find an optimal alignment.
MPL Identification of alternative spliced mRNA variants related to cancers by genome-wide ESTs alignment KIM DAE SOO Oncogene Apr.
RNA Sequencing I: De novo RNAseq
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
PineRefSeq David B. Neale University of California, Davis Pieter J. de Jong Children’s Hospital of Oakland Research Institute.
The EST database is a collection of short single-read transcript sequences from GenBank. These sequences provide a resource to evaluate gene expression,
Bombus terrestris, the buff-tailed bumble bee Native to Europe A managed pollinator Commercially available Reared in greenhouses Important pollinator in.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Genomics and Forensics
Overview of the Drosophila modENCODE hybrid assemblies Wilson Leung01/2014.
Runx1-VE+ Runx1+VE+CD41-Runx1+VE+CD41+Runx1+VE-CD41+ Supplementary Figure 1 Supplementary Figure 1: Validation of cell populations for gene expression.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
DNA LIBRARIES Dr. E. What Are DNA Libraries? A DNA library is a collection of DNA fragments that have been cloned into a plasmid and the plasmid is transformed.
RNA-Seq data analysis Xuhua Xia University of Ottawa
ESTs Ian Keller Laboratory Techniques in Molecular Bio.
No reference available
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
REVIEW OF MOLECULAR GENETICS DR. EDELBERG. Genes, DNA, & Chromosomes.
A New Interface to GeneKeyDB Methods for analyzing relationships among proteins based on shared motifs Chris Symons & Xinxia Peng.
Risheng Chen et al BMC Genomics
The Original Question:
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
RNA-Seq analysis in R (Bioconductor)
High-throughput Biological Data The data deluge
Gene expression estimation from RNA-Seq data
Transcriptome Assembly
Retinoblastoma and the Rb1 Gene
From: TopHat: discovering splice junctions with RNA-Seq
Introduction to Bioinformatics II
Phylogenetic Relationships and Expression Pattern of ZmbZIP22.
Volume 6, Issue 3, Pages (May 2013)
Volume 22, Issue 6, Pages (February 2018)
Splenic CD169+ macrophages express a unique gene profile.
Transcript length distribution resulting from different assemblies of the embryo samples across the three technologies (HiSeq, MiSeq, and PacBio). Transcript.
Mapping rates of different transcript sets to the P
Schematic representation of a transcriptomic evaluation approach.
Relative abundance and expression of the 10 most abundant MAGs in the bioreactor at day 96. Relative abundance and expression of the 10 most abundant MAGs.
Presentation transcript:

Gene Expression in Loblolly Pine Early Development Keithanne Mockaitis Carol Loopstra Indiana University Center for Genomics and Bioinformatics Texas A & M University

Progressive Transcript Profiling Build a useful transcriptome reference early in project:  generate long reads for ease of assembly, scaffolding of existing shorter data  integrate community data into assemblies Vegetative Organs vegetative buds candles stems needles roots Early Stress Signaling Responses cold heat elevated UV compression Reproductive Development megastrobili microstrobili Early Development seeds young seedlings

Sequencing of Early Development Collections, Stage 1 embryos dissected from germinating seeds seeds immediately after stratification megagametophytes dissected from germinating seeds Lib 1 Lib 2 cDNA libraries optimized for 454 sequencing, partially normalized GS – XLR Plus

Sequence reads length distribution of libraries seed/embryo pool megagametophyte pool

Data Assembled

Coverage of Assembled Transcripts > 1 kb average coverage length

Transcripts with no blastx hit to NCBI dbEST: 2,173 Transcripts with blastx hit to NCBI dbEST: 49,386 Hits not to Pinus genus: 6,322 Hits not to gymnosperm: 653 Hits to Pinus transcripts in dbEST: 43,064 Most transcripts from new assembly contribute substantial length to older data ~2000 selected Pinus transcripts length Estimated Gene Discovery

Estimated Maternal Expression Full Assembly Isogroups: 24,688 Megagametophyte Isogroups Mapped (>80% length, 98% id): 12,478 (51%) Homology Estimation Fully Assembly Transcripts (Isotigs): 51,513 Transcripts with significant blastx hit to TAIR10: 41,187 (80%) Unique: 12,233 Transcripts with significant blastx hit to Populus trichocarpa v2: 41,291 Unique: 12,768 Unique OrthoMCL groups represented: 7,075 Paralog Groups: 5,362

Most Highly Represented Gene Families Ortholgous Groupprotein family/superfamilymembers , , , , , histone , PPR or TPR containing heat shock LRR transmembrane protein kinase ABC transporter transducin family, WD40 repeat containing plasma membrane intrinsic18 OrthoMCL: Li et al., 2003 Genome Res. 13, 2178

Many expected transcripts are well covered Vuosku et al., 2009 J Exp Bot 60, 1375 RAD5198.5% KU8099.4% DNA ligase IV67.3% TatD DNAse63.9% MCA100%

Progressive Transcript Profiling Early Development, Stage 2 seeds  embryos from seedlings  young tissues, stages from Build a useful transcriptome reference early in project:  generate long reads for ease of assembly, scaffolding of existing shorter data  integrate community data into assemblies  generate deeper stage-specific sequencing of samples within original pools, additional collections  attribute source specificities through comparative mapping  refine assemblies of alternatively spliced transcripts

Progressive Transcript Profiling Reproductive Development megastrobili: 4 stages microstrobili: 4 stages

Thanks IU CGB James Ford Zach Smith Aaron Buechlein Texas A & M Jeff Puryear