Wfleabase.org/docs/tilexseq0904.pdf What is all this genome expression? Observations and statistics for expression at the base level April 2009Don Gilbert.

Slides:



Advertisements
Similar presentations
RNA-Seq as a Discovery Tool
Advertisements

Breakdown of 244 total (Yale+Vega) Pseudogenes Amongst Various ENCODE Regions 211 Yale, 178 Vega, Union is 244 More pseudogenes in the manually picked.
20,000 GENES IN HUMAN GENOME; WHAT WOULD HAPPEN IF ALL THESE GENES WERE EXPRESSED IN EVERY CELL IN YOUR BODY? WHAT WOULD HAPPEN IF THEY WERE EXPRESSED.
Transcriptome Sequencing with Reference
Peter Tsai Bioinformatics Institute, University of Auckland
DEG Mi-kyoung Seo.
Topic 6: Introduction to Hypothesis Testing
Comparison of array detected transcription map with GENCODE/HAVANA annotations in ENCODE regions.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Gene expression analysis summary Where are we now?
RNA-Seq based discovery and reconstruction of unannotated transcripts in partially annotated genomes 3 Serghei Mangul*, Adrian Caciula*, Ion.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
1 Alternative Splicing. 2 Eukaryotic genes Splicing Mature mRNA.
CSE182-L12 Gene Finding.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
A combination of the words Proteomics and Genomics. Proteogenomics commonly refer to studies that use proteomic information, often derived from mass spectrometry,
Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics
Wfleabase.org/docs/tileMEseq0905.pdf Notes and statistics on base level expression May 2009Don Gilbert Biology Dept., Indiana University
Comments on Rare Variants Analyses Ryo Yamada Kyoto University 2012/08/27 Japan.
High-throughput bisulfite sequencing reveals relationships between gene expression and DNA methylation in the bivalve, Crassostrea gigas Mackenzie Gavery.
Gene Level Expression Profiling Using Affymetrix Exon Arrays Alan Williams, Ph.D. Director Chip Design Affymetrix, Inc.
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
Verna Vu & Timothy Abreo
The iPlant Collaborative
Analysis of the RNAseq Genome Annotation Assessment Project by Subhajyoti De.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
August 20, 2007 BDGP modENCODE Data Production. BDGP Data Production Project Goals 21,000 RACE experiments 6,000 cDNA’s from directed screening and full.
Wfleabase.org/docs/arthropod-gene-finding/ Unlocated Arthropod genes and ways to find them Many bug genes are hard to find - Daphnia’s many tandems were.
Biases in RNA-Seq data. Transcript length bias Two transcripts of length 50 and 100 have the same abundance in a control sample. The expression of both.
No reference available
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
Supplemental Figure 1. Bias-corrected NGS bioinformatics strategies. Paired-end DNA sequencing reveals the sequence of the genomic clone, the sample ID.
1 Probability and Statistics Confidence Intervals.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
CFE Higher Biology DNA and the Genome Transcription.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Unit 1: DNA and the Genome Structure and function of RNA.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Daphnia Genome Annotation & Analysis Notes July 2007 Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
This paper is about RNA can inhibit gene expression
MBD-Chip.
Detect alternative splicing
ENCODE Pseudogenes and Transcription
Functional Genomics in Evolutionary Research
by Nancy D. Borson, Martha Q. Lacy, and Peter J. Wettstein
Comparative Analysis of Single-Cell RNA Sequencing Methods
Sensitivity of RNA‐seq.
Schedule for the Afternoon
Gene Sizes Vary Strachan p146 DYSTROPHIN.
Detection of TMPRSS2-ERG Translocations in Human Prostate Cancer by Expression Profiling Using GeneChip Human Exon 1.0 ST Arrays  Sameer Jhavar, Alison.
Reliable Identification of Genomic Variants from RNA-Seq Data
Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.
The transcript profiles in the three human cell lines based on RNA sequencing (RNA‐seq). The transcript profiles in the three human cell lines based on.
Changes to the growth conditions break the circuit by changing host gene expression Changes to the growth conditions break the circuit by changing host.
Differential protein, mRNA, lncRNA and miRNA regulation by p53.
Sequence Analysis - RNA-Seq 2
CiCo relation to linear transcripts.
Differential methylation between AA and EA cases among breast cancer subtypes. Differential methylation between AA and EA cases among breast cancer subtypes.
Presentation transcript:

wfleabase.org/docs/tilexseq0904.pdf What is all this genome expression? Observations and statistics for expression at the base level April 2009Don Gilbert Biology Dept., Indiana University

wfleabase.org/docs/tilexseq0904.pdf 2007: Tile expression DrosMel tiled by Affymetrix, finds new genes (blue) and known (orange).

wfleabase.org/docs/tilexseq0904.pdf Gene or Tiled expression? Tiled genes find expression better than gene average Gene median sensitivity 38%, specificity= 37% Tiled gene sensitivity= 42%, specificity= 37% in reference to separate sex expression study

wfleabase.org/docs/tilexseq0904.pdf Tiles find new genes Daphnia tile expression with gene finding calls 26% coding bases over the genome, compared to 17% from gene predictions, or 5, ,000 new genes. Manak et al 2006, with Drosmel also found 24% CDS/genome, up from 18% CDS/genome from reference gene set. Computational tools need to mature; gene finding is preliminary.

wfleabase.org/docs/tilexseq0904.pdf Tiles find rarely used genes New tile-genes are expressed in a subset of conditions, most known genes are expressed in all conditions But some known genes have same low, spotty expression as Tile-genes

wfleabase.org/docs/tilexseq0904.pdf Tiles find rarely used genes Average expression is high for genes found in all treatments Accurate gene detection at low level expression remains a challenge

wfleabase.org/docs/tilexseq0904.pdf 2008: Precision improves Nimblegen has higher base precision than Affymetix.

wfleabase.org/docs/tilexseq0904.pdf 2009: Precision improves RNA-Seq has higher base precision yet And RNA-Seq here is strand-specific expression Stranded mRNA ncRNA

wfleabase.org/docs/tilexseq0904.pdf Intron-Exon Detection

wfleabase.org/docs/tilexseq0904.pdf Gene End Detection

wfleabase.org/docs/tilexseq0904.pdf Differential expression Gene end (3’) has more expression, but constant differential over gene span, on average

wfleabase.org/docs/tilexseq0904.pdf Differential expression Genes vary in DE pattern over gene span i.e., a few parts may not capture full gene expression

wfleabase.org/docs/tilexseq0904.pdf Differential expression Gene models miss much expression Known sex genes capture DE, but unknown regions capture environmental stress expression, in Daphnia.

wfleabase.org/docs/tilexseq0904.pdf Differential eXp distribution Introns show a null DE distribution, genes and TAR regions deviate from this. Use introns as baseline?

wfleabase.org/docs/tilexseq0904.pdf Stranded expression oddity Expression in WRONG direction? 800 (3%) DrosMel Coding exons are expressed in wrong direction?

wfleabase.org/docs/tilexseq0904.pdf Stranded expression oddity DrosMel transcript stranded expression (mRNA-Seq) picks up some odd cases. Expression in BOTH directions?

wfleabase.org/docs/tilexseq0904.pdf Summary 1.Base-level expression (tiles, rna-seq) measures gene expression better Balances sensitivity (false rejection) with specificity (false discovery) 2.Genome-wide base-level expression finds new genes As an alternative to gene-level studies, it has values and drawbacks. Computational methods need to improve to use this data well. 3.Base-level expression measures gene structures well On average, and precision is improving for individual genes. Gene oddities are found also

wfleabase.org/docs/tilexseq0904.pdf End note Summary pages wfleabase.org/genome-summaries/tile-expression/ insects.eugenes.org/species/data/dmel5/modencode/ Genome expression maps insects.eugenes.org:8091/gbrowse/cgi-bin/gbrowse/drosmelme/ expression in 52 cell lines (affy) and more precise solexa & nimblegen for a few cell lines insects.eugenes.org:8091/gbrowse/cgi-bin/gbrowse/daphnia_pulex8/ expression among 4 treatment groups (sex, metal stress, biotic predator); nimblegen

wfleabase.org/docs/tilexseq0904.pdf Gene Boundary Detection.