Download presentation
Presentation is loading. Please wait.
Published byFarida Dharmawijaya Modified over 6 years ago
1
Gene expression estimation from RNA-Seq data
刘学军
2
Outlines Background RPKM Poisson model N-URD model
Improved Poisson model
3
The Cycle of Forward Genetics
Sequencing Genotype Observation Thinking Phenotype Hypothesis Test Hypothesis By Genetic Manipulation Gene Deletion/Replacement Recombinant Technology
4
Central Dogma DNA transcription mRNA translation Protein
5
RNA-Seq protocal RNA is isolated from a sample.
RNA is converted to cDNA fragments High-throughput sequencing Reads are mapped to a reference genome (counts of reads – ‘digital’) Gene expression estimation
6
An example reference ACGTCCCC 12 ACGTC reads 8 CGTCC reads
9 GTCCC reads 5 TCCCC reads This gene can be summarized by a sequence of counts 12, 8, 9, 5.
7
Advantages of RNA-Seq Large dynamic range Low background noise
Requirement of less sample RNA Ability to detect novel transcripts
8
Challenges of RNA-Seq Sequencing non-uniformity
Read mapping uncertainty Paired-end sequencing data
9
Sequencing non-uniformity
10
Source of read mapping uncertainty
Paralogous gene family Low-complexity sequence Alternatively spliced isoforms of the same genes Uncertainty in read alignment gene multireads and isoform multireads
11
Alternatively spliced isoforms
12
Read mapping uncertainty
13
Paired-end sequencing
14
RPKM Reads per kilobase of the transcript per million mapped reads to the transcriptome --gene expression level --isoform expression level? Mortazavi et al. (2008) Nature Methods.
15
Jiang et al. (2009) Bioinformatics
Notations: fg,i: the ith isoform of gene g. lf: isoform length kf: the number of transcript copies in the isoform The total length of the transcripts is The probability of a read comes from some isoform f is Define as the expression index of isoform f.
16
Model assumption w: the total number of mapped reads
Given a region of length l in f, the number of reads coming from that region, which can be approximated by
17
Poisson model For a gene with m exons, with lengths
and n isoforms with expressions Observations Xs: number of reads mapped to an exon
18
Poisson model For every X, the Possion parameter is
where cij is 1 if isoform i contains exon j and 0 otherwise. Data likelihood,
19
Wu et al. (2011) Bioinformatics
URD model -> N-URD model Global bias curve (GBC) Local bias curve (LBC)
20
Global bias curve
21
Local bias curve
22
Usage of the bias curve
23
The N-URD models GN-URD: cij - > Gij LN-URD: cij -> Lij
MN-URD: cij -> a*Gij +(1-a)*Lij 1-M: no. of iteration for LBC calculation is 1 5-M: no. of iteration for LBC calculation is 5
25
Li et al. (2010) Genome Biology
Use variable rates for different positions. Poisson linear model,
26
Non-linear model Use empirical data to obtain the non-linear relationship between sequencing preference (ai) and the surrounding sequences. Gene expression level with length L,
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.