Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gene expression estimation from RNA-Seq data

Similar presentations


Presentation on theme: "Gene expression estimation from RNA-Seq data"— Presentation transcript:

1 Gene expression estimation from RNA-Seq data
刘学军

2 Outlines Background RPKM Poisson model N-URD model
Improved Poisson model

3 The Cycle of Forward Genetics
Sequencing Genotype Observation Thinking Phenotype Hypothesis Test Hypothesis By Genetic Manipulation Gene Deletion/Replacement Recombinant Technology

4 Central Dogma DNA transcription mRNA translation Protein

5 RNA-Seq protocal RNA is isolated from a sample.
RNA is converted to cDNA fragments High-throughput sequencing Reads are mapped to a reference genome (counts of reads – ‘digital’) Gene expression estimation

6 An example reference ACGTCCCC 12 ACGTC reads 8 CGTCC reads
9 GTCCC reads 5 TCCCC reads This gene can be summarized by a sequence of counts 12, 8, 9, 5.

7 Advantages of RNA-Seq Large dynamic range Low background noise
Requirement of less sample RNA Ability to detect novel transcripts

8 Challenges of RNA-Seq Sequencing non-uniformity
Read mapping uncertainty Paired-end sequencing data

9 Sequencing non-uniformity

10 Source of read mapping uncertainty
Paralogous gene family Low-complexity sequence Alternatively spliced isoforms of the same genes Uncertainty in read alignment gene multireads and isoform multireads

11 Alternatively spliced isoforms

12 Read mapping uncertainty

13 Paired-end sequencing

14 RPKM Reads per kilobase of the transcript per million mapped reads to the transcriptome --gene expression level --isoform expression level? Mortazavi et al. (2008) Nature Methods.

15 Jiang et al. (2009) Bioinformatics
Notations: fg,i: the ith isoform of gene g. lf: isoform length kf: the number of transcript copies in the isoform The total length of the transcripts is The probability of a read comes from some isoform f is Define as the expression index of isoform f.

16 Model assumption w: the total number of mapped reads
Given a region of length l in f, the number of reads coming from that region, which can be approximated by

17 Poisson model For a gene with m exons, with lengths
and n isoforms with expressions Observations Xs: number of reads mapped to an exon

18 Poisson model For every X, the Possion parameter is
where cij is 1 if isoform i contains exon j and 0 otherwise. Data likelihood,

19 Wu et al. (2011) Bioinformatics
URD model -> N-URD model Global bias curve (GBC) Local bias curve (LBC)

20 Global bias curve

21 Local bias curve

22 Usage of the bias curve

23 The N-URD models GN-URD: cij - > Gij LN-URD: cij -> Lij
MN-URD: cij -> a*Gij +(1-a)*Lij 1-M: no. of iteration for LBC calculation is 1 5-M: no. of iteration for LBC calculation is 5

24

25 Li et al. (2010) Genome Biology
Use variable rates for different positions. Poisson linear model,

26 Non-linear model Use empirical data to obtain the non-linear relationship between sequencing preference (ai) and the surrounding sequences. Gene expression level with length L,


Download ppt "Gene expression estimation from RNA-Seq data"

Similar presentations


Ads by Google