Presentation is loading. Please wait.

Presentation is loading. Please wait.

Iterative resolution of multi-reads in multiple genomes

Similar presentations


Presentation on theme: "Iterative resolution of multi-reads in multiple genomes"— Presentation transcript:

1 Iterative resolution of multi-reads in multiple genomes
Main source for Background Material, slide backgrounds: Eran Halperin's Accurate Estimation of Expression Levels of Homologous Genes In RNA-Seq Experiments

2 2 SeqEm – Eran Halperin 1/12/11

3 Microarrays: Known Issues
3 Background hybridization Genes with low expression levels Different hybridization properties Relative expression levels Limited set of probes SeqEm – Eran Halperin 1/12/11

4 RNA-Seq Procedure 4 Isolate Total RNA (e.g. by poly(A) binding), Sequence short reads (25-40bp) Map to reference genome (Eland, MAQ, BWA, Bowtie, etc.) QC, Splice Variants, etc. Estimate concentration of mRNA in sample Statistics/Analysis SeqEm – Eran Halperin 1/12/11

5 5 SeqEm – Eran Halperin 1/12/11

6 Homologous Genes 6 SeqEm – Eran Halperin 1/12/11

7 7 SeqEm – Eran Halperin 1/12/11
7 SeqEm – Eran Halperin 1/12/11

8 MULTIREADS - Current Standard
8 Discard Uniformly distribute Map according to unique read distribution (Erange) SeqEm – Eran Halperin 1/12/11

9 Generative Model + Algorithm
9 Notation: G = (G1, G2, , Gn) P = (P1, P2, , Pn); ΣPi = 1 R = (r1, r2, , rm) Model for RNA-Seq: Choose Gi from distribution P Generate short reads: copy (with errors) a random substring of G SeqEm – Eran Halperin 1/12/11

10 SeqEm 10 R G P1 P2 P3 SeqEm – Eran Halperin 1/12/11

11 SeqEm: Problem 1 11 SeqEm – Eran Halperin 1/12/11

12 SeqEm: Likelihood 12 SeqEm – Eran Halperin 1/12/11

13 13 SeqEm – Eran Halperin 1/12/11
Problem shown to be concave – EM converges to global maximum 13 SeqEm – Eran Halperin 1/12/11

14 1/12/11 14 SeqEm – Eran Halperin

15 MGMR motivation Cartoon:

16 MGMR intution -Assume same gene structures -Most expression levels expected to be similar...

17 New Generative Model Notation: Model for RNA-Seq:
G = (G_1, G_2, , G_M) genes S = (S_1, S_2, , S_N) samples (i.e., genomes) P = (P_11, P_12, , P_MN); for each sample, Σ(genes)P = 1 For i-th sample, R = (r_1, r_2, , r_Ri) Model for RNA-Seq: Sample vector of Ps from Dirichlet distribution Ps define probability of sampling each gene Generate short reads: copy (with errors) a random substring of G

18

19 Why Dirichlet? Distribution's parameters (alphas) define distributions of multinomials (e.g., P_iks you draw) Conjugate prior of multinomial distribution – i.e., Mult(x|Θ)Dir(Θ|α)~Dir(x+α)

20 Dirichlet distribution
Spend time here because gives intuition and next comes math -point out that each point is prob mass function – sums to one and all pos -colors rep values of pdf -explain gamma is factorial of n-1, explain where it should be high/low in each case and why - right side is points drawn from this distribution in each case

21

22

23 Estimating alpha given P

24 Project status Current status: math done (I hope!), coding... Plans: Simulation - small in silico genomes having known percent of homologs, differential expression Compare results of method to discarding reads, uniform assignment, weighted assignment Test on real data Sanity check: multiple lanes of same subject Population studies – e.g genomes project Issue: do more mixed pools lead to less accuracy? Deal with SNPs, transcripts instead of genes Your suggestions...


Download ppt "Iterative resolution of multi-reads in multiple genomes"

Similar presentations


Ads by Google