Download presentation
Presentation is loading. Please wait.
Published byHamdani Agus Widjaja Modified over 5 years ago
1
Iterative resolution of multi-reads in multiple genomes
Main source for Background Material, slide backgrounds: Eran Halperin's Accurate Estimation of Expression Levels of Homologous Genes In RNA-Seq Experiments
2
2 SeqEm – Eran Halperin 1/12/11
3
Microarrays: Known Issues
3 Background hybridization Genes with low expression levels Different hybridization properties Relative expression levels Limited set of probes SeqEm – Eran Halperin 1/12/11
4
RNA-Seq Procedure 4 Isolate Total RNA (e.g. by poly(A) binding), Sequence short reads (25-40bp) Map to reference genome (Eland, MAQ, BWA, Bowtie, etc.) QC, Splice Variants, etc. Estimate concentration of mRNA in sample Statistics/Analysis SeqEm – Eran Halperin 1/12/11
5
5 SeqEm – Eran Halperin 1/12/11
6
Homologous Genes 6 SeqEm – Eran Halperin 1/12/11
7
7 SeqEm – Eran Halperin 1/12/11
7 SeqEm – Eran Halperin 1/12/11
8
MULTIREADS - Current Standard
8 Discard Uniformly distribute Map according to unique read distribution (Erange) SeqEm – Eran Halperin 1/12/11
9
Generative Model + Algorithm
9 Notation: G = (G1, G2, , Gn) P = (P1, P2, , Pn); ΣPi = 1 R = (r1, r2, , rm) Model for RNA-Seq: Choose Gi from distribution P Generate short reads: copy (with errors) a random substring of G SeqEm – Eran Halperin 1/12/11
10
SeqEm 10 R G P1 P2 P3 SeqEm – Eran Halperin 1/12/11
11
SeqEm: Problem 1 11 SeqEm – Eran Halperin 1/12/11
12
SeqEm: Likelihood 12 SeqEm – Eran Halperin 1/12/11
13
13 SeqEm – Eran Halperin 1/12/11
Problem shown to be concave – EM converges to global maximum 13 SeqEm – Eran Halperin 1/12/11
14
1/12/11 14 SeqEm – Eran Halperin
15
MGMR motivation Cartoon:
16
MGMR intution -Assume same gene structures -Most expression levels expected to be similar...
17
New Generative Model Notation: Model for RNA-Seq:
G = (G_1, G_2, , G_M) genes S = (S_1, S_2, , S_N) samples (i.e., genomes) P = (P_11, P_12, , P_MN); for each sample, Σ(genes)P = 1 For i-th sample, R = (r_1, r_2, , r_Ri) Model for RNA-Seq: Sample vector of Ps from Dirichlet distribution Ps define probability of sampling each gene Generate short reads: copy (with errors) a random substring of G
19
Why Dirichlet? Distribution's parameters (alphas) define distributions of multinomials (e.g., P_iks you draw) Conjugate prior of multinomial distribution – i.e., Mult(x|Θ)Dir(Θ|α)~Dir(x+α)
20
Dirichlet distribution
Spend time here because gives intuition and next comes math -point out that each point is prob mass function – sums to one and all pos -colors rep values of pdf -explain gamma is factorial of n-1, explain where it should be high/low in each case and why - right side is points drawn from this distribution in each case
23
Estimating alpha given P
24
Project status Current status: math done (I hope!), coding... Plans: Simulation - small in silico genomes having known percent of homologs, differential expression Compare results of method to discarding reads, uniform assignment, weighted assignment Test on real data Sanity check: multiple lanes of same subject Population studies – e.g genomes project Issue: do more mixed pools lead to less accuracy? Deal with SNPs, transcripts instead of genes Your suggestions...
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.