Download presentation
Presentation is loading. Please wait.
Published byJames West Modified over 9 years ago
1
27/06/2005ISMB 2005 GenXHC: A Probabilistic Generative Model for Cross- hybridization Compensation in High-density Genome-wide Microarray Data Joint work with Quaid Morris (1),(2), Tim Hughes (2) and Brendan Frey (1),(2) (1)Probabilistic and Statistical Inference Group, University of Toronto (2) Banting & Best Department of Medical Research, University of Toronto Jim Huang (1)
2
27/06/2005ISMB 2005 Genome-wide profiling using high- density microarrays The move towards high-density arrays for genome- wide profiling presents challenges… Probes Conditions Expression … Coding regions Genome
3
27/06/2005ISMB 2005 Cross-hybridization in high-density microarrays As we move to higher-density arrays, cross- hybridization noise becomes significant and unavoidable TCGATCTATCGATCTA TCGATCTATCGATCTA Hybridization Oligonucleotide Probes mRNA transcript Cross- hybridization AGCTAGGATAGCTAGGAT G C T A GCTAGGCTAG C G T C C
4
27/06/2005ISMB 2005 Cross-hybridization in high-density microarrays (cont’d) Large cross- hybridization noise component in high-density data!
5
27/06/2005ISMB 2005 Cross-hybridization compensation State-of-the-art methods for cross-hybridization compensation designed for Affymetrix GeneChips Affymetrix MAS 5.0 Robust Multi-array Analysis (RMA/GC-RMA) (1),(2) (1)Wu, Z. and Irizarry, R.A. (2004) Stochastic models inspired by hybridization theory for short oligonucleotide arrays. Proc. Ninth International Conference on Research in Computational Molecular Biology (RECOMB), March 2004, pp. 98-106. (2) Irizarry, R.A. et al. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, pp. 249 - 264.
6
27/06/2005ISMB 2005 Z Λ X Bilinear model for cross-hybridization Each probe is assigned a set of cross-hybridizing transcript expression profiles Each transcript has a hybridization weight λ that determines its contribution
7
27/06/2005ISMB 2005 The probabilistic generative model for cross- hybridization Model the data probabilistically as X = ΛZ + V where X = [x 1 x 2 … x T ] is N x T, Z = [z 1 z 2 … z T ] is M x T, Λ is the N x M hybridization matrix, V is additive noise
8
27/06/2005ISMB 2005 Sparsity of the Λ matrix Force many of the weights λ ij to 0 Denote by S the set of weights which are non-zero: the prior becomes where
9
27/06/2005ISMB 2005 The probabilistic generative model for cross- hybridization (cont’d) The probabilistic model p(X,Z,Λ|S) for cross-hybridization is therefore
10
27/06/2005ISMB 2005 Variational inference To perform inference, minimize the KL-divergence with respect to a distribution q for the given probabilistic model p The optimum is the posterior distribution q(Z,Λ) = p(Z,Λ|X,S) Difficult to compute exactly! Use a surrogate which approximates the true posterior
11
27/06/2005ISMB 2005 Variational EM for approximate inference and parameter estimation Use exponential distributions parameterized by variational parameters for q Minimize KL-divergence via variational EM (2),(3) to get the estimate β jt of the transcript expression profiles: Variational E-step Variational M-step (2) Neal, R. M. and Hinton, G. E. (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants, Learning in Graphical Models, Kluwer Academic Publishers, pp. 355-368. (3) Jaakkola, T. and Jordan, M.I. (2000) Bayesian parameter estimation via variational methods. Statistics and Computing, 10:1, January 2000, pp. 25-37.
12
27/06/2005ISMB 2005 Variational Expectation-Maximization algorithm Variational E-step Variational M-step
13
27/06/2005ISMB 2005 Results Agilent exon-tiling microarray data with 26,486 60-mer probes across 12 tissue pools Matched each probe to full-length RefSeq cDNAs via BLAST search to determine the sparsity structure S Resulting data set contains 9,904 probes matched to 2,905 mouse transcripts
14
27/06/2005ISMB 2005 Results (cont’d)
15
27/06/2005ISMB 2005 Significance testing of inferred expression profiles Randomly permute the rows of the S matrix and perform inference Mean SNR significantly lower for permuted data compared to unpermuted data
16
27/06/2005ISMB 2005 Gene Ontology-Biological Process (GO-BP) enrichment using denoised data Perform agglomerative hierarchical clustering and compute a hypergeometric p-value for each cluster to evaluate statistical significance of the clustering Majority of clusters are have increased significance in denoised data compared to clustering using noisy data
17
27/06/2005ISMB 2005 Comparison to Robust Multi-array Analysis Unlike RMA, GenXHC models the explicit sparse structure of the set of probe-transcript interactions This increases statistical power when doing functional prediction
18
27/06/2005ISMB 2005 Summary Cross-hybridization compensation using prior knowledge about the transcript population doubles number of probes on array Problem of inferring latent transcript profiles is one of variational inference Functional annotation using denoised data yields functional categories which have higher statistical significance compared to noisy expression data Taking into account the set of probe-transcript binding interactions generally yields greater statistical power versus ignoring them
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.