27/06/2005ISMB 2005 GenXHC: A Probabilistic Generative Model for Cross- hybridization Compensation in High-density Genome-wide Microarray Data Joint work.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

BioInformatics (3).
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
Computer vision: models, learning and inference Chapter 18 Models for style and identity.
Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Information Bottleneck EM School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel Gal Elidan and Nir Friedman.
Introduction to Microarry Data Analysis - II BMI 730
A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence.
Probabilistic Clustering-Projection Model for Discrete Data
Consistent probabilistic outputs for protein function prediction William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann
04/02/2006RECOMB 2006 Detecting MicroRNA Targets by Linking Sequence, MicroRNA and Gene Expression Data Joint work with Quaid Morris (2) and Brendan Frey.
Author: Jim C. Huang etc. Lecturer: Dong Yue Director: Dr. Yufei Huang.
2. Introduction Multiple Multiplicative Factor Model For Collaborative Filtering Benjamin Marlin University of Toronto. Department of Computer Science.
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Getting the numbers comparable
Probe Level Analysis of AffymetrixTM Data
Microarray GEO – Microarray sets database
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Modeling User Rating Profiles For Collaborative Filtering
Clustering (Gene Expression Data) 6.095/ Computational Biology: Genomes, Networks, Evolution LectureOctober 4, 2005.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
1 Test of significance for small samples Javier Cabrera.
Microarray Data Analysis Using R Studies in Tissue Databases Mark Reimers, NCI.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
Combining the strengths of UMIST and The Victoria University of Manchester Propagating Measurement Uncertainty in Microarray Data Analysis Magnus Rattray.
Microarray Preprocessing
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
1 A Presentation of ‘Bayesian Models for Gene Expression With DNA Microarray Data’ by Ibrahim, Chen, and Gray Presentation By Lara DePadilla.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Detection and Compensation of Cross- Hybridization in DNA Microarray Data Joint work with Quaid Morris (1), Tim Hughes (2) and Brendan Frey (1) (1)Probabilistic.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
High-resolution computational models of genome binding events Yuan (Alan) Qi Joint work with Gifford and Young labs Dana-Farber Cancer Institute Jan 2007.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Lecture 2: Statistical learning primer for biologists
Flat clustering approaches
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
GO enrichment and GOrilla
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
The State of Microarrays The Scientist: 2003 By: Hien Dang.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Edge Preserving Spatially Varying Mixtures for Image Segmentation Giorgos Sfikas, Christophoros Nikou, Nikolaos Galatsanos (CVPR 2008) Presented by Lihan.
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
Statistical Applications in Biology and Genetics
Probabilistic Sparse Matrix Factorization
Probabilistic Sparse Matrix Factorization
Getting the numbers comparable
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Presentation transcript:

27/06/2005ISMB 2005 GenXHC: A Probabilistic Generative Model for Cross- hybridization Compensation in High-density Genome-wide Microarray Data Joint work with Quaid Morris (1),(2), Tim Hughes (2) and Brendan Frey (1),(2) (1)Probabilistic and Statistical Inference Group, University of Toronto (2) Banting & Best Department of Medical Research, University of Toronto Jim Huang (1)

27/06/2005ISMB 2005 Genome-wide profiling using high- density microarrays The move towards high-density arrays for genome- wide profiling presents challenges… Probes Conditions Expression … Coding regions Genome

27/06/2005ISMB 2005 Cross-hybridization in high-density microarrays As we move to higher-density arrays, cross- hybridization noise becomes significant and unavoidable TCGATCTATCGATCTA TCGATCTATCGATCTA Hybridization Oligonucleotide Probes mRNA transcript Cross- hybridization AGCTAGGATAGCTAGGAT G C T A GCTAGGCTAG C G T C C

27/06/2005ISMB 2005 Cross-hybridization in high-density microarrays (cont’d) Large cross- hybridization noise component in high-density data!

27/06/2005ISMB 2005 Cross-hybridization compensation State-of-the-art methods for cross-hybridization compensation designed for Affymetrix GeneChips Affymetrix MAS 5.0 Robust Multi-array Analysis (RMA/GC-RMA) (1),(2) (1)Wu, Z. and Irizarry, R.A. (2004) Stochastic models inspired by hybridization theory for short oligonucleotide arrays. Proc. Ninth International Conference on Research in Computational Molecular Biology (RECOMB), March 2004, pp (2) Irizarry, R.A. et al. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, pp

27/06/2005ISMB 2005 Z Λ X Bilinear model for cross-hybridization Each probe is assigned a set of cross-hybridizing transcript expression profiles Each transcript has a hybridization weight λ that determines its contribution

27/06/2005ISMB 2005 The probabilistic generative model for cross- hybridization Model the data probabilistically as X = ΛZ + V where X = [x 1 x 2 … x T ] is N x T, Z = [z 1 z 2 … z T ] is M x T, Λ is the N x M hybridization matrix, V is additive noise

27/06/2005ISMB 2005 Sparsity of the Λ matrix Force many of the weights λ ij to 0 Denote by S the set of weights which are non-zero: the prior becomes where

27/06/2005ISMB 2005 The probabilistic generative model for cross- hybridization (cont’d) The probabilistic model p(X,Z,Λ|S) for cross-hybridization is therefore

27/06/2005ISMB 2005 Variational inference To perform inference, minimize the KL-divergence with respect to a distribution q for the given probabilistic model p The optimum is the posterior distribution q(Z,Λ) = p(Z,Λ|X,S) Difficult to compute exactly! Use a surrogate which approximates the true posterior

27/06/2005ISMB 2005 Variational EM for approximate inference and parameter estimation Use exponential distributions parameterized by variational parameters for q Minimize KL-divergence via variational EM (2),(3) to get the estimate β jt of the transcript expression profiles: Variational E-step Variational M-step (2) Neal, R. M. and Hinton, G. E. (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants, Learning in Graphical Models, Kluwer Academic Publishers, pp (3) Jaakkola, T. and Jordan, M.I. (2000) Bayesian parameter estimation via variational methods. Statistics and Computing, 10:1, January 2000, pp

27/06/2005ISMB 2005 Variational Expectation-Maximization algorithm Variational E-step Variational M-step

27/06/2005ISMB 2005 Results Agilent exon-tiling microarray data with 26, mer probes across 12 tissue pools Matched each probe to full-length RefSeq cDNAs via BLAST search to determine the sparsity structure S Resulting data set contains 9,904 probes matched to 2,905 mouse transcripts

27/06/2005ISMB 2005 Results (cont’d)

27/06/2005ISMB 2005 Significance testing of inferred expression profiles Randomly permute the rows of the S matrix and perform inference Mean SNR significantly lower for permuted data compared to unpermuted data

27/06/2005ISMB 2005 Gene Ontology-Biological Process (GO-BP) enrichment using denoised data Perform agglomerative hierarchical clustering and compute a hypergeometric p-value for each cluster to evaluate statistical significance of the clustering Majority of clusters are have increased significance in denoised data compared to clustering using noisy data

27/06/2005ISMB 2005 Comparison to Robust Multi-array Analysis Unlike RMA, GenXHC models the explicit sparse structure of the set of probe-transcript interactions This increases statistical power when doing functional prediction

27/06/2005ISMB 2005 Summary Cross-hybridization compensation using prior knowledge about the transcript population doubles number of probes on array Problem of inferring latent transcript profiles is one of variational inference Functional annotation using denoised data yields functional categories which have higher statistical significance compared to noisy expression data Taking into account the set of probe-transcript binding interactions generally yields greater statistical power versus ignoring them