Download presentation
Presentation is loading. Please wait.
Published byEverett Allen Modified over 9 years ago
1
Detection and Compensation of Cross- Hybridization in DNA Microarray Data Joint work with Quaid Morris (1), Tim Hughes (2) and Brendan Frey (1) (1)Probabilistic and Statistical Inference Group, University of Toronto (2) Banting & Best Department of Medical Research, University of Toronto Jim Huang (1),
2
Description and Applications of DNA Microarrays Microarrays consist of a 2-D array of probes, each with a short DNA sequence attached. These sequences are called oligonucleotide sequences. The output of each probe is approximately proportional to the amount of DNA that binds to the probe from a given tissue; the data for each probe is an N-dimensional expression profile vector, where N is the number of tissues used on the array. DNA microarrays can be used to measure the level of gene expression across these N tissues.
3
Hybridization and cross- hybridization The process of 2 complementary DNA strands binding is called hybridization; Ideally, an oligonucleotide probe will only bind to the DNA sequence for which it was designed and to which it is complementary; However, many DNA sequences are similar to one another and can bind to other probes on the array; This phenomenon is called cross-hybridization; AGCTAGGATAGCTAGGAT TCGATCCTATCGATCCTA ATCTAGAATATCTAGAAT TCGATCCTATCGATCCTA Hybridization Cross-hybridization Oligonucleotide Probe DNA from tissue sample
4
The trouble with cross- hybridization With cross-hybridization, each probe will signal the presence of multiple sequences other than that it was designed for; This skews the observed data from the expected data. Expected expression profile vector (no hybridization) Observed expression profile vector (cross-hybridized) =+
5
Detecting cross-hybridization (1) To test for whether cross-hybridization is impacting the gene expression data, we perform a BLAST sequence match on all oligonucleotide probe sequences used on the microarray; Many probes will be matched with sequences for which it wasn’t specifically designed.
6
Detecting cross-hybridization (2) We compute the Pearson correlation coefficient ρ between matched probe sequence expression profiles and between the profiles of randomly-paired probes; Approximately 33% of the BLAST-matched probes have ρ > 0.95, whereas only 2% of randomly-matched probes have ρ >0.95; This difference in the 2 distributions indicates that cross-hybridization indeed has a significant impact on the observed gene expression data.
7
Compensating for cross- hybridization We model the observed, cross-hybridized expression profile vector x as a matrix product of a hybridization matrix Λ and an unobserved expression profile vector z in which there is no cross-hybridization. The elements λ ij of the Λ matrix are set as parameterized functions of the Gibbs free energy ΔG ij between probes i and j. To compensate for cross-hybridization, we use a generalized Expectation-Maximization algorithm in which we solve for z and Λ iteratively.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.