Rennie, C1 Noyes,HA2 Kemp, SJ2 Hulme, H1 Brass, A1,3 Hoyle, DC4

Slides:



Advertisements
Similar presentations
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Advertisements

Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.
Oligonucleotide-templated nanoparticle assembly Fiona McKenzie.
A Basic Introduction to SFold Kevin MacDonald December 7, 2004 BI420 Final Presentation.
SNP Genotyping Without Probes by High Resolution Melting of Small Amplicons Robert Pryor 1, Michael Liew 2 Robert Palais 3, and Carl Wittwer 1, 2 1 Dept.
Basics of hybridization. What is hybridization? n Complementary base pairing of two single strands of nucleic acid  double strand product u DNA/DNA u.
Information Aspects of Nucleic Acids Measurement Technologies Description of nucleic acid measurement technologies Algorithmic, optimization, data analysis.
Materials and Methods Abstract Conclusions Introduction 1. Korber B, et al. Br Med Bull 2001; 58: Rambaut A, et al. Nat. Rev. Genet. 2004; 5:
©2003/04 Alessandro Bogliolo Primer design. ©2003/04 Alessandro Bogliolo Outline 1.Polymerase Chain Reaction 2.Primer design.
PCR Primer Design Guidelines
Affymetrix vs. glass slide based arrays
1. Abstract SAGE Serial analysis of gene expression (SAGE) is a method of large-scale gene expression analysis.that involves sequencing small segments.
Affymetrix GeneChips and Analysis Methods Neil Lawrence.
Strand Design for Biomolecular Computation
Detection and Compensation of Cross- Hybridization in DNA Microarray Data Joint work with Quaid Morris (1), Tim Hughes (2) and Brendan Frey (1) (1)Probabilistic.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
The Fidelity of the Tag-Antitag System J. A. Rose, R. J. Deaton, M. Hagiya, And A. Suyama DNA7 poster Summarized by Shin, Soo-Yong.
Real-Time Quantitative PCR Basis
Scenario 6 Distinguishing different types of leukemia to target treatment.
 DNA (gene mutations, paternity, organs compatibility for transplantations)  RNA  Proteins (gene expression)
EMBL- EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UK T +44 (0) F +44 (0) Gene Co-expression.
Identification of Copy Number Variants using Genome Graphs
Northern blotting & mRNA detection by qPCR - part 2.
Chapter 10: Genetic Engineering- A Revolution in Molecular Biology.
From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).
Human Genomics Higher Human Biology. Learning Intentions Explain what is meant by human genomics State that bioinformatics can be used to identify DNA.
DNA sequences alignment measurement Lecture 13. Introduction Measurement of “strength” alignment Nucleic acid and amino acid substitutions Measurement.
Basics of hybridization. What is hybridization? n Complementary base pairing of two single strands of nucleic acid  double strand product u DNA/DNA u.
Reflector Design for Orthogonal Frequency (OFC) Coded Devices D.C. Malocha, D. Puccio, and N. Lobo School of Electrical Engineering & Computer Science.
1 Summarized by Ji Youn Lee. 2 Model Development.
Lee Roberts and Charlotte Stead
Rennie C1 Hulme H2 Fisher P2 Hall L3 Agaba M4 Noyes HA1 Kemp SJ1,4
Identifying candidate genes for the regulation of the response to Trypanosoma congolense infection Introduction African cattle breeds differ significantly.
Identification of gene networks associated with lipid response to infection with Trypanosoma congolense Brass A3; Broadhead, A2; Gibson, JP1; Iraqi, FA1,
EQTLs.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Statistical Data Analysis - Lecture /04/03
Noyes HA1 Agaba M2 Gibson J3 Ogugo M2 Iraqi F2 Brass A4 Anderson S5
Figure 1 Template-map sets used to generate a set of 108 8mers that contain 50% G/C content and are 4bm complements and reversals. 8mers are generated.
SNPs in forensic genetics: a review on SNP typing methodologies
Microarray - Leukemia vs. normal GeneChip System.
Parts of an Academic Paper
Congenic mice reveal effect of SNP, genomic rearrangements and expression variation on genome wide gene expression Introduction There is still no well-defined.
PCR TECHNIQUE
Department of Computer Science
Rennie C1 Hulme H2 Fisher P2 Hall L3 Agaba M4 Noyes HA1 Kemp SJ1,4
Lecture 4: Probe & primer design
Congenic mice reveal effect of SNP, genomic rearrangements and expression variation on genome wide gene expression Introduction There is still no well-defined.
Alternative Computational Analysis Shows No Evidence for Nucleosome Enrichment at Repetitive Sequences in Mammalian Spermatozoa  Hélène Royo, Michael Beda.
Position specific effect of SNP on signal ratio from long oligonucleotide CGH microarrays; most single probe aberrations represent genuine genomic variants.
A DNA computing readout operation based on structure-specific cleavage
Fuzzy logic with biomolecules
For this type of flow, the stagnation temperature is constant, then
DNA and the Genome Key Area 8a Genomic Sequencing.
DNA Diagnostics by Surface-Bound Melt-Curve Reactions
Eric Samorodnitsky, Jharna Datta, Benjamin M
Cumulus and granulosa cell markers of oocyte and embryo quality
DNA computing on surfaces
Identification of Bacteria BBT203 Ach
Rapid Detection of TEM-Type Extended-Spectrum β-Lactamase (ESBL) Mutations Using Lights-On/Lights-Off Probes with Single-Stranded DNA Amplification  Kenneth.
Automated Searching of Polynucleotide Sequences
Hans Binder, Stephan Preibisch  Biophysical Journal 
Volume 3, Issue 5, Pages e13 (November 2016)
Ye Bang-Ce, Chu Xiaohe, Fan Ye, Li Songyang, Yin Bincheng, Zuo Peng 
High-Throughput Identification and Quantification of Candida Species Using High Resolution Derivative Melt Analysis of Panfungal Amplicons  Tasneem Mandviwala,
Russell Deaton, junghuei Chen, hong Bi, and John A. Rose
A DNA Computing Readout Operation Structure-Specific Cleavage
Molecular Basis for Target RNA Recognition and Cleavage by Human RISC
Fiona T van den Berg, John J Rossi, Patrick Arbuthnot, Marc S Weinberg 
Real-Time PCR.
Presentation transcript:

Rennie, C1 Noyes,HA2 Kemp, SJ2 Hulme, H1 Brass, A1,3 Hoyle, DC4 1Faculty of Life Sciences, University of Manchester, Smith Building, Oxford Road, Manchester, M13 9PT, UK 2Biosciences Building, School of Biological Sciences, University of Liverpool, Crown Street, Liverpool, L69 7ZB, UK 3School of Computer Science, Kilburn Building, University of Manchester, Oxford Road, Manchester, M13 9PL, UK 4North West Institute of Bio-Health Informatics, School of Medicine, Stopford Building, Oxford Road, Manchester, M13 9PT, UK Mismatches between probe and target sequences have a strong position-dependent effect on signal ratios from aCGH using 60mer oligonucleotide microarrays Abstract Sequence mismatches between probe and fluorescently-labelled target strands are known to affect the stability of the probe-target duplex formed, and hence the strength of the observed fluorescent signals in microarray experiments. However, the exact effects of sequence mismatches on microarray hybridisations are not well characterised. Array-based Comparative Genomic Hybridisation (aCGH) is a common technique for identifying DNA copy number variations. aCGH data are particularly suitable for analysing the effects of probe-target sequence mismatches because using genomic DNA avoids complications due to the large ranges of intracellular mRNA levels. A previous study provided data for hybridisations comparing three mouse strains to a C57BL/6 reference, using Agilent 60mer oligonucleotide arrays. Sequence mismatches between targets and probes were identified using the Perlegen 8 million mouse SNP dataset, and their effect on log2 signal ratio between test and reference strains was assessed. Observations indicated a strong effect of sequence mismatches on log2 signal ratio, dependent on the number of mismatches and on their position relative to the probe sequence. Ratios for probes with 1 mismatch or 2 mismatches were strongly correlated when probes were matched on the maximum length of perfectly matched sequence between probe and target. An existing model of nucleic acid melting was tested, but predictions did not correspond to these results. Progress has been made in developing a new computational model to reproduce these findings. Background Nucleic acid hybridisation is the formation of a double-helix from two single strands by complementary base-pairing. It is the basis for many key biological techniques. An obvious example is microarrays, which use hybridisation to probes attached at a surface. Sequence mismatches are often present, for example in cross-species hybridisation, or due to ordinary variation between strains, breeds or individuals. They are known to have a strong effect on results from short oligonucleotide probes and less effect on cDNA probes. Experimental dataset Log2 signal ratios (see equation 1) were obtained from hybridisations of gDNA from three mouse test strains against a C57BL/6 reference using Agilent 244K whole mouse genome and 56K custom CGH array platforms. The probe sequences and the Perlegen mouse SNP dataset were compared to identify SNP loci that would cause sequence mismatches between the probes and the test strain targets. 15206 probes on the whole genome array and 3710 probes on the custom array overlapped 1 or more polymorphic loci (see table 1). Array Test strain 1 SNP (% of probes) 2 SNP (% of probes) 3 SNP (% of probes) 244K whole genome A/J 8032 (3.41) 803 (0.34) 36 (0.02) 244K whole genome BALB/cJ 7417 (3.15) 724 (0.31) 45 (0.02) 244K whole genome 129P3/J 8106 (3.44) 868 (0.37) 41 (0.02) 244K whole genome All strains 13984 (5.94) 1546 (0.66) 80 (0.03) 56K custom A/J 1343 (2.51) 120 (0.22) 5 (0.01) 56K custom BALB/cJ 1834 (3.43) 178 (0.33) 8 (0.01) 56K custom 129P3/J 2273 (4.25) 233 (0.44) 11 (0.02) 56K custom All strains 5199 (9.71) 536 (1.00) 23 (0.04) Table 1 Number of probes in each hybridisation overlapping 1, 2 or 3 SNP loci that would cause a mismatch in the probe-target duplex. There were also 2 probes that overlapped 4 SNP loci, but these were omitted from the analysis Equation 1 Log2 signal ratio. A higher ratio indicates lower intensity for the test strain (hence possible destabilisation of the duplex between the probe and the test strain target) Key observations from experimental data The mean log2 signal for each number of mismatches was plotted for each hybridisation (see figure 1). Larger numbers of mismatches were associated with higher mean log2 signal ratios, and there was a strong correlation between number of known mismatches and log2 signal ratio (r2 = 0.94), indicating that mismatches do have an effect on the results from long oligonucleotide probes. For the probe-target pairs with 1 mismatch, the mean log2 ratio was plotted for each possible mismatch position, measured from the nearest end of the probe (see figure 2). Mismatches further from the end of the probe were associated with higher mean log2 signal ratios, and there was a strong correlation between mismatch position and log2 signal ratio (r2 = 0.92). Moving mismatch position nearer to the centre of the probe reduces the length of continuous complementary duplex that can be formed. It is possible that this length of perfect match could be a factor. When the mean log2 signal ratios for probe-target pairs with 1 mismatch are compared with those for pairs with 2 mismatches and the same length of perfect match (see figure 3), there was a correlation between the results for pairs with 1 mismatch and pairs with 2 mismatches (Pearson’s correlation co-efficient 0.65, r2 = 0.43, indicating that length of perfect match accounts for approximately 43% of the variance in log2 signal ratio) Attempts to replicate experimental observations with DINAMelt simulations Simulations of hybridising a 60mer probe to perfect and mismatched targets were carried out using DINAMelt, an existing model of nucleic acid hybridisation. The difference in Gibb’s free energy (between a perfectly matched probe-target duplex and one with mismatches) predicted by DINAMelt is approximately equivalent to the log2 signal ratio in the experimental data. For each simulation, the actual probe sequences from the experimental data were used and perfect or mismatched targets were generated. DINAMelt appeared to replicate the effect of the number of mismatches (see figure 5). Results suggested that a larger number of mismatches would lead to lower thermodynamic stability of the duplex, as suggested by the experimental results. There was a correlation between number of mismatches and difference in Gibb’s free energy (r2 = 0.663). The results of analysing the effect of mismatch position on the DINAMelt simulations showed much less similarity to the experimental data (see figure 6). Rather than always observing reductions in stability as the mismatch is moved further from the end of the probe, a plateau is reached after around 6 bases. Figure 1 Mean log2 signal ratio for each number of mismatches Figure 5 Mean difference in Gibb’s free energy for each number of mismatches Figure 2 Mean log2 signal ratio for each mismatch position (measured from the end of the probe), only for probe-target pairs containing 1 mismatch Figure 6 Mean difference in Gibb’s free energy for each mismatch position (measured from the end of the probe), only for pairs with 1 mismatch Possible reason why the position effect was not reproduced by the simulations There are two main differences between the DINAMelt model and the hybridisation conditions that produced the experimental data used in this analysis. Firstly, DINAMelt models hybridisations in solution. There are several ways in which this affects the thermodynamics of hybridisation, but factors such as the presence of an array surface or probe density are unlikely to lead to the observed dependence on mismatch position and length of perfect match. Secondly, DINAMelt uses parameters derived from melting experiments. These largely used very short nucleic acids and were carried out at much lower temperatures than the 65oC used for the Agilent long oligonucleotide microarray hybridisations. At higher temperatures, entropy and the range of many possible partially-bound duplex configurations will make a much greater contribution to the total energy of the duplex. If a mismatch lies within an unbound section of the duplex, it will have no effect on the duplex stability. If the majority of duplex configurations are fully-bound or have internal loops (see figure 7a), the mismatch position doesn’t alter the likelihood that it will affect duplex stability. However, if the majority of configurations are partially-bound and melt from the ends (see figures 7b and 7c), mismatches nearer the ends will be less likely to affect duplex stability. This will lead to a greater effect from mismatches near the middle of the probe, as observed in the experimental data. These configurations are the basis for a new model of hybridisation that is being developed. This model is an extension of the Poland-Scheraga model that restricts the partition function to states where the duplex opens only from the two ends. In initial testing, the model successfully replicates the position-dependent effect of mismatches observed in the experimental results Figure 3 Mean log2 signal ratio for probe target pairs with 1 mismatch compared to pairs with 2 mismatches that contain the same length of perfect match Figure 7 Possible probe-target binding configurations. a. duplex with internal loops. Mismatch position does not alter likelihood of lying within loop and so affecting duplex stability. b. duplex with terminal unclosed loop or dangling end and mismatch near end c. as 7b but the mismatch is further from the end and so less likely to lie within an unbound section and more likely to affect duplex stability Figure 4 Mean log2 signal ratio for each substitution type Mismatch position explains more log2 signal ratio variation than polymorphism type Mean log2 signal ratio was plotted for each type of substitution (see figure 4). All substitutions were associated with increased log2 signal ratio, and the largest effect was seen for changes from pyrimidines to a G. A two-way ANOVA was performed to compare the scale of the polymorphism type effect and position effect. The majority of the variation in log2 signal ratio (94.4%) was explained by neither factor. However, both factors were significant and mismatch position explained over 5 times as much variation as polymorphism type. Conclusion Mismatches affect results from long oligonucleotide probes, the effect is dependent on the mismatch position and, for small numbers of mismatches, on the maximum length of perfect match between probe and target. These observations have implications for data analysis and probe design. They have informed the initial design of a new model of nucleic acid hybridisation that is currently being developed and that successfully replicates the qualitative aspects of these results. Acknowledgements Thanks to Tara Hill (Agilent) and Leanne Wardlesworth (University of Manchester Core Services Unit) for excellent technical assistance. This research was partly funded by the Wellcome Trust and by BBSRC