Rennie, C1 Noyes,HA2 Kemp, SJ2 Hulme, H1 Brass, A1,3 Hoyle, DC4

Rennie, C1 Noyes,HA2 Kemp, SJ2 Hulme, H1 Brass, A1,3 Hoyle, DC4 1Faculty of Life Sciences, University of Manchester, Smith Building, Oxford Road, Manchester, M13 9PT, UK 2Biosciences Building, School of Biological Sciences, University of Liverpool, Crown Street, Liverpool, L69 7ZB, UK 3School of Computer Science, Kilburn Building, University of Manchester, Oxford Road, Manchester, M13 9PL, UK 4North West Institute of Bio-Health Informatics, School of Medicine, Stopford Building, Oxford Road, Manchester, M13 9PT, UK Mismatches between probe and target sequences have a strong position-dependent effect on signal ratios from aCGH using 60mer oligonucleotide microarrays Abstract Sequence mismatches between probe and fluorescently-labelled target strands are known to affect the stability of the probe-target duplex formed, and hence the strength of the observed fluorescent signals in microarray experiments. However, the exact effects of sequence mismatches on microarray hybridisations are not well characterised. Array-based Comparative Genomic Hybridisation (aCGH) is a common technique for identifying DNA copy number variations. aCGH data are particularly suitable for analysing the effects of probe-target sequence mismatches because using genomic DNA avoids complications due to the large ranges of intracellular mRNA levels. A previous study provided data for hybridisations comparing three mouse strains to a C57BL/6 reference, using Agilent 60mer oligonucleotide arrays. Sequence mismatches between targets and probes were identified using the Perlegen 8 million mouse SNP dataset, and their effect on log2 signal ratio between test and reference strains was assessed. Observations indicated a strong effect of sequence mismatches on log2 signal ratio, dependent on the number of mismatches and on their position relative to the probe sequence. Ratios for probes with 1 mismatch or 2 mismatches were strongly correlated when probes were matched on the maximum length of perfectly matched sequence between probe and target. An existing model of nucleic acid melting was tested, but predictions did not correspond to these results. Progress has been made in developing a new computational model to reproduce these findings. Background Nucleic acid hybridisation is the formation of a double-helix from two single strands by complementary base-pairing. It is the basis for many key biological techniques. An obvious example is microarrays, which use hybridisation to probes attached at a surface. Sequence mismatches are often present, for example in cross-species hybridisation, or due to ordinary variation between strains, breeds or individuals. They are known to have a strong effect on results from short oligonucleotide probes and less effect on cDNA probes. Experimental dataset Log2 signal ratios (see equation 1) were obtained from hybridisations of gDNA from three mouse test strains against a C57BL/6 reference using Agilent 244K whole mouse genome and 56K custom CGH array platforms. The probe sequences and the Perlegen mouse SNP dataset were compared to identify SNP loci that would cause sequence mismatches between the probes and the test strain targets probes on the whole genome array and 3710 probes on the custom array overlapped 1 or more polymorphic loci (see table 1). Array Test strain 1 SNP (% of probes) 2 SNP (% of probes) 3 SNP (% of probes) 244K whole genome A/J 8032 (3.41) 803 (0.34) 36 (0.02) 244K whole genome BALB/cJ 7417 (3.15) 724 (0.31) 45 (0.02) 244K whole genome 129P3/J 8106 (3.44) 868 (0.37) 41 (0.02) 244K whole genome All strains (5.94) 1546 (0.66) 80 (0.03) 56K custom A/J 1343 (2.51) 120 (0.22) 5 (0.01) 56K custom BALB/cJ 1834 (3.43) 178 (0.33) 8 (0.01) 56K custom 129P3/J 2273 (4.25) 233 (0.44) 11 (0.02) 56K custom All strains 5199 (9.71) 536 (1.00) 23 (0.04) Table 1 Number of probes in each hybridisation overlapping 1, 2 or 3 SNP loci that would cause a mismatch in the probe-target duplex. There were also 2 probes that overlapped 4 SNP loci, but these were omitted from the analysis Equation 1 Log2 signal ratio. A higher ratio indicates lower intensity for the test strain (hence possible destabilisation of the duplex between the probe and the test strain target) Key observations from experimental data The mean log2 signal for each number of mismatches was plotted for each hybridisation (see figure 1). Larger numbers of mismatches were associated with higher mean log2 signal ratios, and there was a strong correlation between number of known mismatches and log2 signal ratio (r2 = 0.94), indicating that mismatches do have an effect on the results from long oligonucleotide probes. For the probe-target pairs with 1 mismatch, the mean log2 ratio was plotted for each possible mismatch position, measured from the nearest end of the probe (see figure 2). Mismatches further from the end of the probe were associated with higher mean log2 signal ratios, and there was a strong correlation between mismatch position and log2 signal ratio (r2 = 0.92). Moving mismatch position nearer to the centre of the probe reduces the length of continuous complementary duplex that can be formed. It is possible that this length of perfect match could be a factor. When the mean log2 signal ratios for probe-target pairs with 1 mismatch are compared with those for pairs with 2 mismatches and the same length of perfect match (see figure 3), there was a correlation between the results for pairs with 1 mismatch and pairs with 2 mismatches (Pearson’s correlation co-efficient 0.65, r2 = 0.43, indicating that length of perfect match accounts for approximately 43% of the variance in log2 signal ratio) Attempts to replicate experimental observations with DINAMelt simulations Simulations of hybridising a 60mer probe to perfect and mismatched targets were carried out using DINAMelt, an existing model of nucleic acid hybridisation. The difference in Gibb’s free energy (between a perfectly matched probe-target duplex and one with mismatches) predicted by DINAMelt is approximately equivalent to the log2 signal ratio in the experimental data. For each simulation, the actual probe sequences from the experimental data were used and perfect or mismatched targets were generated. DINAMelt appeared to replicate the effect of the number of mismatches (see figure 5). Results suggested that a larger number of mismatches would lead to lower thermodynamic stability of the duplex, as suggested by the experimental results. There was a correlation between number of mismatches and difference in Gibb’s free energy (r2 = 0.663). The results of analysing the effect of mismatch position on the DINAMelt simulations showed much less similarity to the experimental data (see figure 6). Rather than always observing reductions in stability as the mismatch is moved further from the end of the probe, a plateau is reached after around 6 bases. Figure 1 Mean log2 signal ratio for each number of mismatches Figure 5 Mean difference in Gibb’s free energy for each number of mismatches Figure 2 Mean log2 signal ratio for each mismatch position (measured from the end of the probe), only for probe-target pairs containing 1 mismatch Figure 6 Mean difference in Gibb’s free energy for each mismatch position (measured from the end of the probe), only for pairs with 1 mismatch Possible reason why the position effect was not reproduced by the simulations There are two main differences between the DINAMelt model and the hybridisation conditions that produced the experimental data used in this analysis. Firstly, DINAMelt models hybridisations in solution. There are several ways in which this affects the thermodynamics of hybridisation, but factors such as the presence of an array surface or probe density are unlikely to lead to the observed dependence on mismatch position and length of perfect match. Secondly, DINAMelt uses parameters derived from melting experiments. These largely used very short nucleic acids and were carried out at much lower temperatures than the 65oC used for the Agilent long oligonucleotide microarray hybridisations. At higher temperatures, entropy and the range of many possible partially-bound duplex configurations will make a much greater contribution to the total energy of the duplex. If a mismatch lies within an unbound section of the duplex, it will have no effect on the duplex stability. If the majority of duplex configurations are fully-bound or have internal loops (see figure 7a), the mismatch position doesn’t alter the likelihood that it will affect duplex stability. However, if the majority of configurations are partially-bound and melt from the ends (see figures 7b and 7c), mismatches nearer the ends will be less likely to affect duplex stability. This will lead to a greater effect from mismatches near the middle of the probe, as observed in the experimental data. These configurations are the basis for a new model of hybridisation that is being developed. This model is an extension of the Poland-Scheraga model that restricts the partition function to states where the duplex opens only from the two ends. In initial testing, the model successfully replicates the position-dependent effect of mismatches observed in the experimental results Figure 3 Mean log2 signal ratio for probe target pairs with 1 mismatch compared to pairs with 2 mismatches that contain the same length of perfect match Figure 7 Possible probe-target binding configurations. a. duplex with internal loops. Mismatch position does not alter likelihood of lying within loop and so affecting duplex stability. b. duplex with terminal unclosed loop or dangling end and mismatch near end c. as 7b but the mismatch is further from the end and so less likely to lie within an unbound section and more likely to affect duplex stability Figure 4 Mean log2 signal ratio for each substitution type Mismatch position explains more log2 signal ratio variation than polymorphism type Mean log2 signal ratio was plotted for each type of substitution (see figure 4). All substitutions were associated with increased log2 signal ratio, and the largest effect was seen for changes from pyrimidines to a G. A two-way ANOVA was performed to compare the scale of the polymorphism type effect and position effect. The majority of the variation in log2 signal ratio (94.4%) was explained by neither factor. However, both factors were significant and mismatch position explained over 5 times as much variation as polymorphism type. Conclusion Mismatches affect results from long oligonucleotide probes, the effect is dependent on the mismatch position and, for small numbers of mismatches, on the maximum length of perfect match between probe and target. These observations have implications for data analysis and probe design. They have informed the initial design of a new model of nucleic acid hybridisation that is currently being developed and that successfully replicates the qualitative aspects of these results. Acknowledgements Thanks to Tara Hill (Agilent) and Leanne Wardlesworth (University of Manchester Core Services Unit) for excellent technical assistance. This research was partly funded by the Wellcome Trust and by BBSRC

Rennie, C1 Noyes,HA2 Kemp, SJ2 Hulme, H1 Brass, A1,3 Hoyle, DC4

Similar presentations

Presentation on theme: "Rennie, C1 Noyes,HA2 Kemp, SJ2 Hulme, H1 Brass, A1,3 Hoyle, DC4"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Rennie, C1 Noyes,HA2 Kemp, SJ2 Hulme, H1 Brass, A1,3 Hoyle, DC4

Similar presentations

Presentation on theme: "Rennie, C1 Noyes,HA2 Kemp, SJ2 Hulme, H1 Brass, A1,3 Hoyle, DC4"— Presentation transcript:

Similar presentations

About project

Feedback