EMBL- EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UK T +44 (0) F +44 (0) Gene Co-expression on Microarrays: Fiction or Fact? Tineke Casneuf 1,2, Yves Van de Peer 1 and Wolfgang Huber 2 Introduction Microarray co-expression signatures of genes are an important tool for studying gene function and relations between genes. In addition to real biological co-expression, correlated signals can result from technical deficiencies like hybridisation of probes with off-target transcripts. We investigated the nature and scale of off-target transcript hybridisation in relation to signal correlation with data from Affymetrix genechips. Acknowledgements This work was supported by a grant from the Fund for Scientific Research, Flanders (3G031805) and by the European Commission through a Marie Curie Host Fellowship program (MEST-CT BIOSTAR). Tineke Casneuf PhD student European Bioinformatics Institute June 2007 VIB / Ghent University Bioinformatics & Evolutionary Genomics Technologiepark 927 B-9052 Gent BELGIUM (1)(2) T + 32 (0) F + 32 (0) Highly correlated pairs returned by our custom-made definition show longer common paths than those returned by Affymetrix’ definition. We propose that these correlation relationships result from real biological co-expression, as opposed to from cross- hybridisation. The latter is likely the case for Affymetrix probe sets as reporters with perfect sequence identity to off-target genes are sustained. Conclusions We here reveal a positive relation between off-target reporter alignment strength and expression correlation that is present even between gene pairs that do not share longer stretches of sequence similarity and where the reporter to off-target alignment is only based on short near-matches. Furthermore, this effect can be observed within probe sets. We show that omitting reporters liable to cross-hybridisation results in biologically more relevant expression relationships. The application of this finding is essential for enrichment for real true biological expression correlations and assures that reliable co-expression links are identified for downstream co-expression analyses. More stringent probe set definitions return biologically more relevant co-expression links Reporters with unequal off-target responses We also studied the behavior of different reporters within a probe set and found a positive relation between the alignment scores a i of reporter x i to Y's transcript sequence and the Pearson correlation coefficients of the reporters' signal patterns to the expression pattern of Y. We illustrate these finding with an example: The summarised expression values of a probe set _at, designed to target AT5G04790 and off-target gene AT1G ρXY = 0.7. The background corrected, normalised signal profiles of _at's reporters. The colour of the profile corresponds to its a i and is explained in the legend. For each of these reporters, Pearson correlation coefficient ρX i Y calculated between its signal profile to that of Y, is plotted against its off-target sensitivity score a i. These plots demonstrate that _at’s reporters show unequal responses to AT1G75180: four of them have perfect sequence identity and show an expression profile with an ρ>0.8 to this off-target. This is contrasted by reporters with lower alignment strength. The relation between off-target sensitivity and signal correlation is different for different reporters of a probe set. In addition to probe sets defined and annotated by the manufacturer Affymetrix, we evaluated the use of a more stringent custom-made definition, where probe sets were constructed solely from perfect matching reporters while excluding reporters most liable to cross-hybridisation (with a i = 23 to an off-target). Expression correlation and off-target sensitivity These boxplots depict the expression correlation ρ in function of off-target sensitivity Q 75 XY. The reveal a positive relation between the two variables: a gene whose expression is measured by reporters that align well to a different transcript tends to have an expression signal that is correlated with that of the other transcript. Figure A shows the data for all probe set pairs; for Figure B gene pairs with a BLAST hit in at least one direction with an E-value< were omitted. We compared gene pairs with considerable different correlation coefficients in the two probe set definitions: a first set that have a high ρ in Affymetrix’ definition and a low ρ in the custom-made (blue) and a second with high ρ in the custom-made definition and low ρ in Affymetrix’ (orange). This plot shows the cumulative of the lengths of the longest common path down the biological Process branch of the Gene Ontology tree of the annotation of the gene pairs of both sets. Illustration of our approach