Presentation is loading. Please wait.

Presentation is loading. Please wait.

Effect of Single Nucleotide Polymorphism in Affymetrix probes Olivia Sanchez-Graillet Departments of Biological Sciences and Mathematical Sciences University.

Similar presentations


Presentation on theme: "Effect of Single Nucleotide Polymorphism in Affymetrix probes Olivia Sanchez-Graillet Departments of Biological Sciences and Mathematical Sciences University."— Presentation transcript:

1 Effect of Single Nucleotide Polymorphism in Affymetrix probes Olivia Sanchez-Graillet Departments of Biological Sciences and Mathematical Sciences University of Essex (UK)‏ December 2008

2 SNPs: a single base pair is different between one individual and the other. Polymorphism: if at least two variants have frequencies > 1% in a population. Single Nucleotide Polymorphisms (SNPs )‏

3 SNPs are the most common type of sequence variation between individuals. SNPs are markers of phenotypes and diseases. SNPs may alter the gene expression and may change or not the amino acid sequence.

4 Other common variations: DIP: deletion/insertion polymorphism :-/T, C/- STR: short tandem repeat (microsatellite) polymorphism  (CA)19/20/21/22/23/24/25/26 MIXED: cluster containing submissions from 2 or more alleleic classes  -/AAA/AAAAA/AAAACCAAAAAAAAAAAAAAA MNP: multiple nucleotide polymorphism with alleles of common length > 1  AAA/CCC

5 We are studying the relationships between probes intensities on Affymetrix GeneChips. Affymetrix Gene chips contain thousands of probes

6 Probes map to different exons. Because of alternative splicing, some of the exons may be upregulated whereas others may be downregulated. We therefore focus on probes within exons.

7 Probes mapping to the same exon should behave similarly. What causes Affymetrix probes to behave as outliers with respect to other probes within a single exon? Objective:  Study the impact of SNPs and other common variation upon Affymetrix probes on GeneChips.  Explore whether the existence of a SNP causes a probe to behave differently to other probes which map uniquely to a single exon.

8 Previous research on how SNPs might affect gene expression‏:  Allele A is over-expressed compared to allele B or vs or both alleles are equally expressed (Kumari et al.,2007).  Hybridization resulted from variation might mislead the interpretation of data from individual genes, even if a single probe is affected (Alberts et al., 2007).  In 15 of 25 probesets, SNPs caused a difference in hybridization. Not every SNP causes a difference in hybridization (Alberts et al., 2007).  When the SNPs located at the very beginning or end of a probe, it might have little or not effect on hybridization (Hughes et al., 2001).

9 Method: A)Generation of exon heatmaps B)Identification of probes containing SNPs. C)Study of SNP-probes which are outliers.

10 1. CEL files are downloaded from the GEO database. 2. Calibration of microarray data: Quality control: detection of spatial flaws. Row Quantile Normalisation. 3. Correlate the intensities for groups of probes, using many thousands of GeneChip experiments. (A) Generation of exon heatmaps

11 Example flaw in CEL file W. B. Langdon et al. (2008). A Survey of Spatial Defects in Homo Sapiens Affymetrix GeneChips. In IEEE/ACM Transactions on Computational Biology and Bioinformatics.

12 Probe correlations The correlation in log intensities between Probe 9 and Probe 11 from probeset 208772_at, obtained from 5,638 HG-U133A GeneChips.

13 The number in each square is the correlation multiplied by 10 and rounded Blue = low correlation Yellow = high correlation Average intensity in GEO Relative probe position on exon Standard deviation in GEO Probe number on heatmap

14 4.Unique mappings (alignments) of probes to individual exons ( Sanchez-Graillet et al.,2008. Widespread existence of uncorrelated probe intensities from within the same probeset on Affymetrix GeneChips. In Journal of Integrative Bioinformatics, 5(2):98) : avoid cross-hybridization and multiple targeting. sense direction (antisense is avoided). X  (25 bases, 96% identity)‏ probe 2 exon 3 transcript 3 (25 bases, 100% identity)‏ probe 2 exon 2 transcript 2 (25 bases, 100% identity)‏ probe 1 exon 1 transcript 1

15 (B,C) Identification of probes containing SNPs and outlier SNP probes

16 1. SNPs data downloaded from Ensembl 48 :  3' UnTranslated Region, 5' UTR, and coding regions. Chromosome 10 Gene 1 Gene 2 transcript 1 transcript 2 transcript 3 3'UTR 3'downstream 3' 5' 5'upstream ENSG00000172586 ENSG00000212959 gene_id ENST00000372837 ENST00000372833 ENST00000391642 trans_id 3downstream 3utr 5upstream G/A 75213225 10 rs11000776 biotypeallelechrom_positionchrom_namesnp_id

17 2. Identification of exons with SNPs by using transcript information and chromosomic positions. 3. Selection of unique exons and probes:  Only unique exons with more than 4 probes.  SNP positions on the probes uniquely mapping to exons are obtained.

18 4. Identification of SNP-probes which are outliers:  The overall correlation matrix median (OMM) is compared with each SNP-probe median (SPM).  If OMM – SPM >= 0.15

19 0.66>0.150.03<0.15Difference SPM_9 0.21 SPM_8 0.84 OMM 0.87 SNP in an outlier probe SNP in an no-outlier probe

20 Results

21 ENSE00001454795 HG_U133_Plus_2 ONNNNONNNN

22 W.Langdo nW.Langdo n Wed May 14 10:54:31 BST 2008 ENSE00001191156 HG_U95A SNP in overlapped probes. The same SNP is in outlier probes and no-outliers probes 10 1045_s_at-109-625 rs45612038 14 T/C CTTCAAGAGCATCATGAAGAAGAGT O 9 1045_s_at-237-557 rs45612038 16 T/C ACCTTCAAGAGCATCATGAAGAAGA O 8 1045_s_at-357-497 rs45612038 18 T/C AGACCTTCAAGAGCATCATGAAGAA N 7 1045_s_at-586-137 rs45612038 20 T/C TGAGACCTTCAAGAGCATCATGAAG N 6 1045_s_at-233-503 rs45612038 23 T/C ATATGAGACCTTCAAGAGCATCATG N 5 1045_s_at-153-611 rs45612038 25 T/C ACATATGAGACCTTCAAGAGCATCA N Probe position heatmap probe_id snp_id snp position allele sequence Outlier

23 ENSE0000129003 HG_U133A SNPs in only no-outlier probes rs11038 221667_s_at-512-441 10 13 A/G GTTTATGATCTGACCTAGGTCCCCC N rs6413487 221667_s_at-570-641 9 7 C/G TAAGGACGCTGGGAGCCTGTCAGTT N snp_id probe_id probe_position_heatmap snp_position_probe allele seq

24 ENSE00001416163 HG_U133A (5,374 CEL files) SNP in only outlier probes rs13505 219768_at-2-233 8 24 C/A CTGAATTTAGATCTCCAGACCCTGC O rs13505 219768_at-602-267 9 4 C/A CCTGCCTGGCCACAATTCAAATTAA O snp_id probe_id probe_position_heatmap snp_position_probe allele sequence

25 ENSE00001416163 HG_U133_Plus_2 (2,572 CEL files) SNP in both outlier and no-outlier probes rs13505 219768_at-765-395 8 24 C/A CTGAATTTAGATCTCCAGACCCTGC N rs13505 219768_at-507-443 9 4 C/A CCTGCCTGGCCACAATTCAAATTAA O snp_id probe_id probe_position_heatmap snp_position_probe allele sequence

26 ENSE00001416163 HG_U133A_2 (159 CEL files) SNP in only NO-outlier probes rs13505 219768_at-432-225 8 24 C/A CTGAATTTAGATCTCCAGACCCTGC N rs13505 219768_at-534-259 9 4 C/A CCTGCCTGGCCACAATTCAAATTAA N snp_id probe_id probe_position_heatmap snp_position_probe allele sequence

27 ~60,000 SNPs distributed in unique exons of ten array designs.  11% in unique exons in which all probes that contain the same SNP are outliers.  5% in which not all the probes containing the same SNP are outliers.  84% in which all probes are not outliers. These numbers may vary according to the Ensembl version used and the threshold for outliers chosen.

28 The most frequent variation found was SNP, followed by indels and mixed variation. The most common allele was C/T. If more than one SNP-probe maps to the same exon, the probes may have partial or total overlapped sequences.

29 Cross-validation for HG_U133_Plus_2 Examination of SNP-Outlier Associations

30 Median differences and positions of SNPs on probes in HG_U133_Plus_2

31 Median differences and main alleles (A,C,T,G) found in SNPs in HG_U133_Plus_2

32 We have identified other causes of outlier probes:  Probes containing a contiguous run of 4 or more guanines: formation of G-quadruplexes occurring on the surface of a GeneChip. ( Upton et al., BMC Genomics (in press) ).  Probes located next to bright probes, such as at the edge of the Genechip, are affected by blur.  Motifs or any other “problematic” subsequences.

33 Outlier SNP-probes in HG_U133_Plus_2 with “problematic” sub sequences (PS):  G’s (>=4), CCTCC, CCACC, GGTGG Gs, CCTCC CCACC, GGTGG Outlier probesNo-outlier probes

34 Conclusions

35 We have not found a common behaviour when SNPs are present in a probe. SNPs do not seem to cause outliers in groups of probes representing individual exons. SNPs may influence other biological events like alternative poly(A). The genomic region where SNPs are found, the position of the SNP in a probe, the main allele, and the number of SNPs in a probe does not make a probe an outlier in the correlation heatmap.

36 Bioinformatics Group Dr Andrew HarrisonPhysics Dr Berthold LausenStatistics Dr Abdel SalhiMathematics Professor Graham UptonStatistics Dr William LangdonPhysics and Computer Sc. Dr Olivia SanchezComputer Sc. Dr Maria StalteriInorganic Chemistry & Bioinformatics Jose Arteaga-Salas Statistics Rohmatul Fajriyah Statistics Abdelhak Kheniche Pharmacology & Mathematics Rahim Bux Khokhar Mathematics Zain-Ul-Abdin Khurho Mathematics Farhat Memon Computer Sc. Joanna Rowsell Mathematics

37 Thank you!

38 ENSE00001187103 HG_U95Av2 Probes with several SNPs rs3813031 894_g_at-144-579 7 2 C/T TGCGGCGGCTGTAGTGGGCTCTCTT rs17842533 894_g_at-144-579 7 19 C/T TGCGGCGGCTGTAGTGGGCTCTCTT rs1058994 894_g_at-144-579 7 21 G/A TGCGGCGGCTGTAGTGGGCTCTCTT rs14658 894_g_at-144-579 7 24 T/C TGCGGCGGCTGTAGTGGGCTCTCTT rs17842533 40619_at-143-579 8 17 C/T CGGCGGCTGTAGTGGGCTCTCTTCC rs1058994 40619_at-143-579 8 19 G/A CGGCGGCTGTAGTGGGCTCTCTTCC rs14658 40619_at-143-579 8 22 T/C CGGCGGCTGTAGTGGGCTCTCTTCC rs17842533 894_g_at-453-277 9 8 C/T TAGTGGG C TCTCTTCCTCCTTCCAC rs1058994 894_g_at-453-277 9 10 G/A TAGTGGGCTCTCTTCCTCCTTCCAC rs14658 894_g_at-453-277 9 13 T/C TAGTGGGCTCTCTTCCTCCTTCCAC rs11538407 894_g_at-453-277 9 20 T/C TAGTGGGCTCTCTTCCTCCTTCCAC rs1064555 894_g_at-453-277 9 24 T/G TAGTGGGCTCTCTTCCTCCTTCCAC snp_id probe_id probe_position_heatmap snp_position_probe allele sequence

39 Adjacent probes within a cell on a GeneChip have the same sequence – a run of Guanines will result in closely packed DNA with just the right properties to form quadruplexes.


Download ppt "Effect of Single Nucleotide Polymorphism in Affymetrix probes Olivia Sanchez-Graillet Departments of Biological Sciences and Mathematical Sciences University."

Similar presentations


Ads by Google