Effect of Single Nucleotide Polymorphism in Affymetrix probes Olivia Sanchez-Graillet Departments of Biological Sciences and Mathematical Sciences University.

Slides:



Advertisements
Similar presentations
Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.
Advertisements

Lecture 2 Strachan and Read Chapter 13
CZ5225 Methods in Computational Biology Lecture 9: Pharmacogenetics and individual variation of drug response CZ5225 Methods in Computational Biology.
Association Studies, Haplotype Blocks and Tagging SNPs Prof. Sorin Istrail.
Introduction to genomes & genome browsers
Microarray Pitfalls Stem Cell Network Microarray Course, Unit 3 October 2006.
Microarray Normalization
1 of 25 Sequence Variation in Ensembl. 2 of 25 Outline SNPs SNPs in Ensembl Linkage disequilibrium SNPs in BioMart DAS sources.
Outline to SNP bioinformatics lecture
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew.
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Global dissection of cis and trans regulatory variations in Arabidopsis thaliana Xu Zhang Borevitz Lab.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Fine Structure and Analysis of Eukaryotic Genes
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Single Nucleotide Polymorphism
Detection of complex mutations in Swedish FAP familes Anna Rohlin, 1 Yvonne Engwall, 1 Josephine Wernersson 1, Jan Björk, 2 and.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Dr Andrew Harrison Departments of Mathematical Sciences and Biological Sciences University of Essex Looking for signals in tens of thousands.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
MAPPING GENOMES – genetic, physical & cytological maps Genetic distance (in cM) 1 centimorgan = 1 map unit, corresponding to recombination frequency of.
Supplemental Figure 1A. A small fraction of genes were mapped to >=20 SNPs. Supplemental Figure 1B. The density of distance from the position of an associated.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
10cM - Linkage Mapping Set v2 ABI Median intermarker distance: 4.7 Mb Mean intermarker distance: 5.6 Mb Mean genetic gap distance: 8.9 cM Average Heterozygosity.
Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits)
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Quantitative Genetics
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Identification of Copy Number Variants using Genome Graphs
Genomics and Forensics
____ __ __ _______Birol et al :: AGBT :: 7 February 2008 A NOVEL APPROACH TO IMPROVE THE NOISE IN DETECTING COPY NUMBER VARIATIONS USING OLIGONUCLEOTIDE.
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
Simple-Sequence Length Polymorphisms SSLPs Short tandemly repeated DNA sequences that are present in variable copy numbers at a given locus. Scattered.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
Human Genomics Higher Human Biology. Learning Intentions Explain what is meant by human genomics State that bioinformatics can be used to identify DNA.
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Simple-Sequence Length Polymorphisms
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Anastasia Baryshnikova  Cell Systems 
SNPs and CNPs By: David Wendel.
The principles of genetic association
Cancer Cell Line Encyclopedia
Presentation transcript:

Effect of Single Nucleotide Polymorphism in Affymetrix probes Olivia Sanchez-Graillet Departments of Biological Sciences and Mathematical Sciences University of Essex (UK)‏ December 2008

SNPs: a single base pair is different between one individual and the other. Polymorphism: if at least two variants have frequencies > 1% in a population. Single Nucleotide Polymorphisms (SNPs )‏

SNPs are the most common type of sequence variation between individuals. SNPs are markers of phenotypes and diseases. SNPs may alter the gene expression and may change or not the amino acid sequence.

Other common variations: DIP: deletion/insertion polymorphism :-/T, C/- STR: short tandem repeat (microsatellite) polymorphism  (CA)19/20/21/22/23/24/25/26 MIXED: cluster containing submissions from 2 or more alleleic classes  -/AAA/AAAAA/AAAACCAAAAAAAAAAAAAAA MNP: multiple nucleotide polymorphism with alleles of common length > 1  AAA/CCC

We are studying the relationships between probes intensities on Affymetrix GeneChips. Affymetrix Gene chips contain thousands of probes

Probes map to different exons. Because of alternative splicing, some of the exons may be upregulated whereas others may be downregulated. We therefore focus on probes within exons.

Probes mapping to the same exon should behave similarly. What causes Affymetrix probes to behave as outliers with respect to other probes within a single exon? Objective:  Study the impact of SNPs and other common variation upon Affymetrix probes on GeneChips.  Explore whether the existence of a SNP causes a probe to behave differently to other probes which map uniquely to a single exon.

Previous research on how SNPs might affect gene expression‏:  Allele A is over-expressed compared to allele B or vs or both alleles are equally expressed (Kumari et al.,2007).  Hybridization resulted from variation might mislead the interpretation of data from individual genes, even if a single probe is affected (Alberts et al., 2007).  In 15 of 25 probesets, SNPs caused a difference in hybridization. Not every SNP causes a difference in hybridization (Alberts et al., 2007).  When the SNPs located at the very beginning or end of a probe, it might have little or not effect on hybridization (Hughes et al., 2001).

Method: A)Generation of exon heatmaps B)Identification of probes containing SNPs. C)Study of SNP-probes which are outliers.

1. CEL files are downloaded from the GEO database. 2. Calibration of microarray data: Quality control: detection of spatial flaws. Row Quantile Normalisation. 3. Correlate the intensities for groups of probes, using many thousands of GeneChip experiments. (A) Generation of exon heatmaps

Example flaw in CEL file W. B. Langdon et al. (2008). A Survey of Spatial Defects in Homo Sapiens Affymetrix GeneChips. In IEEE/ACM Transactions on Computational Biology and Bioinformatics.

Probe correlations The correlation in log intensities between Probe 9 and Probe 11 from probeset _at, obtained from 5,638 HG-U133A GeneChips.

The number in each square is the correlation multiplied by 10 and rounded Blue = low correlation Yellow = high correlation Average intensity in GEO Relative probe position on exon Standard deviation in GEO Probe number on heatmap

4.Unique mappings (alignments) of probes to individual exons ( Sanchez-Graillet et al.,2008. Widespread existence of uncorrelated probe intensities from within the same probeset on Affymetrix GeneChips. In Journal of Integrative Bioinformatics, 5(2):98) : avoid cross-hybridization and multiple targeting. sense direction (antisense is avoided). X  (25 bases, 96% identity)‏ probe 2 exon 3 transcript 3 (25 bases, 100% identity)‏ probe 2 exon 2 transcript 2 (25 bases, 100% identity)‏ probe 1 exon 1 transcript 1

(B,C) Identification of probes containing SNPs and outlier SNP probes

1. SNPs data downloaded from Ensembl 48 :  3' UnTranslated Region, 5' UTR, and coding regions. Chromosome 10 Gene 1 Gene 2 transcript 1 transcript 2 transcript 3 3'UTR 3'downstream 3' 5' 5'upstream ENSG ENSG gene_id ENST ENST ENST trans_id 3downstream 3utr 5upstream G/A rs biotypeallelechrom_positionchrom_namesnp_id

2. Identification of exons with SNPs by using transcript information and chromosomic positions. 3. Selection of unique exons and probes:  Only unique exons with more than 4 probes.  SNP positions on the probes uniquely mapping to exons are obtained.

4. Identification of SNP-probes which are outliers:  The overall correlation matrix median (OMM) is compared with each SNP-probe median (SPM).  If OMM – SPM >= 0.15

0.66> <0.15Difference SPM_ SPM_ OMM 0.87 SNP in an outlier probe SNP in an no-outlier probe

Results

ENSE HG_U133_Plus_2 ONNNNONNNN

W.Langdo nW.Langdo n Wed May 14 10:54:31 BST 2008 ENSE HG_U95A SNP in overlapped probes. The same SNP is in outlier probes and no-outliers probes _s_at rs T/C CTTCAAGAGCATCATGAAGAAGAGT O _s_at rs T/C ACCTTCAAGAGCATCATGAAGAAGA O _s_at rs T/C AGACCTTCAAGAGCATCATGAAGAA N _s_at rs T/C TGAGACCTTCAAGAGCATCATGAAG N _s_at rs T/C ATATGAGACCTTCAAGAGCATCATG N _s_at rs T/C ACATATGAGACCTTCAAGAGCATCA N Probe position heatmap probe_id snp_id snp position allele sequence Outlier

ENSE HG_U133A SNPs in only no-outlier probes rs _s_at A/G GTTTATGATCTGACCTAGGTCCCCC N rs _s_at C/G TAAGGACGCTGGGAGCCTGTCAGTT N snp_id probe_id probe_position_heatmap snp_position_probe allele seq

ENSE HG_U133A (5,374 CEL files) SNP in only outlier probes rs _at C/A CTGAATTTAGATCTCCAGACCCTGC O rs _at C/A CCTGCCTGGCCACAATTCAAATTAA O snp_id probe_id probe_position_heatmap snp_position_probe allele sequence

ENSE HG_U133_Plus_2 (2,572 CEL files) SNP in both outlier and no-outlier probes rs _at C/A CTGAATTTAGATCTCCAGACCCTGC N rs _at C/A CCTGCCTGGCCACAATTCAAATTAA O snp_id probe_id probe_position_heatmap snp_position_probe allele sequence

ENSE HG_U133A_2 (159 CEL files) SNP in only NO-outlier probes rs _at C/A CTGAATTTAGATCTCCAGACCCTGC N rs _at C/A CCTGCCTGGCCACAATTCAAATTAA N snp_id probe_id probe_position_heatmap snp_position_probe allele sequence

~60,000 SNPs distributed in unique exons of ten array designs.  11% in unique exons in which all probes that contain the same SNP are outliers.  5% in which not all the probes containing the same SNP are outliers.  84% in which all probes are not outliers. These numbers may vary according to the Ensembl version used and the threshold for outliers chosen.

The most frequent variation found was SNP, followed by indels and mixed variation. The most common allele was C/T. If more than one SNP-probe maps to the same exon, the probes may have partial or total overlapped sequences.

Cross-validation for HG_U133_Plus_2 Examination of SNP-Outlier Associations

Median differences and positions of SNPs on probes in HG_U133_Plus_2

Median differences and main alleles (A,C,T,G) found in SNPs in HG_U133_Plus_2

We have identified other causes of outlier probes:  Probes containing a contiguous run of 4 or more guanines: formation of G-quadruplexes occurring on the surface of a GeneChip. ( Upton et al., BMC Genomics (in press) ).  Probes located next to bright probes, such as at the edge of the Genechip, are affected by blur.  Motifs or any other “problematic” subsequences.

Outlier SNP-probes in HG_U133_Plus_2 with “problematic” sub sequences (PS):  G’s (>=4), CCTCC, CCACC, GGTGG Gs, CCTCC CCACC, GGTGG Outlier probesNo-outlier probes

Conclusions

We have not found a common behaviour when SNPs are present in a probe. SNPs do not seem to cause outliers in groups of probes representing individual exons. SNPs may influence other biological events like alternative poly(A). The genomic region where SNPs are found, the position of the SNP in a probe, the main allele, and the number of SNPs in a probe does not make a probe an outlier in the correlation heatmap.

Bioinformatics Group Dr Andrew HarrisonPhysics Dr Berthold LausenStatistics Dr Abdel SalhiMathematics Professor Graham UptonStatistics Dr William LangdonPhysics and Computer Sc. Dr Olivia SanchezComputer Sc. Dr Maria StalteriInorganic Chemistry & Bioinformatics Jose Arteaga-Salas Statistics Rohmatul Fajriyah Statistics Abdelhak Kheniche Pharmacology & Mathematics Rahim Bux Khokhar Mathematics Zain-Ul-Abdin Khurho Mathematics Farhat Memon Computer Sc. Joanna Rowsell Mathematics

Thank you!

ENSE HG_U95Av2 Probes with several SNPs rs _g_at C/T TGCGGCGGCTGTAGTGGGCTCTCTT rs _g_at C/T TGCGGCGGCTGTAGTGGGCTCTCTT rs _g_at G/A TGCGGCGGCTGTAGTGGGCTCTCTT rs _g_at T/C TGCGGCGGCTGTAGTGGGCTCTCTT rs _at C/T CGGCGGCTGTAGTGGGCTCTCTTCC rs _at G/A CGGCGGCTGTAGTGGGCTCTCTTCC rs _at T/C CGGCGGCTGTAGTGGGCTCTCTTCC rs _g_at C/T TAGTGGG C TCTCTTCCTCCTTCCAC rs _g_at G/A TAGTGGGCTCTCTTCCTCCTTCCAC rs _g_at T/C TAGTGGGCTCTCTTCCTCCTTCCAC rs _g_at T/C TAGTGGGCTCTCTTCCTCCTTCCAC rs _g_at T/G TAGTGGGCTCTCTTCCTCCTTCCAC snp_id probe_id probe_position_heatmap snp_position_probe allele sequence

Adjacent probes within a cell on a GeneChip have the same sequence – a run of Guanines will result in closely packed DNA with just the right properties to form quadruplexes.