Functional Genomics I - Microarrays
Transcriptomics Proteomics Metabolomics Genomics SNP (Single Nucleotide Polymorphisms) CNV (Copy Number Variation, CGH) Epigenomics
Technology that provides measurments of thousands of molecules in the same experiment and reasonable prices and precision Generally in the size of a typical microscope slide (75 x 25 mm (3" X 1") and about 1.0 mm thick)
Biological Question Experimental Design Microarray Experiment Pre-processing Differential Expression ClusteringPrediction Biology: Verification and Interpretation … Image Analysis Background Normalization Sumarization Transformation
Google Images
Molecular Cell Biology [Lodish,Berk,Matsudaira,Kayser,Kreiger,Scott,Zipursky,Danell] (5th Ed) Gene Expression
100bp 200bp RWPE-1DU-145PC bp ladder mRNA, Gene X copies 10 6 copies 10 5 copies 10 4 copies 10 3 copies 10 2 copies 10 copies PCR QPCR
Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
Affymetrix Images – 1 dye two-dyes
Affymetrix Spotted Arrays Inkjet arrays Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
Dr. Hugo Barrera Microarrays Course EMBO-INER 2005, Mexico City
mRNA Extraction (and amplification) Labelling Hybridization Scanning Statistical Analysis Image Analysis & Data Processing PROCESS Healty/ControlDisease/Treatement REFERENCETEST Gene: A 1-1 B 1-0 C 3-3 D 0-3 Gene: E 3-0 F 0-1 G 1-1 H 2-0 Gene: I 2-2 J 0-0 K 3-0 L 2-1 Gene D Gene E Gene K TWO-DYES mRNA/cDNA Labeled mRNA Digital Image Microarray Data Selected Genes PRODUCT TEST Gene: A 1 B 1 C 1 D 0 Gene: E 4 F 1 G 1 H 2 Gene: I 2 J 0 K 5 L 2 Sample Gene D Gene E Gene K Gene J ONE-DYE
Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
Dr. Hugo Barrera, Microarrays Course EMBO-INER 2005, Mexico CityMicroarrays Bioinformatics, Dov Stekel, Cambridge, m Laser 10 m Laser
Pre-processing Image Analysis Background Normalization Sumarization Transformation Microarray - Pre-Processing Purpose Output: Data File (unique "global relative" measure of expression for every gene with minimal experimental error) Input: Scanned Image File
TECHNOLOGIES DNA Probes Oligos ~20 40nt Target (cDNA, PCR products, etc.) Copies per geneUsually 1Usually 3 Organization Sectors (print-tip) n x m probsets Probeset m probsets (~100) y sectors (~=3) x sectors (~=3) n probsets (~100) Sectors i x j spots (18x20) Empty spots landing lights perfect match probes (pm) mismatch probes (mm) Controls
TECHNOLOGIES 10,000 genes * 2 dyes * 3 copies/gene * ~40 pixels/gene = 2,400,00 values only 10,000 values 10,000 genes * 20 oligos * 2 (pm,mm) * ~ 36 pixels/gene = 14,400,00 values only 10,000 values RAW DATA Image Analysis Pre-processing
Addressing: Estimate location of spot centers. Segmentation: Classify pixels as foreground or background. Extraction: For each spot on the array and each dye foreground intensities background intensities quality measures. Addressing Done by GeneChip Affymetrix software
Addressing: Estimate location of spot centers. Segmentation: Classify pixels as foreground or background. Extraction: For each spot on the array and each dye foreground intensities background intensities quality measures. Addressing (by grid, GenePix)
Addressing: Estimate location of spot centers. Segmentation: Classify pixels as foreground or background. Extraction: For each spot on the array and each dye foreground intensities background intensities quality measures. Segmentation Circular feature Irregular feature shape Finally compute Average
Background Reduction Extraction: Determining Background
2-Color Results (GenePix).gpr file "results" for one array 10,000 genes ~ 30,000 values (.gal files 1 file for a "list" of array) Affymetrix Results.cel file "results" for one array (raw - no background reduced) 10,000 genes ~ 400,000 values Image Analysis
Segmentation (Spot detection) Background Estimation Value Value = Spot Intensity – Spot Background Gene 1 Gene 2 Gene 3. Gene k. Gene N Sample Sample
Gene 1 Gene 2 Gene 3. Gene k. Gene N Sample Sample G=Sample 1 R=Sample 1 G=Sample 1 R=Sample 1 Log 2
Gene 1 Gene 2 Gene 3. Gene k. Gene N Sample Sample (log 2 scale) RGRG 1 value? A M MA-Plot G=Sample 1 R=Sample 1
A M "With-in" (2 color technologies) Normalization – 2 dyes (assumption: Majority No change)
Normalization – 2 dyes (assumption: Majority No change) Before After "With-in" (2 color technologies)
Normalization – 2 dyes "With-in" Spatial (2 color technologies) Before Normalization Aftter loess Global Normalization Aftter loess by Sector (print-tip) Normalization
Gene 1 Gene 2 Gene 3. Gene k. Gene N Sample Log 2
Before normalization After normalization Between-slides Normalization – 1 or 2 dyes quantile MAD (median absolute deviation) scale qspline invariantset loess
Sumarization = "Average"(Intensities) Summarization – Affymetrix Oligonucleotide dependent technologies Usual Methods: tukey-biweight av-diff median-polish PM MM The "summarization" equivalent in two-dyes technologies is the average of gene replicates within the slide.
Some spots may be defective in the printing process Some spots could not be detected Some spots may be damaged during the assay Artefacts may be presents (bubbles, etc) Use replicated spots as averages Remove unrecoverable genes Remove problematic spots in all arrays Infer values using computational methods (warning)
More than 10,000 genes Too many data increases Computation Time and analysis complexity Remove Genes that do not change significantly Undefined Genes Low expression Keeping Large signal to noise ratio Large statistical significance Large variability Large expression
Image Analysis` Background Subtraction Normalization Summarization Transformation Data Processing Background Detection & Subtraction a) Filtering Microarray Image Scanning Spot Detection Intensity Value Affymetrix Two-dyes b) Image Analysis and Background Subtraction c) Transformation Between Within d) A=log2(R*G)/2 M=log2(R/G) Normalization
Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007
Gene 1 Gene 2 Gene 3. Gene N Class A Samples Class B Samples Normal Tissue, Cancer A, Untreated, Reference, … Tumour Tissue, Cancer B, Treated, Strains, … ….
Differential Expression Unsupervised Classification Biomarker detection Identifying genes related to survival times Regression Analysis Gene Copy Number and Comparative Genomic Hibridization Epigenetics and Methylation Genetic Polymorphisms and SNP's Chromatin Immuno-Precipitation On-Chip Pathogen Detection ……
Differential Expression Positive Negative Samples A Samples B Samples A Samples B Gene Selection µ=dµ=d µ=dµ=d Expression Level Gene 1 Gene 2 Gene 3. Gene N Class A Samples Class B Samples Normal Tissue, Cancer A, Untreated, Reference, … Tumour Tissue, Cancer B, Treated, Strains, … p-value FDR q-Value
Biomarker Detection PositiveNegative Samples Class A Samples Class B Samples Class A Samples Class B µ=dµ=d µ=dµ=d Gene Selection Expression Level Biomarker Discovery Gene 1 Gene 2 Gene 3. Gene N Class A Samples Class B Samples Normal Tissue, Cancer A, Untreated, Reference, … Tumour Tissue, Cancer B, Treated, Strains, …
A C G B H E D I K M L Samples Co-Expressed Genes Unsupervised Sample Classification a B Low High Expression b
Genes Associated to Survival Times and Risk PositiveNegative Gene Selection Kaplan-Meier Plot Time Hazard Kaplan-Meier Plot Time Hazard Gene 1 Gene 2 Gene 3. Gene N Class A Samples Class B Samples Normal Tissue, Cancer A, Untreated, Reference, … Tumour Tissue, Cancer B, Treated, Strains, …
Regression: Gene Association to outcome Positive Negative Gene Selection Dependent Variable Gene Expression Dependent Variable Gene Expression Slope ≠ 0Slope = 0 Gene 1 Gene 2 Gene 3. Gene N Class A Samples Class B Samples Normal Tissue, Cancer A, Untreated, Reference, … Tumour Tissue, Cancer B, Treated, Strains, …
LabellingDetectionHybridisation AA CG CC … … SNP 1 SNP 2 SNP 3 3' T T G C G G TG G C 5' SNP 1 SNP 2 SNP 3 Products of 1nt primer extension (in solution) Capture C TGA 5' GC 5' CG AA CG CC … … SNP 1 SNP 2 SNP 3 5' + Transcribed RNA + reverse transcriptase 5' GC A^C 5' TA C^A Extension ddNTPs (one labelled) 5' TA 5' TA 5' GC 5' CG 5' GC 5' GC AA CG CC … … SNP 1 SNP 2 SNP 3 Extension (1nt) + Labelled ddNTPs PCR products + DNA polymerase T C GA SNP 1 SNP 2 SNP 3 a b c
Chromatin Immuno-Precipitation (ChIP-on-Chip) Precipitation of Antibody-TF-DNA complex Fusion of Tag sequence into TF gene Labelling of precipitated DNA Microarray Hybridisation Incubation DNA-Tagged TF Transcription FactorTag Antibody against tag peptide
(1) ACGGCTAGTCACAAC... (2) GCTAGTCACAACCCA... (3) GCTAGTCCGGCACAG Sample SpottedHybridized (1)(2)(3)
Placenta 1 Placenta 2 mRNA Extraction Reference Pool Labelling Microarray Hybridization (by duplicates) Scanning & Data Processing Detection of Differentially Expressed Genes Validation and Analysis Green Red t-test H 0 : µ = 0 p-values correction: False Discovery Rate Comparison With Known Tissue Specific Genes Image Analysis Within Normalization (per array) Between Normalization (all arrays) (controls) (Dr. Hugo Barrera)
a b cd Placenta/ReferenceControl/Control
(a) Microarray Experiment Ratio (log 2 ) Placenta (b) T1dbase T1 score 1 0 Lung Thalamus Amygdala Spinal Cord Testis Kidney Liver Pituitary Thyroid Cerebellum Hypothalamus Caudate Nucleus Exocrine Pancreas Lymph Node Frontal Cortex Stomach Breast Bone Marrow Pancreatic Islets Uterus Ovary Skin Heart Skeletal Muscle Prostate Thymus Salivary Gland Trachea Placenta 2 Replcate 2 Placenta 2 Replicate 1 Array: Placenta 1 Replicate 1 Placenta 1 Replicate 2
Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007
Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007
Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007