Download presentation
1
Microarrays and Gene Expression
DTC Bioinformatics Course 9th February 2010 Helen Lockstone
2
Overview Background Array design Applications of array technology
Steps in data analysis Finding differentially expressed genes Biological interpretation
3
Schedule Time Topic Introduction to microarray technology and applications Break Microarray data analysis Practical 1 Lunch Biological interpretation Practical 2
4
Microarrays in the Literature
5
The Central Dogma Transcriptome measured by microarrays
6
Premise of Microarrays
Compare gene expression between groups Differentially expressed genes may provide some biological insight But not magical solutions!
7
Typical Microarray Designs
Disease vs control Good prognosis vs poor prognosis Different tumour types Effect of treatment Effect of stimulus Time course Different tissues/stages of development
8
Criticism of Microarrays
Non-hypothesis driven “fishing expeditions” Because microarray experiments are expensive and time-consuming to interpret, often published as a stand-alone experiment Produce large amounts of data, interpretations can be very different (but equally valid) Further experimental work, following up hypotheses suggested from array data, can produce elegant studies Perception that data is unreliable – validation
9
Microarray Repositories
GEO – ArrayExpress - Excellent resource of microarray data MIAME guidelines
10
What is a Microarray? Glass slide consisting of hundreds of thousands of probes arranged in grid layout Each probe detects a particular RNA species (transcript) Hybridisation occurs by complementary base-pairing Make quantitative measurements – signal from each probe is proportional to the amount of hybridised RNA Interrogate entire genome in single experiment
11
Microarray Technology
Probes cDNA Oligonucleotides PCR products Design Targeted to genes Tiling (chromosomes, promoters) Fabrication Method Spotted (robotic printing) Photolithography (synthesised in-situ) Type One-colour (log intensities) Two-colour (log ratios) Labelling molecules Cy3 (green), Cy5 (red), biotin
12
Experimental Protocol
13
Microarray Manufacturers
Company Established Main Microarray Technology Human Whole-Genome Array released Headquarters Affymetrix 1992 GeneChip 1994 Santa Clara, CA Illumina 1998 BeadChip 2005 San Diego, CA Roche NimbleGen 1999 High-density tiling arrays Madison, WI Agilent aCGH, ChIP-chip, custom 2004
14
Array design
15
Affymetrix Microarrays
Manufacturing microarrays for >15 years 25bp probes – 11 individual probes comprise a probe-set, signal combined to estimate gene expression Whole human genome array has >50,000 probesets Size array surface 1.28cm2 3’ expression arrays – probes designed to 3’ end of transcript
16
Recent Developments Limitations of 3’ array design
Assumes representative of entire gene Assumes well-defined 3’ end of gene Can’t assess splicing events Can be difficult to distinguish homologous genes Whole transcript arrays 4-probe probesets designed to each exon Gene 1.0 and Exon 1.0 arrays
17
Exon Array Design Picture from Affymetrix
18
Illumina Beadchip Arrays
Beads randomly occupy wells on surface of array 30-40 replicates of each bead type (probe) Longer probe length – typically one probe per gene
19
Applications of Microarray Technology
20
Microarray Applications
ChIP-chip Gene Expression Alternative Splicing DNA Methylation microRNA expression Comparative Genomic Hybridisation SNP Genotyping
21
Gene Expression Still most common use for microarrays
Aim to determine differential expression between groups of samples e.g. disease and control Generate hypotheses about the mechanisms underlying the disease of interest
22
Alternative Splicing Up to 75% of human genes may produce alternative transcripts Increases protein diversity from given set of genes Alternative transcripts from same gene can produce proteins with different, even opposite, functions (e.g. Bcl-x) Role in disease - mutations can disrupt splice sites or splicing machinery
23
Alternative Splicing Affymetrix exon array allows investigation of alternative splicing Custom arrays with junction probes Additional layer of analysis
24
Alternative Poly-A Sites
Alters length of 3’ UTR - may change which target regions for miRNAs are present
25
Alternative Splicing
26
MicroRNAs Small non-coding RNAs (~22bp)
Sequence-specific binding to 3’ UTRs Post-transcriptional gene silencing Picture from He et al. Nature Reviews Cancer 7, (2007)
27
SNP Arrays Illumina and Affymetrix ~6 million SNPs genome-wide
Genotype individuals in high-throughput and cost-effective manner Genome-wide association studies eQTL studies
28
Tiling Arrays Applications so far use arrays with probes designed to genes/miRNAs/SNPs of interest Tiling arrays consist of high-density probes covering a particular region(s) of the genome Identify novel transcripts, exons
29
DNA Methylation Methylation of cytosine bases (CpG islands) in gene promoter regions can silence transcription Epigenetic mechanism Two-colour hybridisation
30
ChIP-chip Method to identify transcription factor binding sites in an unbiased fashion Cross-link protein (TF) of interest with DNA Use immuno-precipitation to pull down DNA fragments bound to the protein (enriched sample) Hybridise with genomic DNA to obtain log-ratio Again looking for large positive ratios
31
Comparative Genomic Hybridisation
Trisomy 13 in female compared to reference male Detect regions of amplification/deletion (copy number changes) Feature of cancer – hybridise sample with reference DNA (copy number=2) Potential dosage effects on genes in affected regions
32
Analysing Gene Expression Data
33
R and BioConductor Powerful, open-source software for statistical analysis and graphical visualisation Greater functionality provided by software packages contributed by researchers BioConductor packages are specifically for genomic data affy limma vsn
34
Analysis Steps Check quality of the data
Decide if any samples are outliers Preprocessing and normalisation Statistical analysis to find differentially expressed genes Tools for biological interpretation
35
Data Quality Looking for good signal and similar metrics across all arrays in experiment (after normalisation between arrays) Poor signal could indicate a hybridisation problem or degraded sample Control probes for hybridisation, labelling and sample can help identify problems
36
Illumina Array Metrics
Average signal Number of detected genes Housekeeping genes signal Biotin controls Hybridisation controls Negative control probe signal
37
Processing Data Background correction
Transform data to log scale (more suitable for statistical analysis) Normalisation between arrays (adjust for systematic differences such as overall brightness) Probe-set summarisation (Affymetrix) or across replicate probes (Illumina)
38
Exploring Data – Boxplots Signal Intensity
39
Exploring Data - PCA
40
Outlier Samples Potential outlier samples will look different to others in the experiment No definitive rules to decide when to exclude a sample from analysis Depends on size of experiment Can be useful to run analysis with and without outlier to assess effect on results Always re-normalise data excluding any outlier samples before proceeding
41
Outlier Sample
42
PCA indicating outlier sample
43
Filtering Lose data but signal from low intensity probes is noisy and can give false positives Detection p-values calculated for each probe based on overlap of signal with negative control probe signal distribution Criteria Detected in all samples/at least one sample Detected in at least one group
44
Detecting Differentially Expressed Genes
Linear Models for Microarray Analysis (limma) Handles analysis of simple and complex experimental designs For two-group comparisons, analogous to t-test, otherwise ANOVA Uses information from all genes to estimate variance Reduces chance of false positives from very low variance genes More robust for small sample sizes
45
limma Fits linear model for each gene
Test whether slope = 0 for each gene and assign p-values Multiple testing correction - FDR Group 1 Group 2 Log normalised intensity
46
Effect of other variables
Wt and Mut groups Three different litters Top gene ~ 5x higher expression in Wt compared to Mut Similarly expressed across litters in both genotypes
47
Strong litter effect Overlap between groups
Within litters, consistent pattern of higher expression in WT vs Mut Within genotypes, B>C>A – expression depends on litter Accounting for this variance increases power
48
Limma Output
49
Limma Output Small sample size and subtle effects can mean no probes would be considered statistically significant Ranked in order of evidence for differential expression – can still be explored Biological interpretation can be most difficult step – tools available
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.