Download presentation
Presentation is loading. Please wait.
1
Linear Models for Microarray Data
LIMMA Linear Models for Microarray Data
2
Difficulties with microarray data
Variability of the expression values differs between genes Non-identical and dependent distribution between genes Multiple testing of tens of thousands of genes
3
Correct for multiple comparisons
Multiple testing - Family-wise error rate - False Discovery Rate etc. Parallel nature of the inference allows for compensating possibilities Borrowing information from the ensemble of genes to assist in inference from individual genes
4
Empirical Bayes Frequentist methods, a hypothesis is typically rejected or not rejected without directly assigning a probability Bayesian methods, specifies some prior probability, which is then updated in the light of new data. For Bayesian techniques, the prior distribution is assigned independent of the data and fixed before any data is observed.
5
Empirical Bayes Superficially similar to Bayesian methods in that a prior distribution is assigned. However, prior distribution is estimated from the data Therefore Empirical Bayes is a frequentist technique
6
LIMMA Empiricial Bayes techniques have previously been applied to microarray data Analysis specific to experiment and very difficult to implement LIMMA - Simple model with simple expression of posterior odds Allows linear modelling to be applied to microarray data
7
Estrogen Data 2x2 factorial experiment on MCF7 breast cancer cells using Affymetrix HGU95av2 arrays Factors : Estrogen (Presence/Absence) Length of exposure (10hr/48hr) The idea of the study is to identify genes that respond to estrogen treatment
8
Read in the Data Load in the estrogen data Normalise the data
Define the targets (factors) for the linear model
9
Design Matrix Eight arrays Four pairs of replicates
1 low10-1.cel absent 10 2 low10-2.cel absent 10 3 high10-1.cel present 10 4 high10-2.cel present 10 5 low48-1.cel absent 48 6 low48-2.cel absent 48 7 high48-1.cel present 48 8 high48-2.cel present 48 Eight arrays Four pairs of replicates Four parameters in the linear model
10
Contrast Matrix Estrogen effect at 10 hours
1 low10-1.cel absent 10 2 low10-2.cel absent 10 3 high10-1.cel present 10 4 high10-2.cel present 10 5 low48-1.cel absent 48 6 low48-2.cel absent 48 7 high48-1.cel present 48 8 high48-2.cel present 48 Estrogen effect at 10 hours Time effect without estrogen Estrogen effect at 48 hours
11
Differential Expression
Extract linear model fit for contrasts Obtain list of differentially expressed genes for contrasts Look for overlap among differentially expressed genes
12
Linear Model Fit logFC - Estimate of the log2-fold-change corresponding to the effect or contrast AveExpr - Average log2-expression for the probe over all arrays/channels t - moderated t-statistic P.Value - Raw p-value adj.P.Value -Adjusted p-value B - log odds that the gene is differentially expressed
13
Annotating Data Probe arrays can be annotated with external data
Multiple sources of gene annotations
14
Gene Set Enrichment All biochemical pathways are determined by sets of genes Gene sets are determined by prior biological knowledge relating to co-expression, function, location or known biochemical pathways. If a pathway is in any way related to a biological trait then the co-functioning genes should display a higher degree of enrichment compared to the rest of the transcriptome. Gene Set Enrichment (GSE) is a computational technique which determines whether a priori defined set of genes show statistically significant overlap
16
Estrogen receptor (ER) gene set
If estrogen is present, ER genes will bind the estrogen and become activated Gain ability to regulate gene expression and result in differential expression between the cells with and without estrogen Should lead to up regulation of ER genes
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.