Microarray Data Analysis Roy Williams PhD; Burnham Institute for Medical Research
Aims Load normalised illumina data into GeneSpring (undiff verses diff stem cells) Cancel GeneSpring normalisation Define biological replicates Discover significantly differentially regulated genes (undiff verses diff stem cells) Compare this list to Gene Ontologies Attempt to make a conclusion
Essential Tools GeneSpring: Download Demo version Quantiles Normalised Stem Cell Dataset Select data from either: –StemCellCommunity.org database –NIH GeneExpressionOmnibus database –
Stem Cell Microarray Database Automatically exports normalised data table!!! GCT table format is widespread!!!
After QC for low confidence genes (P<0.99) Note: ~50 replicate beads per array Median Outliers 25% quartile 75% quartile BAD CHIP BOXPLOT REPRESENTATION OF DATA SPREAD CHIP NUMBER SIGNAL INTENSITY
The effect of quantiles Normalisation on the filtered 36 data sets IMPORTANT: use non-linear normalisation >library(affy) >Qdata <- normalize.quantiles(Rawdata) All same range
Normalised Tutorial Dataset Using this tool generated the dataset: outputTueJul gct (.gct is portable!) AES_derived Neural stem cells AES_derived Neural stem cells AES_derived Neural stem cells EES cells_ undifferentiated FES cells_ undifferentiated DES cells_ undifferentiated
Genome Import: File ->import genome Load the illumina chip information into GeneSpring
Drag and Drop loads datafile
Define columns using drop down boxes
Define sample attributes (or not)
Give new experiment a name and save
Define new experiment normalisations and parameters New data set loaded appears here
Delete the default normalisations
ALL GONE!
IMPORTANT! DEFINE REPLICATES (PARAMETERS)
Define the custom parameter “group”
Groups define the replicates (use exactly the same text!)
Change the interpretation to look at the “group” replicates
Filter on expression level: Remove the genes which are not expressed (ie absent)
Filtering leaves 25,923 of 49,009 genes: save list as GenesPresent
Reset colour bar range Right click on Colour bar
Using the now default interpretation the data is in 2 groups
To find differentially expressed gene filter on Volcano Plot
With these filters gives a list of ~1800 genes: save list
Export the list with the averaged data: copy annotated GeneList
Select annotations to export
Paste average data into an Excel worksheet
Export all the data and see it is highly reproducible
Check list against GeneOntologies: Development is major group
Look for Pathways significantly differentially expressed Load into Nextbio and Ingenuity GSEA is a good free alternative for pathways ( GeneSet enrichment analysis ) GenePattern is a good free alternative to GeneSpring ( )