Expression profiling & functional genomics Exercises
Differential expression
Use the normalized data to find statistically differentially expressed genes: CyberT software oefnbaldi.xls The file contain the 4 normalised ratios (see SNOMAD) T test on the ratios Condition 1 Dye1 Replica L Condition 1 dye1 Replica R Condition 2 dye2 Replica L Condition 2 dye2 Replica R Condition 2 dye1 Replica L Condition 2 dye1 Replica R Condition 1 dye2 Replica L Condition 1 dye2 Replica R Array 1 Array 2 Per gene, per condition 4 measurements available Paired samples CyberT
Results CyberT Mn: mean ratio # obs: number of ratios available to calculate the statistics SD: standard deviation on the ratio estimates T, p calculated t and p value that indicate the significance of the measurement
Results CyberT
SAM
MARAN ANOVA based Filtering Linearisation Bootstrapping Log transformation
Two typical cDNA designs Reference design (Spellman data set) Reference: unsynchronized cells Condition: synchronized cells during cell cycle at distinct time intervals (18) Condition 1 Dye1 Replica L Condition 2 Dye1 Replica L Condition 3 Dye1 Replica L Condition 4 Dye1 Replica L. … Condition 19 Dye2 Replica L Condition 19 Dye2 Replica L Condition 19 Dye2 Replica L Condition 19 Dye2 Replica L Array 1 Experimental design Exercises
Data were precalculated Login: username userGGS Password: Njoedel Uploaded data: Spellman: test cell cycle (reference design) Mouse: latin sqaure design (log transformed) MARAN
Spellman non log transformed
MARAN
Complex cDNA design Latin Square (mouse data set) Reference: normal mouse Condition: pygmee mouse Two experiments T=1, T=2 reflects two sample time points 2 batches: not all genes of the genome on one array A 1, T 1 B1 Test = R Ref = G A 2, T 1 B1 Test = G Ref = R A 5, T 2 B1 Test = R Ref = G A 6, T 2 B1 Test = G Ref = R A 3, T 1 B2 Test = R Ref = G A 4, T 1 B2 Test = R Ref = G A 7, T 2 B2 Test = R Ref = G A 8, T 2 B2 Test = G Ref = R Exercises
Clustering of expression profiling experiments
Complex cDNA design Latin Square (mouse data set) Reference: normal mouse Condition: pygmee mouse Two experiments T=1, T=2 reflects two sample time points 2 batches: not all genes of the genome on one array A 1, T 1 B1 Test = R Ref = G A 2, T 1 B1 Test = G Ref = R A 5, T 2 B1 Test = R Ref = G A 6, T 2 B1 Test = G Ref = R A 3, T 1 B2 Test = R Ref = G A 4, T 1 B2 Test = R Ref = G A 7, T 2 B2 Test = R Ref = G A 8, T 2 B2 Test = G Ref = R Experimental design 8 Arrays 2 Batches 2 Dyes 2 Conditions Exercises
Dataset Yeast cell cycle data set –Data set is preprocessed (slide by slide) –Expression level of each gene is expressed as the log of the ratio –15 experiments, 7000 genes –Filtering based on variance => retain 3000 genes –Rescaling (mean variance) –Cluster the experiment using Kmeans (EPCLUST) Hierarchical clustering (EPCLUST) AQBC (INCLUsive)
Exercises Clustering INCLUsive
Exercises Clustering INCLUsive
Exercises Clustering INCLUsive Average profile
Exercises EPCLUST
Exercises EPCLUST Remember the ID of the file
Check if your data were uploaded Go back and refresh the page to return to the original page Exercises EPCLUST Continue here
Exercises EPCLUST
Exercises EPCLUST Make a selection of the most interesting genes, because a filtering was already performed select all data
Exercises EPCLUST Try hierarchical clustering and K- means clustering
K-means 30 clusters, Euclidea n distance Exercises EPCLUST: result Kmeans
Exercises EPCLUST Try hierarchical clustering and K- means clustering
The comparison between the content of these two clusters can be seen in the file vergelijkingcluster.xls
Exercises EPCLUST: hierarchical clustering Analyze the tree Try to detect the number of clusters in the dataset Click on a node and view the profile of a subcluster
Exercises EPCLUST: automatic linking to other tools
Exercises EPCLUST: automatic linking to other tools
Exercises EPCLUST: automatic linking to other tools
FATIGO: calculating statistical overrepresentation using GO