Exercise 1: Importing Illumina data Using the Import tool File / Import folder. Select the folder IlluminaTeratospermiaHuman6v1_BS1 In the Import files -window choose the action “Use import tool" and click OK Click the Mark title row –button and click on the title row of the data file. Click Next. Click the Identifier –button and click on the TargetID column. Click the Sample –button and click on the AVG column. Click Finish Alternative: Importing a whole BeadStudio data file directly File / Import files. Select the file IlluminaForLumiHuman6v1_BS1.tsv In the Import files -window choose the action "Import directly" and click OK. This way the file is imported as it is.
Exercise 2: Normalizing Illumina data Using the IlluminaTeratospermiaHuman6v1_BS1 dataset (separate files) In the workflow view, double click on the box ”13 files” to select all of them In the analysis tool section, choose Normalization and Illumina Click Show parameters and set the chiptype to Human-6v1 Click Run Repeat the run using the same chiptype, but setting the normalize.chips to none. Using the file IlluminaForLumiHuman6v1_BS1.tsv (one whole BS file) Select the file IlluminaForLumiHuman6v1_BS1.tsv Choose Normalization and Illumina – lumi pipeline Click Show parameters and set the chiptype to Human-6v1 Click Run Repeat the run using the same chiptype, but setting the normalize.chips to none.
Exercise 3: Describe the experiment Using the IlluminaTeratospermiaHuman6v1_BS1 dataset (separate files) Double click the phenodata file In the phenodata editor, enter 1 in the group column for the control samples and 2 for the affected samples Using the file IlluminaForLumiHuman6v1_BS1.tsv (one whole BS file) Double click the phenodata file In the phenodata editor, click on the original name –column to sort the samples. In the group column mark the replicates with the same number (1, 2 and 3)
Exercise 4: Illumina quality control Using the IlluminaTeratospermiaHuman6v1_BS1 dataset Run the tools Statistics / NMDS and Visualization / Dendrogram for both the normalized and the ”mock-normalized” data files View the result files side by side (use the Detach button) Using the IlluminaForLumiHuman6v1_BS1.tsv dataset As above
Exercise 5: Filtering Select the normalized data and play with different filters Preprocessing / Filter by SD Preprocessing / Filter by CV Preprocessing / Filter by IQR
Exercise 6: Statistical testing t-test Select the sd-filter.tsv of the teratospermia dataset Run Statistics / Two group test using the method t-test Empirical Bayes Select the normalized.tsv of the teratospermia dataset Run Statistics / Two group test using the method empirical Bayes and turning the P-value adjustment off Run Preprocessing / Filter by SD on the result file two-group.tsv Run Statistics / Adjust P-values on the result file sd-filter.tsv (you have to specify the P-value column in the parameters) Compare the results using the Venn diagram Save the analysis session File / save session
Exercise 7: Linear modelling - taking several covariants into account at the same Use a kidney cancer dataset of 17 samples Start a new session File / Import folder, select the folder AffyNormalized and Import directly Right-click the normalized.tsv and link it to the phenodata.tsv. Look what columns you have in the phenodata. Linear modelling Select the normalized.tsv and Statistics / Linear modelling. Set group, kidney side and gender as the three main effects. Set donor as the pairing information. Select the result file pvalues.tsv and run the tool Utilities / Extract genes using a P-value for all the main effect P-value columns (= three times) Save the session
Exercise 8: Clustering Open your Illumina session Hierarchical clustering Select the adjust-pvalues.tsv Run Clustering / Hierarchical with default parameters. Repeat the run using bootstrapping: Set the resampling parameter to bootstrap and number of replicates to 10. How reliable are the branches? K-means clustering Select the adjust-pvalues.tsv Run the tool ”K-means – estimate K” Run K-means clustering setting the parameter number of clusters according to your estimated K. View the clusters using the visualization method Expression profiles Extract the genes from cluster 1 using Utilities / Extract genes from clustering
Exercise 9: Annotation Annotate genes Select the file adjust-pvalues.tsv Run Annotation / Illumina gene list Open the result file annotations.html and click the links in the gene and pathway columns to read more about one of the genes Open the result file annotations.tsv and sort it by the pathway column. Slide the pathway column next to the description column and make it wider
Exercise 10: Pathway analysis Gene enrichment analysis Select the file adjust-pvalues.tsv Run Pathways / Hypergeometric test for KEGG Are any KEGG pathways enriched in your list of differentially expressed genes? Using the file annotations.tsv, figure out what are the genes that contributed to the top pathway Gene set test Select the file normalized.tsv Run Pathways / Gene set test and set the parameter pathways.or.genelist to KEGG.
Exercise 11: Promoter analysis Pattern discovery: do the promoters of similarly expressed genes share a sequence motif? Select the file extract.tsv containing the genes from cluster 1 Run Promoter analysis / Weeder. What is the most interesting motif? Check in the matrix (Best occs) what positions are most conserved. Run Promoter analysis / Cosmo. As judged by the sequence logo, do you find similar motifs?
Exercise 12: Saving and running a workflow Save a workflow Prune your teratospermia dataset workflow if necessary Select the file normalized.tsv and click on the Workflow / Save starting from selected. Give your workflow a meaningful name and save it. Run a workflow Open the session called sessionIlluminaTeratospermia.cs Select the file normalized.tsv and Workflow / Run recent