Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian
Introduction Big volumes of microarray data generated from different technologies are too large to analyze by simple sorting in spreadsheets, or manually comparing, plotting as graphs. Big volumes of microarray data generated from different technologies are too large to analyze by simple sorting in spreadsheets, or manually comparing, plotting as graphs. Each type and/or platform of microarray has its own unique analysis features. Each type and/or platform of microarray has its own unique analysis features.
General Procedures Normalization: remove/reduce systematic (non-biological) variation between array- array, chip-chip. Try to equalized overall signals across array/chip to be compared. Normalization: remove/reduce systematic (non-biological) variation between array- array, chip-chip. Try to equalized overall signals across array/chip to be compared. Examples of normalization: whole chip, per gene, quantile, Lowess, dye swap, RMA,… Examples of normalization: whole chip, per gene, quantile, Lowess, dye swap, RMA,…
Cont. Comparative: Compare gene expression across two or more samples to determine significant differential expressed gene list. Comparative: Compare gene expression across two or more samples to determine significant differential expressed gene list. Example methods: Example methods: t-test ANOVA t-test ANOVA Fold change Fold change Rank order (MAS 5 etc.) Rank order (MAS 5 etc.) Permutation (SAM) Permutation (SAM)
Cont. Clustering: Identifies significant correlation in expression data across experiments/conditions. Clustering: Identifies significant correlation in expression data across experiments/conditions. Example method: Hierarchical clustering Hierarchical clustering k-means clustering k-means clustering Self-organizing maps Self-organizing maps ….. …..
Cont. Biological overlay: Identify functions for give genes; functional clusters of genes; hypothesis generation Biological overlay: Identify functions for give genes; functional clusters of genes; hypothesis generation Example method: Example method: Multi-database access (Source) Multi-database access (Source) Functional grouping (Gene Ontology, KEGG, GenMAPP) Functional grouping (Gene Ontology, KEGG, GenMAPP) PubMed Correlations (PubGene) PubMed Correlations (PubGene)
GenomeStudio A software tool for analyzing illumina gene expression data from scanned microarray images collected from the illumina BeadArray Reader. A software tool for analyzing illumina gene expression data from scanned microarray images collected from the illumina BeadArray Reader. Resulting BeadStudio files can be used by the 3 rd party analysis programs. Resulting BeadStudio files can be used by the 3 rd party analysis programs.
The normalization uses quantiles of sample intensities to fit smoothing B-splines. It’s a non-linear method. Different scaling factors are applied to different parts of the population of genes.
Differntial Expression Algorithm This is used to compare a group of samples to a reference group. Illumina custom: assumes that signal intensith is normally distributed among replicates. The variation has 3 components: biological, non-biological, and technical errors. Illumina custom: assumes that signal intensith is normally distributed among replicates. The variation has 3 components: biological, non-biological, and technical errors. Mann-Whitney: also called Wilcoxon rank-sum rest. It’s a non-parametric test for assessing whether two samples of observation come from the same distribution. Mann-Whitney: also called Wilcoxon rank-sum rest. It’s a non-parametric test for assessing whether two samples of observation come from the same distribution. T-test: T-test:
Output Files of BeadStudio XXXXXX_gene_profile: Intensity data and various quality scores reported at the gene level. Signals from probes for the same gene are combined to give a single value for the gene. XXXXXX_qcinfo :Intensity data for categories of experimental control probes. XXXXXX_gene_diff: Intensity determining if gene expression levels have changed between two experimental groups.
Cluster Analysis
System Controls Housekeeping Controls: The intactness of the biological specimen can be monitored by this. Housekeeping Controls: The intactness of the biological specimen can be monitored by this. Biotin Control: Successful secondary staining is indicated by a positive hybridization signal from these probe. Biotin Control: Successful secondary staining is indicated by a positive hybridization signal from these probe. Negative Controls: This represents measurement of background, non-specific binding or cross-hybridization. Negative Controls: This represents measurement of background, non-specific binding or cross-hybridization.
Cont. Controls for Hybridization: Controls for Hybridization: Cy3-Labeled Hyb control Cy3-Labeled Hyb control Low Stringency Hyb Control Low Stringency Hyb Control High Stringency Hyb Control High Stringency Hyb Control