A Short Overview of Microarrays Tex Thompson Spring 2005
Raw Data ● Microarray data at its most raw consists of a spotted image, and information on what each spot represents (spot intensities and metadata). ● Genes may be spotted in replicate ● Affymetrix chips use a match/mismatch technology to guard against non-specific hybridization.
Normalizing Data ● Normalization of microarray data is the process of removing array-specific bias in order to make results between arrays comparable. ● Intensity data relevant to a single gene needs to be combined and normalized in order to define “expression levels” for each gene. ● The basic idea is that the expression level is proportional to the number of mRNA transcripts of that gene within the tissue of interest.
RMA Normalization ● Each array is assumed to have a common amount of “background noise.” ● Normalization is performed by quantile normalization, such that the intensities across each chip are adjusted to produce identical distributions. ● A statistician (or Google) could tell you much more about this.
Diagram of Microarray Analysis Raw Data Normalized Data mRNA ????????????
What Sorts of Questions Can We Ask? ● What are the most highly/lowly expressed genes in a sample of interest? ● What are the differentially expressed genes across two (or more) samples of interest? ● What sets of genes are always upregulated or downregulated as a set? ● What do you think?
Clustering ● Clustering is the process of assembling N objects into K “clusters” based on a set of measured characteristics. ● For example, a common clustering application is clustering individual samples into clusters based on their gene expression. ● Alternatively, clustering can be used to group together individual genes who similar expression patterns.
Prediction ● Prediction is the process of creating an algorithm for taking an unknown sample and putting it in a known classification scheme. ● For example, a predictor might measure the gene expression levels of an unknown tissue sample and match it to the most probable classification. ● This protocol is very common in studies of different types of cancer.
Algorithms Of Interest ● Principal Component Analysis (PCA) ● Self-Organizing Maps (SOM) ● Support Vector Machines (SVM) ● Linear Discriminant Analysis (LDA) ● K-Means Clustering ● KNN Classifiers ● Differential Expression Statistics ● Assumptions of RMA Normalization
Looking At The Data ● Each array falls into one of four types: – Young – Middle-aged – Old, Mild Presbycusis – Old, Severe Presbycusis
Looking At The Data X13_Frisina_S2_M430A.CEL X1_b_Frisina_S2_M430A.CEL _at _at _at _at
Go To Work! I'll be available for questions via until 9:30am and via These slides will be made available on the course website.