L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.

L15:Microarray analysis (Classification)

The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute Lymphocytic Leukemia) & AML (Acute Myelogenous Leukima) Possibly, the set of genes over-expressed are different in the two conditions

Geometric formulation Each sample is a vector with dimension equal to the number of genes. We have two classes of vectors (AML, ALL), and would like to separate them, if possible, with a hyperplane.

Hyperplane properties Given an arbitrary point x, what is the distance from x to the plane L? –D(x,L) = (  T x -  0 ) When are points x 1 and x 2 on different sides of the hyperplane? Ans: If D(x 1,L)* D(x 2,L) < 0 x 00

Separating by a hyperplane Input: A training set of +ve & -ve examples Recall that a hyperplane is represented by – {x:-  0 +  1 x 1 +  2 x 2 =0} or –(in higher dimensions) {x:  T x-  0 =0} Goal: Find a hyperplane that ‘separates’ the two classes. Classification: A new point x is +ve if it lies on the +ve side of the hyperplane (D(x,L)> 0), -ve otherwise. x2x2 x1x1 + -

Hyperplane separation What happens if we have many choices of a hyperplane? –We try to maximize the distance of the points from the hyperplane. What happens if the classes are not separable by a hyperplane? –We define a function based on the amount of mis-classification, and try to minimize it

Error in classification Sample Function: sum of distances of all misclassified points –Let y i =-1 for +ve example i, y i =+1 otherwise. The best hyperplane is one that minimizes D( ,  0 ) Other definitions are also possible. x2x2 x1x1 + - 

Restating Classification The (supervised) classification problem can now be reformulated as an optimization problem. Goal: Find the hyperplane ( ,  0 ), that optimizes the objective D( ,  0 ). No efficient algorithm is known for this problem, but a simple generic optimization can be applied. Start with a randomly chosen ( ,  0 ) Move to a neighboring (  ’,  ’ 0 ) if D(  ’,  ’ 0 )< D( ,  0 )

Gradient Descent The function D(  ) defines the error. We follow an iterative refinement. In each step, refine  so the error is reduced. Gradient descent is an approach to such iterative refinement. D(  )  D’(  )

Rosenblatt’s perceptron learning algorithm

Classification based on perceptron learning Use Rosenblatt’s algorithm to compute the hyperplane L=( ,  0 ). Assign x to class 1 if f(x) >= 0, and to class 2 otherwise.

Perceptron learning If many solutions are possible, it does no choose between solutions If data is not linearly separable, it does not terminate, and it is hard to detect. Time of convergence is not well understood

Linear Discriminant analysis Provides an alternative approach to classification with a linear function. Project all points, including the means, onto vector . We want to choose  such that –Difference of projected means is large. –Variance within group is small x2x2 x1x1 + - 

LDA cont’d x2x2 x1x1 + -  What is the projection of a point x onto  ? –Ans:  T x What is the distance between projected means? x

LDA Cont’d Fisher Criterion

LDA Therefore, a simple computation (Matrix inverse) is sufficient to compute the ‘best’ separating hyperplane

Maximum Likelihood discrimination Suppose we knew the distribution of points in each class. –We can compute Pr(x|  i ) for all classes i, and take the maximum

ML discrimination recipe We know the distribution for each class, but not the parameters Estimate the mean and variance for each class. For a new point x, compute the discrimination function g i (x) for each class i. Choose argmax i g i (x) as the class for x

ML discrimination Suppose all the points were in 1 dimension, and all classes were normally distributed. 11 22 x

ML discrimination (multi- dimensional case) Not part of the syllabus.

Dimensionality reduction Many genes have highly correlated expression profiles. By discarding some of the genes, we can greatly reduce the dimensionality of the problem. There are other, more principled ways to do such dimensionality reduction.

Principle Components Analysis Consider the expression values of 2 genes over 6 samples. Clearly, the expression of the two genes is highly correlated. Projecting all the genes on a single line could explain most of the data. This is a generalization of “discarding the gene”.

PCA Suppose all of the data were to be reduced by projecting to a single line  from the mean. How do we select the line  ? m

PCA cont’d Let each point x k map to x’ k. We want to mimimize the error Observation 1: Each point x k maps to x’ k = m +  T (x k -m)  m  xkxk x’ k

L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.

Similar presentations

Presentation on theme: "L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.

Similar presentations

Presentation on theme: "L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute."— Presentation transcript:

Similar presentations

About project

Feedback