Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.

Slides:



Advertisements
Similar presentations
Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
Advertisements

Outlines Background & motivation Algorithms overview
Component Analysis (Review)
Machine Learning Lecture 8 Data Processing and Representation
Dimensionality Reduction PCA -- SVD
PCA + SVD.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Lecture 19 Singular Value Decomposition
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Principal Component Analysis
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.
3D Geometry for Computer Graphics
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Unsupervised Learning - PCA The neural approach->PCA; SVD; kernel PCA Hertz chapter 8 Presentation based on Touretzky + various additions.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
09/05/2005 סמינריון במתמטיקה ביולוגית Dimension Reduction - PCA Principle Component Analysis.
Face Recognition Jeremy Wyatt.
Dimension Reduction and Feature Selection Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Inferring the nature of the gene network connectivity Dynamic modeling of gene expression data Neal S. Holter, Amos Maritan, Marek Cieplak, Nina V. Fedoroff,
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
SVD(Singular Value Decomposition) and Its Applications
Summarized by Soo-Jin Kim
Principle Component Analysis Presented by: Sabbir Ahmed Roll: FH-227.
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Chapter 2 Dimensionality Reduction. Linear Methods
Presented By Wanchen Lu 2/25/2013
Next. A Big Thanks Again Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
CSE 185 Introduction to Computer Vision Face Recognition.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Principle Component Analysis and its use in MA clustering Lecture 12.
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
CSSE463: Image Recognition Day 25 This week This week Today: Applications of PCA Today: Applications of PCA Sunday night: project plans and prelim work.
Principal Components Analysis ( PCA)
Unsupervised Learning II Feature Extraction
Unsupervised Learning II Feature Extraction
Principal Component Analysis
Principal Component Analysis (PCA)
PREDICT 422: Practical Machine Learning
Exploring Microarray data
Dimensionality Reduction
Lecture: Face Recognition and Feature Reduction
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Principal Component Analysis (PCA)
Machine Learning Dimensionality Reduction
Principal Component Analysis
Principal Component Analysis
Recitation: SVD and dimensionality reduction
Dimension reduction : PCA and Clustering
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
SVD, PCA, AND THE NFL By: Andrew Zachary.
Principal Component Analysis
Presentation transcript:

Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon

2 Microarrays Measure the expression of genes in the cell “Count” the number of mRNA molecules that attach to biological probes Expression data is gathered for many (thousands) of genes at once Data is gathered for several experiments Either in several time stamps or different conditions

3 Relative expression of gene i in experiment j

4 Detecting Patterns in Expression Data Genes may have similar expression patterns because They are part of the same complex (protein-protein interactions) They are part of the same pathway They have similar regulatory elements They have similar functions (part of a fail-safe mechanism) A popular solution: clustering (we saw already) Hierarchical clustering, K-means, agglomerative,... Today: dimensionality reduction PCA SVD

5 Why Dimensionality Reduction Using irrelevant data may harm accuracy Clustering algorithms do not perform well in high dimensional data Visualizing high dimensional data

6 Principle Components Analysis (PCA) PCA seeks for a linear projection that best describes the data in a least mean squares sense Finds a set of principle components (PCs) A PC defines a projection that encapsulates the maximum amount of variation in a dataset Each PC is orthogonal to all other PCs Reduce dimensionality by picking the most informative PCs Namely, for reducing from dimension d to dimension d ’, pick the d ’ most informative PCs

7 PCA - Steps Input: a dataset Subtract the mean from each dimension Compute the covariance matrix  for the d dimensions The covariance of two variables X and Y: The covariance matrix:

8 PCA – Steps (cont.) Compute the eigenvectors and eigenvalues of the covariance matrix Choose the most informative PCs, construct a feature vector Eigenvectors with highest eigenvalues carry the most information Feature vector is simply the combination of all eigenvectors chosen FeatureVector = (eig 1, eig 2, …, eig d’ ) Transform dataset to the new axis system For s  S :

9 When Things Get Messy… PCA is fine when initial dimension is not too big Space and time complexity are of O(d 2 ) - size of covariance matrix Otherwise – we have a problem… E.g. when d=10 4  time/space complexity is O(10 8 )… Luckily an alternative exists: SVD

10 Eigengenes, Eigenarrays and SVD The idea: Use the singular value decomposition (SVD) theorem for transforming the dataset from the gene/array space to the eigengene/eigenarray space Eigengenes, eigenarrays and eigenvalues: Each dimension is represented by an eigengene/eigenarray/eigenvalue triplet Eigenvalues are used for ranking dimensions Paper: Alter et. Al., 2000

11 Singular Value Decomposition (SVD) Theorem: if E is a real M by N matrix, then there exist orthogonal matrices s.t. Where and

12 SVD  i is the i th singular value of E. u i and v i are the i th left singular vector and right singular vector of E, respecti vely. It holds that Efficient algorithms for calculating the SVD exist

13 Orthogonality of Decomposition

14 Orthogonality of Decomposition

15 SVD and Microarray analysis Reduction from the N genes x M arrays to p eigengenes x p eigenarrays space W is the eigenexpression matrix U represents the expression of genes over eigenarrays V represents the expression of eigengenes over arrays The “fraction of eigenexpression”: “Shannon entropy” of the dataset:

16 Example: Cell cycle of Saccharomyces Cerevisiae Data is available for 5981 genes over 14 time steps (with ½ hour intervals) 784 genes were classified as cell-cycle regulated (with no missing values)

17 Data Sorting For eigengenes  1 and  2, plot the correlation of each gene g 1 with both on a 2-D plot X-axis represents the correlation with  2, Y-axis relates to  1. Sort by angular distance

18 Further Reading PCA: L. Smith: A Tutorial on Principal Components Decomposition Eigengenes, eigenvectors and SVD: O. Alter, P. Brown & D. Botstein: Singular Value Decomposition for Genome-wide Expression Data Processing and Modeling, PNAS 97:18, 2000