DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.

Slides:



Advertisements
Similar presentations
Basic Gene Expression Data Analysis--Clustering
Advertisements

Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.
Gene Shaving – Applying PCA Identify groups of genes a set of genes using PCA which serve as the informative genes to classify samples. The “gene shaving”
Dimensionality Reduction PCA -- SVD
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Getting the numbers comparable
Gene expression analysis summary Where are we now?
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Unsupervised Learning - PCA The neural approach->PCA; SVD; kernel PCA Hertz chapter 8 Presentation based on Touretzky + various additions.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Statistical Analysis of Microarray Data
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
09/05/2005 סמינריון במתמטיקה ביולוגית Dimension Reduction - PCA Principle Component Analysis.
Clustering Petter Mostad. Clustering vs. class prediction Class prediction: Class prediction: A learning set of objects with known classes A learning.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.
Introduction to Bioinformatics - Tutorial no. 12
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Introduction to DNA microarrays DTU - January Hanne Jarmer.
Scanning and image analysis Scanning -Dyes -Confocal scanner -CCD scanner Image File Formats Image analysis -Locating the spots -Segmentation -Evaluating.
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Chapter 2 Dimensionality Reduction. Linear Methods
Clustering of DNA Microarray Data Michael Slifker CIS 526.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Introduction to DNA microarrays DTU - May Hanne Jarmer.
Pattern Recognition Introduction to bioinformatics 2006 Lecture 4.
Microarrays.
Clustering in Microarray Data-mining and Challenges Beyond Qing-jun Wang Center for Biophysics & Computational Biology University of Illinois at Urbana-Champaign.
PCA, Clustering and Classification by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
es/by-sa/2.0/. Principal Component Analysis & Clustering Prof:Rui Alves Dept Ciencies Mediques.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Analysis of GO annotation at cluster level by Agnieszka S. Juncker.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Statistical Analysis of Microarray Data By H. Bjørn Nielsen.
Introduction to Microarrays. The Central Dogma.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Principle Component Analysis and its use in MA clustering Lecture 12.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Classification Categorization is the process in which ideas and objects are recognized, differentiated and understood. Categorization implies that objects.
Principal Components Analysis ( PCA)
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
PREDICT 422: Practical Machine Learning
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Principal Component Analysis
John Nicholas Owen Sarah Smith
PCA, Clustering and Classification by Agnieszka S. Juncker
Analysis of GO annotation at cluster level by Agnieszka S. Juncker
Descriptive Statistics vs. Factor Analysis
Multivariate Statistical Methods
Dimension reduction : PCA and Clustering
Register variation: correlation, clusters and factors
The Numerology of T Cell Functional Diversity
Presentation transcript:

DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction exercise

DNA Microarray Bioinformatics - #27611 Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical Analysis Fit to Model (time series) Expression Index Calculation Advanced Data Analysis ClusteringPCAClassification Promoter Analysis Meta analysisSurvival analysisRegulatory Network Comparable Gene Expression Data Normalization Image analysis The DNA Array Analysis Pipeline

DNA Microarray Bioinformatics - #27611 Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical Analysis Fit to Model (time series) Expression Index Calculation Advanced Data Analysis ClusteringPCAClassification Promoter Analysis Meta analysisSurvival analysisRegulatory Network Comparable Gene Expression Data Normalization Image analysis The DNA Array Analysis Pipeline

DNA Microarray Bioinformatics - #27611 Dimension reduction methods Principal component analysis (PCA) Cluster analysis Multidimensional scaling Correspondance analysis Singular value decomposition Slides stolen more or less from Agnieszka Juncker.

DNA Microarray Bioinformatics - #27611 Dimension reduction methods Principal component analysis (PCA) Cluster analysis Multidimensional scaling Correspondance analysis Singular value decomposition

DNA Microarray Bioinformatics - #27611 Principal Component Analysis (PCA) used for visualization of complex data developed to capture as much of the variation in data as possible

DNA Microarray Bioinformatics - #27611 Principal components 1. principal component (PC1) –the direction along which there is greatest variation 2. principal component (PC2) –the direction with maximum variation left in data, orthogonal to the 1. PC

DNA Microarray Bioinformatics - #27611 Principal components

DNA Microarray Bioinformatics - #27611 PCA on all Genes Leukemia data, precursor B and T Plot of 34 patients, dimension of 8973 genes reduced to 2

DNA Microarray Bioinformatics - #27611 PCA of genes (Leukemia data) Plot of 8973 genes, dimension of 34 patients reduced to 2

DNA Microarray Bioinformatics - #27611 Principal components General about principal components –summary variables –linear combinations of the original variables –uncorrelated with each other –capture as much of the original variance as possible

DNA Microarray Bioinformatics - #27611 Principal components - Variance

DNA Microarray Bioinformatics - #27611 Hierarchical –agglomerative (buttom-up) -divisive (top-down) Partitioning –eg. K-means clustering Clustering methods

DNA Microarray Bioinformatics - #27611 Hierarchical clustering Representation of all pairwise distances Parameters: none (distance measure) Results: –in one large cluster –hierarchical tree (dendrogram) Deterministic

DNA Microarray Bioinformatics - #27611 Hierarchical clustering – UPGMA Algorithm Assign each item to its own cluster Join the nearest clusters Reestimate the distance between clusters Repeat for 1 to n

DNA Microarray Bioinformatics - #27611 Hierarchical clustering

DNA Microarray Bioinformatics - #27611 Hierarchical clustering Data with clustering order and distances Dendrogram representation

DNA Microarray Bioinformatics - #27611 Leukemia data - clustering of patients

DNA Microarray Bioinformatics - #27611 Leukemia data - clustering of patients on top 100 significant genes

DNA Microarray Bioinformatics - #27611 Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical Analysis Fit to Model (time series) Expression Index Calculation Advanced Data Analysis ClusteringPCAClassification Promoter Analysis Meta analysisSurvival analysisRegulatory Network Comparable Gene Expression Data Normalization Image analysis The DNA Array Analysis Pipeline

DNA Microarray Bioinformatics - #27611 Leukemia data - clustering of patients on top 100 significant genes

DNA Microarray Bioinformatics - #27611 Leukemia data - clustering of genes

DNA Microarray Bioinformatics - #27611

K-means clustering Partition data into K clusters Parameter: Number of clusters (K) must be chosen Randomilized initialization: –different clusters each time

DNA Microarray Bioinformatics - #27611 K-means - Algorithm Assign each item a class in 1 to K (randomly) For each class 1 to K –Calculate the centroid (one of the K-means) –Calculate distance from centroid to each item Assign each item to the nearest centroid Repeat until no items are re-assigned (convergence)

DNA Microarray Bioinformatics - #27611 K-mean clustering, K=3

DNA Microarray Bioinformatics - #27611 K-mean clustering, K=3

DNA Microarray Bioinformatics - #27611 K-mean clustering, K=3

DNA Microarray Bioinformatics - #27611 K-means clustering of Leukemia data

DNA Microarray Bioinformatics - #27611 Steen Knudsen: A Biologist’s guide to Analysis of microarray data. Chapter 4: Visualization by Reduction of Dimensionality (PCA) Chapter 5: Cluster Analysis

DNA Microarray Bioinformatics - #27611 Bioinformatics – Real science or fortunetelling? 1.Changing paradigm From final answer to qualified guessing From student to “real” scientist YOU are evolving with this course 2.Always cheating NOT real biology - Approximation to the truth No final answer (in our life time)