Presentation is loading. Please wait.

Presentation is loading. Please wait.

DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.

Similar presentations


Presentation on theme: "DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction."— Presentation transcript:

1 DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction exercise

2 DNA Microarray Bioinformatics - #27611 Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical Analysis Fit to Model (time series) Expression Index Calculation Advanced Data Analysis ClusteringPCAClassification Promoter Analysis Meta analysisSurvival analysisRegulatory Network Comparable Gene Expression Data Normalization Image analysis The DNA Array Analysis Pipeline

3 DNA Microarray Bioinformatics - #27611 Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical Analysis Fit to Model (time series) Expression Index Calculation Advanced Data Analysis ClusteringPCAClassification Promoter Analysis Meta analysisSurvival analysisRegulatory Network Comparable Gene Expression Data Normalization Image analysis The DNA Array Analysis Pipeline

4 DNA Microarray Bioinformatics - #27611 Dimension reduction methods Principal component analysis (PCA) Cluster analysis Multidimensional scaling Correspondance analysis Singular value decomposition Slides stolen more or less from Agnieszka Juncker.

5 DNA Microarray Bioinformatics - #27611 Dimension reduction methods Principal component analysis (PCA) Cluster analysis Multidimensional scaling Correspondance analysis Singular value decomposition

6 DNA Microarray Bioinformatics - #27611 Principal Component Analysis (PCA) used for visualization of complex data developed to capture as much of the variation in data as possible

7 DNA Microarray Bioinformatics - #27611 Principal components 1. principal component (PC1) –the direction along which there is greatest variation 2. principal component (PC2) –the direction with maximum variation left in data, orthogonal to the 1. PC

8 DNA Microarray Bioinformatics - #27611 Principal components

9 DNA Microarray Bioinformatics - #27611 PCA on all Genes Leukemia data, precursor B and T Plot of 34 patients, dimension of 8973 genes reduced to 2

10 DNA Microarray Bioinformatics - #27611 PCA of genes (Leukemia data) Plot of 8973 genes, dimension of 34 patients reduced to 2

11 DNA Microarray Bioinformatics - #27611 Principal components General about principal components –summary variables –linear combinations of the original variables –uncorrelated with each other –capture as much of the original variance as possible

12 DNA Microarray Bioinformatics - #27611 Principal components - Variance

13 DNA Microarray Bioinformatics - #27611 Hierarchical –agglomerative (buttom-up) -divisive (top-down) Partitioning –eg. K-means clustering Clustering methods

14 DNA Microarray Bioinformatics - #27611 Hierarchical clustering Representation of all pairwise distances Parameters: none (distance measure) Results: –in one large cluster –hierarchical tree (dendrogram) Deterministic

15 DNA Microarray Bioinformatics - #27611 Hierarchical clustering – UPGMA Algorithm Assign each item to its own cluster Join the nearest clusters Reestimate the distance between clusters Repeat for 1 to n

16 DNA Microarray Bioinformatics - #27611 Hierarchical clustering

17 DNA Microarray Bioinformatics - #27611 Hierarchical clustering Data with clustering order and distances Dendrogram representation

18 DNA Microarray Bioinformatics - #27611 Leukemia data - clustering of patients

19 DNA Microarray Bioinformatics - #27611 Leukemia data - clustering of patients on top 100 significant genes

20 DNA Microarray Bioinformatics - #27611 Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical Analysis Fit to Model (time series) Expression Index Calculation Advanced Data Analysis ClusteringPCAClassification Promoter Analysis Meta analysisSurvival analysisRegulatory Network Comparable Gene Expression Data Normalization Image analysis The DNA Array Analysis Pipeline

21 DNA Microarray Bioinformatics - #27611 Leukemia data - clustering of patients on top 100 significant genes

22 DNA Microarray Bioinformatics - #27611 Leukemia data - clustering of genes

23 DNA Microarray Bioinformatics - #27611

24 K-means clustering Partition data into K clusters Parameter: Number of clusters (K) must be chosen Randomilized initialization: –different clusters each time

25 DNA Microarray Bioinformatics - #27611 K-means - Algorithm Assign each item a class in 1 to K (randomly) For each class 1 to K –Calculate the centroid (one of the K-means) –Calculate distance from centroid to each item Assign each item to the nearest centroid Repeat until no items are re-assigned (convergence)

26 DNA Microarray Bioinformatics - #27611 K-mean clustering, K=3

27 DNA Microarray Bioinformatics - #27611 K-mean clustering, K=3

28 DNA Microarray Bioinformatics - #27611 K-mean clustering, K=3

29 DNA Microarray Bioinformatics - #27611 K-means clustering of Leukemia data

30 DNA Microarray Bioinformatics - #27611 Steen Knudsen: A Biologist’s guide to Analysis of microarray data. Chapter 4: Visualization by Reduction of Dimensionality (PCA) Chapter 5: Cluster Analysis

31 DNA Microarray Bioinformatics - #27611 Bioinformatics – Real science or fortunetelling? 1.Changing paradigm From final answer to qualified guessing From student to “real” scientist YOU are evolving with this course 2.Always cheating NOT real biology - Approximation to the truth No final answer (in our life time)


Download ppt "DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction."

Similar presentations


Ads by Google