Microarrays.

Slides:



Advertisements
Similar presentations
BioInformatics (3).
Advertisements

Basic Gene Expression Data Analysis--Clustering
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
Microarray Data Analysis (Lecture for CS397-CXZ Algorithms in Bioinformatics) March 19, 2004 ChengXiang Zhai Department of Computer Science University.
Introduction to Bioinformatics
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Microarray Data Preprocessing and Clustering Analysis
Introduction to Bioinformatics Algorithms Clustering.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Cluster Analysis: Basic Concepts and Algorithms
Microarrays Technology behind microarrays Data analysis approaches
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Introduction to Bioinformatics - Tutorial no. 12
Cluster Analysis for Gene Expression Data Ka Yee Yeung Center for Expression Arrays Department of Microbiology.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Lecture 09 Clustering-based Learning
Clustering and MDS Exploratory Data Analysis. Outline What may be hoped for by clustering What may be hoped for by clustering Representing differences.
Protein Structures.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Health and CS Philip Chan. DNA, Genes, Proteins What is the relationship among DNA Genes Proteins ?
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Functional genomics + Data mining BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin.
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Microarray Data Analysis (Lecture for CS498-CXZ Algorithms in Bioinformatics) Oct 13, 2005 ChengXiang Zhai Department of Computer Science University of.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Microarrays.
More About Clustering Naomi Altman Nov '06. Assessing Clusters Some things we might like to do: 1.Understand the within cluster similarity and between.
Clustering.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Computational Biology
Unsupervised Learning
Self-Organizing Network Model (SOM) Session 11
Semi-Supervised Clustering
Data Mining, Neural Network and Genetic Programming
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Data Clustering Michael J. Watts
Molecular Classification of Cancer
Clustering.
Phylogenetic Trees.
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Protein Structures.
Cluster Analysis in Bioinformatics
Dimension reduction : PCA and Clustering
Self-organizing map numeric vectors and sequence motifs
SEEM4630 Tutorial 3 – Clustering.
Unsupervised Learning
Presentation transcript:

Microarrays

OUTLINE Microarrays Processing Microarray Data K- Means Clustering Hierarchical Clustering SOM

Microarrays Gene Expression: We see difference between cels because of differential gene expression, Gene is expressed by transcribing DNA intosingle-stranded mRNA, mRNA is later translated into a protein, Microarrays measure the level of mRNA expression

Microarrays Gene Expression: mRNA expression represents dynamic aspects of cell, mRNA is isolated and labeled using a fluorescent material, mRNA is hybridized to the target; level of hybridization corresponds to light emission which is measured with a laser

Microarrays

Microarrays

Microarrays

Microarrays

Microarrays Animation (by A. Malcolm Campbell): http://www.bio.davidson.edu/Courses/genomics/chip/chip.html

Microarrays Sample Application (oncotypeDX): http://www.oncotypedx.com/en-US/Breast.aspx

Microarrays Sample Application (oncotypeDX): VIDEO http://www.oncotypedx.com/en-US/Breast.aspx

Processing Microarray Data Problems: Extract data from microarrays, Analyze the meaning of the multiple arrays.

Processing Microarray Data

Processing Microarray Data Differentiating gene expression: R = G  not differentiated R > G  up-regulated R < G  down regulated

Processing Microarray Data Problems: Extract data from microarrays, Analyze the meaning of the multiple arrays.

Processing Microarray Data Characteristics of microarray data: Experiment = (gene1, gene2,…, geneN ) Gene = (experiment1, experiment2, …, experimentM) N is often on the order of 104 M is often on the order of 101

Processing Microarray Data

Processing Microarray Data Data Analysis: Clustering: What genes have similar functions, Subdivide genes or experiments into meaningful classes. Classification: Can we correctly classify an unknown experiment or gene into a known class? FOR EXAMPLE: Can we make better treatment decisions for a cancer patient based on gene expression profile?

Processing Microarray Data Clustering: Find classes in the data, Identify new classes, Identify gene correlations, Methods: K-means clustering, Hierarchical clustering, Self Organizing Maps (SOM)

Processing Microarray Data Distance Measures: Euclidean Distance: Manhattan Distance:

Processing Microarray Data K-means Clustering: Break the data into K clusters, Start with random partitioning, Improve it by iterating.

Processing Microarray Data K-means Clustering: Select # of clusters, say k, Repeat Select k random centroids, {m1, m2,…, mk}, Assign points (genes in this case) to the cluster of closest centroid by using a distance measure, Compute new centroids, {m1, m2,…, mk}, until no change to any centroid.

Processing Microarray Data K-means Clustering:

Processing Microarray Data K-means Clustering: Select # of clusters, say k, Therea are some methods to determine the optimum k, Assume k is given.

Processing Microarray Data K-means Clustering: Select k random centroids, {m1, m2,…, mk}, Just randomly assign the centroids.

Processing Microarray Data K-means Clustering: Assign points (genes in this case) to the cluster of closest centroid by using a distance measure, Use the following formula to find the closest centroid to the gene gi: Then assign gene gi to the closest centroid.

Processing Microarray Data K-means Clustering: Compute new centroids, {m1, m2,…, mk}, Find the average in the cluster: Where: mc: centroid of the cluster c, Nc: the number of points in cluster c, gi: the points in cluster c.

Processing Microarray Data K-means Clustering: Repeat until no change to any centroid. Centroids are in the proper places, We can not observe any other improvement in centroids, Therefore STOP.

Processing Microarray Data K-means Clustering DEMO: Our points:

Processing Microarray Data K-means Clustering DEMO: Centroids (4 centroids, squares):

Processing Microarray Data K-means Clustering DEMO: Assign each point to the closest centroid:

Processing Microarray Data K-means Clustering DEMO: Re evaluate the centroids:

Processing Microarray Data K-means Clustering DEMO: Iterate until no change.

Processing Microarray Data K-means Clustering DEMO 2:

Processing Microarray Data Hierarchical Clustering: Similar to costruction of phylogenetic tree, A distance matrix for all genes are constructed based on distances between their expression profiles. Neighbor-joining or UPGMA can be applied on this matrix to get a hierarchical cluster. Single linkage, complete linkage, average linkage clustering

Processing Microarray Data Hierarchical Clustering: Single linkage: the distance between two clusters is given by the value of the shortest link between the clusters

Processing Microarray Data Hierarchical Clustering: Complete linkage: the distance between two clusters is given by the value of the longest link between the clusters

Processing Microarray Data Hierarchical Clustering: Average linkage: the distance between two clusters is defined as the average of distances between all pairs of objects like UPGMA

Processing Microarray Data Hierarchical Clustering: Linkage Criteria:

Processing Microarray Data Agglomerative Hierarchical Clustering:

Processing Microarray Data Agglomerative Hierarchical Clustering:

Processing Microarray Data Agglomerative Hierarchical Clustering DEMO:

Processing Microarray Data Agglomerative Hierarchical Clustering DEMO:

Processing Microarray Data Agglomerative Hierarchical Clustering DEMO:

Processing Microarray Data Agglomerative Hierarchical Clustering DEMO:

Processing Microarray Data Agglomerative Hierarchical Clustering DEMO:

Processing Microarray Data Agglomerative Hierarchical Clustering DEMO:

Processing Microarray Data Agglomerative Hierarchical Clustering DEMO:

Processing Microarray Data Agglomerative Hierarchical Clustering:

Processing Microarray Data Agglomerative Hierarchical Clustering:

Processing Microarray Data Self-Organizing Feature Maps: by Teuvo Kohonen, a data visualization technique which helps to understand high dimensional data by reducing the dimensions of data to a map.

Processing Microarray Data Self-Organizing Feature Maps: humans simply cannot visualize high dimensional data as is, SOM help us understand this high dimensional data.

Processing Microarray Data Self-Organizing Feature Maps: Based on competitive learning, SOM helps us by producing a map of usually 1 or 2 dimensions, SOM plot the similarities of the data by grouping similar data items together.

Processing Microarray Data Self-Organizing Feature Maps:

Processing Microarray Data Self-Organizing Feature Maps: Input vector, synaptic weight vector x = [x1, x2, …, xm]T wj=[wj1, wj2, …, wjm]T, j = 1, 2,3, l Best matching, winning neuron i(x) = arg min ||x-wj||, j =1,2,3,..,l Weights wi are updated.

Processing Microarray Data Self-Organizing Feature Maps (EXAMPLE): Assume that we want to cluster the countries according to their economic potential, Countries has N properties (like export - import amounts, population …)

Processing Microarray Data Self-Organizing Feature Maps (EXAMPLE): Each country is a point in N dimension, It means each country is a vetor of size N, We want to cluster the countries according to the similarities in economical potential.

Processing Microarray Data Self-Organizing Feature Maps (EXAMPLE):

Processing Microarray Data Self-Organizing Feature Maps (EXAMPLE):

Processing Microarray Data Self-Organizing Feature Maps: Similarly SOM is used to analyze microarray data, Similar genes can be observed easily by this way.

References M. Zvelebil, J. O. Baum, “Understanding Bioinformatics”, 2008, Garland Science Andreas D. Baxevanis, B.F. Francis Ouellette, “Bioinformatics: A practical guide to the analysis of genes and proteins”, 2001, Wiley. Barbara Resch, “Hidden Markov Models - A Tutorial for the Course Computational Intelligence”, 2010. Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000