Gene Expression 1. Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC EPCLUST 2.

Slides:



Advertisements
Similar presentations
Basic Gene Expression Data Analysis--Clustering
Advertisements

Gene Shaving – Applying PCA Identify groups of genes a set of genes using PCA which serve as the informative genes to classify samples. The “gene shaving”
Introduction to Bioinformatics
Cluster Analysis.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Microarray GEO – Microarray sets database
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Introduction to Bioinformatics Algorithms Clustering.
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Computational Biology, Part 12 Expression array cluster analysis Robert F. Murphy, Shann-Ching Chen Copyright  All rights reserved.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Introduction to Bioinformatics - Tutorial no. 12
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress.
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Health and CS Philip Chan. DNA, Genes, Proteins What is the relationship among DNA Genes Proteins ?
Evaluating Performance for Data Mining Techniques
Gene expression & Clustering (Chapter 10)
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Clustering Methods K- means. K-means Algorithm Assume that K=3 and initially the points are assigned to clusters as follows. C 1 ={x 1,x 2,x 3 }, C 2.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Gene expression analysis
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Cluster Analysis Cluster Analysis Cluster analysis is a class of techniques used to classify objects or cases into relatively homogeneous groups.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
More About Clustering Naomi Altman Nov '06. Assessing Clusters Some things we might like to do: 1.Understand the within cluster similarity and between.
Statistical Analysis of DNA Microarray. An Example of HDLSS in Genetics.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Clustering Algorithms Presented by Michael Smaili CS 157B Spring
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
A B Supporting Information Figure S1: Distribution of the density of expression intensities for the complete microarray dataset (A) and after removal of.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Unsupervised Learning
Clustering Anna Reithmeir Data Mining Proseminar 2017
Semi-Supervised Clustering
Cluster Analysis II 10/03/2012.
Clustering CSC 600: Data Mining Class 21.
CZ5211 Topics in Computational Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
John Nicholas Owen Sarah Smith
Gene expression analysis
Clustering.
Cluster Analysis.
Text Categorization Berlin Chen 2003 Reference:
Clustering.
Unsupervised Learning
Presentation transcript:

Gene Expression 1

Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC EPCLUST 2

Microarray - Reminder 3

Expression Data Matrix Each column represents all the gene expression levels from a single experiment. Each row represents the expression of a gene across all experiments. Exp1Exp 2Exp3Exp4Exp5Exp6 Gene Gene Gene Gene Gene Gene

Expression Data Matrix Each element is a log ratio: log 2 (T/R). T - the gene expression level in the testing sample R - the gene expression level in the reference sample Exp1Exp 2Exp3Exp4Exp5Exp6 Gene Gene Gene Gene Gene Gene

Microarray Data Matrix Black indicates a log ratio of zero, i.e. T=~R Green indicates a negative log ratio, i.e. T<R Red indicates a positive log ratio, i.e. T>R Grey indicates missing data 6

Exp Log ratio Exp Log ratio Microarray Data: Different representations T<R T>R 7

Microarray Data: Clusters 8

How to determine the similarity between two genes? (for clustering) Patrik D'haeseleer, How does gene expression clustering work?, Nature Biotechnology 23, (2005), 9

Microarray Data: Clustering Hierarchical Clustering 10

Hierarchical Clustering: genes with similar expression patterns are grouped together and are connected by a series of branches (dendrogram). Microarray Data: Clustering Leaves (the shapes in our case) represent genes and the length of the paths between leaves represents the distances between genes. Similar genes lie within the same sub-trees.

12 If we want a certain number of clusters we need to cut the tree at a level indicates that number (in this case - four). Hierarchical clustering finds an entire hierarchy of clusters.

Hierarchical clustering result 13 Five clusters

Microarray Data: Clustering K-mean clustering is an algorithm to classify the data into K number of groups. 14 K=4

Microarray Data: Clustering How? 15 The algorithm divides iteratively the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters k initial "means" (in this casek=3) are randomly selected from the data set (shown in color). k clusters are created by associating every observation with the nearest mean The centroid of each of the k clusters becomes the new means. Steps 2 and 3 are repeated until convergence has been reached.

16 Different types of clustering – different results

17 How to search for expression profiles GEO (Gene Expression Omnibus) Human genome browser

Like Series, but further curated and suitable for analysis with GEO tools Expression profiles by gene Microarray experiments Probe sets Groups of related microarray experiments 18 Searching for expression profiles in the GEO

Download dataset Clustering Statistic analysis 19

20 The expression distribution for different lines in the cluster

21

Searching for expression profiles in the Human Genome browser. 22

Keratine 10 is highly expressed in skin 23

24 What can we do with all the expression profiles? Clusters! How? EPCLUST

25

26

27

28

29

Edit the input matrix: Transpose,Normalize,Randomize 30 Hierarchical clustering K-means clustering In the input matrix each column should represents a gene and each row should represent an experiment (or individual).

Graphical representation of the cluster Samples found in cluster 31

32 Initial seeds Final seeds

10 clusters, as requested 33

Gene Expression Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC EPCLUST 34

–last day to decided on a project! 18,23,24/1- Presenting a proposed project in small groups A very short presentation (Max 5 minutes) Title- Background Main question Major tools you are planning to use to answer the questions 6.3 Final submission FINAL PROJECT- Key dates