Health and CS Philip Chan. DNA, Genes, Proteins What is the relationship among DNA Genes Proteins ?

Slides:



Advertisements
Similar presentations
K-Means Clustering Algorithm Mining Lab
Advertisements

Data Set used. K Means K Means Clusters 1.K Means begins with a user specified amount of clusters 2.Randomly places the K centroids on the data set 3.Finds.
K-means algorithm 1)Pick a number (k) of cluster centers 2)Assign every gene to its nearest cluster center 3)Move each cluster center to the mean of its.
Cluster Analysis Measuring latent groups. Cluster Analysis - Discussion Definition Vocabulary Simple Procedure SPSS example ICPSR and hands on.
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Introduction to Bioinformatics
UNSUPERVISED ANALYSIS GOAL A: FIND GROUPS OF GENES THAT HAVE CORRELATED EXPRESSION PROFILES. THESE GENES ARE BELIEVED TO BELONG TO THE SAME BIOLOGICAL.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
K-Means and DBSCAN Erik Zeitler Uppsala Database Laboratory.
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
Local Clustering Algorithm DISCOVIR Image collection within a client is modeled as a single cluster. Current Situation.
Weather Mining Hayato Akatsuka. Objective Cluster a region which shares similar climate.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Introduction to Bioinformatics - Tutorial no. 12
What is Cluster Analysis?
Gene Expression 1. Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC EPCLUST 2.
Fuzzy K means.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Clustering a.j.m.m. (ton) weijters The main idea is to define k centroids, one for each cluster (Example from a K-clustering tutorial of Teknomo, K.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Data Mining Strategies. Scales of Measurement  Stevens, S.S. (1946). On the theory of scales of measurement. Science, 103,  Four Scales  Categorical.
Clustering Unsupervised learning Generating “classes”
CPSC 386 Artificial Intelligence Ellen Walker Hiram College
Functional genomics + Data mining BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin.
Clustering Methods K- means. K-means Algorithm Assume that K=3 and initially the points are assigned to clusters as follows. C 1 ={x 1,x 2,x 3 }, C 2.
Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Microarrays.
tch?v=Y6ljFaKRTrI Fireflies.
Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Unsupervised Learning. Supervised learning vs. unsupervised learning.
CS654: Digital Image Analysis
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Clustering Unsupervised learning introduction Machine Learning.
Lecture 13, CS5671 Clustering Relevance to Bioinformatics –Array(s) analysis –Examples Principal Component Analysis Clustering Algorithms.
Fuzzy C-Means Clustering
Machine Learning Queens College Lecture 7: Clustering.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Canopy Clustering Given a distance measure and two threshold distances T1>T2, 1. Determine canopy centers - go through The list of input points to form.
Cluster Analysis, an Overview Laurie Heyer. Why Cluster? Data reduction – Analyze representative data points, not the whole dataset Hypothesis generation.
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob Fast Algorithms for Projected Clustering.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Clustering Approaches Ka-Lok Ng Department of Bioinformatics Asia University.
Data Mining – Clustering and Classification 1.  Review Questions ◦ Question 1: Clustering and Classification  Algorithm Questions ◦ Question 2: K-Means.
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
Data Mining – Algorithms: K Means Clustering
Unsupervised Learning
Clustering Anna Reithmeir Data Mining Proseminar 2017
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Cluster Analysis II 10/03/2012.
A Genetic Algorithm Approach to K-Means Clustering
Hallett, et al., - Supplementary Figure 1
Clustering.
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Fuzzy Clustering Algorithms
Unsupervised Learning
Presentation transcript:

Health and CS Philip Chan

DNA, Genes, Proteins What is the relationship among DNA Genes Proteins ?

DNA, Genes, Proteins What is the relationship among DNA Genes Proteins ? Some DNA regions are called genes Which are blueprints for making proteins

Gene expression How “active” the gene is Measuring gene expression can help characterize diseases

Cancer Subtypes Why do we want to find subtypes? For each cancer patient We measure gene expression How can we find out cancer subtypes?

Problem Formulation Input Expression levels (values) of each gene Multiple patients Number of subtypes (clusters) Output Cancer subtypes (clusters)

Clustering Ideas?

Clusters (Subtypes) Clusters Similar within a cluster Different across clusters We need to define distance (similarity) Between two patients in terms of gene expression

Distance Function a and b: two patients a i and b i : expression level of gene i

Distance Function

K-means Clustering Algorithm 1. Pick k random patients as centroids 2. Assign each patient to the cluster with the closest centroid 3. Repeat a. Calculate the centroid for each cluster b. Assign each patient to the cluster with the closest centroid Until no changes in cluster membership

Calculating Centroid Let centroid i the expression of gene i of the centroid centroid i = avg. expression of gene i in the cluster

Animation