Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 Dept. Computer Science and Information Engineering.

Slides:

Advertisements

Similar presentations

K-Means Clustering Algorithm Mining Lab

Advertisements

Yinyin Yuan and Chang-Tsun Li Computer Science Department

Analysis of Microarray Genomic Data of Breast Cancer Patients Hui Liu, MS candidate Department of statistics Prof. Eric Suess, faculty mentor Department.

Clustering Basic Concepts and Algorithms

Microarray Data Analysis (Lecture for CS397-CXZ Algorithms in Bioinformatics) March 19, 2004 ChengXiang Zhai Department of Computer Science University.

2001/12/18CHAMELEON1 CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Paper presentation in data mining class Presenter : 許明壽 ; 蘇建仲.

Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.

T IME WARPING OF EVOLUTIONARY DISTANT TEMPORAL GENE EXPRESSION DATA BASED ON NOISE SUPPRESSION Yury Goltsev and Dmitri Papatsenko *Department of Molecular.

Cluster Analysis.

DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.

SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.

Microarray GEO – Microarray sets database

Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.

Introduction to Bioinformatics Algorithms Clustering.

Dimension reduction : PCA and Clustering by Agnieszka S. Juncker

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.

Computational Biology, Part 12 Expression array cluster analysis Robert F. Murphy, Shann-Ching Chen Copyright  All rights reserved.

Introduction to Bioinformatics Algorithms Clustering.

Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.

Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.

Introduction to Bioinformatics - Tutorial no. 12

Cluster Analysis for Gene Expression Data Ka Yee Yeung Center for Expression Arrays Department of Microbiology.

Generating Robust and Consensus Clusters from Gene Expression Data Allan Tucker a, Stephen Swift a, Xiaohui Liu a, Nigel Martin b, Christine Orengo c,

Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:

Lecture 4 Microarray & Analysis Alizadeh et al. Nature 403 (2000)

Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.

Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.

A hybrid method for gene selection in microarray datasets Yungho Leu, Chien-Pan Lee and Ai-Chen Chang National Taiwan University of Science and Technology.

CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling

BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.

Data Mining Chun-Hung Chou

Gene expression & Clustering (Chapter 10)

Clustering of DNA Microarray Data Michael Slifker CIS 526.

Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.

Presented by Tienwei Tsai July, 2005

Clustering Analysis of Spatial Data Using Peano Count Trees Qiang Ding William Perrizo Department of Computer Science North Dakota State University, USA.

The BIRCH Algorithm Davitkov Miroslav, 2011/3116

DISCOVERING LARGER NETWORK MOTIFS Li Chen 4/16/2009 CSC 8910 Analysis of Biological Network, Spring 2009 Dr. Yi Pan.

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.

Microarray Data Analysis (Lecture for CS498-CXZ Algorithms in Bioinformatics) Oct 13, 2005 ChengXiang Zhai Department of Computer Science University of.

Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.

Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.

Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Model-based evaluation of clustering validation measures.

Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.

Cluster Analysis.

Analyzing Expression Data: Clustering and Stats Chapter 16.

Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.

Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.

1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.

Mr. Idrissa Y. H. Assistant Lecturer, Geography & Environment Department of Social Sciences School of Natural & Social Sciences State University of Zanzibar.

Clustering By : Babu Ram Dawadi. 2 Clustering cluster is a collection of data objects, in which the objects similar to one another within the same cluster.

Cluster Analysis Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010.

Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.

CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course.

Christoph F. Eick Questions Review October 12, How does post decision tree post-pruning work? What is the purpose of applying post-pruning in decision.

Clustering [Idea only, Chapter 10.1, 10.2, 10.4].

Hierarchical Hybrid Search Structure for High Performance Packet Classification Authors : O˜guzhan Erdem, Hoang Le, Viktor K. Prasanna Publisher : INFOCOM,

DATA MINING Spatial Clustering

Data Mining Soongsil University

Image from Gene-Chips (Micorrrays) Statistics for microarray analysis (SMA)

Molecular Classification of Cancer

DataMining, Morgan Kaufmann, p Mining Lab. 김완섭 2004년 10월 27일

GPX: Interactive Exploration of Time-series Microarray Data

Presentation transcript:

Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 Dept. Computer Science and Information Engineering National Cheng Kung University Taiwan, R.O.C. August 13, 2001

2 Outline Microarray Techniques Goal of Microarray Data Mining Clustering Methods Efficient Microarray Data Mining Conclusions

3 Current Status Human genome project is at finishing stage, revealing that there are about 30,000 functional genes in a human cell For more than 90% of the genes, we know little about their real functions

4 Microarray Techniques Main Advantage of Microarray Techniques allow simultaneous studies of the expression of thousands of genes in a single experiment Microarray Process Arrayer Experiments: Hybridization Image Capturing of Results Analysis

5 Goal of Microarray Mining gene test A B C … … …. … Multi-Conditions Expression Analysis

6 Goal of Microarray Mining gene test A B C … … …. … Multi-Conditions Expression Analysis

7 Sample Clustering Results

8 Clustering Methods Types of Clustering Methods Partitioning ： K-Means, K-Medoids, PAM, CLARA … Hierarchical ： HAC 、 BIRCH 、 CURE 、 ROCK Density-based ： CAST, DBSCAN 、 OPTICS 、 CLIQUE… Grid-based ： STING 、 CLIQUE 、 WaveCluster… Model-based ： COBWEB 、 SOM 、 CLASSIT 、 AutoClass…

9 Clustering Methods (cont.) Partitioning Hierarchical

10 Clustering Methods (cont.) Density-basedGrid-based

11 CAST Clustering Input S ： a symmetic n × n Similarity Matrix ， S(i, j) ∈ [0, 1] t ： Affinity Threshold (0 < t < 1) Method 1. Choose a seed for generating a new cluster 2. ADD: add qualified items to the cluster 3. REMOVE: remove unqualified items from the stable cluster 4. Repeat Steps 1-3 till no more clusters can be generated

12 Similarity Measurements ： Correlation Coefficients The most popular correlation coefficient is Pearson correlation coefficient (1892) correlation between X={X 1, X 2, …, X n } and Y={Y 1, Y 2, …, Y n } ： where

13 Similarity Measurements ： Correlation Coefficients (cont.) It captures the similarity of the ‘‘shapes’’ of two expression profiles, and ignores differences between their magnitudes.

14 Problems in Microarray Mining How to cluster microarray data with the following requirements met simultaneously ? Efficiency Accuracy Automation

15 Problems in Microarray Mining (cont.) How to cluster microarray data with the following requirements met simultaneously ? Efficiency Accuracy Automation Good Clustering Methods + Validation Techniques

16 Efficient Microarray Mining Improved CAST algorithm for clustering Hubert’s Γ statistic for validation Iterative sampled computation for automatic clustering

17 Reduce the Computation 1. Narrow down the threshold range 2. Split and Conquer: find “nearly-best” result m = 4 threshold 0100% LM RM LM: Left Margin RM: Right Margin

18 Experimental Results Dataset Source ： Lawrence Berkeley National Lab (LBNL) Michael Eisen's Lab ( ） Microarray expression data of yeast saccharomyces cerevisiae, containing 6221 genes with 80 conditions Similarity matrix was obtained in advance

19 Experimental Results (cont.) Without Range Narrow down Executions ： 19 Execution Time ： 246 sec Γ statistic ： With Range Narrow down Executions ： 13 Execution Time ： 27 sec Γ statistic ：

20 Experimental Results (cont.) Comparison Method Execution Time (Sec) Cluster Number Best Γ Statistic Our Method K-means (k= 3 ~ 21) K-means (k= 3 ~ 39)

21 Conclusions Microarray data analysis is an emerging field needing support of data mining techniques Accuracy Efficiency Automation