Lecture 4 Microarray & Analysis Alizadeh et al. Nature 403 (2000) 503-511.

Slides:



Advertisements
Similar presentations
Analysis of Microarray Genomic Data of Breast Cancer Patients Hui Liu, MS candidate Department of statistics Prof. Eric Suess, faculty mentor Department.
Advertisements

Basic Gene Expression Data Analysis--Clustering
Microarray Data Analysis (Lecture for CS397-CXZ Algorithms in Bioinformatics) March 19, 2004 ChengXiang Zhai Department of Computer Science University.
Cluster analysis for microarray data Anja von Heydebreck.
University at BuffaloThe State University of New York Interactive Exploration of Coherent Patterns in Time-series Gene Expression Data Daxin Jiang Jian.
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Gene Expression Chapter 9.
Microarrays Dr Peter Smooker,
SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.
Microarray GEO – Microarray sets database
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 9 Clustering Algorithms Bioinformatics Data Analysis and Tools.
Microarray Data Preprocessing and Clustering Analysis
Discrimination and clustering with microarray gene expression data Terry Speed, Jane Fridlyand, Yee Hwa Yang and Sandrine Dudoit* Department of Statistics,
Introduction to Bioinformatics Algorithms Clustering.
Yeast Dataset Analysis Hongli Li Final Project Computer Science Department UMASS Lowell.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Computational Biology, Part 12 Expression array cluster analysis Robert F. Murphy, Shann-Ching Chen Copyright  All rights reserved.
Alizadeh et. al. (2000) Stephen Ayers 12/2/01. Clustering “Clustering is finding a natural grouping in a set of data, so that samples within a cluster.
Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.
Introduction to Hierarchical Clustering Analysis Pengyu Hong 09/16/2005.
Cluster Analysis for Gene Expression Data Ka Yee Yeung Center for Expression Arrays Department of Microbiology.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Inferring the nature of the gene network connectivity Dynamic modeling of gene expression data Neal S. Holter, Amos Maritan, Marek Cieplak, Nina V. Fedoroff,
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Introduce to Microarray
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
簡介生物晶片 2000 November 14 中央大學物理系與生命科學系 李弘謙. 內容大綱 什麼是生物晶片 基因工程 DNA 知識複習 製造原理 應用與現況.
Gene Expression Analysis using Microarrays Anne R. Haake, Ph.D.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Analysis of microarray data
Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Gene expression & Clustering (Chapter 10)
Functional genomics + Data mining BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin.
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
CDNA Microarrays MB206.
More on Microarrays Chitta Baral Arizona State University.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
LSM3241: Bioinformatics and Biocomputing Lecture 8: Gene Expression Profiles and Microarray Data Analysis Prof. Chen Yu Zong Tel:
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
CZ5211 Topics in Computational Biology Lecture 2: Gene Expression Profiles and Microarray Data Analysis Prof. Chen Yu Zong Tel:
Microarray Data Analysis (Lecture for CS498-CXZ Algorithms in Bioinformatics) Oct 13, 2005 ChengXiang Zhai Department of Computer Science University of.
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Lecture 7. Functional Genomics: Gene Expression Profiling using
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 Dept. Computer Science and Information Engineering.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
High-throughput omic datasets and clustering
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Brad Windle, Ph.D Unsupervised Learning and Microarrays Web Site: Link to Courses and.
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
C LUSTERING José Miguel Caravalho. CLUSTER ANALYSIS OR CLUSTERING IS THE TASK OF ASSIGNING A SET OF OBJECTS INTO GROUPS ( CALLED CLUSTERS ) SO THAT THE.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Image from Gene-Chips (Micorrrays) Statistics for microarray analysis (SMA)
Clustering.
GPX: Interactive Exploration of Time-series Microarray Data
Clustering.
Presentation transcript:

Lecture 4 Microarray & Analysis Alizadeh et al. Nature 403 (2000)

Microarray revolutionized biology and medicine research One gene at a time before, now tens of thousands simultaneously - PROTEOMICS Gene expression Gene disease relation Gene-gene interaction Finding Co-Regulated Genes Understanding Gene Regulatory Networks Many, many more

Basic idea of Microarray 製造原理 – 將可特徵基因之對偶鹼基序列 – 稱為探針 ( probe ) – 排列放置在微晶片 ( microchip ) 上 應用原理 – 將含基因序列之樣品 ( sample ) 液體到在微 晶片上 – 利用互補鹼基雜交作用 ( hybridization ) 的 原理,由 樣品 與微晶片上基因序列相互 作用的情形摘取所需的資訊

Basic idea of Microarray Construction –Place array of probes on microchip Probe (for example) is oligonucleotide ~25 bases long that characterizes gene or genome Each probe has many, many clones Chip is about 2cm by 2cm Application principle –Put (liquid) sample containing genes on microarray and allow probe and gene sequences to hybridize and wash away the rest – Analyze hybridization pattern

cDNA microarray schema cDNA 晶片製造原理

Microarray analysis Operation Principle: Samples are tagged with flourescent material to show pattern of sample-probe interaction (hybridization) Microarray may have 60K probe

Microarray Processing sequence From: Shin-Mu Tseng

Gene Expression Data Gene expression data on p genes for n samples Genes mRNA samples Gene expression level of gene i in mRNA sample j = Log (Red intensity / Green intensity) Log(Avg. PM - Avg. MM) sample1sample2sample3sample4sample5 …

Some possible applications Sample from specific organ to show which genes are expressed Compare samples from healthy and sick host to find gene-disease connection Probes are sets of human pathogens for disease detection

Amount of data from single microarray is huge If just two color, then amount of data on array with N probes is 2 N Cannot analyze pixel by pixel Analyze by pattern – cluster analysis

Major Data Mining Techniques Link Analysis –Associations Discovery –Sequential Pattern Discovery –Similar Time Series Discovery Predictive Modeling –Classification –Clustering

Strengthens signal when averages are taken within clusters of genes (Eisen) Useful (essential ?) when seeking new subclasses of cells, tumours, etc. Leads to readily interpreted figures Cluster Analysis: grouping similarly expressed genes, Cell samples, or both

Some clustering methods and software Partitioning : K-Means, K-Medoids, PAM, CLARA … Hierarchical : Cluster, HAC 、 BIRCH 、 CURE 、 ROCK Density-based : CAST, DBSCAN 、 OPTICS 、 CLIQUE… Grid-based : STING 、 CLIQUE 、 WaveCluster… Model-based : SOM (self-organized map) 、 COBWEB 、 CLASSIT 、 AutoClass… Two-way Clustering Block clustering

A review paper assessing various methods Algorithmic Approaches to Clustering Gene Expression Data, Ron Shamir School of Computer Science, Tel-Aviv University Tel-Aviv – orithmic.html Conclusion: hierarchical clustering exceptional

Partitioning

Density-based clustering

Hierarchical (used most often) agglomerativity divisivity

Hierarchical Clustering: grouping similarly expressed genes gene Sample A B C … … …. … Gene Expression Profile Analysis From: Shin-Mu Tseng

After Clustering gene sample A B C … … …. … Gene Expression Profile Analysis From: Shin-Mu Tseng

Eisen et al. Proc. Natl. Acad. Sci. USA 95 (1998) data clustered randomized row column both time

distance measurements correlation coefficients association coefficients probabilistic similarity coefficients Types of Similarity Measurements

Correlation Coefficients The most popular correlation coefficient is Pearson correlation coefficient (1892) correlation between X={X 1, X 2, …, X n } and Y={Y 1, Y 2, …, Y n } : –where From: Shin-Mu Tseng s XY s XY is the similarity between X & Y

Now can use similarity for Tree construction Normalize similarity so that =1 Then have nxn similarity matrix S whose diagonal elements are 1 Define distance matrix by (for example) D = 1 – S Diagonal elements of D are 0 Now use distance matrix to built tree (using some tree-building software recall lecture on Phylogeny) s XX

A dendrogram (tree) for clustered genes Cluster 6=(1,2) Cluster 7=(1,2,3) Cluster 8=(4,5) Cluster 9= (1,2,3,4,5) Let p = number of genes. 1. Calculate within class correlation. 2. Perform hierarchical clustering which will produce (2p-1) clusters of genes. 3. Average within clusters of genes. 4 Perform testing on averages of clusters of genes as if they were single genes. E.g. p=5

A real case Nature Feb, 2000 Paper by Allzadeh. A et al Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling

Validation Techniques : Hubert’s Γ Statistics X= [X(i, j)] and Y= [Y(i, j)] are two n × n matrix –X(i, j) : similarity of gene i and gene j –Hubert’s Γ statistic represents the point serial correlation : where M = n (n - 1) / 2 –A higher value of Γ represents the better clustering quality. if genes i and j are in same cluster, otherwise From: Shin-Mu Tseng

Discovering sub-groups

Time Course Data Gene Expression is time-dependent

Sample of time course of clustered genes time

Limitations Cluster analyses : –Usually outside the normal framework of statistical inference –Less appropriate when only a few genes are likely to change –Needs lots of experiments Single gene tests : –May be too noisy in general to show much –May not reveal coordinated effects of positively correlated genes. –Hard to relate to pathways

Some useful links Affymetrix Michael Eisen Lab at LBL (hierarchical clustering software “Cluster” and “Tree View” (Windows)) rana.lbl.gov/ Stanford MicroArray Database (“Xcluster” (Linux)) genome-www4.stanford.edu/MicroArray/SMD/ Review of Currently Available Microarray Software Microarray DB

Eisen, M. B. et al., (1998). "Cluster analysis 'and display of genome-wide expression patterns." Proc Natl Acad Sci U S A 95(25): Wen, X., et al., (1998). "Large-scale temporal gene ex- pression mapping of central nervous system development." Proc Natl Acad Sci U S A 95(1): U. Alon, et al., (1999) “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.” PNAS, 96: , June Spellman, P. T. et al., (1998). "Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization.” Mol Biol Cell 9(12): Some papers