Biology-Driven Clustering of Microarray Data Applications to the NCI60 Data Set K.R. Coombes, K.A. Baggerly, D.N. Stivers, J. Wang, D. Gold, H.G. Sung,

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

An Introduction to Multivariate Analysis
Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
The Rice Functional Genomics Program of China cDNA microarray database (RIFGP-CDMD) consists of complete datasets, including the probe sequences, microarray.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Introduction to Bioinformatics
Clustering approaches for high- throughput data Sushmita Roy BMI/CS 576 Nov 12 th, 2013.
Tecniche di Intelligenza Artificiale in Bioinformatica Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni.
Statistics Tools in GeneSpring The Center for Bioinformatics UNC at Chapel Hill Jianping Jin Ph.D. Bioinformatics Scientist Phone: (919)
Using Gene Ontology Models and Tests Mark Reimers, NCI.
Microarray Data Preprocessing and Clustering Analysis
Gene Ontology Luis Tari. Gene Ontology (GO) URL: Gene Ontology is A hierarchy of roles of genes.
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
1 Microarray Cancer Data Visualization Analysis in Relation to Pharmacogenomics By Ngozi Nwana.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Data visualization in the post-genomics era Carol Morita Genentech, Inc.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Alizadeh et. al. (2000) Stephen Ayers 12/2/01. Clustering “Clustering is finding a natural grouping in a set of data, so that samples within a cluster.
PPA 415 – Research Methods in Public Administration Lecture 4 – Measures of Dispersion.
Microarray-based Disease Prognosis using Gene Annotation Signatures Michael Kovshilovsky Swapna Annavarapu SoCalBSI 2005.
Microarray Data Analysis Using R Studies in Tissue Databases Mark Reimers, NCI.
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Clustering and MDS Exploratory Data Analysis. Outline What may be hoped for by clustering What may be hoped for by clustering Representing differences.
Analysis of microarray data
Analyzing Metabolomic Datasets Jack Liu Statistical Science, RTP, GSK
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Malignant Melanoma and CDKN2A
NOTES: CH 18 part 2 - The Molecular Biology of Cancer
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
Table S1. Characteristics of breast tumor and normal breast tissue samples. Relevant characteristics of breast tumor and normal breast tissue samples analyzed.
CDNA Microarrays MB206.
Data Type 1: Microarrays
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
Dr Paul Lewis Lecturer in Bioinformatics Lecturer in Bioinformatics Cardiff University Cardiff University Biostatistics & Bioinformatics Unit Biostatistics.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Scenario 6 Distinguishing different types of leukemia to target treatment.
Gene expression analysis
S factor  body weight during WM  body weight during ER Supplementary figure 1: S factor distribution in each diet. S.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.
PaLS: Pathways and Literature Strainer Filtering common literature, ontology terms and pathway information. Andrés Cañada Pallarés Instituto Nacional de.
Top X interactions of PIN Network A interactions Coverage of Network A Figure S1 - Network A interactions are distributed evenly across the top 60,000.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
Figure SOM1. Functional roles of the genes affected in zmet2-m1 mutants. Although the genes localized on the intracellular membranes were slightly over-represented.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Clustering Algorithms to make sense of Microarray data: Systems Analyses in Biology Doug Welsh and Brian Davis BioQuest Workshop Beloit Wisconsin, June.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Gene Sleuthing Lorraine Sartori Majid Masso Paul R. McCreary.
A Report on CAMDA’01 Biointelligence Lab School of Computer Science and Engineering Seoul National University Kyu-Baek Hwang and Jeong-Ho Chang.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Brad Windle, Ph.D Unsupervised Learning and Microarrays Web Site: Link to Courses and.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Cancer Cancer- a malignant tumor; the result of abnormal cell proliferation. Regulation of Cell Division –Tumor Supressor Genes Genes that inhibit cell.
Compound F Derivatives: Compound H Derivatives: Compound N Derivatives: F1F2F3 F4F5F6 F7F8 H1H2H3 H4H5H6 H7H8 N1N2 Supplementary Figure 1: Chemical structures.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Lab 5 Unsupervised and supervised clustering Feb 22 th 2012 Daniel Fernandez Alejandro Quiroz.
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Presentation transcript:

Biology-Driven Clustering of Microarray Data Applications to the NCI60 Data Set K.R. Coombes, K.A. Baggerly, D.N. Stivers, J. Wang, D. Gold, H.G. Sung, and S.J. Lee

Introduction Microarray data is more than a large, unstructured matrix. –We already know many genes important for studying cancer through their involvement in specific biological processes –We also know that reproducible chromosomal abnormalities play an important role in cancer Need analytical methods that use biological information early

Methods First, updated the annotations of the genes on the microarray Performed separate analyses –using genes on individual chromosomes –using genes involved in different biological processes Developed ways to assess how well each set of genes classified samples

Quality of Annotations Problem: –I.M.A.G.E. clone IDs and GenBank accession numbers are archival –UniGene clusters, gene names, descriptions, functions, etc., are changeable Solution: –Download latest UniGene (build 137) and LocusLink to update annotations

How many genes on the array have good annotations? Only trust the 7478 spots where the UniGene clusters match.

Where are the genes located?

How do we determine the functions of genes? UniGene -> LocusLink -> GeneOntology GeneOntology is a structured, hierarchical vocabulary to describe gene functions in three broad areas: –biological process (why) –molecular function (what) –cellular component (where)

What kinds of genes are on the microarray?

Data Preprocessing Remove spots with poor annotations and spots with median intensity below the 97th percentile of empty spots. Normalize each array so median log ratio between channels is one Center each gene so mean log ratio across experiments is zero Use (1-correlation)/2 as distance metric

How well does a set of genes distinguish types of cancer? Three methods for assessment: –Qualitative (PCA, MDS) –Quantitative (PCA + ANOVA) –Semi-quantitative (Grading Dendrograms)

Multidimensional Scaling

PCANOVA

How good is a dendrogram? A = cluster contains all and only one kind of cancer B = all, with extras C = all except one D = all except one, with extras E = all except two F = all except two, with extras

Can cancers be distinguished by genes on one chromosome?

Heterogeneity of different types of cancer Some cancers (colon, leukemia) are fairly easy to distinguish from others Some (breast, lung) are so heterogeneous as to be almost impossible to distinguish Some chromosomes (1, 2, 6, 7, 9, 12, 17) can distinguish many cancers. Some (16, 21) are essentially random

Can cancers be distinguished by genes of one function? Table for functional categories looks a lot like the table for chromosomes Some biological process categories (signal transduction, cell proliferation, cell cycle, protein metabolism) can distinguish many types of cancer Others (apoptosis, energy pathways) cannot

Conclusions (I) Multiple views into the data provide substantial insight into differences in cancer types and gene sets. Cancer types differ greatly in their degree of heterogeneity, ranging from homogeneous (colon, leukemia) through moderately heterogeneous (renal, melanoma) to extremely heterogeneous (breast and lung).

Conclusions (II) Homogeneous cancers exhibit strong identifying signals across most views of the data. There are large difference in the ability of genes of different chromosomes or involved in different biological processes to distinguish cancer types.

Supplementary Material Complete results of each analysis by chromosome and by function are available no our web site: /depts/cancergenomics