Download presentation
Presentation is loading. Please wait.
Published byAubrie Nash Modified over 9 years ago
1
Biology-Driven Clustering of Microarray Data Applications to the NCI60 Data Set K.R. Coombes, K.A. Baggerly, D.N. Stivers, J. Wang, D. Gold, H.G. Sung, and S.J. Lee
2
Introduction Microarray data is more than a large, unstructured matrix. –We already know many genes important for studying cancer through their involvement in specific biological processes –We also know that reproducible chromosomal abnormalities play an important role in cancer Need analytical methods that use biological information early
3
Methods First, updated the annotations of the genes on the microarray Performed separate analyses –using genes on individual chromosomes –using genes involved in different biological processes Developed ways to assess how well each set of genes classified samples
4
Quality of Annotations Problem: –I.M.A.G.E. clone IDs and GenBank accession numbers are archival –UniGene clusters, gene names, descriptions, functions, etc., are changeable Solution: –Download latest UniGene (build 137) and LocusLink to update annotations
5
How many genes on the array have good annotations? Only trust the 7478 spots where the UniGene clusters match.
6
Where are the genes located?
7
How do we determine the functions of genes? UniGene -> LocusLink -> GeneOntology GeneOntology is a structured, hierarchical vocabulary to describe gene functions in three broad areas: –biological process (why) –molecular function (what) –cellular component (where)
8
What kinds of genes are on the microarray?
9
Data Preprocessing Remove spots with poor annotations and spots with median intensity below the 97th percentile of empty spots. Normalize each array so median log ratio between channels is one Center each gene so mean log ratio across experiments is zero Use (1-correlation)/2 as distance metric
10
How well does a set of genes distinguish types of cancer? Three methods for assessment: –Qualitative (PCA, MDS) –Quantitative (PCA + ANOVA) –Semi-quantitative (Grading Dendrograms)
11
Multidimensional Scaling
12
PCANOVA
13
How good is a dendrogram? A = cluster contains all and only one kind of cancer B = all, with extras C = all except one D = all except one, with extras E = all except two F = all except two, with extras
14
Can cancers be distinguished by genes on one chromosome?
15
Heterogeneity of different types of cancer Some cancers (colon, leukemia) are fairly easy to distinguish from others Some (breast, lung) are so heterogeneous as to be almost impossible to distinguish Some chromosomes (1, 2, 6, 7, 9, 12, 17) can distinguish many cancers. Some (16, 21) are essentially random
18
Can cancers be distinguished by genes of one function? Table for functional categories looks a lot like the table for chromosomes Some biological process categories (signal transduction, cell proliferation, cell cycle, protein metabolism) can distinguish many types of cancer Others (apoptosis, energy pathways) cannot
23
Conclusions (I) Multiple views into the data provide substantial insight into differences in cancer types and gene sets. Cancer types differ greatly in their degree of heterogeneity, ranging from homogeneous (colon, leukemia) through moderately heterogeneous (renal, melanoma) to extremely heterogeneous (breast and lung).
24
Conclusions (II) Homogeneous cancers exhibit strong identifying signals across most views of the data. There are large difference in the ability of genes of different chromosomes or involved in different biological processes to distinguish cancer types.
25
Supplementary Material Complete results of each analysis by chromosome and by function are available no our web site: http://www.mdanderson.org /depts/cancergenomics
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.