Emergent Biology Through Integration and Mining Of Microarray Datasets Lance D. Miller GIS Microarray & Expression Genomics
Mining of expression data to understand the molecular composition of human cancers and to define components of the tumor molecular profile with mechanistic and clinical importance. FOCUS:
2001, PNAS
Molecular classes are predictive of outcome overall survival:relapse-free survival:
70-gene prognosis classifier for predicting risk of distant metastasis within 5 years Van’t veer, et. al.
Sotiriou, et. al.
Though each tumor is molecularly unique, there exist common transcriptional cassettes that underly biological and clinical properties of tumors that may be of diagnostic, prognostic and therapeutic significance.
GOAL: Mining of expression data to understand the molecular composition of human cancers and to define components of the tumor molecular profile with mechanistic and clinical importance.
The GIS Perpetual Array Platform
Integration of Independent Datasets Perou et. al., 1999Sorlie et. al., 2001West et. al., 2001
Meta-Analysis of Breast Cancer Datasets: dataset source sample size array format 1. Miller-Liu: unpublished 61 tumors: 39 ER+, 22 ER- 19K spotted oligo 2. Sotiriou-Liu: submitted: PNAS 99 tumors: 34 ER+, 65 ER- 7.6K spotted cDNA 3. Gruvberger-Meltzer: Cancer Research 47 tumors: 23 ER+, 24 ER- 6.7K spotted cDNA 4. Sorlie-Borrensen-Dale: PNAS 74 tumors: 56 ER+, 18 ER- 8.1K spotted cDNA 5. van’t Veer-Friend: Nature 98 tumors: 59 ER+, 39 ER- 25K spotted oligo 6. West-Nevins: PNAS 49 tumors: 25 ER+, 24 ER- 7.1K Affymetrix total: 428 tumors, ~73,500 probes (Adaikalavan Ramasamy et. al.)
META MADB: The Construct 1.Extract and Format the Data 2.Link sample/probe info via unique keys 3.Log Transform and Normalize 4.Filter Genes and Arrays 5.Apply Statistical Tests Building the Matrix Creating a Universe 1.Apply UniGene ID as Unifying Key 2.Remove Gene Redundancy 3.Extract p values, d values, z-scores 4.Set p value threshold 5.Merge Datasets
META MADB
d values (difference of average expression) T1T1 T2T2 T3T3 T4T4 T5T5 …Tn…Tn T1T1 T2T2 T3T3 T4T4 T5T5 …Tn…Tn gene 1 : e1e1 e2e2 e3e3 e4e4 e5e5 …en…en e1e1 e2e2 e3e3 e4e4 e5e5 …en…en d = average e [ER+]average e [ER-] / ER+ER-
Identifying Grade-Specific Genes in Hepatocellular Carcinoma Sample: 10 cases of each class Sample collection: HBV(+) Array: Human 19K Oligonucleotide array Analysis : 50 arrays OAHAAHG1G2G3 HCC Progression Pre-neoplastic lesions Adenomatous hyperplasia ordinaryatypical HCC Grade 1, 2, 3
Identifying Grade-Specific Genes in Hepatocellular Carcinoma
Identifying Grade-Specific Genes in Hepatocellular Carcinoma
Breast Cancer Grade-Associated Genes as Predictors of HCC Grade? HCC BC
Breast Cancer Grade-Associated Genes as Predictors of HCC Grade? HCC
Estrogen Responsive Genes in vitro (Chin-Yo Lin)
(p<0.001) Estrogen-Responsive in vitro and ER Status-Associated in vivo E2E2 + ICIE2 + CHX
Identifying Cancer-Linked Genes in Epithelial Adenocarcinomas Datasets: 3 gastric, 3 prostate, 2 liver, 1 lung
selection at p< Genes that Distinguish Tumor from Normal at p<0.001 in at least 3 of the 4 Tumor Types
database components: internal and external datasets derived from: - tumor studies (clinical samples) - in vitro, pathway studies (eg, timecourse) - SAGE data - mouse studies (in vitro/in vivo) An Integrated Database for Pan-Cancer Meta-Analysis of Gene Expression Data Summary
Derive expression signatures for all major factors known or suspected to have prognostic value Determine the reliability of expression signatures in outcome prediction Expand integrated database for pan- cancer meta-analysis Integrate expression profiling into clinical decision making Future Directions
Acknowledgements Catholic University of Korea Suk-Woo Nam Jung Yong Lee GIS Adai Ramasamy Liza Vergara Phil Long Chin-Yo Lin Benjamin Mow