CELL INDEX DATABASE (CELLX): A WEB TOOL FOR CANCER PRECISION MEDICINE Pacific Symposium on Biocomputing (PSB) 2015 January 4-8, 2015 The Big Island of Hawaii Keith Ching Senior Principal Scientist, Computational Biology Pfizer, Oncology Research Unit, San Diego, CA
What is CELLX? Web interface to a database of molecular profiling data Cell Lines ( CCLE, Broad, Sanger, GSK, Pfizer ) TCGA – The Cancer Genome Atlas Published studies ( GSE from NCBI GEO ) GTEx - Genotype-Tissue Expression project Custom data ( internal studies ) Datatypes Microarray expression RNA-Seq expression (RSEM) mutation (COSMIC, TCGA, CCLE) Copy Number Variation (CNV) Compound activity (limited) Protein array, RPPA (limited) Meta data, annotations. Pfizer Confidential │ 2
Architecture Demo: Open source YouTube tutorials : Pfizer Confidential │ 3 mysql Apache/ Tomcat Rserve Amazon Web Services minimum requirements: t2.micro vm, 1GB RAM, 1 CPU 150 GB disk space Perl Java
Why CELLX ? For each analysis, half the time is spent on data collection and formatting. –getting most recent dataset. –matching identifiers, merging datatypes Analyses developed to answer a specific question are abstracted and generalized. As new data is generated, the same analysis will be repeated over and over. Pfizer Confidential │ 4
Generalized query For target gene X: –what kinds of alterations mutation, fusion, amplification, deletion, over/under expression –where are alterations found cell lines, primary samples, PDX models –what gene alterations associate (or not) with gene X alterations KRAS mutation, ALK fusion, CCND1 amplification, PD1 expression –what sample characteristics associate with gene X alterations tissue type, subtype, compound sensitivity For target genes W, X, Y, Z –which tumor types have W and X alterations but not Y or Z. Pfizer Confidential │ 5
Precision Medicine Support pre-clinical and translational programs for late-stage targeted oncology agents. ( small molecules or antibodies ) –cell line or Patient Derived Xenograft (PDX) selection mutation status, CNV amp or del, high/low expression –cell line / PDX correlates with agent activity. tissue type, mutation, CNV, expression, meta data –understanding the size / frequency of potential responder indications presence / absence of biomarkers one or more constraints ( tissue type, subtype, subgroup, viral status) –hypothesis testing confirming literature reports, investigator results in public datasets. –easy data access, merging for custom analyses adding custom analyses as new queries Pfizer Confidential │ 6
Expression Pfizer Confidential │ 7
CNV Pfizer Confidential │ 8
Exp vs CNV Pfizer Confidential │ 9
Matrix Pfizer Confidential │ 10
Pfizer Confidential │ 11
Expression / mutation Pfizer Confidential │ 12
Breast Cell line panel screening – CDK4i IC50 values Palbociclib* Gene expression Sens vs. Resist CNV / mutations RB1 *Finn RS, et.al Breast Cancer Res. 2009;11(5):R77. doi: /bcr2419. CCNE1
Metadata test vs. expression of RB1
Meta association with EGFR expression Pfizer Confidential │ 15
RB1, CDKN2A, CCND1 in TCGA breast Pfizer Confidential │ 16
Cutoffs Pfizer Confidential │ 17
Across all TCGA Pfizer Confidential │ 18
Genes correlated with RB1 expression Pfizer Confidential │ 19 TCGA-BRCA-RSEM
Pfizer Confidential │ 20 GLI1
Pfizer Confidential │ 21 Data: George Kan Classes: Kai Wang ACRG HCC – ACVRL1 correlation
Multiple correlations across TCGA (PD-L1) runs correlation across 32 TCGA datasets summary table of number of times a gene appears zip file of each correlation table Pfizer Confidential │ 22 top 100 genes per dataset top 1000 genes per dataset
CD274 / JAK2 / PDCD1LG2 same locus 9p24 Pfizer Confidential │ 23 ACRG157T IGV
Survival Genomewide rank of gene expression and survival. Pfizer Confidential │ 24 TCGA-HNSC
Acknowledgments Paul Rejto : Exec Dir Precision Med CompBio Kai Wang – ACRG subclasses Zhengyan (George) Kan – ACRG data Julio Fernandez – CCLE data Wenyan Zhong – requirements, exp, cnv, mutation correlations Jarek Kostrowicki – R optimization Tao Xi – Tumor vs. normal plots Zhou Zhu – METABRIC data Pfizer Confidential │ 25 Requirements Oncology Business Unit Jean-François Martini : Sr. Dir Biomarker reports, venn, freq Maria Koehler : VP multiple datatype scatter plot Integrative Biology and Biochem Kim Arndt : VP IBB biomarker frequencies by subtype