Anthony Gitter Cancer Bioinformatics (BMI 826/CS 838) May 5, 2015

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

Microarray Data Analysis (Lecture for CS397-CXZ Algorithms in Bioinformatics) March 19, 2004 ChengXiang Zhai Department of Computer Science University.
Instance-based Classification Examine the training samples each time a new query instance is given. The relationship between the new query instance and.
Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
Supervised Learning Recap
TCGA(The cancer genome atlas) catalogue genetic mutations responsible for cancer, using genome sequencing and bioinformatics The TCGA is sequencing the.
Generative Topic Models for Community Analysis
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
Linking Proteomic and Transcriptional Data through the Interactome and Epigenome Reveals a Map of Oncogene- induced Signaling Anthony Gitter Cancer Bioinformatics.
Network-based stratification of tumor mutations Matan Hofree.
Introduction Integrative Analysis of Genomic Variants in Carcinogenesis Syed Haider, Arek Kasprzyk, Pietro Lio Artificial Intelligence and Computational.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Predictive sub-typing of subjects Retrospective and prospective studies Exploration of clinico-genomic data Identify relevant gene expression patterns.
3 rd Summer School in Computational Biology September 10, 2014 Frank Emmert-Streib & Salissou Moutari Computational Biology and Machine Learning Laboratory.
Integrative omics analysis Qi Liu Center for Quantitative Sciences Vanderbilt University School of Medicine
Chapter 5 Data mining : A Closer Look.
Introduction The goal of translational bioinformatics is to enable the transformation of increasingly voluminous genomic and biological data into diagnostics.
Health and CS Philip Chan. DNA, Genes, Proteins What is the relationship among DNA Genes Proteins ?
Supplementary Figure 1. Somatic mutation spectrum # Substitutions # Substitutions per Mb b c a Repeats Pseudogenes Whole genome Splice sites Non-coding.
Habil Zare Department of Genome Sciences University of Washington
Gene expression profiling identifies molecular subtypes of gliomas
Cancer is heterogeneous disease! -> enabled characterization of new tumor subtypes for improving personalized treatment and ultimately achieving better.
Radiogenomics in glioblastoma multiforme
1 A Presentation of ‘Bayesian Models for Gene Expression With DNA Microarray Data’ by Ibrahim, Chen, and Gray Presentation By Lara DePadilla.
Systematic reviews of genetic association studies Robert Walton Fiona Fong 15 March 2013.
Molecular Diagnosis Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
Networks and Interactions Boo Virk v1.0.
Diagnosis of multiple cancer types by shrunken centroids of gene expression Course: Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Inferring transcriptional and microRNA-mediated regulatory programs in glioblastma Setty, M., et al.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
Lecture 11. Topics in Omic Studies (Cancer Genomics, Transcriptomics and Epignomics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational.
A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt,
The Broad Institute of MIT and Harvard Differential Analysis.
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
CBioPortal Web resource for exploring, visualizing, and analyzing multidimentional cancer genomics data.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin Hoadley, KA et al. Cell 158(4):
Review of statistical modeling and probability theory Alan Moses ML4bio.
(1) Genotype-Tissue Expression (GTEx) Largest systematic study of genetic regulation in multiple tissues to date 53 tissues, 500+ donors, 9K samples, 180M.
Advances and challenges in computational modeling and statistical learning of biological systems Qi Liu Department of Biomedical Informatics Vanderbilt.
miRNA-targets cross-talks: key players in glioblastoma multiforme
Cancer Genomics and Class Discovery
Transcriptional heterogeneity of breast cancer subtypes,
Knowledge Engine for Genomics (KnowEnG):
Hallett, et al., - Supplementary Figure 1
Global Transcriptional Dysregulation in Breast Cancer
Functional Genomics Analysis Reveals a MYC Signature Associated with a Poor Clinical Prognosis in Liposarcomas  Dat Tran, Kundan Verma, Kristin Ward,
Dept of Biomedical Informatics University of Pittsburgh
A bioinformatic analysis of microRNAs role in osteoarthritis
Focus on central nervous system neoplasia
Volume 17, Issue 1, Pages (January 2010)
Design and Multiseries Validation of a Web-Based Gene Expression Assay for Predicting Breast Cancer Recurrence and Patient Survival  Ryan K. Van Laar 
Dimension reduction : PCA and Clustering
Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.
Volume 20, Issue 5, Pages (November 2014)
Recurrence-Associated Long Non-coding RNA Signature for Determining the Risk of Recurrence in Patients with Colon Cancer  Meng Zhou, Long Hu, Zicheng.
Volume 4, Issue 3, Pages (August 2013)
V13 Multi-omics data integration
JNK signature in breast cancer cells is linked to ECM, stem cell, and wound healing gene networks and is enriched in basal‐like breast cancer JNK signature.
Volume 20, Issue 5, Pages (November 2014)
Altered Caspase-8 Expression
Volume 26, Issue 12, Pages e5 (March 2019)
Single-cell phenotyping with Traitar.
Breast Cancer Subtype Identification Using RNA-Seq Data
Knowledge-Guided Sample Clustering
Figure 1. Identification of three tumour molecular subtypes in CIT and TCGA cohorts. We used CIT multi-omics data ( Figure 1. Identification of.
Highly metastatic PDAC cells have a unique gene signature, which is not preserved in metastases but predicts poor patient outcome. Highly metastatic PDAC.
Volume 28, Issue 4, Pages e6 (July 2019)
Presentation transcript:

Dissecting cancer heterogeneity with a probabilistic genotype-phenotype model Anthony Gitter Cancer Bioinformatics (BMI 826/CS 838) May 5, 2015 All figures from Cho2013 unless noted otherwise

Class business Project presentations Thursday Guidelines on website Project report due May 11 How to schedule presentation order?

Inspiration from CMapBatch Chris rank 1 Jiayue rank 4 Network stratification project rank √4 (1) Anita rank 7 Vee rank 6 Survival prediction project rank √42 (3) Taylor rank 3 Haixiang rank 5 Erkin rank 2 Clustering pipeline project rank √15 (2) Outlier

Subtyping in cancer Substantial differences across tumors even within one type of cancer Molecular alterations Survival outcomes Response to therapy

Traditional subtyping Learn gene expression signature to distinguish classes AML vs ALL PAM50 for breast cancer Glioblastoma (GBM) Verhaak2010

GBM subtypes Learn class centroids with ClaNC (classification to nearest centroids) t-test statistic to identify genes 210 genes per class in GBM Neural subtype has been criticized Verhaak2010

Many analyses depend on subtypes MutSig or other enrichment tests

Many analyses depend on subtypes Group lasso in regulator regression Setty2012

Many analyses depend on subtypes DIGGIT functional CNV association test Chen2014

Problem with subtype classifiers Cancer and individual tumors are heterogeneous Ding2014

Heterogeneity in expression classification Single-cell RNA-seq shows a single GBM tumor is composed of cells from multiple subtypes Patel2014

Prob_GBM: mixtures of subtypes Patients are mixtures of subtypes Subtypes are mixtures of genomic factors Sound familiar?

Relation to Non-negative Matrix Factorization Network-based stratification Similar concepts, different strategies Hoffree2013

Prob_GBM model Gene expression is a molecular level phenotype Treated as effect of disease, not cause Patient-patient similarity based on expression Genomic factors cause disease Mutations, CNV, miRNAs Expression similarities explained by genomic similarities

Build patient-patient similarity network

Choose co-expression threshold

Learn subtype distributions

Likelihood of edge between similar patients from subtype assignments

Inspired by relational topic model Documents are bags of words Document-document citation network Chang2010

Mapping to cancer domain Documents = patients Bag of words = bag of genomic alterations Document citation link = patient-patient co- expression above some threshold

Generative probabilistic model patient subtype “gene” patients d -> p w -> g Chang2010

Generative probabilistic model Chang2010 γ

Prob_GBM distributions Joint distribution Posterior distribution of the latent variables

Model estimation Cannot maximize posterior exactly Gibbs sampling generates samples from this distribution Two Gibbs sampling references: 1 page summary 231 slide tutorial

Latent variables of interest Subtype distributions per patient p Distributions of genomic alteration n under subtype k

Visualizing patient distributions

Visualizing genomic alteration distributions

Assigning patients to subtypes

Neural is mixture of subtypes

Stability of subtype assignments

Ultimate patient-subtype, alteration-subtype associations