Tumor Genome Sequencing Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST512.

Slides:



Advertisements
Similar presentations
Surviving Survival Analysis
Advertisements

Survival Analysis In many medical studies, the primary endpoint is time until an event occurs (e.g. death, remission) Data are typically subject to censoring.
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
Cancer: a genetic disease of inherited and somatic mutations n Gene mutations and/or genetic instability are involved in many cancers. n Viruses and environmental.
Cancer Treatment from the DNA Perspective
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Introduction Integrative Analysis of Genomic Variants in Carcinogenesis Syed Haider, Arek Kasprzyk, Pietro Lio Artificial Intelligence and Computational.
BRCA Mutations and Breast Cancer Ruth Phillips and Patty Ashby.
Supplementary Figure 1. Somatic mutation spectrum # Substitutions # Substitutions per Mb b c a Repeats Pseudogenes Whole genome Splice sites Non-coding.
Survival Analysis: From Square One to Square Two
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Genome & Exome Sequencing Read Mapping Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Geuvadis RNAseq analysis at UNIGE Analysis plans
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Targeted Cancer Therapy Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Integrated Studies Of Breast, Esophageal, And Gastric Cancers Using High Throughput Technologies And Computational Analyses Maxwell Lee National Cancer.
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
Genetics-multistep tumorigenesis genomic integrity & cancer Sections from Weinberg’s ‘the biology of Cancer’ Cancer genetics and genomics Selected.
Experimental validation. Integration of transcriptome and genome sequencing uncovers functional variation in human populations Tuuli Lappalainen et al.
Maxwell Lee National Cancer Institute Center for Cancer Research High-dimension Data Analysis Group March 19, 2014 Integrated Studies Of Breast, Esophageal,
Computational Laboratory: aCGH Data Analysis Feb. 4, 2011 Per Chia-Chin Wu.
Lecture 11. Topics in Omic Studies (Cancer Genomics, Transcriptomics and Epignomics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational.
No reference available
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Cancer Genome Landscapes
(1) Genotype-Tissue Expression (GTEx) Largest systematic study of genetic regulation in multiple tissues to date 53 tissues, 500+ donors, 9K samples, 180M.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
Advances and challenges in computational modeling and statistical learning of biological systems Qi Liu Department of Biomedical Informatics Vanderbilt.
Tumor Heterogeneity: From biological concepts to computational methods Bo Li, PhD Dana Farber Cancer Institute Harvard Statistics Department.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
Molecular Biology of Cancer AND Cancer Informatics (omics) David Boone.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Pichai Raman on behalf of cBioPortal Team Wednesday, May 25, 16
Genetics Journal Club Sumeet A. Khetarpal 10 December 2015.
Single Nucleotide Polymorphisms (SNPs
Cancer Genomics Core Lab
A graph-based integration of multiple layers of cancer genomics data (Progress Report) Do Kyoon Kim 1.
Moderní metody analýzy genomu
Cancer Genomics and Class Discovery
April 18 Intro to survival analysis Le 11.1 – 11.2
Survival Analysis: From Square One to Square Two Yin Bun Cheung, Ph.D. Paul Yip, Ph.D. Readings.
Differential Gene Expression
Gene expression.
Gene Hunting: Design and statistics
Fig. 8. Recurrent copy number amplification of BRD4 gene was observed across common cancers. Recurrent copy number amplification of BRD4 gene was observed.
Strategy Description Discovery Validation Application
Figure 2 Copy-number variations in multiple myeloma
Linking Genetic Variation to Important Phenotypes
Volume 151, Issue 5, Pages (November 2016)
Volume 72, Issue 4, Pages (October 2017)
Genomic alterations in breast cancer cell line MDA-MB-231.
Volume 17, Issue 1, Pages (January 2010)
Volume 29, Issue 5, Pages (May 2016)
How will cancer be treated in the 21st century?
The Genetic Basis for Cancer Treatment Decisions
Volume 4, Issue 3, Pages (August 2013)
Patterns of Somatically Acquired Amplifications and Deletions in Apparently Normal Tissues of Ovarian Cancer Patients  Leila Aghili, Jasmine Foo, James.
BF528 - Genomic Variation and SNP Analysis
Patterns of Somatically Acquired Amplifications and Deletions in Apparently Normal Tissues of Ovarian Cancer Patients  Leila Aghili, Jasmine Foo, James.
Knowledge-Guided Sample Clustering
Figure 1. Identification of three tumour molecular subtypes in CIT and TCGA cohorts. We used CIT multi-omics data ( Figure 1. Identification of.
Presentation by: Bryan Lopez UCF - BSC 4434 Professor Xiaoman Li
Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours By: Anh Pham.
Concordance between the genomic landscape identified by whole-exome sequencing of plasma cfDNA and tumor; DNA and recurrence of KDR/VEGFR2 oncogenic mutations.
Highly metastatic PDAC cells have a unique gene signature, which is not preserved in metastases but predicts poor patient outcome. Highly metastatic PDAC.
Molecular characterization of esophagogastric tumors.
Volume 28, Issue 4, Pages e6 (July 2019)
Presentation transcript:

Tumor Genome Sequencing Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST512

Cancer Cancer will affect 1 in 2 men and 1 in 3 women in the United States, and the number of new cases of cancer is set to nearly double by the year Cancer is a genetic disease caused by mutations in the DNA Clinically tumors can look the same but most differ genetically. 2

Different Sequencing Approaches Capture-seq ($ ) –Could focus well known mutations Exome-seq ($700-2K) –All the exons in genes; promoters and LncRNA genes? RNA-seq ($500-2K) –Expression and mutations together, miss anything? Whole genome sequencing ($3-4K) –Majority of mutations non-coding, function unknown –Better at detecting structural changes (translocations, fusions) –Cost-vs-benefit balance 3

Two Major Cancer Genome Projects TCGA: The Cancer Genome Atlas (US) –> 30 cancer types and > 10K tumor samples –Primary tumors, fewer death events –Genome, transcriptome, DNA methylome, proteomics –Rigorous tumor sample QC, consistent profiling platform ICGC: International Cancer Genome Consortium (11 countries) –20 cancer types * 500 tumor samples each 4

Tumor Gene Expression Microarrays or RNA-seq Data analysis? Differential expression between cancer and normal Cluster the tumor samples into sub-types –Consensus clustering: sampling genes or tumors, get robust clustering Predict patient outcome (survival or recurrence) 5 Break

Survival Analysis Do patients receiving the treatment live longer? Are smokers more likely to have cancer currence Censored data: the value of a measurement or observation is only partially known –Some patients left the study –Study concluded 6

Survival Without Censoring 7

Survival With Censoring 8

Kaplan Meier Curve More individuals in each group, better separation of the groups, better p-value 9

Log Rank Test 10

Log Rank Test 11

More Variables 50-signature? Logistic regression: –Estimate odds ratio: ratio of proportions –Linear combination of all the genes to separate outcome (0, 1). Cox Regression –Estimate hazard ratio: ratio of incidence rates –Models the effect of covariates on the hazard rate but leaves the baseline hazard rate unspecified 12

Use Cox Regression to Separate Two Groups by Gene Signature 13

Caution About Gene Signature’s Predictive Power 14 Break

Mutations in the Tumor Genome Help us identify important genes for tumorigenesis and cancer progression Drivers – a.k.a gatekeepers, mutations that cause and accelerate cancers Passengers – Accidental by-products and thwarted DNA-repair mechanisms Recurrent mutations on genes or pathways are likely drivers 15

High Throughput Driver Detection Differential gene expression Copy number aberration (CNA) or variation (CNV) using CGH, tiling or SNP arrays 16

Comparative genomic hybridization (CGH) 17

GISTIC Gscore: frequency of occurrence and the amplitude of the aberration Statistical significance evaluated by permutation FDR adjust for multiple hypothesis testing 18

GATK FASTQ-> BAMBAM->VCFAnnotate 19

MAF and VCF Formats VCF (GWAS format) and MAF (TCGA format) Both can annotate somatic mutations and germline variants Tab delimited text file CHROM, POS, ID (SNP id, gene symbol, or ENTREZ gene id), REF (reference seq), ALT (altered sequence), QUAL (quality score), FILTER (PASS vs “q10;s50” quality <=10, <=50% samples have data here), INFO (allele counts, total counts, number of samples with data, somatic or not, validated, etc) 20

Example of a Cancer Genome Mutations Profile Circos Plot: how messed up a cancer genome is 21

Total alterations affecting protein- coding genes in selected tumors Vogelstein et al, Science

Somatic Mutation Frequency in 3K Tumor-Normal Pairs Typical tumors: median 45 mutations / tumor More mutations for tumors facing outside 23 Break

TS vs Oncogenes, GoF vs LoF Tumor suppressors vs oncogenes Gain of Function (GoF) or Loss of Function (LoF) mutations –Phenotypes How to tell? –From mutation patterns –From expression patterns –Functional studies Some genes can be both TS and oncogenes 24

Mutation Rate Heterogeneity Mutation rate correlated with replication timing, gene expression, and gene length Tumor evolution and selection 25 Lawrence et al, Nat 2013

Recurrent Mutations 26 Known Novel clear cancer assoc Novel Lawrence et al, Nat 2014

How Much Should We Sequence? Need ~200 patients for 20% mutation rate, ~550 pts for 10%, ~1200 pts for 5% mutation rate. Most driver mutations have been found, pressing need in basic cancer research to study their function Biggest surprise: mutations on chromatin regulators –> 50% new and strong cancer driver genes –Oncogenes: DNMT3A, IDH1 –Tumor Suppressor: MLL, ATRX, ARID1A, SNF5 –Both: EZH2 Sequencing metastasized or drug resistant tumors might yield insights on tumor progression 27

Resources MSKCC CBioPortalCBioPortal –GUI interface for experimental biologists Broad FireHoseFireHose –API for accessing processed TCGA data UCSC CGHubCGHub –API for accessing raw and processed cancer data Sanger COSMICCOSMIC –Catalog of Somatic Mutations in Cancer Many also provide software tools 28

Summary Different sequencing approaches Gene Expression, tumor sub-typing Survival analysis: KM vs Cox Regression Different mutation types and distributions Gain or loss of function mutations Tumor suppressor vs oncogenes 29

Acknolwedgement Aleksandar Milosavljevic Kristin Sainani Linda Staub & Alexandros Gekenidis Yin Bun Cheung, Paul Yip John Pack Cheng Li Xujun Wang Peng Jiang 30