Human Cancer Genome Project Computational Systems Biology of Cancer: (III)

Slides:



Advertisements
Similar presentations
Linkage and Genetic Mapping
Advertisements

We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Sampling distributions of alleles under models of neutral evolution.
Basics of Linkage Analysis
Bioinformatics lectures at Rice University Li Zhang Lecture 10: Networks and integrative genomic analysis-2 Genome instability and DNA copy number data.
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
1. Estimation ESTIMATION.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Chapter 9 Hypothesis Testing Testing Hypothesis about µ, when the s.t of population is known.
STAC: A multi-experiment method for analyzing array-based genomic copy number data Sharon J. Diskin, Thomas Eck, Joel P. Greshock, Yael P. Mosse, Tara.
Algorithms for Smoothing Array CGH data
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
Evaluation.
PSY 307 – Statistics for the Behavioral Sciences
Comparative Genomic Hybridization (CGH). Outline Introduction to gene copy numbers and CGH technology DNA copy number alterations in breast cancer (Pollack.
Section 9.2 Testing the Mean .
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
Inference for a Single Population Proportion (p).
Estimating a Population Proportion
Copyright © Cengage Learning. All rights reserved. Hypothesis Testing 9.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
1 Statistical Distribution Fitting Dr. Jason Merrick.
Estimating a Population Proportion
1 SMU EMIS 7364 NTU TO-570-N Inferences About Process Quality Updated: 2/3/04 Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow.
1 A Robust Framework for Detecting Structural Variations February 6, 2008 Seunghak Lee 1, Elango Cheran 1, and Michael Brudno 1 1 University of Toronto,
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
CHAPTER 17: Tests of Significance: The Basics
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Identification of Copy Number Variants using Genome Graphs
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Computational Laboratory: aCGH Data Analysis Feb. 4, 2011 Per Chia-Chin Wu.
Simple examples of the Bayesian approach For proportions and means.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
CGH Data BIOS Chromosome Re-arrangements.
Stat 31, Section 1, Last Time Distribution of Sample Means –Expected Value  same –Variance  less, Law of Averages, I –Dist’n  Normal, Law of Averages,
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
1 Probability and Statistics Confidence Intervals.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 3 – Slide 1 of 27 Chapter 11 Section 3 Inference about Two Population Proportions.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Copy Number Analysis in the Cancer Genome Using SNP Arrays Qunyuan Zhang, Aldi Kraja Division of Statistical Genomics Department of Genetics & Center for.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
CSE280Stefano/Hossein Project: Primer design for cancer genomics.
SECTION 7.2 Estimating a Population Proportion. Where Have We Been?  In Chapters 2 and 3 we used “descriptive statistics”.  We summarized data using.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
CHAPTER 10 Comparing Two Populations or Groups
Chapter 8: Inference for Proportions
Copyright © Cengage Learning. All rights reserved.
Discrete Event Simulation - 4
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Iuliana Ionita, Raoul-Sam Daruwala, Bud Mishra 
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Presentation transcript:

Human Cancer Genome Project Computational Systems Biology of Cancer: (III)

Human Cancer Genome Project Bud Mishra Professor of Computer Science, Mathematics and Cell Biology ¦ Courant Institute, NYU School of Medicine, Tata Institute of Fundamental Research, and Mt. Sinai School of Medicine

Human Cancer Genome Project Bliss in Ignorance “ Learn to say ‘I don't know.’ If used when appropriate, it will be often.” –US Secretary of Defense, Mr. Donald Rumsfeld.

Human Cancer Genome Project Mishra’s Mystical 3M’s Rapid and accurate solutions –Bioinformatic, statistical, systems, and computational approaches. –Approaches that are scalable, agnostic to technologies, and widely applicable Promises, challenges and obstacles— Measure Mine Model

Human Cancer Genome Project “Mine” What we can compute and what we cannot

Human Cancer Genome Project Critical Innovations Detecting LOH and TSG –Multi-point Scoring using a novel Relative-Risk Function (based on the earlier Generative Model) –Accurate detection of End-points of Intervals/genes –P-values using Scan Statistics Implementations –NYU Array CGH Toolkit With Access to Genome-variants, UCSC-Browser, NCBI-Browsers, Kegg, etc.

Human Cancer Genome Project Genetics of LOH

Human Cancer Genome Project Finding Cancer Genes LOH/Deletion Analysis analysis Hypothesize a TSG (Tumor Suppressor Gene) Score function for each possible genomic region containing the TSG –Evolutionary history –Interactions –Parameters This score can be computed using estimation from data and also prior information on how the deletions arise. We use a simple approximation; we assume there is a Poisson process that generates breakpoints along the genome and an Exponential process that models the length of the deletions.

Human Cancer Genome Project Loss of Heterozygousity

Human Cancer Genome Project Relative Risk Score For an interval I (set of consecutive probes) we define a multipoint score quantifying the strength of associations between disease and copy number changes in I. where A is the event “I amplified’’ (for oncogenes) and “I deleted’’ (for tumor suppressor genes).

Human Cancer Genome Project Relative Risk Score (cont) The first part can be estimated from data: The second part depends on the marginal probability of amplification (for oncogenes) and deletion (for tumor suppressor genes)

Human Cancer Genome Project Relative Risk Score: Marginal In order to compute the marginal, we rely on the generative model assumed to have produced the data, as follows: –Breakpoints occur as a Poisson process at a certain rate  a,  d – At each of these breakpoints, there is an amplification/ deletion with length distributed as an Exponential random variable with parameter a, d

Human Cancer Genome Project Relative Risk Score: Marginal Assuming the generative process above, we can compute the second part. It depends on the parameters of the Poisson and Exponential random variables.These parameters are estimated from data.

Human Cancer Genome Project Prior Score OncogeneTSG

Human Cancer Genome Project Finding the Cancer Genes So far we have shown how to compute the score for a certain genomic intervals Intervals with high scores are interesting –Given a larger genomic region, for example a chromosome arm, we compute the scores for all possible intervals up to a certain length –The maximum scoring interval in a region is the most likely location for a cancer gene We propose two methods to estimate the location of possible cancer genes in this region: The Max method & The LR (left-right) method

Human Cancer Genome Project High Scoring Intervals High scoring intervals are obvious candidates for cancer genes. –We assign significance based on the estimated number of breakpoints in a genomic region with high score. –We obtain an approximate p-value using results from scan statistics.

Human Cancer Genome Project Significance Testing We now know how to estimate the most likely location of a cancer gene in a genomic region of interest. Let us say the interval is I max Is this finding statistically significant? We rely on an empirical way to compute an approximate p-value

Human Cancer Genome Project Significance Testing (for TSG) The p-value is estimated from the observed distribution of breakpoints along the chromosome –Intuitively, in the null hypothesis, which assumes that no tumor suppressor gene resides on the chromosome, the breakpoints are expected to be uniformly distributed –However if indeed I tsg is a tumor suppressor gene, then its neighborhood should contain an unusually large number of breakpoints, signifying a region with many deletions

Human Cancer Genome Project Scan Statistics If N is the total number of breakpoints on the chromosome and k is the number of breakpoints in I tsg, then we can compute the probability of observing k out of N breakpoints in a window of length |I tsg |(=w) if these breakpoints are uniformly distributed ¸ p-value

Human Cancer Genome Project Scan Statistics

Human Cancer Genome Project Simulation Model Pre-cancerous Cell with a Causative Copy-Number Aberration Random Copy-Number Variations 70% Tumor Cells 30% Normal Cells

Human Cancer Genome Project Results – Simulated Data We simulated data on diseased people assuming different scenarios. We vary the relative proportions of types of patients in a sample; some patients are diseased because of homozygous deletions of the tumor suppressor gene (a), other because of hemizygous deletions (b) and the rest are diseased because of other causes (c). We measure the performance using the Jaccard measure of overlap between the estimated TSG and the true position:

Human Cancer Genome Project Results

Results

Lung Cancer Dataset Dataset of Zhao et al –70 lung tumors –DNA copy number changes across 115,000 SNPs First, we infer the copy-number values at these probes and decide which of them are deleted or amplified…

Human Cancer Genome Project Results Most of the regions detected have been previously reported as implicated in lung cancer (e.g. 5q21, 14q11). Most significantly, some of the intervals found overlap some good candidate genes, that may play a role in lung cancer (e.g. MAGI3, HDAC11, PLCB1). Also, the regions 3q25 and 9p23 have been found for the first time to be homozygously deleted by Zhao et al. (2005).

Human Cancer Genome Project ChromosomeComments 1p13.2MAGI3 3p25.1HDAC11 3q25.1Homozygous Del 4q34.1Del Lung Cancer 5q14.1 5q21.3Del Lung Cancer 16q24CDH13 17q21BRCA1, HDAC5 19p13.3LKB1 20p12PLCB1 21q21.2Del Lung Cancer TSG

Human Cancer Genome Project ChromosomeComments 3q28Over-expression in LC 5p15.3LOC (similar to MUC4) 6p22.3 8q24PVT1/MYC 11p15OR51A2 12p11Amplification in Zhao et al. (2005) 20q11.23Amplification in Zhao et al. (2004) Onco- gene

Human Cancer Genome Project A Tree of Patients

Human Cancer Genome Project Copy Number Data

Human Cancer Genome Project All 70 Patients

Human Cancer Genome Project Extensions Combining with Gene Expression data –Uses a pathway model (Inferred with a Truncated Stein Shrinkage) –Uses Scan-statistics on a graph to determine genes associated with CNVs, cascades in pathways, & “Others” (mutations, translocations or epigenomics) Generalizes to other datasets (e.g. proteomics etc.)

Human Cancer Genome Project Software Architecture

Human Cancer Genome Project Current Implementation BuddhaCGH Agnostic to Technology: –Works well for BAC array, ROMA, Agilent, Affymetrix –Raw Affymetrix data is noisier, but our new algorithm for “background correction and summarization (BCS)” makes Affymetrix-data significantly better. Scalable: Fast implementation, with visualization and integration (Publicly Available) Generalized Global Analysis (LOH analysis, detecting TSG and oncogenes)

Human Cancer Genome Project

Demo BuddhaCGH… AdrenoCorticalCancer

Human Cancer Genome Project Discussions Q&A…

Human Cancer Genome Project Answer to Cancer “If I know the answer I'll tell you the answer, and if I don't, I'll just respond, cleverly.” –US Secretary of Defense, Mr. Donald Rumsfeld.

Human Cancer Genome Project To be continued… Break…