1 Modelling of CGH arrays experiments Philippe Broët Faculté de Médecine, Université de Paris-XI Sylvia Richardson Imperial College London CGH = Competitive.

Slides:



Advertisements
Similar presentations
1
Advertisements

1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS Univariate Distributions
STATISTICS Random Variables and Distribution Functions
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Variance Estimation in Complex Surveys Third International Conference on Establishment Surveys Montreal, Quebec June 18-21, 2007 Presented by: Kirk Wolter,
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.
Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, : Imperial College Dept. Epidemiology 2: Imperial College.
Estimating the False Discovery Rate in Multi-class Gene Expression Experiments using a Bayesian Mixture Model Alex Lewin 1, Philippe Broët 2 and Sylvia.
1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.
Bayesian mixture models for analysing gene expression data Natalia Bochkina In collaboration with Alex Lewin, Sylvia Richardson, BAIR Consortium Imperial.
1 Spatial processes and statistical modelling Peter Green University of Bristol, UK BCCS GM&CSS 2008/09 Lecture 8.
Chapter 7 Sampling and Sampling Distributions
Photo Slideshow Instructions (delete before presenting or this page will show when slideshow loops) 1.Set PowerPoint to work in Outline. View/Normal click.
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Break Time Remaining 10:00.
Turing Machines.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
Detection Chia-Hsin Cheng. Wireless Access Tech. Lab. CCU Wireless Access Tech. Lab. 2 Outlines Detection Theory Simple Binary Hypothesis Tests Bayes.
PP Test Review Sections 6-1 to 6-6
ABC Technology Project
5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
VOORBLAD.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
Artificial Intelligence
25 seconds left…...
Subtraction: Adding UP
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Statistical Inferences Based on Two Samples
Analyzing Genes and Genomes
DTU Informatics Introduction to Medical Image Analysis Rasmus R. Paulsen DTU Informatics TexPoint fonts.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Chapter 8 Estimation Understandable Statistics Ninth Edition
Clock will move after 1 minute
Intracellular Compartments and Transport
PSSA Preparation.
Experimental Design and Analysis of Variance
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Physics for Scientists & Engineers, 3rd Edition
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
9. Two Functions of Two Random Variables
Commonly Used Distributions
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
Comparative Genomic Hybridization (CGH). Outline Introduction to gene copy numbers and CGH technology DNA copy number alterations in breast cancer (Pollack.
Presentation transcript:

1 Modelling of CGH arrays experiments Philippe Broët Faculté de Médecine, Université de Paris-XI Sylvia Richardson Imperial College London CGH = Competitive Genomic Hybridization

2 Outline Background Mixture model with spatial allocations Performance, comparison with CGH- Miner Analyses of CGH-array cancer data sets Extensions

3 The development of solid tumors is associated with the acquisition of complex genetic alterations that modify normal cell growth and survival. Many of these changes involve gains and/or losses of parts of the genome: Amplification of an oncogene or deletion of a tumor suppressor gene are considered as important mechanisms for tumorigenesis. LossGain Tumor supressor geneOncogene Aim: study genomic alterations in oncology

4 1. Extraction - DNA 2. Labelling (fluo) 3. Co-hybridization 4. Scanning Case Control CGH = Competitive Genomic hybridization Array containing short sequences of DNA bound to glass slide Fluorescein-labeled normal and pathologic samples co-hybridised to the array

5 Once hybridization has been performed, the signal intensities of the fluorophores is quantified Provides a means to quantitatively measure DNA copy-number alterations and to map them directly onto genomic sequence

6 MCF7 cell line investigated in Pollack et al (2002) 23 chromosomes and 6691 cDNA sequences Data log transformed: Difference bet. MCF7 and reference

7 Types of alterations observed (Single) Gain or Deletion of sequences, occurring for contiguous regions Low level changes in the ratio ± log2 but attenuation (dye bias) ratio ± 0.4 Multiple gains (small regions) High level change, easy to pick up Focus the modelling on the first common type of alterations

8 Deletion? Multiple gains ? Normal? Chromosome 1

Mixture model

10 Specificity of CGH array experiment A priori biological knowledge from conventional CGH : Limited number of states for a genomic sequence : - presence (modal), - deletion, - gain(s) corresponding to different intensity ratios on the array Mixture model to capture the underlying discrete states GS located contiguously on chromosomes are likely to carry alterations of the same type Use clone spatial location in the allocation model 3 component mixture model with spatial allocation

11 Mixture model For chromosome k: Z gk : log ratio of measurement of normal versus tumoral change, genomic sequence (GS) g, chromosome k Dye bias is estimated by using a reference array (normal/normal) and then subtracting the bias from Z gk Z gk w 1 gk N( μ 1, 1 2 ) + w 2 gk N( μ 2, 2 2 ) + w 3 gk N( μ 3, 3 2 ) For unique labelling: μ 1 0 μ 2 = 0 (dye bias has been adjusted) 2=presence 1=deletion 3=gain

12 Define mixture proportions to depend on the chromosomic location via a logistic model: w c gk = exp( u c gk ) / Σ m exp( u m gk ) favours allocation of nearby GS to same component Mixture model with spatial allocation Z gk w 1 gk N( μ 1, 1 2 ) + w 2 gk N( μ 2, 2 2 ) + w 3 gk N( μ 3, 3 2 ) Spatial structure on the weights (c.f. Fernandez and Green, 2002): Introduce 3 centred Markov random fields {u m gk }, m = 1, 2, 3 with nearest neighbours along the chromosomes x x x g -1 gg+1 Spatial neighbours of GS g

13 Prior structure w c gk = exp( u c gk ) / Σ m exp( u m gk ) with Gaussian Conditional AutoRegressive model : u c gk | u c - gk ~ N ( h u c hk /n g, ck 2 /n g ) for h = neighbour of g (n g = #h, one or two in this simple case), with constraint g u c gk = 0 Variance parameters ck 2 of the CAR acts as a smoothing prior: indexed by the chromosome : switching structure between the states can be different between chromosomes Mean and variances (μ c, c 2 ) of the mixture components are common to all chromosomes borrowing information Inverse gamma priors for the variances, uniform priors for the means

14 Posterior quantities of interest Bayesian inference via MCMC, implemented using Winbugs In particular, latent allocations, L gk, of GS g on chromosome k to state c, are sampled during the MCMC run Compute posterior allocation probabilities : p c gk = P(L gk = c | data), c =1,2,3 Probabilistic classification of each GS using threshold on p c gk : -- Assign g to modified state: deletion (c=1) or gain (c=3) if corresponding p c gk > 0.8, -- Otherwise allocate to modal state. Subset S of genomic sequences classified as modified (this subset depends on the chosen threshold)

15 False Discovery Rate Using the posterior allocation probabilities, can compute an estimate of FDR for the list S : Bayes FDR (S) | data = 1/card(S) Σ g S p 2 gk where p 2 gk is posterior probability of allocation to the modal (c=2) state Note: Can adjust the threshold to get a desired FDR and vice versa

Performance

17 Simulation set-up 200 fake GS with Z ~ N(0,.3 2 ), modal Z ~ N(log 2,.3 2 ), deletion, a block of 30 GS Z ~ N(- log 2,.3 2 ), gains, blocks of 20 and 10 GS Reference array with Z ~ N(0,.3 2 ) 50 replications Modal Deletion Modal Gain Modal Gain Mod

18 CGH-Miner Data mining approach to select gain and losses (Wang et al 2005): –Hierarchical clustering with a spatial constraint (ie only spatially adjacent clusters are joined) –Subtree selection according to predefined rules focus on selecting large consistent gain/loss regions and small (big spike) regions –Implemented in CGH-Miner Excel plug in Estimation of FDR using a reference (normal/normal) array and the same set of rules to prune the tree. Declared target 1% Simulation set-up is similar to Wang et al.

19 Classification obtained by CGH miner and CGH mix Modal Deletion Modal Gain Modal Gain Mod

Posterior probabilities of allocation to the 3 components

21 Comparative performance between CGHmix and CGH-Miner 50 simulationsCGHmixCGH-Miner Realised false positive (mean) Realised false positive (range) Realised false negative (mean) Realised false negative (range) Realised FDR (%) Estimated FDR (%)

Analyses of CGH-array cancer data sets

23 Breast cancer cell line MCF7 Data from Pollack et al., 6691 GS on 23 chromosomes μ 1 = -0.35, 1 = 0.37 (μ 2 = 0) 2 = 0.27 μ 3 = 0.44, 3 = 0.54 Estimated FDR CGHmix = 2.6% Estimated FDR CGH-Miner = 1.5% ^ ^ ^ ^ ^

24

25 Classification of GS obtained by CGHmix

26 known alterations found by both methods additional known Alterations found by CGHmix

27 Neuroblastoma KCNR cell line Curie Institute CGH custom array for chromosome genomic clones, mostly on the short arm 3 replicate spots for each μ 1 = , loss component μ 3 = 0.04, not plausible no gain in this case Estimate FDR by regrouping c=2 and c=3 classes Substantial number of deletions on short arm No deletion found for the long arm by CGHmix, a result confirmed by classical cytogenetic information ^ ^

28 Long arm

29 Extensions Account for variability in the case of repeated measurement add a measurement model with GS specific noise, with exchangeable prior Refine the spatial model: –Incorporate genomic sequence location in the neighbourhood definition of the CAR model 0-1 contiguity spatial weights –In particular, account for overlapping sequences by using weights that depend on the overlap