Population Approaches to Detecting and Genotyping Copy Number Variation Lachlan Coin July 2010
Outline Population-haplotype approach to CNV detecting and genotyping Application to SNP and CGH data Application to NGS sequence data
cnvHap approach to CNV discovery and genotyping Coin et al, 2010, Nature Methods 7, (2010)
Example of trained model
cnvHap models haploid CN transitions Specify an per-base global transition rate matrix copy number to copy number from q 00 q 10 …. … Rate matrix multiplied by position specific scalar rate Values trained using EM, following the approach of Klosterman et al, used in Xrate for finding substitution rates
cnvHap joint model of CNV + SNP haplotypes
Cluster positions modelled using a linear model Model fitted using Ridge regression carried at each iteration of E-M algorithm
Using Illumina SNP arrays
Illumina Agilent Combined Illumina and Agilent arrays
Some CNVs exhibit shared structure
Improved CNV genotyping accuracy Cumulative Frequency of Squared Pearson Correlation
A deletion at 16p11.2 in a patient with ‘extreme obesity’ estimated by aCGH to be 546kb-700kb flanked by segmental duplication (>99% sequence identity) probably arises by NAHR, implying deletion is 739kb BMI = 29.2 kg.m -2 at age 7½ learning difficulties, delayed speech 28.9 Mb 29.2 Mb 29.5 Mb 29.8 Mb 30.1 Mb 30.4 Mb 30.7 Mb p13.2 p13.12 p12.3p12.1 q12.2 q21 q22.2 q23.1 q23.3q24.2 p11.2 log 2 ratio MLPA probes Segmental duplication chromosome 16 RG Walters et al. Nature 463, (2010) doi: /nature08727
16p11.2 deletions in obesity and population cohorts -3/931 British extreme early-onset obesity (SCOOP) 0/5304/643French child obesity case:control Lean/ Normal Weight ObeseCohort 0/6694/705French adult obesity case:control 1/62353/1592 Population cohorts (NFBC1966, CoLaus, EGPUT) 0/1402/159Swedish discordant siblings -2/141French bariatric surgery patients Obesity: P = 5.8x10 -7 OR = 29.8 [3.9–225] Morbid obesity: P = 6.4x10 -8 OR = 43.0 [5.6–329]
Coverage affected by GC content
Regression model fit to correct for GC bias
Loess curves fit to remove residual spatial variation of coverage
Detecting CNVS with NGS data Depth/haploid coverage B-allele frequency
NGS versus CGH data NGS data chrom1:350mb-351mbCGH data chrom1:350mb-351mb
NGS vs CGH data
Haplotype structure of deletion
NGS amplification Depth/coverage
With consistent break-points in population
Polyploid phasing and imputation Imputation error rate Switch error rate
Conclusions Population-haplotype model enables joint CNV discovery and genotyping using array data Preliminary results indicate this will also help using NGS data Combining information from multiple platforms improves sensitivity Imputation still works for ploidy > 2, phasing becomes more difficult
Acknowledgements Evangelos Bellos Shu-Yi Su Robin Walters Julian Asher Alex Blakemore Adam de Smith Phillipe Froguel Julia El-Sayed Moustafa David Balding (UCL) Rob Sladek (McGill)