Population Approaches to Detecting and Genotyping Copy Number Variation Lachlan Coin July 2010.

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Imputation for GWAS 6 December 2012.
DNA copy number variation and cancer risk John F Pearson Canterbury Statistics Open Day University of Canterbury 2/10/2012.
Discovery of Structural Variation with Next-Generation Sequencing Alexandre Gillet-Markowska Gilles Fischer Team – Biology.
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Simple Multiple Line Fitting Algorithm Yan Guo. Motivation To generate better result than EM algorithm, to avoid local optimization.
Using the whole read: Structural Variation detection with RPSR
High resolution detection of IBD Sharon R Browning and Brian L Browning Supported by the Marsden Fund.
University of Connecticut
Methods for copy number variation: hidden Markov model and change- point models.
Tumour karyotype Spectral karyotyping showing chromosomal aberrations in cancer cell lines.
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
1000 Genomes SV detection Boston College Chip Stewart 24 November 2008.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Informatics challenges and computer tools for sequencing 1000s of human genomes Gabor T. Marth Boston College Biology Department Cold Spring Harbor Laboratory.
Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Genes, Environment and Traits
Chapter 7 Multifactorial Traits
Detecting copy number variations using paired-end sequence data Nick Furlotte CS224 May 29, 2009.
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.
Constitutional (germ-line) variants in hereditary conditions
Expanded PLA2G6 Copy Number Variant Analysis in Patients with Infantile Neuroaxonal Dystrophy (INAD) Danielle Crompton, P. K. Rehal, L. MacPherson, K.
Affymetrix CytoScan HD array
John B. Cole 1, Daniel J. Null *1, Chuanyu Sun 2, and Paul M. VanRaden 1 1 Animal Genomics and Improvement 2 Sexing Technologies Laboratory Navasota, TX.
Regression. Population Covariance and Correlation.
A Genome-wide association study of Copy number variation in schizophrenia Andrés Ingason CNS Division, deCODE Genetics. Research Institute of Biological.
1 Commentary 1.Do not get too worried about "methods" and details. I fully expect there to be concepts and techniques that you simply are not going to.
The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.
Methods in genome wide association studies. Norú Moreno
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.
Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida.
Identification of Copy Number Variants using Genome Graphs
Other genomic arrays: Methylation, chIP on chip… UBio Training Courses.
____ __ __ _______Birol et al :: AGBT :: 7 February 2008 A NOVEL APPROACH TO IMPROVE THE NOISE IN DETECTING COPY NUMBER VARIATIONS USING OLIGONUCLEOTIDE.
Future Directions Pak Sham, HKU Boulder Genetics of Complex Traits Quantitative GeneticsGene Mapping Functional Genomics.
High density array comparative genomic hybridisation (aCGH) for dosage analysis and rapid breakpoint mapping in Duchenne Muscular Dystrophy (DMD) Victoria.
Computational Laboratory: aCGH Data Analysis Feb. 4, 2011 Per Chia-Chin Wu.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Paul VanRaden and Chuanyu Sun Animal Genomics and Improvement Lab USDA-ARS, Beltsville, MD, USA National Association of Animal Breeders Columbia, MO, USA.
Biostatistics-Lecture 19 Linkage Disequilibrium and SNP detection
Variant calling: number of individuals vs. depth of read coverage Gabor T. Marth Boston College Biology Department 1000 Genomes Meeting Cold Spring Harbor.
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
1 SSC 2006: Case Study #2: Obstructive Sleep Apnea Rachel Chu, Shuyu Fan, Kimberly Fernandes, and Jesse Raffa Department of Statistics, University of British.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Copy Number Analysis in the Cancer Genome Using SNP Arrays Qunyuan Zhang, Aldi Kraja Division of Statistical Genomics Department of Genetics & Center for.
Meiotic gene conversion in humans: rate, sex ratio, and GC bias Amy L. Williams June 19, 2013 University of Chicago.
Stitching the Tutorials Together Introduction to Systems Biology Course Chris Plaisier Institute for Systems Biology.
Canadian Bioinformatics Workshops
Recent Advances in Genomic Science Julian Sampson Institute of Medical Genetics, Cardiff.
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
Population sequencing using short reads: HIV as a case study Vladimir Jojic et.al. PSB 13: (2008) Presenter: Yong Li.
Estimating standard error using bootstrap
Canadian Bioinformatics Workshops
Global Variation in Copy Number in the Human Genome
Genome Wide Association Studies using SNP
Linear Regression.
Jianbin Wang, H. Christina Fan, Barry Behr, Stephen R. Quake  Cell 
Mathematical Foundations of BME Reza Shadmehr
Transmission Disequilibrium of Small CNVs in Simplex Autism
CSCI2950-C Lecture 3 September 13, 2007.
Chromosomal Microarray Detection of Constitutional Copy Number Variation Using Saliva DNA  Jennifer Reiner, Lisa Karger, Ninette Cohen, Lakshmi Mehta,
Analysis of multiple genetic polymorphisms in aggressive-growing and slow-growing abdominal aortic aneurysms  Tyler Duellman, BS, Christopher L. Warren,
Sequencing at 10,000x using Illumina paired reads
Claudia M. B. Carvalho, Rolph Pfundt, Daniel A. King, Sarah J
Perspectives from Human Studies and Low Density Chip
Adaptive Evolution of UGT2B17 Copy-Number Variation
Array CGH results: (A) Rearrangement pattern at 22q13: the profile of chromosome 22 shows a terminal deletion of 8.4 Mb at 22q13.2q13.3 (chr23: 42 817 697–51 219 009 bp)
Presentation transcript:

Population Approaches to Detecting and Genotyping Copy Number Variation Lachlan Coin July 2010

Outline Population-haplotype approach to CNV detecting and genotyping Application to SNP and CGH data Application to NGS sequence data

cnvHap approach to CNV discovery and genotyping Coin et al, 2010, Nature Methods 7, (2010)

Example of trained model

cnvHap models haploid CN transitions Specify an per-base global transition rate matrix copy number to copy number from q 00 q 10 …. … Rate matrix multiplied by position specific scalar rate Values trained using EM, following the approach of Klosterman et al, used in Xrate for finding substitution rates

cnvHap joint model of CNV + SNP haplotypes

Cluster positions modelled using a linear model Model fitted using Ridge regression carried at each iteration of E-M algorithm

Using Illumina SNP arrays

Illumina Agilent Combined Illumina and Agilent arrays

Some CNVs exhibit shared structure

Improved CNV genotyping accuracy Cumulative Frequency of Squared Pearson Correlation

A deletion at 16p11.2 in a patient with ‘extreme obesity’ estimated by aCGH to be 546kb-700kb flanked by segmental duplication (>99% sequence identity)‏ probably arises by NAHR, implying deletion is 739kb BMI = 29.2 kg.m -2 at age 7½ learning difficulties, delayed speech 28.9 Mb 29.2 Mb 29.5 Mb 29.8 Mb 30.1 Mb 30.4 Mb 30.7 Mb p13.2 p13.12 p12.3p12.1 q12.2 q21 q22.2 q23.1 q23.3q24.2 p11.2 log 2 ratio MLPA probes Segmental duplication chromosome 16 RG Walters et al. Nature 463, (2010) doi: /nature08727

16p11.2 deletions in obesity and population cohorts -3/931 British extreme early-onset obesity (SCOOP)‏ 0/5304/643French child obesity case:control Lean/ Normal Weight ObeseCohort 0/6694/705French adult obesity case:control 1/62353/1592 Population cohorts (NFBC1966, CoLaus, EGPUT)‏ 0/1402/159Swedish discordant siblings -2/141French bariatric surgery patients Obesity: P = 5.8x10 -7 OR = 29.8 [3.9–225] Morbid obesity: P = 6.4x10 -8 OR = 43.0 [5.6–329]

Coverage affected by GC content

Regression model fit to correct for GC bias

Loess curves fit to remove residual spatial variation of coverage

Detecting CNVS with NGS data Depth/haploid coverage B-allele frequency

NGS versus CGH data NGS data chrom1:350mb-351mbCGH data chrom1:350mb-351mb

NGS vs CGH data

Haplotype structure of deletion

NGS amplification Depth/coverage

With consistent break-points in population

Polyploid phasing and imputation Imputation error rate Switch error rate

Conclusions Population-haplotype model enables joint CNV discovery and genotyping using array data Preliminary results indicate this will also help using NGS data Combining information from multiple platforms improves sensitivity Imputation still works for ploidy > 2, phasing becomes more difficult

Acknowledgements Evangelos Bellos Shu-Yi Su Robin Walters Julian Asher Alex Blakemore Adam de Smith Phillipe Froguel Julia El-Sayed Moustafa David Balding (UCL) Rob Sladek (McGill)