Copy Number Variants: detection and analysis Manuel Ferreira & Shaun Purcell Boulder, 2009.

Slides:



Advertisements
Similar presentations
Linkage and Genetic Mapping
Advertisements

What is an association study? Define linkage disequilibrium
CZ5225 Methods in Computational Biology Lecture 9: Pharmacogenetics and individual variation of drug response CZ5225 Methods in Computational Biology.
Which Phenotypes Can be Predicted from a Genome Wide Scan of Single Nucleotide Polymorphisms (SNPs): Ethnicity vs. Breast Cancer Mohsen Hajiloo, Russell.
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
DNA copy number variation and cancer risk John F Pearson Canterbury Statistics Open Day University of Canterbury 2/10/2012.
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
02_13.jpg Human chromosome 4 02_15.jpg 02_15_2.jpg.
An Update in Genetics of Epilepsy
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Dr Katie Snape Specialist Registrar in Genetics St Georges Hospital
Understanding Genetics of Schizophrenia
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
Detecting copy number variations using paired-end sequence data Nick Furlotte CS224 May 29, 2009.
1 Genetic Variability. 2 A population is monomorphic at a locus if there exists only one allele at the locus. A population is polymorphic at a locus if.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Constitutional (germ-line) variants in hereditary conditions
Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007.
Gene 210 Genetics of Neurodevelopmental and Neurospychiatric disorders
The Pie of Schizophrenia (Theoretical: Early Molecular Biology)
DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University.
Mutation Feb. 11, 2015 HUGEN 2022: Population Genetics J. Shaffer Dept. Human Genetics University of Pittsburgh.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Affymetrix CytoScan HD array
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
CZ5225: Modeling and Simulation in Biology Lecture 10: Copy Number Variations Prof. Chen Yu Zong Tel:
A Genome-wide association study of Copy number variation in schizophrenia Andrés Ingason CNS Division, deCODE Genetics. Research Institute of Biological.
Gene Hunting: Linkage and Association
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
1 Commentary 1.Do not get too worried about "methods" and details. I fully expect there to be concepts and techniques that you simply are not going to.
Nature Genetics Vol.36 Sept 2004 Detection of Large-scale Variation In the Human Genome Iafrate, Feuk, Rivera, Listewnik, Donahoe, Qi, Scherer, Lee any.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Methods in genome wide association studies. Norú Moreno
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Identification of Copy Number Variants using Genome Graphs
Cancer genomics Yao Fu March 4, Cancer is a genetic disease In the early 1970’s, Janet Rowley’s microscopy studies of leukemia cell chromosomes.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
A PPROACHING THE G ENOME - G ENETIC M ARKERS, L INKAGE AND A SSOCIATION G ENETICS 202 Jon Bernstein Department of Pediatrics October 8, 2015.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
CHROMOSOMAL INVERSIONS IN HUMAN POPULATIONS Andrea González Morales.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Chromosome inversions in human populations Marta Ruiz Fernández Master in Advanced Genetics 17 December 2014.
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
Copy Number Analysis in the Cancer Genome Using SNP Arrays Qunyuan Zhang, Aldi Kraja Division of Statistical Genomics Department of Genetics & Center for.
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
Chromosomal Inheritance and Human Heredity. Human Chromosomes Karyotype – a picture of an organism’s chromosomes We take pictures during mitosis when.
Recent Advances in Genomic Science Julian Sampson Institute of Medical Genetics, Cardiff.
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Interpreting exomes and genomes: a beginner’s guide
Precise Identification of Structural Variations in the Human Genome by Splitting Shotgun Reads Zemin Ning1, Anthony Cox1, David Adams1, Paul Flicek2, Charles.
Common variation, GWAS & PLINK
Global Variation in Copy Number in the Human Genome
Chapter 15 Overview: Locating Genes Along Chromosomes.
Linking Genetic Variation to Important Phenotypes
BF528 - Genomic Variation and SNP Analysis
Presentation transcript:

Copy Number Variants: detection and analysis Manuel Ferreira & Shaun Purcell Boulder, 2009

Large chromosomal rearrangements can cause sporadic disease Down syndrome Duchenne Muscular Dystrophy (DMD) DiGeorge-Velo cardiofacial syndrome (VCFS)... Lupski 2007 Nat Genet 39: s43

Sebat et al 2004 Science 305: 525 Iafrate et al 2004 Nat Genet 36: 949

Outline 1. What is a Copy Number Variant (CNV) 2. Genome-wide detection of CNVs 3. Association analysis of CNVs 4. Online databases

1. What is a CNV?

What is a CNV? Structural Variants Genomic alterations involving segment of DNA >1kb Quantitative (CNVs) Positional (Translocations) Orientational (Inversions) Deletions Duplications Insertions 1. Classes of structural variants Copy Number Polymorphism (CNP) is a CNV that occurs in >1% population

...CGATG......CG ATG......CG GAA......TT ATG......CG GAA......TT ATG......CG ATG... GAA......TT GGG......GTG GGG... ATG......CG GAA......TT GGG......GTG Deletion Duplication Translocation Insertion Inversion...CGATG... GAA......TT...GTGGGG... Segmental Duplication...CGATG... GAA......TT...GTGGGG......CGATG... GAA......TT...GTGGGG... With no CNV 1bp - Mb

Scherer 2007 Nat Genet 39: s7

What is a CNV? 2. Origins of CNVs (A) Non-allelic homologous recombination (B) Non-homologous end joining (C) Tandem repeat sequences (D) Retrotransposons Bailey & Eichler 2006 Nat Rev Genet 7: 552

3. CNVs are abundant in the genome Human vs HumanSNPsCNVs Base pairs2.5 Mb4 Mb 1/1,200 bp1/800 % genome0.08%0.12% Sebat 2007 Nat Genet 39: s3 What is a CNV?

4. CNVs significantly overlap with known genes Cooper et al 2007 Nat Genet 39: s22 What is a CNV?

5. CNVs influence gene expression Stranger et al 2007 Science 315: %17.7% What is a CNV?

6. In healthy individuals, most CNVs are inherited… Rare CNVs 10% de novo Common CNVs 90% inherited McCarroll 2008 Hum Mol Genet 17: R135 McCarroll et al 2008 Nat Genet 40: 1166 What is a CNV? >99% <1% >1% population

2. Detection of CNVs

Detection of CNVs A. Using intensity data from whole-genome arrays SNPs Genotype known common variants CNVs Genotype known common variants (A) Identify and genotype new, potentially rarer variants (B)

Detection of CNVs Genotype known common CNVs using whole-genome arrays (A) Affy 6.0 Illumina 1M 36,000 CNV non-polymorphic probes covering ~4,000 CNV regions in DGV >940,000 CNV non-polymorphic probes High-density in ~5,600 CNV regions in DGV + extended to whole-genome Nimblegen array-CGH, CNV only, test vs reference custom or whole-genome (up to 2,1M probes)

Detection of CNVs S Mat/Pat Ind Genotype Copy number at SAmount of DNA at S 1S/S 2 2SS/S 3 3S/- 1 4-/S 1 5-/- 0 6SSS/S 4

Detection of CNVs Non-polymorphic probes McCarroll et al 2008 Nat Genet 40: 1166

Detection of CNVs Identify and genotype new, potentially rarer CNVs from whole-genome array data (CGH, Affymetrix/Illumina) (B) Example: rs A/G... AGCCCGAAATGTTTTCAGA AGCCCGAAGTGTTTTCAGA... probe 1 probe 2 AA AG GG Intensity of probe 2 Intensity of probe 1

Detection of CNVs 1A/G112 2A/-101 3AA/ /G011 5-/-000 6AAA/G314 rs000, A/G A A G A A A G A A A G Mat/Pat Ind Genotype Copy number for: AGTotal Pattern

Detection of CNVs rs000, A/G A Normalized intensity of allele A Normalized intensity of allele G Polymorphic probe in CNV region A/A A/G G/G Individuals with deletion(s) Individuals with duplication(s) ie. total CN > 2 ie. total CN < 2

Detection of CNVs Combine information across probes to identify new CNVs For example...CasesControls 100kb deletion chr. 210/5,0001/5,000 Korn et al 2008 Nat Genet 40: 1253 Birdseye Affy 5.0, 6.0 Wang et al 2007 Genome Res 17: 1665 PennCNV Affymetrix and Illumina

Detecting CNVs through GWAS arrays is challenging… Lots of confounders: DNA quality, concentration, source, batch effects, date effects. Genotype calling software often platform specific, not very user friendly. Arrays have poor resolution for CNVs (>100kb). Genotype calling is computationally demanding, as it requires analysis of very large ‘raw’ cell files.

Detection of CNVs B. Identifying CNVs through genotyping errors Mendelian Inconsistencies Failure Hardy-Weinberg equilibrium GG A- G- AA GG     Conrad et al 2006 Nat Genet 38: 75 McCarroll et al 2006 Nat Genet 38: 86

Detection of CNVs C. Targeted or whole-genome sequencing Korbel et al 2007 Science 318: 420

Summary so far… CNVs are abundant, often overlap genes, can influence gene expression and most are inherited in healthy individuals Known and new CNVs can be identified and genotyped in large- scale studies using whole-genome genotyping arrays, such as the 6.0 and 1M. Low resolution (>100Kb) & low signal/noise ratio. What are the particular strategies and challenges for association analysis of CNVs? More accurate CNV genotyping maps/arrays/algorithms expected in the next few years.

3. Association analysis of CNVs

Association analysis of CNVs 1. Some of the relevant questions (A) Are CNPs associated with variation in human traits or diseases? (B) Can we identify rare CNVs associated with large increase in disease risk? Are these de novo or inherited in cases? (C) When considering the whole-genome, do cases have more CNV events then controls, ie. increased burden? (D) How to test SNPs in copy number regions? (E) Are most CNVs tagged by SNPs in genotyping arrays?

Example 1: Autism whole-genome CNV analysis Sample16p11CasesControlsP DiscoveryDel (600kb) 5/1,4413/4, x [Affy 500K]Dup7/1,4412/4,234 Replication 1 (CHB)Del5/5120/ [array-CGH]Dup4/5120/434 Replication 2 (deCODE)Del3/2992/18, x [Illumina]Dup0/2995/18,834 Deletion frequency Iceland Autism1% Psychiatric disorder0.1% General population0.01% Weiss et al. N Engl J Med 2008; 358: 667 COPPER Birdseye CNAT deldup inherited26 de novo101 unknown14

Example 2: SCZ whole-genome CNV analysis Cases Controls Chromosome → Specific loci

A “positive control” 1:4000 live births ~30% develop psychosis In ~0.6-2% SCZ patients 3Mb and 1.5Mb variants 2 additional atypical deletions observed 11 : 0 9 : 0 10 : 1 22q11.2 (VCFS)15q13.31q21.1 CHRNA7, alpha 7 nicotinic acetylcholine receptor 5 cases w/ impaired cognition; 1 w/ epilepsy Previously seen in mental retardation with seizures 3 cases had cognitive abnormalities; 1 with epilepsy Also seen in a patient with MR and seizures and two patients with autism. Specific large (>500kb) rare deletions

3,391 patients with SCZ, 3,181 controls Filter for 100kb 6,753 CNVs Cases have greater rate of CNVs than controls 1.15-fold increase P = 3×10 -5 Cases have greater rate of CNVs than controls 1.15-fold increase P = 3×10 -5 Cases have more genes intersected by CNVs than controls 1.14-fold increase P = 2×10 -6 Cases have more genes intersected by CNVs than controls 1.14-fold increase P = 2×10 -6 Rate of genic CNVs in cases versus controls 1.18-fold increase P = 5×10 -6 Rate of genic CNVs in cases versus controls 1.18-fold increase P = 5×10 -6 Rate of non-genic CNVs in cases versus controls 1.09-fold increase P = 0.16 Rate of non-genic CNVs in cases versus controls 1.09-fold increase P = 0.16 Results invariant to obvious statistical controls Array type, genotyping plate, sample collection site, mean probe intensity Results invariant to obvious statistical controls Array type, genotyping plate, sample collection site, mean probe intensity True for singleton events (observed only once in dataset) 1.45-fold increase (~15% cases versus 11% controls) P = 5×10 -6 True for singleton events (observed only once in dataset) 1.45-fold increase (~15% cases versus 11% controls) P = 5×10 -6 Genome-wide burden of rare CNVs in SCZ

Two other studies supporting a genome-wide increase in rare CNVs in schizophrenia –Walsh et al (2008) Science 5% controls, 15% cases, 20% early onset cases neurodevelopmental genes disrupted –Xu et al (2008) Nature Genetics strong increased de novo rate in sporadic cases; but increased inherited rate also 1q21.1 and 15q13.3 also identified by SGENE consortium

Association analysis of CNVs 2. Testing SNPs in CNV regions

B/B A/B A/A AA/B AA/A BB/B BB/A Allele-specific risk CNV Korn et al 2008 Nat Genet 40: 1253 A-B Normalized intensity of allele A Normalized intensity of allele B

Association analysis of CNVs 3. Testing CNVs through the analysis of SNPs in LD HapMap 1M 6.0 Common CNVs Coverage limited by lack of SNPs in CNV regions (poor genotyping) McCarroll et al 2008 Nat Genet 40: 1166

4. Online databases

Comprehensive summary of structural variation in the human genome. Healthy control samples Database of Genomic Variants

Database of submicroscopic chromosomal imbalances, from array-CGH data. Focuses on data from patients with developmental delay, learning disabilities or congenital anomalies. DECIPHER