DNA copy number variation and cancer risk John F Pearson Canterbury Statistics Open Day University of Canterbury 2/10/2012
2 Breast Cancer Foulkes WD. N Engl J Med 2008; 359:
3 Missing heritability TA Manolio et al. Nature 461, (2009) doi: /nature08494
4 Evan E. Eichler.
5 Copy number variation Allele 1 Allele 2 Copy number loss Copy number gain Whole gene Partial gene Contiguous genes Regulatory effects
6 Copy number variants (CNVs) 16,000 copy number variant loci cover >50% of the human genome CNVs are associated with cancer risk Rare CNVs detected in ~50% of familial cancer genes eg. BRCA1, BRCA2 Genome-wide association studies of cancer prostate cancer, hepatocarcinoma, nasopharyngeal carcinoma, and neuroblastoma Increased CNV load Li Fraumeni Syndome (cancer related genes?) breast cancer (TP53 pathway, ESR1 pathway)
7 SNP arrays LRR = log 2 (R observed /R expected ) The B Allele Frequency (BAF) is a somewhat confusing term that actually refers to a normalized measure of relative signal intensity ratio of the B and A alleles Wang et al Genome Res November; 17(11): 1665–1674.
8 Genomic location
9 Copy number AA AB B NormalCopy neutral LOH Copy number loss
10 Copy number gain AAA AAB ABB BBB
11 Illumina bead arrays. o CNVision (workflow software) o Gnosis o PennCNV o QuantiSNP o CNV Partition CNV calling CNV calling algorithms
12 Hidden Markov Model Estimate copy number at each SNP from Log R ratio B allele frequency transition probability at previous SNP. PennCNV, QuantiSNP
13 PennCNV
14 PennCNV r i LRR b i BAF at SNP i. ( 1 ≤ i ≤ M ) z i copy number state The likelihood of the observed data is:
15 PennCNV r i LRR b i BAF at SNP i. ( 1 ≤ i ≤ M ) z i copy number state The likelihood of the observed data is: LRR emission probability model includes a term for chemical fluctuations and misannotation/assembly BAF emission probability complicated mixture model
16 PennCNV r i LRR b i BAF at SNP i. ( 1 ≤ i ≤ M ) z i copy number state Transmission probabilities between 2 adjacent SNPs i -1 and i. with copy numbers z i and z i-1 at distance d i. D = 100Mb for state 4, 100kb for other states. p are unknowns, estimated by the Baum-Welch algorithm.
17 PennCNV r i LRR b i BAF at SNP i. ( 1 ≤ i ≤ M ) z i copy number state Baum-Welch used to train the model Viterbi algorithm used to infer most likely path CNV called whenever a stretch of states is different from normal ( usually state 3 or 4)
18 Copy number gain AAA AAB ABB BBB
19 Noisy data
20 Breast cancer A characteristic of breast tumour cells is genomic instability BRCA1, BRCA2
21 BRCA1: known large deletions Sample IDBRCA1 mutation EMB del exons 2-24 EMB del exons 3-19 EMB del exons 1-23 EMB del exons1-21 EMB del exons 1-23 EMB del exons 1-23 EMB del exons1-21 GEM del exons PAD del exons 9-19 EMB del exons 1-17 EMB del exons 1-17 KCO del exons 1-17 EMB del exons 8-13 GEM del exons 8-13 Sample IDBRCA1 mutation EMB del exons 3-19 EMB del exons 1-17 Detected Not detected CNV prediction summary: cnvPartition - 25% (4/16) GNOSIS- 19% (3/16) PennCNV- 88% (14/16) QuantiSNP- 81% (13/16)
22 CNV calling by 4 algorithms QC(1) – GWAS criteria Endometrial cancer 1343 cases ANECS, SEARCH 1343 cases ANECS, SEARCH 655 female controls Hunter Community Study 655 female controls Hunter Community Study Case vs. control analyses 1279 cases 619 controls 1210 cases 612 controls Want to find: 1.CNVs overlapping known susceptibility genes 2.novel CNVs in the mismatch repair pathway 3.common or rare CNVs associations
23 CNV frequency: all CaseControlDifferenceP 1, Total CNVs NS Deletions NS Duplications NS Exons NS Mean CNV per sample
24 CNV frequency: rare (< 1%) CaseControlDifferenceP 1, Total CNVs E-05 Deletions E-06 Duplications NS Exons E-04 Mean rare CNV per sample
25 CNV frequency: rare (< 1%) CaseControlDifferenceP 1, Total CNVs E-05 Deletions E-06 Duplications NS Exons E-04 Mean rare CNV per sample
26 Association study CaseControl P adjusted Chr X X X X X X CNV Regions
27 Association study CNV overlapping genes CaseControl P adjusted Chr X X
28
29 Acknowledgements University of Otago Gemma Moir-Meyer Logan Walker Mackenzie Cancer Research Group Queensland Institute of Medical Research Mandy Spurdle Felicity Lose Yen Tan Alex Metcalf Australian National Endometrial Cancer Study Bryony Thompson University of Cambridge Deborah Thompson Paul Pharoah Alison Dunning Douglas Easton Studies of Epidemiology and Risk Factors in Cancer Heredity (SEARCH) University of Newcastle Rodney Scott Mark McEvoy John Attia Elizabeth Holliday The Hunter Community Study CIMBA consortium MAYO clinic Fergus Couch