Presentation is loading. Please wait.

Presentation is loading. Please wait.

CZ5225: Modeling and Simulation in Biology Lecture 10: Copy Number Variations Prof. Chen Yu Zong Tel: 6516-6877

Similar presentations


Presentation on theme: "CZ5225: Modeling and Simulation in Biology Lecture 10: Copy Number Variations Prof. Chen Yu Zong Tel: 6516-6877"— Presentation transcript:

1 CZ5225: Modeling and Simulation in Biology Lecture 10: Copy Number Variations Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg http://bidd.nus.edu.sg Room 08-14, level 8, S16, NUS phacyz@nus.edu.sg http://bidd.nus.edu.sgphacyz@nus.edu.sg http://bidd.nus.edu.sg

2 2 Copy number variation (CNV) What is it? A form of human genetic variation: instead of 2 copies of each region of each chromosome (diploid), some people have amplifications or losses (> 1kb) in different regions –this doesn’t include translocations or inversions We all have such regions –the publicly available genome NA15510 has between 5 & 240 by various estimates –they are only rarely harmful (but rare things do happen)

3 Copy-number probes are used to quantify the amount of DNA at known loci CN locus:...CGTAGCCATCGGTAAGTACTCAATGATAG... PM: ATCGGTAGCCATTCATGAGTTACTA * * * PM = c CN=1 * * * PM = 2c CN=2 * * * PM = 3c CN=3

4 4 Copy number variation Population genomics The genomes of two humans differ more in a structural sense than at the nucleotide level; a recent paper estimates that on average two of us differ by ~ 4 - 24 Mb of genetic due to Copy Number Variation ~ 2.5 Mb due to Single Nucleotide Polymorphisms

5 Abundance of CNVs in the human population ? Still an open question but probably thousands, at low allelic frequency (<20%)

6 Abundance of deletion CNVs in the human population Comparison of overlapping CNVs identified by Conrad et al. (2006) and McCarroll et al. (2006). Freeman et al. Genome Res 2006

7 Non-allelic homologous recombination events between low-copy repeats (LCR-NAHR) Lupski & Inoue, TIG 2002

8 Duplications and Deletions of LCRs mediated by NAHR LCRs in direct orientation LCRs in inverted orientation Inversions

9 Intrachromatid recombination between LCRs LCRs in direct orientationLCRs in inverted orientation Inversion Deletion

10 Mechanisms generating genomic deletions

11 11 Copy number variation Relations to human disease Responsible for a number of rare genetic conditions. For example, Down syndrome ( trisomy 21), Cri du chat syndrome (a partial deletion of 5p). Implicated in complex diseases. For example:  CCL3L1 CN   HIV/AIDS susceptibility; also, some sporadic (non-inherited) CN variants are strongly associated with autism, while Tumors typically have a lot of chromosomal abnormalities, including recurrent CN changes.

12 Evolutionary and medical implications of CNVs: CCL3L1 as an example When CCL3L1 occupies the CCR5 receptor on CD4 cells, it blocks HIV's entry. Gonzales et al., Science, 2005

13 Copy-number variation of CCL3L1 within and among human and chimp populations

14 Gonzales et al., Science, 2005 Individuals with a high CCL3L1 gene copy number relative to their population average are more resistant to HIV infection than those with a low copy number, presumably because there is more ligand to compete with HIV during binding to CCR5. CCL3L1 and HIV Infection

15 15 Trisomy 21

16 16 Partial deletion of chr 5p

17 17 A cytogeneticist’s story “The story is about diagnosis of a 3 month old baby with macrocephaly and some heart problems. The doctors questioned a couple of syndromes which we tested for and found negative. Rather than continue this ‘shot in the dark’ approach, we put the case on an array and found a 2Mb deletion which notably deletes the gene NSD1 on chr 5, mutations in which are known to be cause Sotos syndrome. This is an overgrowth syndrome and fits with the macrocephaly. The bottom line is that we are able to diagnose quicker by this approach and delineate exactly the underlying genetic change.”

18 18 2Mb deletion Chromosome 5 A cytogeneticist’s story

19 19 A lung cancer cell line vs matched normal lymphoblast, from Nannya et al Cancer Res 2005;65:6071-6079 Many tumors have gross CN changes

20 20 Research into gonad dysfunction: Human sex reversal 20% of 46,XY females have mutations in SRY 80% of 46,XY females unexplained! 90% of 46,XX males due to translocation SRY 10% of 46,XX males unexplained! Suggests loss of function and gain of function mutations in other genes may cause sex reversal. We’re looking at shared deletions.

21 21 Genomic DNA ATCGGTAGCCATTCATGAGTTACTA Perfect Match probe for Allele A ATCGGTAGCCATCCATGAGTTACTA Perfect Match probe for Allele B A SNP G TAGCCATCGGTA GTACTCAATGAT Affymetrix SNP chip terminology Genotyping: answering the question about the two copies of the chromosome on which the SNP is located: Is a sample AA (AA), AB (AG) or BB (GG) at this SNP?

22 Affymetrix GeneChip Affymetrix GeneChip 1.28cm 6.4 million features/ chip 1.28cm 5 µ > 1 million identical 25 bp probes / feature * * * * * *

23 23 250 ng Genomic DNA RE Digestion Adaptor Ligation GeneChip Mapping Assay Overview Xba Fragmentation and Labeling PCR: One Primer Amplification Complexity Reduction AA BB AB Hyb & Wash

24 24 Principal low-level analysis steps Background adjustment and normalization at probe level These steps are to remove lab/operator/reagent effects Combining probe level summaries to probe set level summary: best done robustly, on many chips at once This is to remove probe affinity effects and discordant observations (gross errors/non-responding probes, etc) Possibly further rounds of normalization (probe set level) as lab/cohort/batch/other effects are frequently still visible Derive the relevant copy-number quantities Finally, quality assessment is an important low-level task.

25 25 AA TT AT Preprocessing for total CN using SNP probe pairs (250K chip) Modification by H Bengtsson of a method due to A Wirapati developed some years ago for microsatellite genotyping; similar to the approach used by Illumina.

26 26 Background adjustment and normalization Outcome similar to that achieved by quantile normalization

27 27 Low-level analysis problems remain unsolved; why? The feature size keeps  and so the # features/chip keeps  ; Fewer and fewer features are used for a given measurement, allowing more measurements to be made using a single chip These considerations all place more and more demands on the low-level analysis: to maintain the quality of existing measurements, and to obtain good new ones.

28 SNP probes can be used to estimate total copy numbers AA * * * PM = PM A + PM B = 2c * * * * * * ABAB * * * * * * * * * * BB * * * PM = PM A + PM B = 3c AAB * * *

29 29 SNP probe tiling strategy TAGCCATCGGTA N SNP 0 position A / G GTACTCAATGAT* ATCGGTAGCCAT T ATCGGTAGCCAT C ATCGGTAGCCAT G ATCGGTAGCCAT A CATGAGTTACTA PM MM PM MM A A B B 0 Allele Central probe quartet

30 30 SNP probe tiling strategy TAGCCATCGGTA N SNP +4 Position A / G GTA C TCAATGATCAGCT* GTAGCCAT T GTAGCCAT C GTAGCCAT T CAT G AGTTACTAGTCG CAT C AGTTACTAGTCG CAT G AGTTACTAGTCG CAT C AGTTACTAGTCG PM MM PM MM A A B B +4 Allele +4 offset probe quartet

31 31 SNP for Identifying Copy Number Variations Using SNP chips to identify change in total copy number (i.e. CN ≠ 2) Outline a new method (CRMA) Evaluate and compare it with other methods Make some closing remarks on further issues

32 32 Copy-number estimation using Robust Multichip Analysis (CRMA) CRMA Preprocessing (probe signals) allelic crosstalk (or quantile) Total CNPM=PM A +PM B Summarization (SNP signals  ) log-additive PM only Post-processingfragment-length (GC-content) Raw total CNs R = Reference M ij = log 2 (  ij /  Rj ) chip i, probe j A few details are passed over. Ask me later if you care about them.

33 Crosstalk between alleles - adds significant artifacts to signals Cross-hybridization: Allele A: TCGGTAAGTACTC Allele B: TCGGTATGTACTC AA * * * PM A >> PM B * * * * * * PM A ≈ PM B ABAB * * * * * * * PM A << PM B * * * BB

34 There are six possible allele pairs Nucleotides: {A, C, G, T} Ordered pairs: –(A,C), (A,G), (A,T), (C,G), (C,T), (G,C) Because of different nucleotides bind differently, the crosstalk from A to C might be very different from A to T.

35 AA BB AB Crosstalk between alleles is easy to spot offset + PM B PM A Example: Data from one array Probe pairs (PM A, PM B ) for nucleotide pair (A,T)

36 Crosstalk between alleles can be estimated and corrected for PM B PM A What is done: Offset is removed from SNPs and CN units. Crosstalk is removed from SNPs. + no offset AA BB AB

37 37 CRMA Preprocessing (probe signals) allelic crosstalk (or quantile) Total CNPM=PM A +PM B Summarization (SNP signals  ) log-additive PM only Postprocessingfragment-length (GC-content) Raw total CNs M ij = log 2 (  ij /  Rj ) Already briefly described. Copy-number estimation using Robust Multichip Analysis (CRMA)

38 38 CRMA Preprocessing (probe signals) allelic crosstalk (quantile) Total CNPM=PM A +PM B Summarization (SNPsignals  ) log-additive PM only Postprocessingfragment-length (GC-content) Raw total CNs M ij = log 2 (  ij /  Rj )  That’s it! Copy-number estimation using Robust Multichip Analysis (CRMA)

39 39 CRMA Preprocessing (probe signals) allelic crosstalk (quantile) Total CNsPM=PM A +PM B Summarization (SNP signals  ) log-additive PM only Postprocessingfragment-length (GC-content) Raw total CNs M ij = log 2 (  ij /  Rj ) log 2 (PM ijk ) = log 2  ij + log 2  jk +  ijk Fit using rlm Copy-number estimation using Robust Multichip Analysis (CRMA)

40 40 CRMA Preprocessing (probe signals) allelic crosstalk (quantile) Total CNPM=PM A +PM B Summarization (SNP signals  ) log-additive PM-only Postprocessingfragment-length (GC-content) Raw total CNs M ij = log 2 (  ij /  Rj ) 100K Longer fragments get less well amplified by PCR and so give weaker SNP signals Copy-number estimation using Robust Multichip Analysis (CRMA)

41 41 CRMA Preprocessing (probe signals) allelic crosstalk (quantile) Total CNPM=PM A +PM B Summarization (SNP signals  ) log-additive PM-only Postprocessingfragment-length (GC-content) Raw total CNs M ij = log 2 (  ij /  Rj ) 500K Longer fragments get less well amplified by PCR and so give weaker SNP signals Copy-number estimation using Robust Multichip Analysis (CRMA)

42 42 CRMA Preprocessing (probe signals) allelic crosstalk (quantile) Total CNPM=PM A +PM B Summarization (SNP signals  ) log-additive PM-only Postprocessingfragment-length (GC-content) Raw total CNs M ij = log 2 (  ij /  Rj ) 500K Longer fragments get less well amplified by PCR and so give weaker SNP signals Copy-number estimation using Robust Multichip Analysis (CRMA)

43 43 CRMA Preprocessing (probe signals) allelic crosstalk (quantile) Total CNPM=PM A +PM B Summarization (SNP signals  ) log-additive PM-only Postprocessingfragment-length (GC-content) Raw total CNs M ij = log 2 (  ij /  Rj ) Care required with the number and nature of Reference samples used Copy-number estimation using Robust Multichip Analysis (CRMA)

44 44 Comparison of 4 methods CRMAdChip (Li & Wong 2001) CNAG* (Nannya et al 2005) CNAT v4 (Affymetrix 2006) Preprocessing (probe signals) allelic crosstalk (quantile) quantilescalequantile Total CNPM=PM A +PM B PM=PM A +PM B MM=MM A +MM B PM=PM A +PM B “log-additive” PM-only Summarization (SNP signals  ) Log additive PM only Multiplicative PM-MM =A+B=A+B Post-processingfragment-length (GC-content ) fragment-length (GC-content) fragment-length (GC-content) Raw total CNs M ij = log 2 (  ij /  Rj )

45 45 Further bioinformatic issues Estimating copy number: needs calibration data Segmentation (of chromosomes into constant copy number regions): an HMM-like algorithm Analyzing family CN data: a different HMM Incorporating non-polymorphic probes: independent HMM observations to be weighted and combined Dealing with mixed normal-abnormal samples Utilizing poor quality DNA samples Estimating allele-specific copy number


Download ppt "CZ5225: Modeling and Simulation in Biology Lecture 10: Copy Number Variations Prof. Chen Yu Zong Tel: 6516-6877"

Similar presentations


Ads by Google