CZ5225: Modeling and Simulation in Biology Lecture 10: Copy Number Variations Prof. Chen Yu Zong Tel: 6516-6877

Slides:



Advertisements
Similar presentations
Lecture 2 Strachan and Read Chapter 13
Advertisements

What is an association study? Define linkage disequilibrium
CZ5225 Methods in Computational Biology Lecture 9: Pharmacogenetics and individual variation of drug response CZ5225 Methods in Computational Biology.
Low-Level Copy Number Analysis CRMA v2 preprocessing Henrik Bengtsson Post doc, Department of Statistics, University of California, Berkeley, USA CEIT.
Single Nucleotide Polymorphism Copy Number Variations and SNP Array Xiaole Shirley Liu and Jun Liu.
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Polymorphisms: Clinical Implications By Amr S. Moustafa, M.D.; Ph.D. Assistant Prof. & Consultant, Medical Biochemistry Dept. College of Medicine, KSU.
Recall that to be successful, all things must survive to reproduce
Cloning lab results Cloning the human genome Physical map of the chromosomes Genome sequencing Integrating physical and recombination maps Polymorphic.
DNA Copy Number Analysis Qunyuan Zhang, Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University.
Lecture 36: Cloning and Sequencing Genes. Lecture Outline, 12/5/05 Case Study: BRCA1, continued –Cloning DNA fragments into plasmids other vectors “Libraries”
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
02_13.jpg Human chromosome 4 02_15.jpg 02_15_2.jpg.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Restriction Fragment Length Polymorphisms (RFLPs) By Amr S. Moustafa, M.D.; Ph.D. Assistant Prof. & Consultant, Medical Biochemistry Dept. College of.
Genome-wide Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Chapter 3 -- Genetics Diversity Importance of Genetic Diversity Importance of Genetic Diversity -- Maintenance of genetic diversity is a major focus of.
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
The Chromosomal Basis of Inheritance Chapter 15. The importance of chromosomes In 1902, the chromosomal theory of inheritance began to take form, stating:
Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,
1 Genetic Variability. 2 A population is monomorphic at a locus if there exists only one allele at the locus. A population is polymorphic at a locus if.
HAPLOID GENOME SIZES (DNA PER HAPLOID CELL) Size rangeExample speciesEx. Size BACTERIA1-10 Mb E. coli: Mb FUNGI10-40 Mb S. cerevisiae 13 Mb INSECTS.
Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007.
Copy Number Variants: detection and analysis Manuel Ferreira & Shaun Purcell Boulder, 2009.
Data Type 1: Microarrays
DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Chapter : DQA1/PM Chapter 18: Autosomal STR Profiling.
Affymetrix CytoScan HD array
A Single-Array Preprocessing Method for Estimating Full-Resolution Raw Copy Numbers from all Affymetrix Genotyping Arrays Henrik Bengtsson (MSc CS, PhD.
CS177 Lecture 10 SNPs and Human Genetic Variation
1 Estimating chromosomal copy number InCoB2007, Hong Kong, 30 August, 2007.
A Genome-wide association study of Copy number variation in schizophrenia Andrés Ingason CNS Division, deCODE Genetics. Research Institute of Biological.
1 Commentary 1.Do not get too worried about "methods" and details. I fully expect there to be concepts and techniques that you simply are not going to.
Estimating chromosomal copy numbers from Affymetrix SNP & CN chips Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley September 13, 2007 "Statistics.
Genomics Collaboration Senior Scientist
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Methods in genome wide association studies. Norú Moreno
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Identification of Copy Number Variants using Genome Graphs
____ __ __ _______Birol et al :: AGBT :: 7 February 2008 A NOVEL APPROACH TO IMPROVE THE NOISE IN DETECTING COPY NUMBER VARIATIONS USING OLIGONUCLEOTIDE.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Computational Laboratory: aCGH Data Analysis Feb. 4, 2011 Per Chia-Chin Wu.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Microbial Genetics.  In bacteria genetic transfer (recombination) can happen three ways:  Transformation  Transduction  Conjugation  The result is.
Copy Number Analysis in the Cancer Genome Using SNP Arrays Qunyuan Zhang, Aldi Kraja Division of Statistical Genomics Department of Genetics & Center for.
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
동물 분자 유전체 연구의 최신 동향 National Institute of Animal Science Animal Genomics & Bioinformatics 정호영
Chromosomal Inheritance and Human Heredity. Human Chromosomes Karyotype – a picture of an organism’s chromosomes We take pictures during mitosis when.
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
8 and 11 April, 2005 Chapter 17 Population Genetics Genes in natural populations.
Lecture 6 Genetic drift & Mutation Sonja Kujala
Single Nucleotide Polymorphisms (SNPs
Global Variation in Copy Number in the Human Genome
Copy-number estimation using Robust Multichip Analysis - Supplementary materials for the aroma.affymetrix lab session Henrik Bengtsson & Terry Speed Dept.
Human Cells Human genomics
Linking Genetic Variation to Important Phenotypes
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
What are they?? How do we use them?
What are they?? How do we use them?
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

CZ5225: Modeling and Simulation in Biology Lecture 10: Copy Number Variations Prof. Chen Yu Zong Tel: Room 08-14, level 8, S16, NUS

2 Copy number variation (CNV) What is it? A form of human genetic variation: instead of 2 copies of each region of each chromosome (diploid), some people have amplifications or losses (> 1kb) in different regions –this doesn’t include translocations or inversions We all have such regions –the publicly available genome NA15510 has between 5 & 240 by various estimates –they are only rarely harmful (but rare things do happen)

Copy-number probes are used to quantify the amount of DNA at known loci CN locus:...CGTAGCCATCGGTAAGTACTCAATGATAG... PM: ATCGGTAGCCATTCATGAGTTACTA * * * PM = c CN=1 * * * PM = 2c CN=2 * * * PM = 3c CN=3

4 Copy number variation Population genomics The genomes of two humans differ more in a structural sense than at the nucleotide level; a recent paper estimates that on average two of us differ by ~ Mb of genetic due to Copy Number Variation ~ 2.5 Mb due to Single Nucleotide Polymorphisms

Abundance of CNVs in the human population ? Still an open question but probably thousands, at low allelic frequency (<20%)

Abundance of deletion CNVs in the human population Comparison of overlapping CNVs identified by Conrad et al. (2006) and McCarroll et al. (2006). Freeman et al. Genome Res 2006

Non-allelic homologous recombination events between low-copy repeats (LCR-NAHR) Lupski & Inoue, TIG 2002

Duplications and Deletions of LCRs mediated by NAHR LCRs in direct orientation LCRs in inverted orientation Inversions

Intrachromatid recombination between LCRs LCRs in direct orientationLCRs in inverted orientation Inversion Deletion

Mechanisms generating genomic deletions

11 Copy number variation Relations to human disease Responsible for a number of rare genetic conditions. For example, Down syndrome ( trisomy 21), Cri du chat syndrome (a partial deletion of 5p). Implicated in complex diseases. For example:  CCL3L1 CN   HIV/AIDS susceptibility; also, some sporadic (non-inherited) CN variants are strongly associated with autism, while Tumors typically have a lot of chromosomal abnormalities, including recurrent CN changes.

Evolutionary and medical implications of CNVs: CCL3L1 as an example When CCL3L1 occupies the CCR5 receptor on CD4 cells, it blocks HIV's entry. Gonzales et al., Science, 2005

Copy-number variation of CCL3L1 within and among human and chimp populations

Gonzales et al., Science, 2005 Individuals with a high CCL3L1 gene copy number relative to their population average are more resistant to HIV infection than those with a low copy number, presumably because there is more ligand to compete with HIV during binding to CCR5. CCL3L1 and HIV Infection

15 Trisomy 21

16 Partial deletion of chr 5p

17 A cytogeneticist’s story “The story is about diagnosis of a 3 month old baby with macrocephaly and some heart problems. The doctors questioned a couple of syndromes which we tested for and found negative. Rather than continue this ‘shot in the dark’ approach, we put the case on an array and found a 2Mb deletion which notably deletes the gene NSD1 on chr 5, mutations in which are known to be cause Sotos syndrome. This is an overgrowth syndrome and fits with the macrocephaly. The bottom line is that we are able to diagnose quicker by this approach and delineate exactly the underlying genetic change.”

18 2Mb deletion Chromosome 5 A cytogeneticist’s story

19 A lung cancer cell line vs matched normal lymphoblast, from Nannya et al Cancer Res 2005;65: Many tumors have gross CN changes

20 Research into gonad dysfunction: Human sex reversal 20% of 46,XY females have mutations in SRY 80% of 46,XY females unexplained! 90% of 46,XX males due to translocation SRY 10% of 46,XX males unexplained! Suggests loss of function and gain of function mutations in other genes may cause sex reversal. We’re looking at shared deletions.

21 Genomic DNA ATCGGTAGCCATTCATGAGTTACTA Perfect Match probe for Allele A ATCGGTAGCCATCCATGAGTTACTA Perfect Match probe for Allele B A SNP G TAGCCATCGGTA GTACTCAATGAT Affymetrix SNP chip terminology Genotyping: answering the question about the two copies of the chromosome on which the SNP is located: Is a sample AA (AA), AB (AG) or BB (GG) at this SNP?

Affymetrix GeneChip Affymetrix GeneChip 1.28cm 6.4 million features/ chip 1.28cm 5 µ > 1 million identical 25 bp probes / feature * * * * * *

ng Genomic DNA RE Digestion Adaptor Ligation GeneChip Mapping Assay Overview Xba Fragmentation and Labeling PCR: One Primer Amplification Complexity Reduction AA BB AB Hyb & Wash

24 Principal low-level analysis steps Background adjustment and normalization at probe level These steps are to remove lab/operator/reagent effects Combining probe level summaries to probe set level summary: best done robustly, on many chips at once This is to remove probe affinity effects and discordant observations (gross errors/non-responding probes, etc) Possibly further rounds of normalization (probe set level) as lab/cohort/batch/other effects are frequently still visible Derive the relevant copy-number quantities Finally, quality assessment is an important low-level task.

25 AA TT AT Preprocessing for total CN using SNP probe pairs (250K chip) Modification by H Bengtsson of a method due to A Wirapati developed some years ago for microsatellite genotyping; similar to the approach used by Illumina.

26 Background adjustment and normalization Outcome similar to that achieved by quantile normalization

27 Low-level analysis problems remain unsolved; why? The feature size keeps  and so the # features/chip keeps  ; Fewer and fewer features are used for a given measurement, allowing more measurements to be made using a single chip These considerations all place more and more demands on the low-level analysis: to maintain the quality of existing measurements, and to obtain good new ones.

SNP probes can be used to estimate total copy numbers AA * * * PM = PM A + PM B = 2c * * * * * * ABAB * * * * * * * * * * BB * * * PM = PM A + PM B = 3c AAB * * *

29 SNP probe tiling strategy TAGCCATCGGTA N SNP 0 position A / G GTACTCAATGAT* ATCGGTAGCCAT T ATCGGTAGCCAT C ATCGGTAGCCAT G ATCGGTAGCCAT A CATGAGTTACTA PM MM PM MM A A B B 0 Allele Central probe quartet

30 SNP probe tiling strategy TAGCCATCGGTA N SNP +4 Position A / G GTA C TCAATGATCAGCT* GTAGCCAT T GTAGCCAT C GTAGCCAT T CAT G AGTTACTAGTCG CAT C AGTTACTAGTCG CAT G AGTTACTAGTCG CAT C AGTTACTAGTCG PM MM PM MM A A B B +4 Allele +4 offset probe quartet

31 SNP for Identifying Copy Number Variations Using SNP chips to identify change in total copy number (i.e. CN ≠ 2) Outline a new method (CRMA) Evaluate and compare it with other methods Make some closing remarks on further issues

32 Copy-number estimation using Robust Multichip Analysis (CRMA) CRMA Preprocessing (probe signals) allelic crosstalk (or quantile) Total CNPM=PM A +PM B Summarization (SNP signals  ) log-additive PM only Post-processingfragment-length (GC-content) Raw total CNs R = Reference M ij = log 2 (  ij /  Rj ) chip i, probe j A few details are passed over. Ask me later if you care about them.

Crosstalk between alleles - adds significant artifacts to signals Cross-hybridization: Allele A: TCGGTAAGTACTC Allele B: TCGGTATGTACTC AA * * * PM A >> PM B * * * * * * PM A ≈ PM B ABAB * * * * * * * PM A << PM B * * * BB

There are six possible allele pairs Nucleotides: {A, C, G, T} Ordered pairs: –(A,C), (A,G), (A,T), (C,G), (C,T), (G,C) Because of different nucleotides bind differently, the crosstalk from A to C might be very different from A to T.

AA BB AB Crosstalk between alleles is easy to spot offset + PM B PM A Example: Data from one array Probe pairs (PM A, PM B ) for nucleotide pair (A,T)

Crosstalk between alleles can be estimated and corrected for PM B PM A What is done: Offset is removed from SNPs and CN units. Crosstalk is removed from SNPs. + no offset AA BB AB

37 CRMA Preprocessing (probe signals) allelic crosstalk (or quantile) Total CNPM=PM A +PM B Summarization (SNP signals  ) log-additive PM only Postprocessingfragment-length (GC-content) Raw total CNs M ij = log 2 (  ij /  Rj ) Already briefly described. Copy-number estimation using Robust Multichip Analysis (CRMA)

38 CRMA Preprocessing (probe signals) allelic crosstalk (quantile) Total CNPM=PM A +PM B Summarization (SNPsignals  ) log-additive PM only Postprocessingfragment-length (GC-content) Raw total CNs M ij = log 2 (  ij /  Rj )  That’s it! Copy-number estimation using Robust Multichip Analysis (CRMA)

39 CRMA Preprocessing (probe signals) allelic crosstalk (quantile) Total CNsPM=PM A +PM B Summarization (SNP signals  ) log-additive PM only Postprocessingfragment-length (GC-content) Raw total CNs M ij = log 2 (  ij /  Rj ) log 2 (PM ijk ) = log 2  ij + log 2  jk +  ijk Fit using rlm Copy-number estimation using Robust Multichip Analysis (CRMA)

40 CRMA Preprocessing (probe signals) allelic crosstalk (quantile) Total CNPM=PM A +PM B Summarization (SNP signals  ) log-additive PM-only Postprocessingfragment-length (GC-content) Raw total CNs M ij = log 2 (  ij /  Rj ) 100K Longer fragments get less well amplified by PCR and so give weaker SNP signals Copy-number estimation using Robust Multichip Analysis (CRMA)

41 CRMA Preprocessing (probe signals) allelic crosstalk (quantile) Total CNPM=PM A +PM B Summarization (SNP signals  ) log-additive PM-only Postprocessingfragment-length (GC-content) Raw total CNs M ij = log 2 (  ij /  Rj ) 500K Longer fragments get less well amplified by PCR and so give weaker SNP signals Copy-number estimation using Robust Multichip Analysis (CRMA)

42 CRMA Preprocessing (probe signals) allelic crosstalk (quantile) Total CNPM=PM A +PM B Summarization (SNP signals  ) log-additive PM-only Postprocessingfragment-length (GC-content) Raw total CNs M ij = log 2 (  ij /  Rj ) 500K Longer fragments get less well amplified by PCR and so give weaker SNP signals Copy-number estimation using Robust Multichip Analysis (CRMA)

43 CRMA Preprocessing (probe signals) allelic crosstalk (quantile) Total CNPM=PM A +PM B Summarization (SNP signals  ) log-additive PM-only Postprocessingfragment-length (GC-content) Raw total CNs M ij = log 2 (  ij /  Rj ) Care required with the number and nature of Reference samples used Copy-number estimation using Robust Multichip Analysis (CRMA)

44 Comparison of 4 methods CRMAdChip (Li & Wong 2001) CNAG* (Nannya et al 2005) CNAT v4 (Affymetrix 2006) Preprocessing (probe signals) allelic crosstalk (quantile) quantilescalequantile Total CNPM=PM A +PM B PM=PM A +PM B MM=MM A +MM B PM=PM A +PM B “log-additive” PM-only Summarization (SNP signals  ) Log additive PM only Multiplicative PM-MM =A+B=A+B Post-processingfragment-length (GC-content ) fragment-length (GC-content) fragment-length (GC-content) Raw total CNs M ij = log 2 (  ij /  Rj )

45 Further bioinformatic issues Estimating copy number: needs calibration data Segmentation (of chromosomes into constant copy number regions): an HMM-like algorithm Analyzing family CN data: a different HMM Incorporating non-polymorphic probes: independent HMM observations to be weighted and combined Dealing with mixed normal-abnormal samples Utilizing poor quality DNA samples Estimating allele-specific copy number