Download presentation
Presentation is loading. Please wait.
Published byDeborah Small Modified over 8 years ago
1
Copy Number Analysis in the Cancer Genome Using SNP Arrays Qunyuan Zhang, Aldi Kraja Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University School of Medicine Statistical Genetics Forum 02 - 12 - 2007
3
What is Copy Number ? Gene Copy Number The gene copy number (also "copy number variants" or CNVs) is the amount of copies of a particular gene in the genotype of an individual. Recent evidence shows that the gene copy number can be elevated in cancer cells... (from Wikipedia www.wikipedia.org)www.wikipedia.org DNA Copy Number A Copy Number Variant (CNV) represents a copy number change involving a DNA fragment that is ~1 kilobases or larger. (from Nature Reviews Genetics, Feuk et al. 2006) Chromosomal Copy Number It refers to DNA Copy Number in most publications.
4
Why Study Copy Number ? “ Chromosomal copy number alterations can lead to activation of oncogenes and inactivation of tumor suppressor genes (TSGs) in human cancers. … identification of cancer-specific copy number alterations will not only provide new insight into understanding the molecular basis of tumorigenesis but will also facilitate the discovery of new TSGs and oncogenes.”
5
DNA Copy Number Changes in Tumor Cells Homologous repeats Segmental duplications Chromosomal rearrangements Duplicative transpositions Non-allelic recombinations …… Normal cell Tumor cells deletion amplification CN=0 CN=1 CN=2 CN=3 CN=4 CN=2
6
Why Use SNP Arrays ? CGH Array CGH: Comparative genomic hybridization “Array-based CGH makes it possible to scan the genome for copy number with high resolution by hybridizing to arrayed genomic DNA or cDNA clones. …However, currently available array CGH methods cannot simultaneously detect chromosomal loss of heterozygosity (LOH). “ SNP Array “… to combine the detection of cancer copy number with cancer-specific LOH in the same experiments, we have developed an analytical method to detect DNA copy number changes by hybridization of representations of genomic DNA to commercially available single nucleotide polymorphism (SNP) arrays.” Simultanously detect DNA copy number changes and phenotype changes (LOH) in tumor cells
7
Materials & Methods 5 samples for validation, with known copy numbers of chromosome X (1,2,3,4,5 copies of chrom. X ) 2 diploid cell lines containing cytogenetically mapped partial or whole-chromosome copy number gains or losses. 18 lung and breast cancer cell lines 15 normal blood control cell lines Affymetrix XbaI mapping array 130 (10,043 SNPs) Chip scanning and image processing by MAS 5.0 Intensity normalization and summarization Raw/observed copy numbers of cancer samples Segmentation and copy number estimation (Hidden Markov Model, HMM)
8
Normalization & Summarization Normalization (reducing technical variation between chips, making intensities from different chips comparable) - Base Line Array Method Summarization (combining the multiple probe intensities for each SNP to produce a summarized signal value for each SNP) Perfect Match: pm = pmA + pmB Mismach: mm = mmA + mmB Model based summarization pm/mm difference multiplicative model (Li & Wong, 2001)
9
Observed/Raw Copy Number Data Observed/Raw Copy Number Data For each SNP of each cancer sample observed signal Observed CN = x 2 mean signal of two copy normal samples Log 2 Transformed Intensities and Raw CNs Black: Normal, Red: Tumor, Green: Tumor/Normal
10
Segmentation & Estimation CN=2CN=1 CN=4 CN=3
11
CN Estimation: Hidden Markov Model (HMM) CN Estimation: Hidden Markov Model (HMM) CNAT(www.affymetrix.com); dChip (www.dchip.org) ; CNAG (www.genome.umin.jp) CN=? Obs. CN … SNP_i SNP_i+1 SNP_i+2 SNP_i+3 SNP_i+4 … SNP Hidden status (unknown CN ) Observed status (observed/raw CN) CN estimation: finding a sequence of CN values which maximizes the likelihood of observed raw CN. Algorithm: Viterbi algorithm Information/assumptions below are needed Background probabilities: Overall probabilities of possible CN values. P(CN=x); x=0,1,2,3,… n (usually,n<10) Transition probabilities: Probabilities of CN values of each SNP conditional on the previous one. P(CN_i+1=x|CN_i=y); x=0,1,2,3,… n; y=0,1,2,3, … n Emission probabilities: Probabilities of observed raw CN values of each SNP conditional on the hidden/unknown/true CN status. P( observed CN | CN=y) y=0,1,2,3, …n
12
Prior Information for HMM Prior Information for HMM Background Probabilities (B) Overall probabilities of possible CN values. P(CN=2)=0.9 P(CN=i)=0.1/(N-1), i=0,1,3,4,…,N; N=max CN allowed. e.g. P(CN=i)=0.01 when N=11 Transition Probabilities (T) Probabilities of CN values of each SNP conditional on the previous one. P(CN_i+1=x|CN_i=y); x=-0,1,2,3,… n; y=0,1,2,3, … n Genetic distance (Haldane map funtion) Emission Probabilities (E) Probabilities of observed raw CN values of each SNP conditional on the hidden/unknown/true CN status. Signal |CN ~ t distribution with df=40 Max Liklihood (Observed CN | B, T, E); Interative 0 1 2 3 … n 0 p00 p01 p02 p03 … p0n 1 p10 p11 p12 p13 … p1n 2 p20 p21 p22 p23 … p2n 3 p30 p31 p32 p33 … p0n … n pn0 pn1 pn2 pn3 … pnn
13
HMM CN estimation for the samples with known CN of Chr. X
14
Errors of HMM (1-99.2%=0.8%) “… our criteria for homozygous deletion require the presence of at least 2 SNPs that cover an area of 1 kb in addition to an inferred copy number of 0 …”
15
HMM CN estimation for the samples with known loss or gain regions
16
HMM CN estimation for cancer cell lines
17
Contamination Problem
18
Disadvantages of HMM With no significance test Intense computation Individual level analysis
19
Software Affymetrix Chips (www.affymetrix.com)www.affymetrix.com Illumina Chips (www.illumina.com)www.illumina.com CNAT(www.affymetrix.com)www.affymetrix.com dChip (www.dchip.org)www.dchip.org CNAG (www.genome.umin.jp)www.genome.umin.jp GenePattern www.broad.mit.edu/cancer/software/genepattern/www.broad.mit.edu/cancer/software/genepattern/ BioConductor R Packages (www.bioconductor.org)www.bioconductor.org GLAD package, adaptive weights smoothing (AWS) method DNAcopy package, circular binary segmentation method
20
References JL Freeman et al. Genome Research 2006; 16:949-961 J Huang et al. Hum Genomics. 2004;1(4):287-99 X Zhao et al. Cancer Research 2004; 64:3060-3071 Y Nannya et al. Cancer Research 2005, 65: 6071-6079 … see google …
21
Genome-wide Raw CN Changes (Piar#105)
22
Genome-wide Raw CN Changes (average over ~400 pairs )
23
Raw CN Changes of Chr. 14 (average over ~400 pairs )
24
Sliding Window Analysis ….. … …...... …… …….. … …...... …… ….. …… ….. Window 1 Window 2 Window 3 Window 4 Window 5 Window 6 Window 7 Window 8 Window 9 Window 10 Window N Window k ……….. Each window (k) contains 30 consecutive SNPs (k, k+1, k+2, k+3, …, k+29)
25
Genome-wide Raw Copy Number Changes (sliding window plot, averaged over ~400 pairs )
26
Sliding Window Test of Significance of CN Changes -log(p) values, based on ~ 400 pairs
27
CN Change Frequencies in Population ( Chr.14,~400 pairs) Black: Freq.(CN>0) Red: Freq.(CN>0, significant amplification at 0.01 level) Green: Freq.(CN<0, significant deletion at 0.01 level)
28
Microarray: From Image to Copy Number TumorNormal Affymetrix Mapping 250K Sty- I chip ~250K probe sets ~250K SNPs CN=1 CN=0 CN>2 CN=2 probe set (24 probes) Deletion Amplification more DNA copy number more DNA hybridization higher intensity
29
General Procedures for Copy Number Analysis Finished chips (scanner) Raw image data [.DAT files] (experiment info [.EXP]) (image processing software) Probe level raw intensity data [.CEL files] Background adjustment, Normalization, Summarization Summarized intensity data Raw copy number (CN) data [log ratio of tumor/normal intensities] Significance test of CN changes Estimation of CN Smoothing and boundary determination Concurrent regions among population Amplification and deletion frequencies among populations Association analysis Preprocessing : chip description file [.CDF]
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.