Copy Number Analysis in the Cancer Genome Using SNP Arrays Qunyuan Zhang, Aldi Kraja Division of Statistical Genomics Department of Genetics & Center for.

Slides:



Advertisements
Similar presentations
DNA copy number variation and cancer risk John F Pearson Canterbury Statistics Open Day University of Canterbury 2/10/2012.
Advertisements

Single Nucleotide Polymorphism Copy Number Variations and SNP Array Xiaole Shirley Liu and Jun Liu.
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Methods for copy number variation: hidden Markov model and change- point models.
Tumour karyotype Spectral karyotyping showing chromosomal aberrations in cancer cell lines.
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
DNA Copy Number Analysis Qunyuan Zhang, Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University.
Some slides adapted from J. Fridlyand BioSys course: DNA Microarray Analysis – Lecture, 2007 Analysis of Array CGH Data by Hanni Willenbrock.
Getting the numbers comparable
Genomic Arrays: Tools for cancer gene discovery Ian Roberts MRC Cancer Cell Unit Hutchison MRC Research Centre
STAC: A multi-experiment method for analyzing array-based genomic copy number data Sharon J. Diskin, Thomas Eck, Joel P. Greshock, Yael P. Mosse, Tara.
DNA Copy Number Analysis Qunyuan Zhang Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University School.
Comparative Genomic Hybridization (CGH). Outline Introduction to gene copy numbers and CGH technology DNA copy number alterations in breast cancer (Pollack.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
Affymetrix GeneChip Data Analysis Chip concepts and array design Improving intensity estimation from probe pairs level Clustering Motif discovering and.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Genome-wide Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University.
Gene expression array and SNP array
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
Manifestation of Novel Social Challenges of the European Union in the Teaching Material of Medical Biotechnology Master’s Programmes at the University.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
GENOMIC COPY NUMBER Rudy Guerra Department of Statistics Rice University April 14, 2008.
CDNA Microarrays MB206.
Data Type 1: Microarrays
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University.
Large Scale Variation Among Human and Great Ape Genomes Determined by Array Comparative Genomic Hybridization Devin P. Locke, Richard Segraves, Lucia Carbone,
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
CS177 Lecture 10 SNPs and Human Genetic Variation
CZ5225: Modeling and Simulation in Biology Lecture 10: Copy Number Variations Prof. Chen Yu Zong Tel:
A Genome-wide association study of Copy number variation in schizophrenia Andrés Ingason CNS Division, deCODE Genetics. Research Institute of Biological.
Nature Genetics Vol.36 Sept 2004 Detection of Large-scale Variation In the Human Genome Iafrate, Feuk, Rivera, Listewnik, Donahoe, Qi, Scherer, Lee any.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Complex Traits Most neurobehavioral traits are complex Multifactorial
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Methods in genome wide association studies. Norú Moreno
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Identification of Copy Number Variants using Genome Graphs
Other genomic arrays: Methylation, chIP on chip… UBio Training Courses.
____ __ __ _______Birol et al :: AGBT :: 7 February 2008 A NOVEL APPROACH TO IMPROVE THE NOISE IN DETECTING COPY NUMBER VARIATIONS USING OLIGONUCLEOTIDE.
Cancer genomics Yao Fu March 4, Cancer is a genetic disease In the early 1970’s, Janet Rowley’s microscopy studies of leukemia cell chromosomes.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Correlation Matrix Diagonal Segmentation (CMDS) A Fast Genome-wide Approach for Identifying Recurrent DNA Copy Number Alterations across Cancer Patients.
Computational Laboratory: aCGH Data Analysis Feb. 4, 2011 Per Chia-Chin Wu.
California Pacific Medical Center
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
CGH Data BIOS Chromosome Re-arrangements.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
Other uses of DNA microarrays
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
The Haplotype Blocks Problems Wu Ling-Yun
Special Topics in Genomics ChIP-chip and Tiling Arrays.
Global Variation in Copy Number in the Human Genome
Copy-number estimation using Robust Multichip Analysis - Supplementary materials for the aroma.affymetrix lab session Henrik Bengtsson & Terry Speed Dept.
Microarray Technology and Applications
High-Resolution Genomic Profiling of Disseminated Tumor Cells in Prostate Cancer  Yu Wu, Jamie R. Schoenborn, Colm Morrissey, Jing Xia, Sandy Larson, Lisha.
Discovery tools for human genetic variations
Linking Genetic Variation to Important Phenotypes
Genomic alterations in breast cancer cell line MDA-MB-231.
Getting the numbers comparable
Microarray Techniques to Analyze Copy-Number Alterations in Genomic DNA: Array Comparative Genomic Hybridization and Single-Nucleotide Polymorphism Array 
Histology and genomic copy number alterations in TRAMP tumors.
High-Resolution Molecular Characterization of 15q11-q13 Rearrangements by Array Comparative Genomic Hybridization (Array CGH) with Detection of Gene Dosage 
Presentation transcript:

Copy Number Analysis in the Cancer Genome Using SNP Arrays Qunyuan Zhang, Aldi Kraja Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University School of Medicine Statistical Genetics Forum

What is Copy Number ? Gene Copy Number The gene copy number (also "copy number variants" or CNVs) is the amount of copies of a particular gene in the genotype of an individual. Recent evidence shows that the gene copy number can be elevated in cancer cells... (from Wikipedia DNA Copy Number A Copy Number Variant (CNV) represents a copy number change involving a DNA fragment that is ~1 kilobases or larger. (from Nature Reviews Genetics, Feuk et al. 2006) Chromosomal Copy Number It refers to DNA Copy Number in most publications.

Why Study Copy Number ? “ Chromosomal copy number alterations can lead to activation of oncogenes and inactivation of tumor suppressor genes (TSGs) in human cancers. … identification of cancer-specific copy number alterations will not only provide new insight into understanding the molecular basis of tumorigenesis but will also facilitate the discovery of new TSGs and oncogenes.”

DNA Copy Number Changes in Tumor Cells Homologous repeats Segmental duplications Chromosomal rearrangements Duplicative transpositions Non-allelic recombinations …… Normal cell Tumor cells deletion amplification CN=0 CN=1 CN=2 CN=3 CN=4 CN=2

Why Use SNP Arrays ? CGH Array CGH: Comparative genomic hybridization “Array-based CGH makes it possible to scan the genome for copy number with high resolution by hybridizing to arrayed genomic DNA or cDNA clones. …However, currently available array CGH methods cannot simultaneously detect chromosomal loss of heterozygosity (LOH). “ SNP Array “… to combine the detection of cancer copy number with cancer-specific LOH in the same experiments, we have developed an analytical method to detect DNA copy number changes by hybridization of representations of genomic DNA to commercially available single nucleotide polymorphism (SNP) arrays.” Simultanously detect DNA copy number changes and phenotype changes (LOH) in tumor cells

Materials & Methods 5 samples for validation, with known copy numbers of chromosome X (1,2,3,4,5 copies of chrom. X ) 2 diploid cell lines containing cytogenetically mapped partial or whole-chromosome copy number gains or losses. 18 lung and breast cancer cell lines 15 normal blood control cell lines Affymetrix XbaI mapping array 130 (10,043 SNPs) Chip scanning and image processing by MAS 5.0 Intensity normalization and summarization Raw/observed copy numbers of cancer samples Segmentation and copy number estimation (Hidden Markov Model, HMM)

Normalization & Summarization Normalization (reducing technical variation between chips, making intensities from different chips comparable) - Base Line Array Method Summarization (combining the multiple probe intensities for each SNP to produce a summarized signal value for each SNP) Perfect Match: pm = pmA + pmB Mismach: mm = mmA + mmB Model based summarization pm/mm difference multiplicative model (Li & Wong, 2001)

Observed/Raw Copy Number Data Observed/Raw Copy Number Data For each SNP of each cancer sample observed signal Observed CN = x 2 mean signal of two copy normal samples Log 2 Transformed Intensities and Raw CNs Black: Normal, Red: Tumor, Green: Tumor/Normal

Segmentation & Estimation CN=2CN=1 CN=4 CN=3

CN Estimation: Hidden Markov Model (HMM) CN Estimation: Hidden Markov Model (HMM) CNAT( dChip ( ; CNAG ( CN=? Obs. CN … SNP_i SNP_i+1 SNP_i+2 SNP_i+3 SNP_i+4 … SNP Hidden status (unknown CN ) Observed status (observed/raw CN) CN estimation: finding a sequence of CN values which maximizes the likelihood of observed raw CN. Algorithm: Viterbi algorithm Information/assumptions below are needed Background probabilities: Overall probabilities of possible CN values. P(CN=x); x=0,1,2,3,… n (usually,n<10) Transition probabilities: Probabilities of CN values of each SNP conditional on the previous one. P(CN_i+1=x|CN_i=y); x=0,1,2,3,… n; y=0,1,2,3, … n Emission probabilities: Probabilities of observed raw CN values of each SNP conditional on the hidden/unknown/true CN status. P( observed CN | CN=y) y=0,1,2,3, …n

Prior Information for HMM Prior Information for HMM Background Probabilities (B) Overall probabilities of possible CN values. P(CN=2)=0.9 P(CN=i)=0.1/(N-1), i=0,1,3,4,…,N; N=max CN allowed. e.g. P(CN=i)=0.01 when N=11 Transition Probabilities (T) Probabilities of CN values of each SNP conditional on the previous one. P(CN_i+1=x|CN_i=y); x=-0,1,2,3,… n; y=0,1,2,3, … n Genetic distance (Haldane map funtion) Emission Probabilities (E) Probabilities of observed raw CN values of each SNP conditional on the hidden/unknown/true CN status. Signal |CN ~ t distribution with df=40 Max Liklihood (Observed CN | B, T, E); Interative … n 0 p00 p01 p02 p03 … p0n 1 p10 p11 p12 p13 … p1n 2 p20 p21 p22 p23 … p2n 3 p30 p31 p32 p33 … p0n … n pn0 pn1 pn2 pn3 … pnn

HMM CN estimation for the samples with known CN of Chr. X

Errors of HMM (1-99.2%=0.8%) “… our criteria for homozygous deletion require the presence of at least 2 SNPs that cover an area of 1 kb in addition to an inferred copy number of 0 …”

HMM CN estimation for the samples with known loss or gain regions

HMM CN estimation for cancer cell lines

Contamination Problem

Disadvantages of HMM With no significance test Intense computation Individual level analysis

Software Affymetrix Chips ( Illumina Chips ( CNAT( dChip ( CNAG ( GenePattern BioConductor R Packages ( GLAD package, adaptive weights smoothing (AWS) method DNAcopy package, circular binary segmentation method

References JL Freeman et al. Genome Research 2006; 16: J Huang et al. Hum Genomics. 2004;1(4): X Zhao et al. Cancer Research 2004; 64: Y Nannya et al. Cancer Research 2005, 65: … see google …

Genome-wide Raw CN Changes (Piar#105)

Genome-wide Raw CN Changes (average over ~400 pairs )

Raw CN Changes of Chr. 14 (average over ~400 pairs )

Sliding Window Analysis ….. … … …… …….. … … …… ….. …… ….. Window 1 Window 2 Window 3 Window 4 Window 5 Window 6 Window 7 Window 8 Window 9 Window 10 Window N Window k ……….. Each window (k) contains 30 consecutive SNPs (k, k+1, k+2, k+3, …, k+29)

Genome-wide Raw Copy Number Changes (sliding window plot, averaged over ~400 pairs )

Sliding Window Test of Significance of CN Changes -log(p) values, based on ~ 400 pairs

CN Change Frequencies in Population ( Chr.14,~400 pairs) Black: Freq.(CN>0) Red: Freq.(CN>0, significant amplification at 0.01 level) Green: Freq.(CN<0, significant deletion at 0.01 level)

Microarray: From Image to Copy Number TumorNormal Affymetrix Mapping 250K Sty- I chip ~250K probe sets ~250K SNPs CN=1 CN=0 CN>2 CN=2 probe set (24 probes) Deletion Amplification more DNA copy number more DNA hybridization higher intensity

General Procedures for Copy Number Analysis Finished chips (scanner) Raw image data [.DAT files] (experiment info [.EXP]) (image processing software) Probe level raw intensity data [.CEL files] Background adjustment, Normalization, Summarization Summarized intensity data Raw copy number (CN) data [log ratio of tumor/normal intensities] Significance test of CN changes Estimation of CN Smoothing and boundary determination Concurrent regions among population Amplification and deletion frequencies among populations Association analysis Preprocessing : chip description file [.CDF]