Genome-wide Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University.

Slides:



Advertisements
Similar presentations
Polymorphisms: Clinical Implications By Amr S. Moustafa, M.D.; Ph.D. Assistant Prof. & Consultant, Medical Biochemistry Dept. College of Medicine, KSU.
Advertisements

Microarray Normalization
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
DNA Copy Number Analysis Qunyuan Zhang, Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University.
Getting the numbers comparable
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
DNA Copy Number Analysis Qunyuan Zhang Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University School.
Comparative Genomic Hybridization (CGH). Outline Introduction to gene copy numbers and CGH technology DNA copy number alterations in breast cancer (Pollack.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Restriction Fragment Length Polymorphisms (RFLPs) By Amr S. Moustafa, M.D.; Ph.D. Assistant Prof. & Consultant, Medical Biochemistry Dept. College of.
Gene expression array and SNP array
Chapter 3 -- Genetics Diversity Importance of Genetic Diversity Importance of Genetic Diversity -- Maintenance of genetic diversity is a major focus of.
Reading the Blueprint of Life
DNA basics DNA is a molecule located in the nucleus of a cell Every cell in an organism contains the same DNA Characteristics of DNA varies between individuals.
Genetic and Molecular Epidemiology Lecture III: Molecular and Genetic Measures Jan 19, 2009 Joe Wiemels HD 274 (Mission Bay)
Plant Molecular Systematics Michael G. Simpson
AP Biology: Chapter 14 DNA Technologies
AP Biology Ch. 20 Biotechnology.
GENOMIC COPY NUMBER Rudy Guerra Department of Statistics Rice University April 14, 2008.
How do you identify and clone a gene of interest? Shotgun approach? Is there a better way?
CDNA Microarrays MB206.
Restriction Nucleases Cut at specific recognition sequence Fragments with same cohesive ends can be joined.
Investigating the use of Multiple Displacement Amplification (MDA) to amplify nanogram quantities of DNA to use for downstream mutation screening by sequencing.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University.
Module 1 Section 1.3 DNA Technology
Agenda Introduction to microarrays
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
CZ5225: Modeling and Simulation in Biology Lecture 10: Copy Number Variations Prof. Chen Yu Zong Tel:
A Genome-wide association study of Copy number variation in schizophrenia Andrés Ingason CNS Division, deCODE Genetics. Research Institute of Biological.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Identification of Copy Number Variants using Genome Graphs
____ __ __ _______Birol et al :: AGBT :: 7 February 2008 A NOVEL APPROACH TO IMPROVE THE NOISE IN DETECTING COPY NUMBER VARIATIONS USING OLIGONUCLEOTIDE.
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Correlation Matrix Diagonal Segmentation (CMDS) A Fast Genome-wide Approach for Identifying Recurrent DNA Copy Number Alterations across Cancer Patients.
Computational Laboratory: aCGH Data Analysis Feb. 4, 2011 Per Chia-Chin Wu.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Chapter 10: Genetic Engineering- A Revolution in Molecular Biology.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Simple-Sequence Length Polymorphisms SSLPs Short tandemly repeated DNA sequences that are present in variable copy numbers at a given locus. Scattered.
Chapter 20 DNA Technology and Genomics. Biotechnology is the manipulation of organisms or their components to make useful products. Recombinant DNA is.
CGH Data BIOS Chromosome Re-arrangements.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Copy Number Analysis in the Cancer Genome Using SNP Arrays Qunyuan Zhang, Aldi Kraja Division of Statistical Genomics Department of Genetics & Center for.
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
DNA Fingerprinting Maryam Ahmed Khan February 14, 2001.
Higher Human Biology Unit 1 Human Cells KEY AREA 5: Human Genomics.
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
Simple-Sequence Length Polymorphisms
Part 3 Gene Technology & Medicine
Copy-number estimation using Robust Multichip Analysis - Supplementary materials for the aroma.affymetrix lab session Henrik Bengtsson & Terry Speed Dept.
Microarray Technology and Applications
DNA Marker Lecture 10 BY Ms. Shumaila Azam
Chapter 20 – DNA Technology and Genomics
Relationship between Genotype and Phenotype
Relationship between Genotype and Phenotype
Linking Genetic Variation to Important Phenotypes
تهیه کننده بهارا رستمی نیا بهار 94
Getting the numbers comparable
DNA Profiling Vocabulary
Relationship between Genotype and Phenotype
Relationship between Genotype and Phenotype
RealTime-PCR.
Presentation transcript:

Genome-wide Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University School of Medicine – 2006 Course: M Computational Statistical Genetics

Four Questions What is Copy Number ? What is Copy Number ? What can Copy Number tell us? What can Copy Number tell us? How to measure/quantify Copy Number? How to measure/quantify Copy Number? How to analyze Copy Number? How to analyze Copy Number?

What is Copy Number ? Gene Copy Number The gene copy number (also "copy number variants" or CNVs) is the amount of copies of a particular gene in the genotype of an individual. Recent evidence shows that the gene copy number can be elevated in cancer cells. For instance, the EGFR copy number can be higher than normal in Non-small cell lung cancer. …Elevating the gene copy number of a particular gene can increase the expression of the protein that it encodes. From Wikipedia

DNA Copy Number A Copy Number Variant (CNV) represents a copy number change involving a DNA fragment that is ~1 kilobases or larger. From Nature Reviews Genetics, Feuk et al DNA Copy Number ≠ DNA Tandem Repeat Number (e.g. micro satellites) <10 bases DNA Copy Number ≠ RNA Copy Number RNA Copy Number = Gene Expression Level DNA transcription mRNA Copy Number is the amount of copies of a particular fragment of nucleic acid molecular chain. It refers to DNA Copy Number in most publications.

What can Copy Number tell us? Genetic Diversity/Polymorphisms - restriction fragment length polymorphism (RFLP) - amplified fragment length polymorphism (AFLP) - random amplification of polymorphic DNA (RAPD) - variable number of tandem repeat (VNTR; e.g., mini- and microsatellite) - single nucleotide polymorphism (SNP) - presence/absence of transportable elements … - structural alterations (e.g., deletions, duplications, inversions … ) - DNA copy number variant (CNV) Association with phenotypes/diseases genes/genetic factors

Genetic Alterations in Tumor Cells (DNA Copy Number Changes) Homologous repeats Segmental duplications Chromosomal rearrangements Duplicative transpositions Non-allelic recombinations …… Normal cell Tumor cells deletion amplification CN=0 CN=1 CN=2 CN=3 CN=4 CN=2

How to measure/quantify Copy Number? Quantitative Polymerase Chain Reaction (Q-PCR) : DNA Amplification (dNTPs, primers, Taq polymerase, fluorescent dye) PCR less CN amplification less DNA low fluorescent intensity more CN amplification more DNA high fluorescent intensity (one fragment each time) Microarray : DNA Hybridization (dNTPs, primers, Taq polymerase, fluorescent dye) PCR less CN amplification less DNA arrayed probes low intensities more CN amplification more DNA arrayed probes high intensities (multiple/different fragments, mixed pool) Hybridization

Microarray: From Image to Copy Number TumorNormal Affymetrix Mapping 250K Sty- I chip ~250K probe sets ~250K SNPs CN=1 CN=0 CN>2 CN=2 probe set (24 probes) Deletion Amplification more DNA copy number more DNA hybridization higher intensity

~400 cancer patients Normal tissue & tumor tissue (~400 pairs, ~800 DNA samples) Affymetrix 250K Sty-I Human Mapping SNP Array DNA hybridization signals (intensities on chip images) Genotype calling SNP genotypes LOH analysis DNA copy number analysis (genotypic changes) (DNA copy number changes) How to Analyze Copy Number? ? A Real Example

General Procedures for Copy Number Analysis Finished chips (scanner) Raw image data [.DAT files] (experiment info [.EXP]) (image processing software) Probe level raw intensity data [.CEL files] Background adjustment, Normalization, Summarization Summarized intensity data Raw copy number (CN) data [log ratio of tumor/normal intensities] Significance test of CN changes Estimation of CN Smoothing and boundary determination Concurrent regions among population Amplification and deletion frequencies among populations Association analysis Preprocessing : chip description file [.CDF]

Background Adjustment/Correction Reduces unevenness of a single chip Makes intensities of different positions on a chip comparable Before adjustment After adjustment Corrected Intensity (S’) = Observed Intensity (S) – Background Intensity (B) For each region i, B(i) = Mean of the lowest 2% intensities in region i AffyMetrix MAS 5.0

Eliminates non-specific hybridization signal Obtains accurate intensity values for specific hybridization Background Adjustment/Correction PM only, PM-MM, Ideal MM, etc. quartet probe set sense or antisense strands 25 oligonucleotide probes

Normalization Reduces technical variation between chips Makes intensities from different chips comparable Before normalization After normalization Base Line Array (linear); Quantile Normalization;Contrast Normalization; etc. S – Mean of S S’ = STD of S S’ ~ N(0,1 )

Combines the multiple probe intensities for each probe set to produce a summarized value for subsequent analyses. Summarization Average methods: PM only or PM-MM, allele specific or non-specific Model based method : Li & Wong, 2001 Gene Expression Index

Raw Copy Number Data S : Summarized raw intensity S’ : Log transformation, S’ = log 2 (S) Raw CN: Log ratio of tumor / normal intensities CN = S’ tumor - S’ normal = log 2 (S tumor /S normal ) Pair design S normal = S of the paired normal sample Group design S normal = average S of the group of normal samples before Log transformation S after Log transformation Log(S) Raw CN

Individual Level Analysis Individual Level Analysis Analysis for each individual sample (or each sample pair)  Significance test of CN amplification and deletion  Boundary finding (smoothing and segmentation)  CN estimation

Intensities and Raw CNs, Chr. 1 (Piar#101) Black: Normal, Red: Tumor, Green: Tumor- Normal

Significance Test for Copy Number Changes: -log(p) values, chr. 1, pair#101 Window-based t test Window size = 0.5 Mbp (~30 SNPs); N = SNP number in window Mean CN of window t = X N ~ t (df=N -1) SD of widow -log(p) Window Position (Mbp)

Genome-wide Raw CN Changes (Piar#105)

Genome-wide Widow-based Test of CN Changes (Piar#105) - Log (p)

Segmentation BioConductor R Packages ( GLAD package, adaptive weights smoothing (AWS) method DNAcopy package, circular binary segmentation method

CN Estimation: Hidden Markov Model (HMM) CNAT( dChip ( ; CNAG ( CN=? log ratio … SNP_i SNP_i+1 SNP_i+2 SNP_i+3 SNP_i+4 … position hidden status (unknown CN ) observed status (raw CN = log ratio of intensities) CN estimation: finding a sequence of CN values which maximizes the likelihood of observed raw CN. Algorithm: Viterbi algorithm (can be Iterative) Information/assumptions below are needed Background probabilities: Overall probabilities of possible CN values. P(CN=x); x=-2,-1,0,1,2,3,…, n (usually,n<10) Transition probabilities: Probabilities of CN values of each SNP conditional on the previous one. P(CN_i+1=x|CN_i=y); x=-2,-1,0,1,2,3,…, or n; y=-2,-1,0,1,2,3, …, or n Emission probabilities: Probabilities of observed raw CN values of each SNP conditional on the hidden/unknown/true CN status. P(log ratio<x|CN=y)=f(x|CN=y); x=one of real numbers; y=-2,-1,0,1,2,3, …, or n

HMM Estimation of CN for Chr. 1 (Piar#101) Black: Normal Intensities, Red: Tumor Intensities, Green: Tumor- Normal Blue: HMM estimated CNs in Tumor Tissue CN=2CN=1 CN=4 CN=3

Population Level Analysis Population Level Analysis Analysis for the whole group (or sub-group) of samples  Overall significance test  Amplification and deletion frequencies summarization  Common/concurrent region finding  Associations (with mutations, LOHs, clinical variables …)

Genome-wide Raw CN Changes (average over ~400 pairs )

Raw CN Changes of Chr. 14 (average over ~400 pairs )

Sliding Window Analysis ….. … … …… …….. … … …… ….. …… ….. Window 1 Window 2 Window 3 Window 4 Window 5 Window 6 Window 7 Window 8 Window 9 Window 10 Window N Window k ……….. Each window (k) contains 30 consecutive SNPs (k, k+1, k+2, k+3, …, k+29)

Genome-wide Raw Copy Number Changes (sliding window plot, averaged over ~400 pairs )

Sliding Window Test of Significance of CN Changes -log(p) values, based on ~ 400 pairs

CN Change Frequencies in Population ( Chr.14,~400 pairs) Black: Freq.(CN>0) Red: Freq.(CN>0, significant amplification at 0.01 level) Green: Freq.(CN<0, significant deletion at 0.01 level)

Population Level Segmentation Analysis (~400 pairs) Circular Binary Segmentation approach, Bioconductor Package DNAcopy

Segmentation of Chr. 14 (average result of ~400 pairs)

Visualization of Concurrent Regions of Chr. 14 (~400 pairs) positions samples

Group-specific Analysis Black: non-smokers, Red: non-smokers

Separate Tumor Samples from Normal Samples Using Six Chromosomal Peaks with Significant CN Changes (Classification Based on RAW CN) Tumor Normal

Mapping Known Cancer-related Genes onto the Copy Number Map

Software Affymetrix Chips ( Illumina Chips ( CNAT( dChip ( ; CNAG ( GenePattern BioConductor R Packages ( GLAD package, adaptive weights smoothing (AWS) method DNAcopy package, circular binary segmentation method Widows ? Unix ? Parallel Computation ?

References R Gentlemen et al. Bioinformatics and computational biology solutions using R and Bioconductor. Springer, 2005 JL Freeman et al. Genome Research 2006; 16: J Huang et al. Hum Genomics. 2004;1(4): X Zhao et al. Cancer Research 2004; 64: Y Nannya et al. Cancer Research 2005, 65: … see google …

Acknowledgements Aldi Kraja Li Ding Ingrid Borecki John Osborne Michael Province Ken Chen Division of Statistical Genomics Medical Sequencing Group Center for Genome Sciences Washington University School of Medicine