ICSA, 6/2007 Pei Wang, 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research.

Slides:



Advertisements
Similar presentations
Microarray Technique, Analysis, and Applications in Dermatology Jennifer Villaseñor-Park 1 and Alex G Ortega-Loayza 2 1 Department of Dermatology, University.
Advertisements

1 Modelling of CGH arrays experiments Philippe Broët Faculté de Médecine, Université de Paris-XI Sylvia Richardson Imperial College London CGH = Competitive.
Overview research topics and techniques
Microarray Technology and Applications
DNA Microarray Jamie Mashek.
Comparative genomic hybridization (CGH) is a technique for studying chromosomal changes in cancer. As cancerous cells multiply, they can undergo dramatic.
High-dimensional data analysis: Microarrays and multiple testing Mark van de Wiel 1,2 1. Dep. of Mathematics, VU University Amsterdam 2. Dep. of Biostatistics.
Biochemistry April Lecture DNA Microarrays.
Technology & Methods Seminar:
Tumour karyotype Spectral karyotyping showing chromosomal aberrations in cancer cell lines.
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
1 Gene Order Evolution: The consequences of genome rearrangements.
Some slides adapted from J. Fridlyand BioSys course: DNA Microarray Analysis – Lecture, 2007 Analysis of Array CGH Data by Hanni Willenbrock.
1 Example of HMMs: copy number data. 2 DNA copy number is the number of copies of a genomic segment present in the cell. Copy numbers are measured in.
STAC: A multi-experiment method for analyzing array-based genomic copy number data Sharon J. Diskin, Thomas Eck, Joel P. Greshock, Yael P. Mosse, Tara.
Interval Scores for Quality Annotated CGH Data Doron Lipson 1, Anya Tsalenko 2, Zohar Yakhini 1,2 and Amir Ben-Dor 2 1 Technion, Haifa, Israel 2 Agilent.
The Human Genome Project and ~ 100 other genome projects:
Comparative Genomic Hybridization (CGH). Outline Introduction to gene copy numbers and CGH technology DNA copy number alterations in breast cancer (Pollack.
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
1 April, 2005 Chapter C4.1 and C5.1 DNA Microarrays and Cancer.
DNA Microarrays Examining Gene Expression. Prof. GrossBiology 4 DNA MicroArrays DNA MicroArrays use hybridization technology to examine gene expression.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Multi-dimensional Genomic Profiling of Acute Leukemias Characterized by MLL gene rearrangements Eunice S. Wang MD (Medicine) and Norma J. Nowak PhD (Cancer.
ChrX probes Autosomal probes ChrX probes Autosomal probes Autosomal probes ChrX probes Effect of hybridization temperature on microarray performance Figure.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
SIGMA: A Platform to Visualize and Analyze DNA Copy Number Microarray Data Raj Chari, PhD Student BC Cancer Research Centre Department of Cancer Genetics.
Genetic and Molecular Epidemiology Lecture III: Molecular and Genetic Measures Jan 19, 2009 Joe Wiemels HD 274 (Mission Bay)
Manifestation of Novel Social Challenges of the European Union in the Teaching Material of Medical Biotechnology Master’s Programmes at the University.
DNA MICROARRAYS WHAT ARE THEY? BEFORE WE ANSWER THAT FIRST TAKE 1 MIN TO WRITE DOWN WHAT YOU KNOW ABOUT GENE EXPRESSION THEN SHARE YOUR THOUGHTS IN GROUPS.
Public Meeting on the Guidance Document for IVDMIAs Arthur L. Beaudet, M.D. James R. Lupski, M.D. February 8,
What technique would we use to determine whether a certain gene was transcribed in cancer cells? A Northern Blot.
Correlate February 19, 2010 Sam Gross, Balasubramanian Narasimhan, Robert Tibshirani, and Daniela Witten A method for the integrative analysis of two genomic.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Finish up array applications Move on to proteomics Protein microarrays.
Microarrays and Their Uses Brad Windle, Ph.D
Dose Response 100% 50% 0% IC50, EC50 Dose response curve.
1 Commentary 1.Do not get too worried about "methods" and details. I fully expect there to be concepts and techniques that you simply are not going to.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
MICROARRAY TECHNOLOGY
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
____ __ __ _______Birol et al :: AGBT :: 7 February 2008 A NOVEL APPROACH TO IMPROVE THE NOISE IN DETECTING COPY NUMBER VARIATIONS USING OLIGONUCLEOTIDE.
Microarrays and Gene Expression Arrays
Correlation Matrix Diagonal Segmentation (CMDS) A Fast Genome-wide Approach for Identifying Recurrent DNA Copy Number Alterations across Cancer Patients.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Computational Laboratory: aCGH Data Analysis Feb. 4, 2011 Per Chia-Chin Wu.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
Biochemistry April Lecture DNA Microarrays.
Copy Number Analysis in the Cancer Genome Using SNP Arrays Qunyuan Zhang, Aldi Kraja Division of Statistical Genomics Department of Genetics & Center for.
An Efficient Algorithm for a Class of Fused Lasso Problems Jun Liu, Lei Yuan, and Jieping Ye Computer Science and Engineering The Biodesign Institute Arizona.
Other uses of DNA microarrays
Microarray: An Introduction
Detecting DNA with DNA probes arrays. DNA sequences can be detected by DNA probes and arrays (= collection of microscopic DNA spots attached to a solid.
Homozygous deletions within chromosome 9q23.
Monica Britton, Ph.D. Sr. Bioinformatics Analyst June 2016 Workshop
Gene Expression Analysis
Figure 1. Validation of the chromosome 22 array
Microarray Technology and Applications
Interval Scores for Quality Annotated CGH Data
Lecture 11 By Shumaila Azam
Mariëlle I. Gallegos Ruiz, MSc, Hester van Cruijsen, MD, Egbert F
CSCI2950-C Lecture 3 September 13, 2007.
Microarray Techniques to Analyze Copy-Number Alterations in Genomic DNA: Array Comparative Genomic Hybridization and Single-Nucleotide Polymorphism Array 
Histology and genomic copy number alterations in TRAMP tumors.
A, genomic copy number profiles of chromosome 8 with array CGH in SUM-44, -52, and A, genomic copy number profiles of chromosome 8 with array CGH.
Presentation transcript:

ICSA, 6/2007 Pei Wang, 1 Spatial Smoothing and Hot Spot Detection for CGH data using the Fused Lasso Pei Wang Cancer Prevention Research Program, PHS, FHCRC Joint work with Robert Tibshirani, Stanford University, CA

ICSA, 6/2007 Pei Wang, 2 Outline 1.DNA copy number alterations and Array CGH experiments. 2.Detect copy number alterations using Fused Lasso regression. 3.Simulation and real data examples. 4.Jointly model copy number alterations and disease out comes using Fused Lasso regression.

ICSA, 6/2007 Pei Wang, 3 DNA Copy Number In normal human cells: DNA copy number = 2 Genome instability => Copy number alterations. Alberson and Pinkel, Hum. Mol. Gen., 2003

ICSA, 6/2007 Pei Wang, 4 DNA Copy Number In cancer researches, knowledge of copy number aberrations helps to Identify important cancer genes. Reveal different tumor subtypes with different mechanism of initiation and/or progression. Predict tumor prognosis, and improve clinical diagnosis

ICSA, 6/2007 Pei Wang, 5 Array CGH array Comparative Genomic Hybridization. Scan machine reports the for each spot on the chips, which correspond to:

ICSA, 6/2007 Pei Wang, 6 Array CGH Array CGH has been implemented using a wide variety of techniques. BAC array : produced from bacterial artificial chromosomes; cDNA microarray: made from cDNAs; oligo array: made from oligonucleotides (Affy, Agilent, Illumina). Output from array CGH experiment:

ICSA, 6/2007 Pei Wang, 7 Goal Identify genome regions with DNA copy number alterations An example segment of CGH data from a GMB primary tumor (Bredel et al.2005).

ICSA, 6/2007 Pei Wang, 8 Goal Identify genome regions with DNA copy number alterations Raw CGH data. Estimated copy number from fused lasso regression shows copy number alteration regions.

ICSA, 6/2007 Pei Wang, 9 Method Denote the log 2 ratio measurement of a chromosome (or chromosome arm) as. Assume: = log 2 ( true copy number / 2) + e i = + e i, We are interested in recovering. Property of : (1) =0 for genome regions without alterations; >0 or <0 for regions of gain/loss. (2) Profile { } has strong spatial correlation along index i.

ICSA, 6/2007 Pei Wang, 10 Method We are interested in finding coefficients satisfying (1) Lasso constraint --- detect alteration regions; (2) Fused constraint --- account for the spatial correlation.

ICSA, 6/2007 Pei Wang, 11 lasso & fused lasso lasso Regression (Tibshirani 1996) fused lasso Regression (Tibshirani et al. 2004)

ICSA, 6/2007 Pei Wang, 12 Method Apply fused lasso on aCGH data: (1) Solve the optimization. (2) Choose the tuning parameters. (3) Control the False Discovery Rate (FDR).

ICSA, 6/2007 Pei Wang, 13 Method Apply fused lasso on aCGH data: (1) Solve the optimization. (2) Choose the tuning parameters. (3) Control the False Discovery Rate (FDR).

ICSA, 6/2007 Pei Wang, Solve the optimization 2. Choose the tuning parameter For the general fused lasso regression: -Use SQOPT by Gill et al. to solve the quadratic programming problem with sparse linear constraints (Tibshirani et al., 2004)

ICSA, 6/2007 Pei Wang, 15 For the special application on CGH array: - Pathwise coordinate optimization (Jerome Friedman et. al. Tech Report ) A modification of original Coordinate-wise descent algorithm (Shooting procedure) (Fu 1998, Daubechies et al. 2004). The running time is only 1/100 of the quadratic programming 1. Solve the optimization 2. Choose the tuning parameter

ICSA, 6/2007 Pei Wang, 16 Estimates s 1 and s 2 from pre-smoothed version of the data: s 1 controls the overall copy number alteration amount of the target chromosome --- using heavily smoothed Y. s 2 controls the frequency of the copy number alterations on the target chromosome --- using moderately smoothed Y. 1. Solve the optimization 2. Choose the tuning parameter

ICSA, 6/2007 Pei Wang, 17 Other Method Lai et. al provides a thorough review of statistical methods for aCGH analysis. - Simple smoothing with Lowess - Hidden Markov Model (Fridlyand et al. 2004) - Top Down: Circular Binary Segmentation (Olshen et al. 2004, Venkatraman et al ) - Bottom-up: Cluster along chromosomes (Wang et al. 2005) - Dynamic Programming: CGHseg (Picard et al. 2005) - Denoising using wavelet (Hsu et al. 2005) - And many others.

ICSA, 6/2007 Pei Wang, 18 Other Method Lai et. al provides a thorough review of statistical methods for aCGH analysis. - Simple smoothing with Lowess - Hidden Markov Model (Fridlyand et al. 2004) - Top Down: Circular Binary Segmentation (Olshen et al. 2004, Venkatraman et al. 2007) - Bottom-up: Cluster along chromosome (Wang et al. 2005) - Dynamic Programming: CGHseg (Picard et al. 2005) - Denoising using wavelet (Hsu et al. 2005) - And many others.

ICSA, 6/2007 Pei Wang, 19 General smoothing methods are not typically useful for analyzing CGH data, because their results can be difficult to interpret. Fused lasso regression can also be viewed as a smoothing approach; but, it is able to capture the structure of the CGH data very well.

ICSA, 6/2007 Pei Wang, 20 Comparison of Fused lasso with three segmentation methods: CGHseg (Picard et. al. 2005) CLAC (Wang et.al. 2005) CBS (Olshen et.al. 2004)

ICSA, 6/2007 Pei Wang, 21 Simulation Example Further comparison of fused lasso results with the three segmentation methods on simulation data sets from Lai et al Total length of chromosome segment: 100 Four Different aberration width: 5, 10, 20, 40. Signal to Noise ratio is equal to 1. Normal region: x~ N(0, 0.25); Alteration region: x~N(0.25, 0.25). For each width, simulate 100 independently chromosomes. Evaluation process: 1. Estimate copy number using different methods. 2. Apply different thresholds on the estimated copy numbers, and calculate TPR = # of correct calls / # of total aberration. FPR = # of false calls / # of total normal probes.

ICSA, 6/2007 Pei Wang, 22 The TPR-FPR curves for the fours methods under different window sizes.

ICSA, 6/2007 Pei Wang, 23 Real Data Example Breast Cancer Cell line MDA157 (Pollack 2002)

ICSA, 6/2007 Pei Wang, 24 Computation Time Mean (sd)P=100P=500P=1000P=2000 CBS (DNAcopy1.10.0) (0.113) (0.804) (1.135) (2.854) CGHseg0.063 (0.008) (0.016) (0.041) (0.104) CLAC0.049 (0.003) (0.013) (0.037) (0.073) cghFLasso0.025 (0.013) (0.017) (0.036) (0.056) Data Simulation: 1. Pre-specify chromosome length p=100, 500, 1000, Random sample 50 genome segments of length p from 17 Breast Cancer CGH arrays. 3. Apply each method on the 50 segments, and record the CPU time. Comparison of the speed of the four Methods: (seconds)

ICSA, 6/2007 Pei Wang, 25 Applying Fused Lasso on CGH: gives an appropriate way to model aCGH data. has favorable performance compared to other method. is computationally efficient.

ICSA, 6/2007 Pei Wang, 26 Applying Fused Lasso on CGH: provides an appropriate model for aCGH data. has favorable performance compared to other method. is computationally efficient. Provides a flexible frame work for aCGH analysis in more complicated settings.

ICSA, 6/2007 Pei Wang, 27 Joint Model Study copy number alterations and disease outcomes. Model: Interested in finding disease associated genes.

ICSA, 6/2007 Pei Wang, 28 Joint Model Study copy number alterations and disease outcomes. Model: Interested in finding disease associated genes. Naïve method (Two-Steps): 1. call gains and losses for each individual array; 2. use the estimated copy numbers to look for disease associated genes.

ICSA, 6/2007 Pei Wang, 29 Joint Model Naïve method (Two-Steps): 1. call gains and losses for each individual array; 2. use the estimated copy numbers to look for disease associated genes. Drawbacks: 1. Loss information after first round of data processing. 2. Smoothing adds to already existing among neighboring values, thus causing the within-class covariance to be even more jagged… increase the computational cost with zero benefit in classification performance (Hastie et al Ann. of Stat.)

ICSA, 6/2007 Pei Wang, 30 Joint Model Joint modeling:

ICSA, 6/2007 Pei Wang, 31 Simulate genome segment with p=50 genes for n=30 samples: - true copy numbers - noise CGH measurements Generate psuedo phenotype for each sample using two pre-selected non- adjacent genes. Look for disease associated genes with different methods. Varying the tuning parameter t and produce ROC curves for each method. Repeat for 200 times and plot the mean ROC curve. Compare different approaches on a simulation data set.

ICSA, 6/2007 Pei Wang, 32 Summary Fused Lasso Regression can be used to characterize the spatial structure of array CGH data. - Tibshirani & Wang, Biostatistics (In press) - google-> tibshirani -> click on cghFlasso under software The flexible framework of the regression model can be easily extended to solve other problems involving CGH data.

ICSA, 6/2007 Pei Wang, 33 Acknowledgment Stanford University, Department of Statisitcs Robert Tibshirani, Jerry Friedman, Trevor Hastie. Stanford University, Department of Pathology Jonathan Pollack.