Download presentation
Presentation is loading. Please wait.
Published byBrook Drusilla Ford Modified over 9 years ago
1
Charalampos (Babis) E. Tsourakakis Joint work with Gary Miller, Richard Peng, Russell Schwartz, Stanley Shackney, Dave Tolliver, Maria A. Tsiarli Algorithms for Denoising aCGH Data1 Speaking Skills Machine Learning Journal Club 23 Feb. 2010
2
Motivation & Problem Definition Related Work Our Problem Formulation and a O(n 2 ) solution Experimental Results Theoretical Ramifications: a O(n 1.5 ) algorithm within additive ε error) Conclusion & Future Work Algorithms for Denoising aCGH Data2 ~
3
3 Test DNA: Patient Reference DNA: Healthy subject For each probe we obtain a noisy measurement of log(T/R) where T: true DNA copy number R=2 for humans (diploid organisms)
4
Ideal Scenario In practice, for a variety of reasons (e.g., sample impurity, measurement noise) we obtain a noisy measurement log(T/R) per probe. Algorithms for Denoising aCGH Data4 Copy Numberlog(T/R) 0-Inf 1 20 (Healthy probe) 30.58 41 T: true DNA copy number R=2 for humans (diploid organisms)
5
Input A vector (t 1,t 2,…,t n ), where t i is the measurement at the i-th proble Output A vector (t 1,t 2,…,t n ) with discrete values corresponding to the true DNA copy number Algorithms for Denoising aCGH Data5 ~~~~
6
6 Probe id log(T/R) Blue x : noisy measurements (input) Red □ : true value (output)
7
Algorithms for Denoising aCGH Data7 Probe id log(T/R) Nearby probes tend to have the same DNA copy number! 2) Fit Piecewise Constant Segments 1)Treat Data as 1D time series Well studied problem with many Applications
8
Motivation & Problem Definition Related Work Our Problem Formulation and a O(n 2 ) solution Experimental Results Theoretical Ramifications: a O(n 1.5 ) algorithm within additive ε error) Conclusion & Future Work Algorithms for Denoising aCGH Data8 ~
9
Lasso (Tibshirani et al., Huang et al.) Kalman Filters (Xing et al.) Hidden Markov Models (Fridlyand et al.) Bayesian Hidden Markov Models (Guha et al.) Wavelets (Hsu et al.) Hierarchical clustering (Tibshirani et al.) Circular Binary Segmentation (Olshen et al.) Statistical likelihood tests Loweless, i.e., Locally weighted regression Genetic Local search (Jong et al.) Algorithms for Denoising aCGH Data9 Not Exhaustive
10
Gaussian Mixtures fitting using EM (Hodgson et al.) Variable-bandwidth kernel methods (Muller et al.) Variable-knot splines (Stone et al.) Fused quantile regression (Wang et al.) Non parametric regression Thresholding Algorithms for Denoising aCGH Data10 Not Exhaustive
11
Algorithms for Denoising aCGH Data11 CBS: Matlab Toolbox, modification of Binary Segmentation. CGHSEG: Gaussianity, AIC&BIC, DP Both methods perform consistently better than others on real data (Lai et al., Willenbrock et al.)
12
Motivation & Problem Definition Related Work Our Problem Formulation and a O(n 2 ) solution Experimental Results Theoretical Ramifications: a O(n 1.5 ) algorithm within additive ε error) Conclusion & Future Work Algorithms for Denoising aCGH Data12 ~
13
Algorithms for Denoising aCGH Data13 Breakpoint Squared error Penalty per segment For the vector (p 1,…,p n ) we define the following recurrence equation: Tradeoff(λ)
14
Algorithms for Denoising aCGH Data14 Keep the first and second order moments in an “online” way. Run time O(n 2 )
15
Algorithms for Denoising aCGH Data15 λ=0.2 Train on synthetic data generated by a realistic simulator Willenbrock et al λ=0.2 results in Precision=0.98 Recall=0.91
16
Motivation & Problem Definition Related Work Our Problem Formulation and a O(n 2 ) solution Experimental Results Theoretical Ramifications: a O(n 1.5 ) algorithm within additive ε error) Conclusion & Future Work Algorithms for Denoising aCGH Data16 ~
17
Algorithms for Denoising aCGH Data17 A)CSB B)CGHTrimmer C)CGHSEG Lai et al. (2005) Dataset Available from http://compbio.med.harvard.edu
18
Algorithms for Denoising aCGH Data18 Snijders et al., 15 Cell lines Golden Standard Dataset with two main characteristics a)Knowledge of ground truth after tedious biological tests b)“Easy” dataset (low noise levels) CGHTrimmer performs worse that at least one competitor CGHTrimmer performs equally well with both competitors CGHTrimmer performs better than one competitor CGHTrimmer performs better than both competitors 0 9 5 1 #Cell Lines Color Description
19
Algorithms for Denoising aCGH Data19 A)CBS B)CGHTrimmer C)CGHSEG Breast Cancer Cell Line BT474 Chromosome 1
20
Algorithms for Denoising aCGH Data20 A)CBS B)CGHTrimmer C)CGHSEG Breast Cancer Cell Line BT474 Chromosome 17 Results supported by oncology literature
21
Algorithms for Denoising aCGH Data21 A)CBS B)CGHTrimmer C)CGHSEG Breast Cancer Cell Line T47D Chromosome 1
22
CGHTrimmerCGHSEGCBS Coriell5.78sec8.15min47.7min Breast Cancer22.7623.3min4.95hours Algorithms for Denoising aCGH Data22 A) 1 to 3 orders of magnitude faster. REMARKS B) Reason for speedup: different approach compared to competitors (lack of statistical assumptions, tests, likelihood functions but an intuitive formulation and a simple dynamic programming algorithm)
23
Motivation & Problem Definition Related Work Our Problem Formulation and a O(n 2 ) solution Experimental Results Theoretical Ramifications: a O(n 1.5 ) algorithm within additive ε error) Conclusion & Future Work Algorithms for Denoising aCGH Data23 ~
24
Is O(n 2 ) tight? Probably not… Lemma If |p i -p j | > 2 then in the optimal solution points i,j belong to different segments. Question: Can we use one of the existing “tricks” to speed up our dynamic program? Algorithms for Denoising aCGH Data24
25
Unfortunately, existing “tricks” for dynamic programming do not work for us (e.g., Monge property) But we can find an good approximation algorithm! Algorithms for Denoising aCGH Data25
26
Algorithms for Denoising aCGH Data26 Define a shifted by a constant objective function, i.e., DP i =OPT i - Claim: DP i satisfies the following optimization formula where S i =p 1 +…+p i
27
Algorithms for Denoising aCGH Data27 Find the maximum and the minimum value that the shifted objected DP i can take. Claim: DP i takes values in [0,U 2 n] where
28
Algorithms for Denoising aCGH Data28 Perform binary search on by guessing for each index i a DP i ~ 0 U2nU2n () DP i ~ -
29
Algorithms for Denoising aCGH Data29 constant Dot product of two 4D points (x,i,S i,-1) and (j,DP j,2S j,S j 2 +jDPj) ~~ Reporting points in Halfspaces, Matousek FOCS 1992
30
Algorithms for Denoising aCGH Data30 0U2nU2n () DP i ~ - Remember: We want DP i not DP i ε/n by performing log( U 2 n/(ε/n) ) iterations Set i=n, our algorithm is an ε-additive approximation algorithm. Run Time:
31
Motivation & Problem Definition Related Work Our Problem Formulation and a O(n 2 ) solution Experimental Results Theoretical Ramifications: a O(n 1.5 ) algorithm within additive ε error) Conclusion & Future Work Algorithms for Denoising aCGH Data31 ~
32
CHGtrimmer method: Simple, intuitive dynamic programming algorithm outperforms state-of-the-art competitors: Important biological biomarkers for aCGH data. Good precision-recall results. Significantly faster. New paradigm for Dynamic Programming by reducing the problem to a computational geometry halfspace query problem. Algorithms for Denoising aCGH Data32
33
Structure: O(n 2 ) unlikely to be tight. Extend our approximation to 2D recurrences (benefit many applications) Preprocess breast cancer data using Trimmer to get a discretized version, essential to perform standard tumor phylogenetics Make the DS practical. Algorithms for Denoising aCGH Data33
34
CGHTrimmer: Discretizing Noisy Array CGH Data C.E.T, D. Tolliver, M. A. Tsiarli, S. Shackney, R. Schwartz Algorithms for Denoising aCGH Data G.L. Miller, R. Peng. R. Schwartz, C.E.T Code will be made available at http://www.cs.cmu.edu/~ctsourak/ http://www.cs.cmu.edu/~ctsourak/ Algorithms for Denoising aCGH Data34
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.