Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computing and Mathematical Sciences Liverpool John Moores University Robust methodologies for partition clustering Paulo Lisboa Terence Etchells, Ian Jarman.

Similar presentations


Presentation on theme: "Computing and Mathematical Sciences Liverpool John Moores University Robust methodologies for partition clustering Paulo Lisboa Terence Etchells, Ian Jarman."— Presentation transcript:

1 Computing and Mathematical Sciences Liverpool John Moores University Robust methodologies for partition clustering Paulo Lisboa Terence Etchells, Ian Jarman and Simon Chambers

2 Overview Partition clustering - critique Decomposition of the covariance matrix Landscape mapping of cluster solutions Validation for two synthetic data sets and metabolic sub-typing

3 Bioinformatics Nottingham Tenovous Primary Breast Carcinoma Series Consecutive series of 1,944 cases of primary operable invasive breast cancer (n=1,076 with all markers present) Patients presenting during 1986-98 Protein expression comprising 25 immunohistochemical markers related to tumour malignancy derived through high-throughput protein expression using TMA Abd El-Rehim et al, Int J Cancer, 116, 340-350, 2005.

4 Partition clustering – relevance to bioinformatics C-erbB-2 p53 PgR ER CK 5/6 BRCA1

5 Identify a suitable algorithm: Model-based or model-free ? Hierarchical, K-means, PAM ? Return { S a,...,S z } solutions Validate & interpret each solution K-means i. Assume #K ii. Initialise #N ? iii. Sort by optimality ? iv. Select best for #K ? v. Select #K(s) ? vi. Single cluster or ensemble ? Partition clustering –open issues

6 Scatter matrices Separation index: Decomposition of the scatter matrix SBSB SW1SW1 SW2SW2

7 Invariant separation matrix and index SBSB SW1SW1 SW2SW2 Separation index: Decomposition of the scatter matrix

8 a1a1 a2a2 a3a3 N.B. If |S T |=0 → Project onto subspace of cohort means

9 a1a1 a2a2 a3a3 ~ ~ ~ Theorem: is invariant to dimensionality reduction under Mahalanobis rotations

10 K-means clustering

11 Adaptive Resonance Theory (ART) clustering

12

13 Concordance measure Cluster Membership 1…M 1…O 11 …O 1M NO N1 …O NM

14 Optimality principle Reproducibility with Best Separation - max(J) Best Concordance – max(C V ) under repeated initialisations i. N initialisations ii. Sort by J iii. Select top p% iv. Calculate pairwise C V v. Retain med(C V ) vi.Plot (J, med_C V )

15 Synthetic data (6 clusters) Fig 1(a) Fig 1(b)

16 Synthetic data (6 clusters)

17

18 Synthetic data (10 cohorts)

19

20 MeanCovariance Matrix (i,j) xyz111213212223313233N C1-0.799-1.011-3.3360.3360.0440.0740.0440.3710.2100.0740.2100.58264 C2-0.441-0.569-2.3310.4280.060-0.0020.0600.1230.157-0.0020.1570.64842 C30.649-0.344-4.1540.6200.023-0.0350.0230.1370.070-0.0350.0700.44661 C41.0770.072-2.8150.366-0.0020.076-0.0020.0430.1040.0760.1040.56332 C5-0.390-0.2420.2560.5360.0130.0310.0130.348-0.1170.031-0.1170.689197 C6-1.358-0.6581.6390.309-0.060-0.055-0.0600.245-0.013-0.055-0.0130.532131 C71.2610.1250.8620.3230.0170.0270.0170.386-0.0600.027-0.0600.403163 C8-0.5933.024-0.4980.7760.0330.1750.0330.4910.0030.1750.0030.69597 C90.251-0.539-0.5300.711-0.0250.055-0.0250.352-0.0810.055-0.0810.576106 C100.374-0.2671.9730.390-0.0970.041-0.0970.343-0.0140.041-0.0140.322183 C1C2C3C4C5C6C7C8C9 C20.7805. C31.21051.4828. C41.50541.19241.0687. C52.49751.76363.06492.3119. C63.39132.82944.4763.80291.1757. C73.25162.55753.70022.73021.21512.2233. C82.97762.43413.09012.47742.0252.60822.2314. C92.03881.29692.45431.68460.71091.81761.23932.2086. C103.70873.04874.47273.59771.27171.41411.2332.54971.6952 Solution with 8 Clusters Total 24713586 Original cohorts 1582.4....64 2281.13....42 31150......61 4126.5....32 5..109431316151197 92.2364.143.106 6..25.103.3.131 7..44.13421.163 10...169148.183 8..1....9697 Total10079172133132173190971076

21 Synthetic data – mixing structure (Sammon Map)

22 Synthetic data – Visualisation in data space

23 117 388 92 383 96 192 190 97 208 212 177 23 93 28 177 183 84 164 190 29 96 1 2 3 4 5 1 2 3 4 5 6 219 177 97 160 192 113 1 2 3 4 5 6 118 7 47 144 19 170 97 150 54 21 173 59 118 133 100 132 79 173 97 1 2 3 4 5 6 172 7 190 8 78 137 169 97 132 28 185 52 45 55 63 69 161 124 129 44 176 1 2 3 4 5 6 95 7 181 8 97 9 95 89 85 129 55 18 24 161 143 24 177 127 153 176 96 48 127 1 2 3 4 5 6 60 7 42 8 181 9 66 10 59 142 112 126 42 171 95 177 38 58 98 978 1 2 238 100 738 3 1 2 98 238 738 189 97 335 3 1 2 455 4 96 97 294 101 88 238 455 49 189 94 361 Synthetic data (10 cohorts)

24 Max J SeCo Max Cv

25 Bioinformatics Nottingham Tenovous Primary Breast Carcinoma Series Consecutive series of 1,944 cases of primary operable invasive breast cancer (n=1,076 with all markers present) Patients presenting during 1986-98 Protein expression comprising 25 immunohistochemical markers related to tumour malignancy derived through high-throughput protein expression using TMA Abd El-Rehim et al, Int J Cancer, 116, 340-350, 2005.

26 Marginal distributions

27 Landscape map (SeCo)

28 Stability index (Cv)

29 A Total 123875641 B1 11840161012 142 5 211250330000 179 7 37012240202 167 6 00291450000 174 8 026098000 106 2 006009415 106 3 1000106442 108 4 000010613294 Total 17713116318310697126931076

30 Landscape map (SeCo)

31 Cluster hierarchy (1)

32 Cluster hierarchy (2)

33 Solution A

34

35 Solution B

36 Solution A

37 Sub-type profiling Clusters A Clusters B Luminal N Luminal New 2

38 Sub-type profiling Clusters A Clusters B HER2 Luminal A

39 Sub-type profiling Basal p53 - Basal muc1 - Basal muc1 + Basal p53 + Clusters A Clusters B

40 Consistency with consensus clustering CoRe 5 Clusters Solution 23145 Clusters in Green et al 2007 C112940366 C21138077 C3141137162 C40065170 C50056130 C61813730 NC587254119110

41 Molecular sub-typing

42

43 Summary Partition clustering - critique Decomposition of the covariance matrix Landscape mapping of cluster solutions Validation for two synthetic data sets and metabolic sub-typing

44 Ferrara data (n=633) erprPROLINDneuP53

45 Ferrara data (n=633)

46 SeCo methodTotal 12345 Ambrogi et al [7] 1 213130426256 2 0203013207 3 016802291 4 02077079 Total 213219688251633 Ferrara data (n=633)

47 JMU Cluster 3/5 JMU Cluster 4/5 JMU Cluster 5/5 JMU Cluster 1/5 JMU Cluster 2/5

48 Ferrara data (n=633)

49


Download ppt "Computing and Mathematical Sciences Liverpool John Moores University Robust methodologies for partition clustering Paulo Lisboa Terence Etchells, Ian Jarman."

Similar presentations


Ads by Google