Presentation is loading. Please wait.

Presentation is loading. Please wait.

OPTIMCLASS: Simultaneous identification of optimal clustering method and optimal number of clusters in vegetation classification studies Tichy Lubomír.

Similar presentations


Presentation on theme: "OPTIMCLASS: Simultaneous identification of optimal clustering method and optimal number of clusters in vegetation classification studies Tichy Lubomír."— Presentation transcript:

1 OPTIMCLASS: Simultaneous identification of optimal clustering method and optimal number of clusters in vegetation classification studies Tichy Lubomír 1, Chytry Milan 1, Botta-Dukát Zoltán 2, Hájek Michal 1 ; Talbot Stephen S. 3 1 Masaryk University, Brno, Czech Republic 2 Hungarian Academy of Sciences, Vácrátot, Hungary 3 U.S. Fish and Wildlife Service, Anchorage, USA

2 Why do we need a method for identification of optimal clustering algorithm and optimal number of clusters? The same dataset

3 -A huge variety of clustering methods produce “reasonable” results. -Subjective selection of the clustering method and no. of clusters is usually based on empirical experience Why do we need a method for identification of optimal clustering algorithm and optimal number of clusters? Methods published: Most algorithms identify the optimal partition mathematically, without considering ecological interpretation

4 The Method A posteriori description of phytosociological tables is based on diagnostic species Diagnostic species describes a cluster. Therefore, the number of diagnostic species determines whether the classified table can be sufficiently interpreted. Species1 98788 12112 3.211 Species2 51123 1223. 11132 Species 3 23132.......... Species4..2.4 112.. 1..5. Species5......1.1. 1.213

5 The Method The same dataset:

6 The Method Measure of the classification quality: the total sum of diagnostic species Fisher’s Exact Test calculates the probability of observed occurrence of species across clusters for a right-tailed test hypothesis –The measure reduces the importance of very small clusters. –Easy interpretation: the more diagnostic species in the dataset, the better description of the clusters.

7 The Method Test on three different datasets Southern Siberia, Sayan Mountains (310 plots; forest, steppe and tundra vegetation) Central Europe, Carpathians (241 plots; mire vegetation) Alaska, Kenai Peninsula (171 plots; wetlands)

8 The Method Classifications tested Flexible beta clustering WARD‘s clustering UPGMA (PC-ORD) Cover transformations (percentages, log percentages, Braun-Blanquet, presence/absence) Distance measures (Bray-Curtis, Manhattan, Euclidean) Ordinal cluster analysis (SYN-TAX) Modified TWINSPAN classification (JUICE) The sequence of splits in divisive classification is determined by internal heterogeneity of clusters. Therefore, any number of clusters is possible (three modifications of pseudospecies cut levels) Distance measures (Kruskal-Wallis, Kendall, Gower-Podani coefficient)

9 Results Sayan Mountains, Siberia (310 plots, 1036 species) Probability = 10 -3 Probability = 10 -6 Probability = 10 -9 No. of clusters No. of diagnostic species No. of clusters No. of diag. spec.

10 Results Sayan Mountains, Siberia (310 plots, 1036 species) Untransformed cover data Number of diagnostic species Number of clusters

11 Results Sayan Mountains, Siberia (310 plots, 1036 species) Euclidean distance measure Number of diagnostic species Number of clusters

12 Results Sayan Mountains, Siberia (310 plots, 1036 species) Manhattan distance measure Number of diagnostic species Number of clusters

13 Results Sayan Mountains, Siberia (310 plots, 1036 species) Bray-Curtis distance measure Number of diagnostic species Number of clusters

14 Results Sayan Mountains, Siberia (310 plots, 1036 species) UPGMA Number of diagnostic species Number of clusters

15 Results Sayan Mountains, Siberia (310 plots, 1036 species) Ward‘s method Number of diagnostic species Number of clusters

16 Results Sayan Mountains, Siberia (310 plots, 1036 species) Flexible beta -0.25 Number of diagnostic species Number of clusters

17 Results Sayan Mountains, Siberia (310 plots, 1036 species) Ordinal cluster analyses (SYN-TAX) Number of diagnostic species Number of clusters

18 Results Sayan Mountains, Siberia (310 plots, 1036 species) Modified TWINSPAN Number of diagnostic species Number of clusters

19 The Method Test on three different datasets Southern Siberia, Sayan Mountains (310 plots; forest, steppe and tundra vegetation) Central Europe, Carpathians (241 plots; mire vegetation) Alaska, Kenai Peninsula (171 plots; wetlands) Similar results:

20 Conclusions Classifications based on transformed cover values give better results than percentage covers. Euclidean distance - slightly poorer results than Manhattan or Bray-Curtis distances. UPGMA clustering method - poorer results than Ward’s and Flexible beta methods. No significant difference between ordinal cluster analysis proposed by Podani (SYN-TAX 2000) and other clustering methods. Modified TWINSPAN – performs well with small numbers of clusters.

21

22

23 Number of clusters Number of diagnostic species occurrences Modified TWINSPAN classification

24 Number of clusters Sum of diagnostic species Modified TWINSPAN classification

25 Number of clusters Number of clusters with more than 4 diagnostic species Modified TWINSPAN classification


Download ppt "OPTIMCLASS: Simultaneous identification of optimal clustering method and optimal number of clusters in vegetation classification studies Tichy Lubomír."

Similar presentations


Ads by Google