Classifications of circulation patterns from the COST733 database: An assessment of synoptic- climatological applicability by two- sample Kolmogorov-Smirnov test Radan HUTH, Monika CAHYNOVÁ Institute of Atmospheric Physics, Prague, Czech Republic
COST733 database (collection) COST733 Action – “Harmonization and Applications of Weather Types Classifications for European Regions” (very) large number of classifications produced on unified data –SLP at 12 UTC –ERA40 (Sep 1957 – Aug 2002) –~9, ~18, ~27 types wherever possible –12 European domains
COST733 database (collection) version 2.0 of the database –released this spring 18 methods for each domain –threshold-based: GWT (Beck), Litynski, Lamb (Jenkinson- Collison), P27 (Kruizinga), WLK –leader algorithm: Lund, Kirchhofer, Erpicum –PCA-based: T-mode PCA –optimization algorithms: CKMEANS, PCACA (k-means), Petisco, PCAXTRKMS, SANDRA, SANDRA-S, NNW (SOMs), PCAXTR –pseudo-random: random centroids plus 7 subjective and objectivized classifications not attributable to any domain –ignored today
COST733 database (collection) different attributes of classifications –number of types (9 x 18 x 27) –sequencing (no vs. 4-day sequences) –seasonal vs. year-round definition –variable: all based on SLP, several additional variables used
GOAL assess the synoptic-climatological applicability of classifications i.e., how well they stratify surface weather (climate) conditions demonstrate effect of –sequencing –seasonal vs. annual definition –adding more variables 500 hPa height 500 hPa vorticity 850/500 hPa thickness –number of types
Classifications examined 11 methods –30 classifications available for each of them –differing in sequencing (no x 4 days) additional variables (Z500, THICK850/500, VOR500, all together) number of types (9, 18, 27) 5 methods –additional 6 classifications available –differing in seasonality of definition (year-round x seasonal)
TOOL 2-sample Kolmogorov-Smirnov test equality of distributions of the climate element under one type against under all the other types x
TOOL at each station types for which the K-S test rejects the equality of distributions are counted the larger the count, the better the stratification, the better the synoptic- climatological applicability
ANALYSIS preliminary results maximum temperature (minimum temperature – very similar results) (precipitation – different) domain 07 (central Europe) 39 stations from ECA&D database winter (DJF) Jan 1961 – Dec 2000
RANKING OF CLASS’S at all stations individually: –for each classification: number of rejected K-S counted –classifications ranked by the %age of rejected K-S tests (= well separated classes) –higher %age better lower rank for each classification: ranks averaged over stations area mean rank ranking of the classification
Result 1: comparison of methods area mean ranks averaged over 30 realizations of each method result: order of the method, independent of any attribute (no. of types, sequencing, variable)
Result 1: comparison of methods so the winner is…
Result 1: comparison of methods 1cluster analysis of PCs7 PCA-extreme scores reassigned by k-means 2 SANDRA (optimized k- means) 8 obliquely rotated T-mode PCA 3C-k-means9 PCA-extreme scores reassigned by Eucl. distance 4random centroids10Erpicum 5Lund correlation-based11 orthogonally rotated T- mode PCA 6Kruizinga NOTE: not all methods participated in the race!
Result 2: sensitivity to the number of types all pairs of classifications –differing in no. of types 9 vs vs. 27 –with all other attributes equal difference in rank is calculated histogram of differences t-test: equality of the difference to zero -106 ± 17
Result 2: sensitivity to the number of types all pairs of classifications –differing in no. of types 9 vs vs. 27 –with all other attributes equal difference in rank is calculated histogram of differences t-test: equality of the difference to zero -55 ± 12
Result 3: effect of sequencing all pairs of classifications –differing in sequencing (no vs. 4-days) –with all other attributes equal difference in rank is calculated histogram of differences t-test: equality of the difference to zero -30 ± 11
Result 4: effect of seasonality all pairs of classifications –differing in the seasonality in their definition –with all other attributes equal difference in rank is calculated histogram of differences t-test: equality of the difference to zero -44 ± 24
Result 5: effect of additional variables +42 ± ± 18
Result 5: effect of additional variables +61 ± ± 18
CONCLUSIONS various kinds of cluster analysis perform well fewer types better performance sequencing adds value: surface temperature is better described by types of 4-day sequences than types of instantaneous fields seasonal definition better than annual, but: –systematic difference in the number of types (7 vs. 9) additional variables bring no benefit; in fact they worsen the synoptic-climatological applicability
OUTLOOK analysis to extend to –all domains –more variables (Tmin, Precip) more comparisons will be possible results may be more general several other criteria as well other datasets (gridded: ENSEMBLES, reanalyses)