HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 WG1 REPORT TO THE 2nd MC Enric Aguilar URV, Tarragona, Spain
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 SURVEY General Data Monthly to Annual data Daily data Benchmark Dataset –Real data –Simulated data
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 SURVEY General Data: –39 surveys received –3 of them, provided by recognized homogenization experts from outside HOME –Responses from 24 countries –Thanks!
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 SURVEY. Data elements/resolutions of interest Each survey answer has been given 100 points. Temperature, precipitation and pressure are the elements with the most interest and should be analyzed HOME-wide Other elements (i.e. sunshine duration, wind, indices, etc) are mentioned a number of times and HOME may consider to analyze them in smaller groups of interest Regarding resolution daily data, closely followed by monthly gathers most interest
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 SURVEY. Monthly data. Analysed elements Different approaches, not individuals, are considered here 11 different data elements or types have been mentioned Again, temperature, precipitation and pressure are dominant
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 SURVEY. Monthly data. Detection and correction principles Four approaches are dominant: –Iterative t-test over reference series SNHT –Hypothesis testing over reference series MASH –Regression based over different correction principles Vincent, RHTest –Penalized likelihood using ANOVA on non Homogeneous series Caussinus-Mestre
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 SURVEY. Monthly data. METHODS MENTIONED Many surveys did not specifically mention a method SNHT and MASH, and C-M are dominant SNHT users are the largest group
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 SURVEY. Monthly data. REFERENCE CALCULATION Reference series are widely used for homogenization Weighted averages is the dominant approach for calculation Relying on a-priori homogeneous series, regression based references and multiple series are other mentioned approaches WG1 encourages WG2, WG3 (and also WG4) to test different approaches to calculate references when the method allows
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 WG1 PROPOSAL FOR MONTHLY TO ANNUAL DATA METHODS (Based on climatology literature) Subjective methods Craddock Test + Metadata They are very difficult to test comprehensively and their performance varies depending on the expert. Applies to other methods mentioned in the survey Two Phase Regression Wang (RHTest) Hypothesis testing MASH. Bayesian approaches Caussinus-Mestre Likelihood ratio tests SNHT Hierarchical regression models Vincent, Reeves et al. Combination of the last 3 Menne and Williams (in press)
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 EXISTING REVIEWS (Based on climatology literature, 2003 onwards) Reeves et al (2007) Domonkos (in press) Menne and Williams (2005) De Gaetano (2005) Ducré-Robitaille et al (2003)
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 RECENT LITERATURE SURVEY HIGHLIGHTS THE EXISTENCE OF MODIFICATIONS TO OUR SELETED METHODS. An example Formulated by Alexandersson (1986) and Alexandersson et al (1997) Recently Khaliq and Ouarda have recalculated the critical values using improved MCS Re-formulated by Reeves et al, 2006 –Non parametric SNHT –Different standardization procedures
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 WG1 PROPOSAL TO WG4. Methods Interpolation of monthly factors –MASH –Vincent et al (2002) Nearest neighbour resampling models, by Brandsma and Können (2006) Higher Order Moments (HOM), by Della Marta and Wanner (2006) Two phase non-linear regression (Mestre)
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 SURVEY. Daily data. Detection Most approaches use detection on lower resolution data.
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 SURVEY. Daily data. Detection Temperature, precipitation and pressure are the most analyzed elements, again
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 SURVEY. Daily data. Correction Very few approaches actually calculate special corrections for daily data. Most approaches either –Do nothing (discard data) –Apply monthly factors –Interpolate monthly factors The survey points out several other alternatives that WG5 needs to investigate
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 METHODS, COMPLEMNTARY APPROACHES ARISING FROM OTHER SCIENTIFIC FIELDS Other fields, like applied statistics, econometrics, biometrics, medicine, genetics, etc have been sampled to find new methods or improvements these already known by us
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 Examples CART Mixture Models (EM) Asymetric kernels Bayesian stuff Adaptative penalized likelihood Those methods are directly adapted to the problem of multiple change-points
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data (Lai et al., Bioinformatics 21, 2005)
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 SURVEY. Real Dataset. Potential Datasets Through the survey, we have been offered a large number of datasets Some of them are very specific (for example, rain and snow separated or upper air data) and will be very useful to test our general recommendations over other elements, regions, etc WG1 will make a recommendation based on –Availability of basic elements –Metadata availability –Number of stations –Data period –Homogenization results (specially, for daily data)
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 SURVEY. Potential Datasets to use action-wide IndividualCountryStats First year Last year Metad ata D- Pcp D- StP D - Tm D- Txn M- Pcp M- StP M- Tm M- Txn Allan WG-SP ISPD> AuerAustria Brandsma Neetherl ands ChevalRomania> DellaMartaSwitz MestreFrance METE OCATSpain> (oldest)
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 –Cheval (Romania) & Prohom (Meteocat, North Eastern Spain) Cover all the pre-selected elements Metadata available Large number of stations and long enough number of years –Mestre (France): complentary of Prohom’s data set; will help to assess border effects –Auer Dense precipitation network Metadata available –Allan: Large dataset for pressure with exceptionally long stations Metadata available –Brandsma (Neetherlands) and Della Marta (Switzerland) Homogenization on daily basis has been performed over this data
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 Benchmark dataset – most important parameters sd: Standard deviation inhom: inhomogeneities length: length of datarecord global tr.: global trend x-corr: cross correlations between stations missing: missing data
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 SOFTWARES Some of the proposed methods an other approaches have softwares ready to try over the surrogate data and real data networks –MASH-MISH (by Tamas Szentimrey) –PRODIGE (Caussinus-Mestre, by Olivier Mestre) –CLIMATOOL (different methods, in R, by J.A. Guijarro) –ANACLIM (different methods, by Petr Steparnek) –RHTest, in R (by Xiaolan Wang) –CCRG suite (SNHT + Vincent’s daily, by Enric Aguilar) –THOMAS (by Paul Dellamarta) –… Need to recode some the algorithms into R for testing over idealized time series Need to produce a final piece of software inside WG5
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 WG-1 Status Survey groups or individuals using homogenization techniques. List currently used techniques together with their strengths and weaknesses. This will be achieved through a questionnaire; and personal contact with recognized experts, an evaluation of results will be presented 100% Comprehensively search the literature: climate Journals, grey literature and Proceedings, non-climate sources 50% Classification of the methods according to: statistical nature (parametric, non-parametric, etc…), data requirements (direct, absolute, relative), time scope (annual, monthly, daily) 30% Compilation of the Benchmark Dataset: catalogue of expected inhomogeneity situations, list of suitable real datasets, selection of real datasets, creation of simulated time series reproducing expected problems 80%
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 WG1-Progress Identify an expert team within WG1 to complete –Grey, tedious but very important!!!! –Comprehensively search the literature: climate Journals, grey literature and Proceedings, non- climate sources 50% –Classification of the methods according to: statistical nature (parametric, non-parametric, etc…), data requirements (direct, absolute, relative), time scope (annual, monthly, daily) 30% –Web site repository (private network) 0%
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 THANK YOU!!