SCIENZE CHIMICHE E SINDONE

Slides:



Advertisements
Similar presentations
Chapter 12 Inference for Linear Regression
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
Correlation and regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
© 2010 Pearson Prentice Hall. All rights reserved Single Factor ANOVA.
Lecture 23 Multiple Regression (Sections )
Correlation and Regression Analysis
Chapter 12 Section 1 Inference for Linear Regression.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Inference for regression - Simple linear regression
Linear Regression Inference
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Exploratory Data Analysis Observations of a single variable.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Correlation & Regression Analysis
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Lesson Testing the Significance of the Least Squares Regression Model.
Simple and multiple regression analysis in matrix form Least square Beta estimation Beta Simple linear regression Multiple regression with two predictors.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Stats Methods at IC Lecture 3: Regression.
Inference for Linear Regression
Chapter 14: More About Regression
CHAPTER 12 More About Regression
Chapter 4 Basic Estimation Techniques
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Regression Analysis AGEC 784.
Inference for Least Squares Lines
CHAPTER 12 More About Regression
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Regression Analysis: Statistical Inference
Model validation and prediction
Regression Inferential Methods
Inference for Regression
Comparing Three or More Means
CHAPTER 12 More About Regression
Slides by JOHN LOUCKS St. Edward’s University.
Correlation and Regression
Regression model Y represents a value of the response variable.
CHAPTER 29: Multiple Regression*
DAY 3 Sections 1.2 and 1.3.
CHAPTER 26: Inference for Regression
Comparing Three or More Means
Basic Statistical Terms
Multiple Regression Models
Two Independent Samples
Unit 3 – Linear regression
Joanna Romaniuk Quanticate, Warsaw, Poland
Simple Linear Regression
Paired Samples and Blocks
Basic Practice of Statistics - 3rd Edition Inference for Regression
CHAPTER 12 More About Regression
Product moment correlation
Chapter 14 Inference for Regression
CHAPTER 18: Inference about a Population Mean
CHAPTER 12 More About Regression
8.3 Estimating a Population Mean
St. Edward’s University
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

SCIENZE CHIMICHE E SINDONE Robust statistical analysis on the 14-C dating by the three official laboratories Marco Riani Director of the Interdepartmental Research Centre of Robust Statistics of the University of Parma http://rosa.unipr.it Joint work with A.C. Atkinson (LSE), F. Crosilla (Univ. of Udine), G. Fanti (Univ. of Padua) SCIENZE CHIMICHE E SINDONE

Starting point: strip of material cut from one corner of the TS in 1988

Starting point: strip of material cut from one corner of the TS Arizona 4 measurements (Only A1 or A1+A2)? Oxford 3 measurements x2 Zurich 5 measurements x1 Notation A1  Arizona dated only A1 A1+A2 Arizona dated both A1 and A2

The general model for observation j at site i is: Estimated ages of the individual samples with calculated s.e. (source Nature) The general model for observation j at site i is:

Question: are the µi all equal? 3 possibilities for the error structure Unweighted analysis. Standard analysis of variance (ANOVA) with vij=1 Original weights. We weight the observations by 1/vij. (weighted ANOVA) Modified weights for Arizona: keeping into account the fact that the S.E. for Arizona are roughly 2/3 of those for the other sites Damon et al. (1988): “the errors include the statistical (counting) error, the scatter of results for standards and blanks, and the uncertainty in the Ó13C determination (Arizona includes the Ó13C error at a later stage, when combining subsample results)”.

Question: are the µi all equal?

Summary: are the µi all equal? Unweighted analysis. Standard ANOVA Original weights. Weighted ANOVA Modified weights. Weighted ANOVA with correction for ARIZONA Remark: ANOVA can correctly be applied only if the variances across the three sites are similar (homogeneous). Box test.

P-values of the results of the ANOVA test Conclusions: the variability is homogeneous among sites the means are significantly different

In addition to the TS each laboratory dated 3 controls Linen from Nubian tomb An Egyptian mummy from Thebes Threads from a cope from Var (France) None of the datings of these samples was controversial Question: the difference in means found for the TS can also be found for the 3 controls?

P-values of the results of the ANOVA test for the three controls Conclusions: the variability is homogeneous among sites the means are not significantly different

Shroud sample and control samples Homogeneity of variances Heterogeneity of means Control samples Homogeneity of variances  No evidence of systematic differences between laboratories Homogeneity of means  No evidence of heterogeneity in the means Question: where does the source of egregious heterogeneity on the TS come from?

We now try to check if there is any trend in the age of the material Spatial layout We now try to check if there is any trend in the age of the material x1 x2 x2 x1 x1=horizontal (longitudinal) coordinate x2=vertical (lateral) coordinate

More formally we want to test whether the following regression model is true y is 12×1 vector containing the raw dates obtained from the 3 laboratories. X is a 12 × 3 matrix containing respectively the intercept, the longitudinal (horizontal) coordinate (x1) and the lateral (vertical) coordinate (x2).

The structure of the regression model In our case we have Y = β0 + β1 X1 + β2 X2 + ε Y = radiocarbon dating X1 = horizontal coordinate X2 = vertical coordinate ε = error term For example if β1 =0 we expect to find an estimated value of β1 which is centered around to zero and is not significant. In order to test the significance of X1 and X2 we use the t-statistics x1 x2

Recap about t-statistics t-statistics are used to test the significance of corresponding regressor If the true value of the regression parameter β is equal to 0, then the sampling distribution of the t-statistic is the Student's t-distribution with 𝜈=(n − k) degrees of freedom, where n is the number of observations, and k is the number of regressors (including the intercept)

Problem: we do not know how the material was cut inside each laboratory x1 x2

We consider 387072 regression models of the kind Problem: we do not know how the material was cut inside each laboratory Solution: we consider all the 387072 possible ways in which the material could have been cut We consider 387072 regression models of the kind Y = β0 + β1 X1 + β2 X2 + ε

Arrangements investigated for the Oxford sample (24=3!×4)

Arrangements investigated for the Zurich sample (96=24×4) x2 x1 x2 x1 x2 x1

Arrangements investigated for Arizona (72=if Arizona only dated A1) x2 x1 x2 x1

Arrangements investigated for Arizona (96=if Arizona dated both A1 and A2) x2 x1 x2 x1

We consider 387072 possible bivariate regressions For each regression we store the values of the t statistic of the horizontal (x1) and vertical coordinate (x2) How to interpret the 387072 numbers for the significance of x1? How to interpret the 387072 numbers for the significance of x2? We have to produce a confidence band!

Ordered p-values for Vertical coordinate Significance level of t-statistic from 387072 configurations and envelopes from 100 simulations of each configuration

Ordered p-values for horizontal coordinate Significance level of t-statistic from 387072 configurations and envelopes from 100 simulations of each configuration

Histograms of the values of t-statistics from 387072 possible configurations for vertical coordinate If the data were random the histogram of the t-statistics would be centered around 0

In all the 387072 we obtain a negative value of the t-statistic!!! Histograms of the values of t-statistics from 387072 possible configurations for horizontal coordinate In all the 387072 we obtain a negative value of the t-statistic!!!

An important point to remark If there is no effect of the x1 coordinate we expect that roughly half of the 387072 values of the t-stat are positive and roughly half are negative All the 387072 values of the t-stat ARE NEGATIVE

Conclusions and questions up to now x2 is not significant but x1 is! Result is expected because the sample is long and (in x1) and thin (in x2) x1 x2 What gives rise to the two peaks for x1?

Projections of cases to consider Whether Arizona analyzed A2 or not clearly has a large effect on the range of values of x1 and so on the t-statistic x1 x2 Reduction of cases: For A1+A2 the original 96 cases become 52 For A1 the original 72 become 31 Just 83 possible configurations for Arizona. The horizontal projection of the 387072 configurations gives 42081 possibilities

Arrangements investigated for Arizona (72=if Arizona only dated A1) Considering just the horizontal projection we have only 31 cases) x2 x1 x2 x1 C4,2=6 cases 1 case 4!=24 cases

Summary of the results For each of the 83 possible horizontal configurations for Arizona there are 507 different ways to obtain another configuration for Oxford or Zurich We show the distribution of the t-statistics for regression only on x1 for each of these 83 configurations using boxplots

Recap on the boxplot It is based on first quartile, median, third quartile and a rule for detecting the outliers

Recap on the boxplot Example of histograms and boxplot and for normal data

Recap on the boxplot Example of histograms and boxplots and for asymmetric data

Summary of the results For each of the 83 possible horizontal configurations for Arizona there are 507 different ways to obtain another configuration for Oxford or Zurich In the plot of the next slide we present the boxplots of the t-statistics for regression only on x1 divided according to these 83 configurations

Boxplots of the distribution of the t-statistics for x1 for each of the 83 configurations of Arizona On top of the boxplots for A1+A2 we put the value of y (radiocarbon dating) associated with x1=41 1 83 Index number

Question: which configurations are plausible? Histograms of values of t-stat divided according to the value of y from A1+A2 associated with x1=41 x1 x2 Question: which configurations are plausible?

A1: Analysis of robust residuals for a representative configuration LTS residuals

A1: Analysis of residuals for a representative configuration Monitoring residuals plot

Monitoring residuals plot A1+A2: Analysis of residuals for a representative configuration when x1=41 is associated with y=591 x1=41 and y=591 is a clear outlier Monitoring residuals plot

A1+A2: Analysis of residuals for a representative configuration when x1=41 is associated with y=591 LTS robust residuals

Arizona dated only A1 or A1+A2? The configurations with A1+A2 lead to detection of the presence of one or more outliers! Arizona is likely to have dated only A1!

Confirmation from Arizona researchers (two years after our analysis) Arizona only dated A1 x1 x2

Overall conclusions The 12 measurements of the age of the TS cannot be considered as repeated measurements from a single unknown quantity! Evidence of a strong linear trend Arizona only dated A1 Hetero-geneity of the data

Overall conclusions The statement of Damon et al. (1988) that “The results provide conclusive evidence that the linen of the TS is mediaeval” must be reconsidered in the light of the strong evidence produced by our use of robust statistical techniques Any analysis of the 12 measurements which does not take into account the spatial trend ignores an important component!

More about the TS in «Statistics and Computing»

Fit and forecasts (with 95% confidence bands) for one of the plausible configurations millimiters

Other datings It is worthwhile also to mention the results from mechanical, chemical and numismatic dating (Fanti G., P. Malfi, Pan Stanford 2015; Fanti G. P. Malfi, F. Crosilla & Baraldi P., A. Tinti, MATEC Web of Conf, 36, 2015). While chemical dating (FT-IR and Raman) gives dates of 300B.C. ±400 and 200 B.C. ±400, mechanical one furnishes 400 A.D. .±400 and numismatic an age before 692 A.D. These datings are fully compatible with the age in which Jesus Christ lived in Palestine.