Computing in Archaeology Session 9. Sampling Assemblages © Richard Haddlesey www.medievalarchitecture.net
Aims To become familiar with sampling practices in an archaeological context
Introduction to Sampling An area of excavation is a sample of the complete site which in itself is a sample of all sites of that type. The same goes for artefact assemblages. The essence of all sampling is to gain the maximum amount of information by measuring or testing just a part of the available material Fletcher & Lock 2005, 66
Archaeological sample Sampled population Target population
Formal definitions Population: the whole group or set of objects about which inference is to be made Sampling fame: a list of the items, units or objects that could be sampled Variable: a characteristic which is to be measured for the units, such as weight of spearheads Fletcher & Lock 2005, 66
Formal definitions Sample: the subset or part of the population that is selected Sample size: the number in the sample. A sample size of 5 is considered small, while, formally, a sample size of 50 is large. The sample size maybe stated as a percentage of the sampling frame, e.g. a 10% sample Fletcher & Lock 2005, 67
Sampling strategies a simple random sample (probability sample USA) a systematic sample a stratified sample a cluster sample
population – 100 units . . . etc 100 obsidian spearheads
population – 100 units 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
A simple random number sample
Random sampling If we have a sample of 100 spearheads, we simply pick 10 random numbers (i.e. 10%) Computers can help generate random sequences, but are not necessary You must avoid bias in your selection as this can result in scrutiny from others
a simple random number sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
A systematic sample
Systematic sampling To take a systematic approach, we could choose every number ending in 4. Once again this would give us our 10% This method has the advantage of being easy to design unless the units have inherent patterning in their order
a systematic sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
A stratified sample
Stratified sampling Here we take a random sample 5 from the top and five from the bottom Or 5 from the left, 5 right etc
a stratified sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
a stratified sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
A cluster sample
Cluster sampling Rather than select individual items, select clusters or groups of items that are close together This may result in bias values
a cluster sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
a cluster sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Downside to systematic Totally miss this context
Common sample statistics: x – the sample mean s – the sample standard deviation p – the sample proportion (i.e. the proportion of the sample having a particular characteristic)
Stats The true population values for these statistics are usually unknown, and formally denoted by Greek letters
Common sample statistics: known value estimate for x – the sample mean s – the sample standard deviation p – the sample proportion μ – the population mean
Common sample statistics: known value estimate for x – the sample mean s – the sample standard deviation p – the sample proportion μ – the population mean σ – the population standard deviation
Common sample statistics: known value estimate for x – the sample mean s – the sample standard deviation p – the sample proportion μ – the population mean σ – the population standard deviation π – the population proportion
The central-limit theorem (the law of averages) In order to comment on how good an estimate the sample statistics are, the nature of their distribution needs to be known See Fletcher & Lock (2nd ED) 2005, Digging Numbers Oxbow 70-9