Sampling and monitoring the environment-I Marian Scott August 2008.

Slides:



Advertisements
Similar presentations
Introductory Mathematics & Statistics for Business
Advertisements

Sampling and monitoring the environment
Statistical basics Marian Scott Dept of Statistics, University of Glasgow August 2010.
Sampling and monitoring the environment-2 Marian Scott Aug 2008.
Statistical sampling principles for the environment
Sampling and monitoring the environment
Statistical basics Marian Scott Dept of Statistics, University of Glasgow August 2008.
Sampling and monitoring the environment Marian Scott Sept 2006.
Some spatial modelling examples for discussion
Sampling and monitoring the environment-2 Marian Scott Sept 2007.
Sampling and monitoring the environment-I Marian Scott Sept 2007.
Some spatial modelling examples for discussion Marian Scott NERC September 2011.
Overview of Sampling Methods II
Chapter 7 Sampling and Sampling Distributions
Introduction Simple Random Sampling Stratified Random Sampling
Sample size estimation
Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.
QBM117 Business Statistics Statistical Inference Sampling 1.
SAMPLING DESIGN AND PROCEDURE
Dr. Chris L. S. Coryn Spring 2012
Who and How And How to Mess It up
Sampling.
Evaluating Hypotheses
Why sample? Diversity in populations Practicality and cost.
Sampling Distributions
11 Populations and Samples.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Sampling Methods.
Formalizing the Concepts: Simple Random Sampling.
BA 427 – Assurance and Attestation Services
Sampling Moazzam Ali.
Sample Design.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
COLLECTING QUANTITATIVE DATA: Sampling and Data collection
Chapter 1: Introduction to Statistics
RESEARCH A systematic quest for undiscovered truth A way of thinking
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
Sampling: Theory and Methods
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Random Sampling, Point Estimation and Maximum Likelihood.
PARAMETRIC STATISTICAL INFERENCE
Sampling Methods. Definition  Sample: A sample is a group of people who have been selected from a larger population to provide data to researcher. 
7.1Sampling Methods 7.2Introduction to Sampling Distribution 7.0 Sampling and Sampling Distribution.
Agricultural and Biological Statistics. Sampling and Sampling Distributions Chapter 5.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
STANDARD ERROR Standard error is the standard deviation of the means of different samples of population. Standard error of the mean S.E. is a measure.
Lecture 9 Prof. Development and Research Lecturer: R. Milyankova
1 Chapter Two: Sampling Methods §know the reasons of sampling §use the table of random numbers §perform Simple Random, Systematic, Stratified, Cluster,
Gile Sampling1 Sampling. Fundamental principles. Daniel Gile
Tahir Mahmood Lecturer Department of Statistics. Outlines: E xplain the role of sampling in the research process D istinguish between probability and.
Sampling Techniques 19 th and 20 th. Learning Outcomes Students should be able to design the source, the type and the technique of collecting data.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Section 10.1 Confidence Intervals
Sampling Methods, Sample Size, and Study Power
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. PPSS The situation in a statistical problem is that there is a population of interest, and a quantity or.
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Topics Semester I Descriptive statistics Time series Semester II Sampling Statistical Inference: Estimation, Hypothesis testing Relationships, casual models.
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
Slide 7.1 Saunders, Lewis and Thornhill, Research Methods for Business Students, 5 th Edition, © Mark Saunders, Philip Lewis and Adrian Thornhill 2009.
Sampling Design and Procedure
Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.
Statistical Concepts Breda Munoz RTI International.
AC 1.2 present the survey methodology and sampling frame used
Sampling Why use sampling? Terms and definitions
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Graduate School of Business Leadership
Sampling And Sampling Methods.
Presentation transcript:

Sampling and monitoring the environment-I Marian Scott August 2008

Outline Variation General sampling principles Methods of sampling –Simple random sampling –Stratified sampling –Systematic sampling –How many samples (power calculations)

Variation Natural variation in the attribute of interest, might be due to –feeding habits if measuring sheep, rainfall patterns if measuring plants Also variation/ uncertainty due to analytical measurement techniques. Natural variation may well exceed the analytical uncertainty Expect therefore that if you measure a series of replicate samples, they will vary and if there is sufficient you may be able to define the distribution of the attribute of interest.

from Gilbert and Pulsipher (2007)

Activity (log 10 ) of particles (Bq Cs-137) with Normal or Gaussian density superimposed Variation

What is statistical sampling? Statistical sampling is a process that allows inferences about properties of a large collection of things (commonly described as the population), to be made from observations made on a relatively small number of individuals belonging to the population (the sample). In conducting statistical sampling, one is attempting to make inferences to the population.

Statistical sampling The use of valid statistical sampling techniques increases the chance that a set of specimens (the sample, in the collective sense) is collected in a manner that is representative of the population. Statistical sampling also allows a quantification of the precision with which inferences or conclusions can be drawn about the population.

Statistical sampling the issue of representativeness is important because of the variability that is characteristic of environmental measurements. Because of variability within the population, its description from an individual sample is imprecise, but this precision can be described in quantitative terms and improved by the choice of sampling design and sampling intensity (Peterson and Calvin, 1986).

Good books The general sampling textbooks by Cochran (1977) and Thompson (1992), the environmental statistics textbook by Gilbert (1987), and papers by Anderson-Sprecher et al. (1994), Crepin and Johnson (1993), Peterson and Calvin (1986), and Stehman and Overton (1994).

Know what you are setting out to do before you start describing a characteristic of interest (usually the average), describing the magnitude in variability of a characteristic, describing spatial patterns of a characteristic,mapping the spatial distribution, quantifying contamination above a background or specified intervention level detecting temporal or spatial trends, assessing human health or environmental impacts of specific facilities, or of events such as accidental releases, assessing compliance with regulations

Rules Rule 1: specify the objective

Rules Rule 1: specify the objective what is the average concentration? are there trends in space and time?

Rules Rule 1: specify the objective Rule 2: use your knowledge of the environmental context

Use your scientific knowledge the nature of the population such as the physical or biological material of interest, its spatial extent, its temporal stability, and other important characteristics, the expected behaviour and environmental properties of the compound of interest in the population members, the sampling unit (i.e., individual sample or specimen), the expected pattern and magnitude of variability in the observations.

What is the population? The concept of the population is important. The population is the set of all items that could be sampled, such as all fish in a lake, all people living in the UK, all trees in a spatially defined forest, or all 20-g soil samples from a field. Appropriate specification of the population includes a description of its spatial extent and perhaps its temporal stability

What is the sampling unit? The environmental context helps define the sampling unit. It is not practical to consider sampling units so small that their concentration cannot be easily measured. to consider extremely large sampling units, if they are too difficult to manipulate or process. A sampling unit is a unique element of the population that can be selected as an individual sample for collection and measurement.

Sampling units In some cases, sampling units are discrete entities (i.e., animals, trees), but in others, the sampling unit might be investigator-defined, and arbitrarily sized. Statistical sampling leads to a description of the sampled members of the population and inference(s) and conclusion(s) about the population as a whole.

example Cyanotoxins in shellfish The objective here is to provide a measure (the average) of cyanotoxins in shellfish (eg mussels for human consumption) for the west coast of Scotland. Population is Sampling unit is.

example Cyanotoxins in shellfish the population would be all mussels on the west coast. One problem with this study scenario is that the population is mussels on the west coast, but the sampled population may be just those mussels large enough to be caught by the prevailing commercial fishing methods. Sampling unit is

representativeness An essential concept is that the taking of a sufficient number of individual samples should provide a collective sample that is representative of all samples that could be taken and thus provides a true reflection of the population.

representativity A representative collective sample should reflect the population not only in terms of the attribute of interest, but also in terms of any incidental factors that affect the attribute of interest. Representativeness of environmental samples is difficult to demonstrate. Usually, representativeness is considered justified by the procedure used to select the samples.

5 step approach Define the objectives and questions to be answered Summarize the environmental context for the quantities being measured. Identify the population, including spatial and temporal extent. Select an appropriate sampling design. Document the sampling design and its rationale.

Methods Judgemental sampling non-probability based, based only on judgment. problems include the facts that the sample may be biased, that precision cannot be quantified, and that representativeness is unknown. Thus ultimately, it is not possible to evaluate the accuracy or bias of the estimator based on such a sample. It is clear that expert knowledge, allied with probability sampling, is far superior to judgmental sampling.

Methods Simple random sampling With simple random sampling, every sampling unit in the population has, in theory, an equal probability of being included in the sample. The resulting estimator based on such a sample will be unbiased, but it may not be efficient, in either the statistical or practical senses. Simple random sampling designs are easy to describe but may be difficult to achieve in practice.

Population of N units-10 randomly selected Random digits: 5,17,23, 25, 31, 33,42, 45,46,51

Methods Two-stage sampling This design involves definition of primary units, some fraction of which is selected randomly, then the selected primary units are sub-divided and a fraction of the sub-units are selected randomly. At each stage, the units in the design may be sub-divided and randomly selected. This design is useful for components of variation estimation, and it can be cost-effective..

Methods Stratified sampling The population is divided into strata, each of which is likely to be more homogeneous than the entire population. In other words, the individual strata have characteristics that allow them to be distinguished from the other strata, and such characteristics are known to affect the measured attribute of interest. Some ordinary sampling method (e.g., a simple random sample or systematic sample) is used to estimate the properties of each stratum.

Methods Stratified sampling Usually, the proportion of sample observations taken in each stratum is similar to the stratum proportion of the population, but this is not a requirement. If good estimates are wanted for rare strata that have a small occurrence frequency in the population, then the number of samples taken from the rare strata can be increased. Stratified sampling is more complex and requires more prior knowledge than simple random sampling, and estimates of the population quantities can be biased if the stratum proportions are incorrectly specified.

Methods Systematic sampling Systematic sampling is probably the most commonly used method for field sampling. It is generally unbiased as long as the starting point is randomly selected and the systematic rules are followed with care. Line transects and two dimensional grids are specific types of systematic samples that are described in more detail in the spatial section.

Methods Systematic sampling Systematic sampling is often more practical than random sampling because the procedures are relatively easy to implement in practice, but this approach may miss important features if the quantity being sampled varies with regular periodicity and the sampling scheme has similar periodicity.

Population of N (9x6) units-9 systematically selected Systematic selection: 6,12,18,24,30,36,42, 48

Methods Cluster sampling Cluster sampling is most frequently applied in situations where members of the population are found in clusters or colonies. Then, clusters of individuals are selected randomly and all individuals within each cluster are selected and measured. Another variant would involve random selection of a fraction of the individuals within a cluster. Cluster sampling is a convenient and practical design if individuals naturally group within the population. Adaptive sampling is a form of cluster sampling in which decisions are made during the survey, particularly when a cluster, such as a community or herd, are detected unexpectedly.

Methods Double sampling A procedure known as double sampling can be useful when one characteristic may be difficult or expensive to measure but another related characteristic is simple or easy to measure. This might involve making a relatively large number of analyses using the more efficient technique, and selecting a few specimens from this sample on which to make the more expensive analysis. Then, if the two techniques yield a reasonably strong predictive relationship, one can use data from the efficient technique and the relationship to make an inference to the entire sample.

How many samples do I need? How many samples to do what? Estimate the mean (with a specified precision) Estimate the difference between two treatment groups Commonly classed as power calculations

Practical issues Qn 1: not being able to follow exactly the pre-determined statistical sampling design. Qn 2: Absence of suitable material is a common source of missing values in environmental sampling. What do we do?

So we have sampled, what next? Two of the most common sampling objectives are: estimation of the mean, or estimation of a proportion (e.g., the unknown fraction of a population > a specified value), We consider how to achieve these under different sampling schemes

Estimate the population mean Simple random sampling every sampling unit in the population is expected to have an equal probability of being included in the sample. The first step requires complete enumeration of the population members. In the simple random-sampling scheme, one generates a set of random digits that are used to objectively identify the individuals to be sampled and measured.

Estimate the population mean The sampling frame In simple random sampling, one might assume a population of N units (N 100-cm 2 areas), and use simple random sampling to select n of these units. This typically involves generation of n random digits between 1 and N, which would identify the units to sample. If a number is repeated, then one would simply generate a replacement digit.

Estimate the population mean In the salmon example, if we imagine a fish farm cage, this would require a conceptual view of the population, as N fish numbered consecutively from 1 to N. The random digits generated then identify the fish to be sampled from the cage. From the n units sampled, suppose that the PCB concentration is measured in each sample, any one of which is denoted as y i ; then the sample average, is an unbiased estimate of the population mean PCB concentration and the sample variance,s 2, would provide an unbiased estimate of the population variance:

Sample mean and variance

Sampling error the sampling fraction f is usually very small and given by n/N.

Example: 137 Cs contained activity (inventory) in sediment of an estuary Suppose wanted to estimate the inventory of 137 Cs in the sediments of an estuary whose boundaries have been clearly defined. Assume a precise estimate of the area (m 2 ) involved is available, so it becomes necessary to measure 137 Cs areal activity densities (Bq m -2 ), which are multiplied by the area to estimate the contained activity in Bq.

What is? What is the population What is the sampling unit What is the context

Stratified random sampling In stratified sampling, the population is divided into two or more strata that individually are more homogeneous than the entire population, and a sampling method is used to estimate the properties of each stratum. Usually, the proportion of sample observations in each stratum is similar to the stratum proportion in the population.

Stratified random sampling In stratified sampling, the population of N units is first divided into sub-populations of N 1, N 2,….N L units. These sub-populations are non-overlapping and together comprise the whole population. The sub-populations are called strata. They need not have the same number of units, but, to obtain the full benefit of stratification, the sub-population sizes or areas must be known. In stratified sampling, a sample is drawn from each of the strata, the size of each sample ideally in proportion to the population size or area of that stratum.

Sample mean and variance

Systematic sampling Systematic sampling differs from the methods of random sampling in terms of practical implementation and in terms of coverage. Again, assume there are N (= nk) units in the population. Then to sample n units, a unit is selected for sampling at random. Then, subsequent samples are taken at every k units. Systematic sampling has a number of advantages over simple random sampling, not least of which is convenience of collection. A systematic sample is thus spread more evenly over the population.

Systematic sampling Data from systematic designs are more difficult to analyze, especially in the most common case of a single systematic sample. Consider first the simpler case of multiple systematic samples. For example, xxx in pond sediment could be sampled using transects across the pond from one shoreline to the other. Samples are collected every 5m along the transect. The locations of the transects are randomly chosen. Each transect is a single systematic sample.

Systematic sampling Each sample is identified by the transect number and the location along the transect. Suppose there are i = 1,.., t systematic samples (i.e. transect in the pond example) and the y ij is the jth observation on the ith systematic sample for j = 1,…, n i. The average of the samples from the ith transect is calculated.

Population mean and variance estimates

How many samples are needed to ?

Number of samples needs you to state the desired limits of precision for the population inference (how precisely does one want to know the average PCB concentration, or, what size of difference is needed to be detected and with what precision?), state the inherent population variability of the attribute of interest, and derive an equation which relates the number (n) of samples with the desired precision of the parameter estimator and the degree of significance (the chance of being wrong in the inference).

Number of samples Testing mean = null (versus not = null) Calculating power for mean = null + difference Alpha = 0.05 Assumed standard deviation = 1 Sample Target Difference Size Power Actual Power

Number of samples What is the power? Power is a probability, it is the probability that we correctly conclude that the null hypothesis should be rejected. The null would say there is no difference/no effect/no trend. We want a high power

Number of samples Testing mean = null (versus not = null) Calculating power for mean = null + difference Alpha = 0.05 Assumed standard deviation = 1 Sample Target Difference Size Power Actual Power

Power Curves

PCB estimate the mean concentration with an estimated standard error (e.s.e.) precision of 0.1 mg kg -1. The variation of PCB in salmon flesh is Therefore, how many samples would be required? Since the e.s.e. of the sample mean is s/ n, then one must solve for n, for example:

Sample size-too big Thus this degree of improvement in precision, can only be achieved by increasing the number of samples taken to approximately This may well be impractical; therefore the only solution may be to accept a lower precision.

Final comments sampling is an important step in being able to answer the scientific questions of interest there are many different approaches spatial sampling will be covered in a later session but many of the same ideas will be re- visited.