OUTLINE: sampling and census sampling surveys, frame, size probability and non-probability sampling methods census OUTLINE: sampling and census sampling surveys, frame, size probability and non-probability sampling methods census S AMPLING
collection methods for dataSampling any data collection that is not a controlled experiment i.e. percentage of greenhouse gases in atmosphere above Winnipeg S AMPLING AND C ENSUS
Census survey whose domain is the characteristics of an entire population any study of entire population of a particular set of ‘objects’. i.e. female polar bears in western Hudson Bay human residents of Heidelberg the number of Epacris impressa plants on a single hillside in Riding Mountain National Park S AMPLING AND C ENSUS
survey collect, analyse or study only some members of a population then we are carrying out a survey aim is to make observations at a limited number of carefully chosen locations that are representative of a distribution use sample to predict the overall character of the population – accuracy will depend on quality of sample S AMPLING AND C ENSUS
done for several reasons: costs less than a census of the equivalent population they are carried out to answer specific questions, sample survey will usually offer greater scope than a census (larger geographical area, greater variety of questions) S AMPLING S URVEYS
development of sampling survey: state objectives of survey define target population define data to be collected define the required precision and accuracy define the measurement ‘instrument’ define the sample frame, sample size and sampling method, then select the sample S AMPLING S URVEYS
process of generating a sample requires several critical decisions to be made: sample frame sample size sampling method errors will compromise the entire survey S AMPLING S URVEYS
if frame is wrongly defined, sample may not be representative of the target population. frame might be ‘wrong’ in three ways: contains too many individuals (membership is under-defined) contains too few individuals (membership is over-defined) contains the wrong set of individuals (membership is ill-defined) S AMPLE F RAME
Two-stage process: divide the target population into sampling units i.e. households, trees, light bulbs, soil samples, cities, individuals create a finite list of sampling units that make up the target population. i.e. names, addresses, identity numbers, # of 50 mL sample bottles S AMPLE F RAME
member of a sample/sample frame in geomatics – points, lines (transects) and areas (quadrats) i.e. measuring snow depth at 10 cm intervals along a 10 m line measuring all features that fall within 10 m of a line S AMPLING U NITS
quantity is not better than quality in statistics – sample size of 30 or greater is ideal in geomatics – appropriate sample size is directly related to a distribution’s variability S AMPLE S IZE
aim is to obtain a sample that is representative of the target population. when selecting a sampling method, we need some minimal prior knowledge of the target population how we actually decided which sampling units will be chosen makes up the sampling method. S AMPLING M ETHOD
most sampling methods attempt to select units such that each has a definable probability of being chosen - probability sampling methods. we can ignore probability of selection and choose samples on some other criterion – non-probability sampling methods. S AMPLING M ETHOD
N ON-PROBABILITY S AMPLING units that make up the sample are collected with no specific probability structure in mind i.e. units are self-selected units are most easily accessible units are selected on economic grounds units are considered to be typical of pop’n units are chosen without an obvious design
considered inferior to other method - no statistical basis upon which the success of sampling method can be evaluated. may be unavoidable – regard as a ‘last resort’ when designing a sample scheme. N ON-PROBABILITY S AMPLING
basis is the selection of sampling units to make up the sample based on defining the chance that each unit in the sample frame will be included i.e. have 100 students, need 10 to fill out a survey, each student has a 1 in 10 chance or being selected (probability of selection is 0.1) P ROBABILITY S AMPLING
each time we apply the same method to the same frame, we will generate a different sample concerned with probability of each sample being chosen, rather than with the probability of choosing individual units number of probability sampling strategies P ROBABILITY S AMPLING
Simple random sampling simplest way select n units such that every one of the possible samples has an equal chance of being chosen generate a sample by selecting from the sample frame by any method that guarantees that each sampling unit has a specified probability of being included how we do the sampling is of no significance (I.e. random number tables, dice, …) P ROBABILITY S AMPLING
Simple random sampling P ROBABILITY S AMPLING
i.e Use random number table to generate six random number between 1 and 14 4, 6, 7, 9, 11, 13 P ROBABILITY S AMPLING
Stratified Sampling used when you suspect the target population actually consists of a series of separate ‘sub-populations’ stratification is the process of splitting the sample to take account of possible sub-populations stratified sampling – total pop is first divided into a set of mutually exclusive sub-pops/strata sub-populations may be of equal sizes or not depending on their relative sizes P ROBABILITY S AMPLING
Stratified Sampling within each strata, select a sample usually ensuring that the probability of selection is the same for each unit in each sub- pop – stratified random sample i.e.national polls and rating surveys P ROBABILITY S AMPLING
i.e First split pop into sub-pops (based on the second number in this example) Then sample from these sub- pops (three from each using a random number table – 1, 2, 5) P ROBABILITY S AMPLING
Systematic Sampling decide sample size from the population size; population has to be organized in some way i.e. points along a river, simple numerical order simpler in design and easier to administer P ROBABILITY S AMPLING
Systematic Sampling choose a starting point along the sequence by selecting the r th unit from one end of the sequence then take the rest of the sample by a number to r P ROBABILITY S AMPLING
i.e First order the sample units (in this case decreasing numerical order) Next, select the first point (r value) – 2 Then take every third sample after this (2, 5, 8, 11, 14) P ROBABILITY S AMPLING
C ENSUS aim is to identify and record all members of a population most countries routinely carry out a census on its population i.e. Canada – performs a census every 5 years (1981, 1986, 1991, 1996, 2001) original function to enumerate for electoral purposes, but encompasses a large range of information about national populations
collects important information about the social and economic situation of people living in an area Population Counts Age, Sex, Marital Status, Families (number, type and structure) Structural Type of Dwelling and Household Size Immigration and Citizenship, Education, Mobility, Migration Mother Tongue, Home Language and official/Non-Official Languages Ethnic Origin and Population Group (visible minorities) Labor Market Activities, Household Activity, Place of Work and Mode of Transportation Sources of Income, Total Income and Family and Household Income Families: Social and Economic Characteristics, Occupied Dwellings and Household Costs C ENSUS
disadvantages of census: time consuming - require years of planning laborious - requires thousands of workers/volunteers costly - millions of dollars to survey everyone C ENSUS
Errors in census data: people respond dishonestly due to lack of confidence in confidentiality full accounting of residences is difficult to document (i.e homeless) recruiting substandard people to conduct surveys C ENSUS
a census consists of “enumeration” data counts tabulated or ‘aggregated’ by geographic areas census regions/enumeration areas are not distributed uniformly and vary in shape, size and orientation Canada divided into 51,500 enumeration areas census regions are defined by political boundaries and natural and cultural landmarks C ENSUS R EGIONS
Enumeration Area (EA) smallest reported census area canvassed by one census representative dwellings, depending on situation in rural/urban area Census Tract (CT) represent urban or rural communities in CMAs and Cas populations range between 2, ,000 Census Subdivision (CSD) term applied to municipalities or equivalent C ENSUS R EGIONS
Census Division (CD) areas intermediate between municipality (CSD) and province level represent counties, regional districts, regional municipalities Census Metropolitan Area/Census Agglomeration (CMA/CA) CMA and CA are very large urban cores together with adjacent integrated urban and rural areas urban core population >100,000 for CMA, >10,000 for CA CMA may be combined with adjacent CAs to form ‘consolidated CMA’ Federal Electoral Districts (FED) area entitled to elect a representative member to the House C ENSUS R EGIONS
aggregate census information within the boundaries of the data collection regions. reduce costs confidentiality GIS concerns census region totals are more abstract and spatially inaccurate mask the true nature of population distribution C ENSUS R EGIONS
aggregated data reported as census region totals – data presentation is a count by region also report census totals at region centroids center of area – balance point for census region shape center of population – averaging x and y coordinates of the individual pop`n. R EPORTING M ETHOD
Map of Census divisions
census represents a very important source of data for GIS because: it provides data of use in many areas of human geography: social, economic, political the census goes back to Confederation, so historical analyses can be performed the census provides data in a large variety of readily-mapped spatial zones (eg CMA, county) C ENSUS AND GIS