Download presentation
Presentation is loading. Please wait.
Published byRonald Small Modified over 9 years ago
1
1 Introduction to Statistics
2
2 What is Statistics? The gathering, organization, analysis, and presentation of numerical information
3
Important Terminology * You can find these terms in your text (p. 91 – 100) 3
4
4 Population – is a set of all items of interest and can be infinitely large. E.g. Every person who watches television.
5
5 Sample – is a set of data drawn from a population. It must be representative of the population of interest and be randomly drawn. E.g. The small number of actual viewers on which Nielsen ratings are based.
6
6 Data are the observed values of a variable and can be measurements or observations. Examples: Student marks {67, 74, 71, 83, 93, 55, 48} Popular car colours {silver, black, white, red} Datum – singular form of the word data.
7
7 Variable The quantity being measured (ie. marks, car colours) Population Sample Variable
8
8 Types of Data 1. Qualitative data – are categorical observations that can be measured on the nominal or ordinal scale. Nominal scale data – generated results are names or categories with no implied order. (e.g. yes or no) Ordinal scale data – data that falls into distinct categories that can be ranked in some order. (e.g. good, fair, poor)
9
9 2. Quantitative (Interval) Data – are numerical observations measured on the interval scale (real numbers). Many types of calculations can be performed on such data. This data can be: Discrete – numerical responses arise from a counting process. Continuous – numerical responses arise from a measuring process.
10
10 E.g. Representing Student Grades… Categorical? Data Interval Data e.g. {0..100} Nominal Data e.g. {Pass | Fail} Ordinal Data e.g. {F, D, C, B, A} N Ordered? Y Y N Categorical Data Rank order to data NO rank order to data
11
Day 1 Ends Here 11
12
12 Data Collection and Sampling
13
13 Sources of Data Published data, which can be either primary data (collected and published by the same organization, such as Statistics Canada), or secondary data (published by an organization that did not collect it). Observational studies – obtained by direct observation. Experimental studies – controls are established and participants are randomly selected.
14
14 Surveys… A survey solicits information from people; e.g. Gallup polls; pre-election polls; marketing surveys. The Response Rate (i.e. the proportion of all people selected who complete the survey) is a key survey parameter. Surveys may be administered in a variety of ways, e.g. Personal Interview, Telephone Interview, and Self Administered Questionnaire.
15
15 Questionnaire Design… Over the years, a lot of thought has been put into the science of the design of survey questions. Key design principles: 1. Keep the questionnaire as short as possible. 2. Ask short, simple, and clearly worded questions. 3. Start with demographic questions to help respondents get started comfortably. 4. Use dichotomous (yes|no) and multiple choice questions. 5. Use open-ended questions cautiously. 6. Avoid using leading-questions. 7. Pretest a questionnaire on a small number of people. 8. Think about the way you intend to use the collected data when preparing the questionnaire.
16
16 Sampling It is a way of gathering information. The target population, for which inferences are to be made, must be the same as the sampled population. Samples must be made in a pattern similar to the characteristic make-up of the target population.
17
17 Sampling Plans A sampling plan is a method or procedure for specifying how a sample will be taken from a population. Three methods: Simple Random Sampling, Stratified Random Sampling, and Cluster Sampling.
18
18 Simple Random Sampling… A simple random sample is a sample selected in such a way that every possible sample of the same size is equally likely to be chosen. Drawing three names from a hat containing all the names of the students in the class is an example of a simple random sample: any group of three names is as equally likely as picking any other group of three names.
19
19 Stratified Random Sampling… A stratified random sample is obtained by separating the population into mutually exclusive sets, or strata, and then drawing simple random samples from each stratum. Strata 1 : Gender Male Female Strata 2 : Age < 20 20-30 31-40 41-50 51-60 > 60 Strata 3 : Occupation professional clerical blue collar other We can acquire information about the total population, make inferences within a stratum or make comparisons across strata.
20
20 Stratified Random Sampling… After the population has been stratified, we can use simple random sampling to generate the complete sample: If we only have sufficient resources to sample 400 people total, we would draw 100 of them from the low income group… …if we are sampling 1000 people, we’d draw 50 of them from the high income group.
21
21 Cluster Sampling… A cluster sample is a simple random sample of groups or clusters of elements (vs. a simple random sample of individual objects). All elements of the cluster are then included in the sample. This method is useful when it is difficult or costly to develop a complete list of the population members or when the population elements are widely dispersed geographically. Cluster sampling may increase sampling error due to similarities among cluster members.
22
Day 2 Ends Here 22
23
23 Sample Size… Numerical techniques for determining sample sizes will be described later, but suffice it to say that the larger the sample size is, the more accurate we can expect the sample estimates to be.
24
24 Errors in Sampling Techniques 1. Sampling error refers to differences between the sample and the population that exist only because of the observations that happened to be selected for the sample. It is expected to occur. E.g. Two samples of size 10 of 1,000 households. If we happened to get the highest income level data points in our first sample and all the lowest income levels in the second, this delta is due to sampling error. Note: Increasing the sample size will reduce this type of error.
25
25 Errors in Data Acquisition These errors arise from the recording of incorrect responses, due to: ● incorrect measurements being taken because of faulty equipment, ● mistakes made during transcription from primary sources, ● inaccurate recording of data due to misinterpretation of terms, or ● inaccurate responses to questions concerning sensitive issues.
26
26 Non-Response Bias Refers to error (or bias) introduced when responses are not obtained from some members of the sample, i.e. the sample observations that are collected may not be representative of the target population. Recall that the Response Rate (i.e. the proportion of all people selected who complete the survey) is a key survey parameter and helps in the understanding the validity of the survey and sources of nonresponse error.
27
27 Sampling Bias Occurs when the sampling frame is such that some members of the target population cannot possibly be selected for inclusion in the sample. E.g. Randomly selecting names from a phone book would exclude individuals without a phone or whose number is not listed.
28
Measurement Bias Occurs when the data-collection method consistently either under- or over-estimates a characteristic of the population Results from a data-collection process that affects the variable it is measuring (ie. You can tell by the way the question is worded what most respondents will answer. There are often leading questions or loaded questions.) 28
29
Response Bias Occurs when participants in a survey deliberately give false or misleading answers to influence results unduly or they may be afraid or embarrassed to answer sensitive questions honestly 29
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.