Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 10 Sections 4.2 – 4.3 Objectives: Obtaining Data

Similar presentations


Presentation on theme: "Lecture 10 Sections 4.2 – 4.3 Objectives: Obtaining Data"— Presentation transcript:

1 Lecture 10 Sections 4.2 – 4.3 Objectives: Obtaining Data
Data from Sampling Data from Experiments

2 Obtaining Data Where do data come from? Sampling
Studies based on sampling called Observational Study: observe individuals and measure variables of interest, but do not attempt to influence the response (outcome of a study). The researcher is a passive observer who documents the process. Experiment Studies based on experiment called Experimental study: deliberately impose some treatment on individuals to observe their response. The researcher actively intervenes to control the study conditions and records the responses.

3 Data from Sampling Population: The entire collection of objects of interest. (could be small or infinitely large) Can we use population? Yes, we could do a census (an inspection of the entire population). But, Too costly to get all members of a population. Too much time to get all members of a population. So, we may choose to inspect a small portion of a population. Sample: a subset of a population that is actually observed. Why sample? A sample is a collection of individuals selected from a population and so it takes less time and money. For example, sampling and testing 20 items from a batch of 1000 items obviously involves less labor than testing the entire batch. But, we no longer know everything about the population exactly and hence we have uncertainty about the truth.

4 Sampling Techniques We seek information on a population by choosing a sample. So, it is important to take a representative sample a large enough sample a sampling method which should not be biased. Note: A sampling method is biased if any part of the population cannot get in. Sampling techniques for a single population Random sampling (Simple Random Sampling) Stratified sampling Cluster sampling

5 Random vs. Nonrandom Samples
Nonrandom samples: No randomization technique is used to get sample. So, a nonrandom sample cannot represent a population well and, consequently, the information in such data cannot be generalized to larger populations. What should you do when nonrandom collected data arises in practice? It is acceptable to apply simple descriptive statistical measures to the data (e.g., mean, histogram, and so forth). But, be aware that such measures cannot legitimately be generalized and the statistical techniques presented in the following chapters may not be valid when applied to such data.

6 Simple Random Sample A Simple Random Sample (SRS) is made of randomly selected individuals. Each individual in the population has the same probability of being in the sample. All possible samples of size n have the same chance of being drawn. Choosing an SRS Step 1: assign a numerical value to every individual in the population. Step 2: use a table of random digits or random number generator to select a random sample of these numerical values.

7 Stratified Sampling There is a slightly more complex form of random sampling: A stratified random sample is essentially a series of SRSs performed on subgroups of a given population. The subgroups are chosen to contain all the individuals with a certain characteristic. For example: Divide the population of UCI students into males and females. Divide the population of California by major ethnic group. Divide the counties in America as either urban or rural based on criteria of population density. The SRS taken within each group in a stratified random sample need not be of the same size. For example: A stratified random sample of 100 male and 150 female UCI students

8 Cluster Sampling Example: US Census
Stratified and SRS sampling are best when relatively complete lists of population elements and strata sizes are known before sampling. However, in some applications, such information is difficult or impossible to obtain. In such cases, some form of cluster sampling is used instead of SRS or stratified sampling. Like stratified sampling, cluster sampling requires that we first divide a population into non-overlapping groups, called clusters. However, we do not need to know the number of population elements in each cluster. Instead, we simply take an SRS sample of the clusters and then measure all elements within the selected clusters. Example: US Census Since complete lists of city inhabitants are not known, a city is divided into blocks (clusters) using maps, then a random sample of these blocks is selected and all residences in the sampled blocks are contacted.

9 Data from Experiments The individuals in an experiment are the experimental units. If they are human, we call them subjects. In an experiment, we do something to the subject and measure the response. The “something” we do is a called a treatment. One group of people may be placed on a diet/exercise program for six months (treatment), and their blood pressure (response variable) would be compared with that of people who did not diet or exercise.

10 Factors: the explanatory variables in an experiment.
Level: a specific value of the factor. A bleach experiment was run to study the effect of different bleach concentrations and the effect of the type of stain on the speed (in seconds) of stain removal from a piece of cloth. The bleach concentration was to be observed at 3, 5 and 7 teaspoonfuls of bleach per cup of water, and three types of stain (blue ink, jam, tomato sauce) were of interest. a. What are the explanatory and response variables? b. How many factors are there? How many levels for each factor? c. How many treatments are there? List them.

11 Principles of Experimental Design
Basic tools of experimental design: Control the effects of lurking variables on the response, simply by comparing two or more treatments. Randomize – use impersonal chance to assign subjects to treatments. Replicate each treatment on enough subjects to reduce chance variation in the results.

12 More Examples What type of study is this?
In order to assess the effects of exercise on reducing cholesterol, a researcher sampled 50 people from a local gym who exercise regularly and 50 people from the surrounding community who do not exercise regularly. Each subject reported to a clinic to have their cholesterol measured. The subjects were unaware of the purpose of the study, and the technician measuring the cholesterol was not aware of whether the subject exercises regularly or not.

13 More Examples Olivia is planning to take a foreign language class. To research how satisfied other students are with their foreign language classes, she decides to take a sample of 20 such students. The university offers classes in four languages: Spanish, German, French, and Japanese. She will select a simple random sample of five students from each language. What sampling technique is Olivia using?


Download ppt "Lecture 10 Sections 4.2 – 4.3 Objectives: Obtaining Data"

Similar presentations


Ads by Google