Introduction to Data Analysis.

Slides:



Advertisements
Similar presentations
Where do data come from and Why we don’t (always) trust statisticians.
Advertisements

Mean, Proportion, CLT Bootstrap
GrowingKnowing.com © Sampling A sample is a subset of the population In a sample, you study a few members of the population In a census, you study.
Sampling.
Statistics for Managers Using Microsoft® Excel 5th Edition
Sample Data Population Inference A very common paradigm in statistical studies:
Experimental Design Statistics Introduction Remember, population and sample Samples –1523 randomly chosen voters –6 Black capped chickadees –The.
Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule.
Drawing Samples in “Observational Studies” Sample vs. the Population How to Draw a Random Sample What Determines the “Margin of Error” of a Poll?
Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.
3.2 Sampling Design. Sample vs. Population Recall our discussion about sample vs. population. The entire group of individuals that we are interested in.
Copyright ©2011 Brooks/Cole, Cengage Learning Sampling: Surveys and How to Ask Questions Chapter 5 1.
Chapter 12 Sample Surveys. At the end of this chapter, you should be able to Identify populations, samples, parameters and statistics for a given problem.
Chapter 4 How to get the Data Part1 n In the first 3 lectures of this course we spoke at length about what care we should take in conducting a study ourselves.
The Logic of Sampling. Political Polls and Survey Sampling In the 2000 Presidential election, pollsters came within a couple of percentage points of estimating.
Sample Surveys Ch. 12. The Big Ideas 1.Examine a Part of the Whole 2.Randomize 3.It’s the Sample Size.
Copyright © 2011 Pearson Education, Inc. Samples and Surveys Chapter 13.
Chapter 12: AP Statistics
Qualitative and Quantitative Sampling
C1, L2, S1 Design Method of Data Collection Surveys and Polls Experimentation Observational Studies.
4.2 Statistics Notes What are Good Ways and Bad Ways to Sample?
Sampling Defined / The idea – Making inference about a larger population What is the population – Some particular value in the population estimating.
Copyright © 2009 Pearson Education, Inc. Publishing as Longman. The 1936 Literary Digest Presidential Election Poll Case Study: Special Topic Lecture Chapter.
 Sampling Design Unit 5. Do frog fairy tale p.89 Do frog fairy tale p.89.
Political Science 30: Political Inquiry Drawing a Good Sample.
Homework Read pages Page 467: 1 – 16, 29 – 34, 37, 38, 59.
Introduction to Data Analysis Sampling and Probability Distributions.
Chapter 12 Designing Good Samples. Doubting the Holocaust? An opinion poll conducted in 1992 for the American Jewish Committee asked: Does it seem possible.
Measurements, Mistakes and Misunderstandings in Sample Surveys Lecture 1.
1 Psych 5500/6500 Populations, Samples, Sampling Procedures, and Bias Fall, 2008.
Population We are almost always interested in knowledge about a population. We would have little interest in samples if we could always ask everyone.
DATA COLLECTION METHODS Sampling
Sampling Design Notes Pre-College Math.
Section 3.1B Other Sampling Methods. Objective: To be able to understand and implement other sampling techniques including systematic, stratified, cluster,
Sampling. Sampling Can’t talk to everybody Select some members of population of interest If sample is “representative” can generalize findings.
Chapter 12 Sample Surveys
FPP Chapter 19 Surveys. General Idea Parameter Statistic Inference Sample Population.
STT 421 Day 7: September 28, 2015 September 28, 2015
Part III – Gathering Data
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 In an observational study, the researcher observes values of the response variable and explanatory.
Sampling The complete set of people or objects that information is collected from is called the population. Information is normally taken from a small.
Lecture 1 Stat Applications, Types of Data And Statistical Inference.
Lecture 2 Dustin Lueker.  Parameter ◦ Numerical characteristic of the population  Calculated using the whole population  Statistic ◦ Numerical characteristic.
Statistical Reasoning
I can identify the difference between the population and a sample I can name and describe sampling designs I can name and describe types of bias I can.
Sampling Techniques 1. Simple Random Sample (SRS) or just Random Sample Taking a sample from a population in which… a)Every member has the same chance.
Bangor Transfer Abroad Programme Marketing Research SAMPLING (Zikmund, Chapter 12)
7: The Logic of Sampling. Introduction Nobody can observe everything Critical to decide what to observe Sampling –Process of selecting observations Probability.
©2011 Brooks/Cole, Cengage Learning Elementary Statistics: Looking at the Big Picture 1 Lecture 1: Chapters 1, 2 Introduction, Sampling  Variable Types.
Unit 8: The Normal Distribution. Probability distributions The probability of an outcome in an interval is shown in an histogram as the area above that.
Chapter 3 Sampling Techniques. Chapter 3 – Sampling Techniques When conducting a survey, it is important to choose the right questions to ask and to select.
1 Data Collection and Sampling ST Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results of a statistical.
We’ve been limited to date being given to us. But we can collect it ourselves using specific sampling techniques. Chapter 12: Sample Surveys.
Plan for Today: Chapter 1: Where Do Data Come From? Chapter 2: Samples, Good and Bad Chapter 3: What Do Samples Tell US? Chapter 4: Sample Surveys in the.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 13 Samples and Surveys.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Chapter 11 Sample Surveys. How do we gather data? Surveys Opinion polls Interviews Studies –Observational –Retrospective (past) –Prospective (future)
Sampling Chapter 5. Introduction Sampling The process of drawing a number of individual cases from a larger population A way to learn about a larger population.
Ten percent of U. S. households contain 5 or more people
Unit 4--Lesson 2. Lesson Objectives At the end of the lesson, students can: Identify common issues with sampling and surveys Design an experiment using.
Last lecture summary Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier Measures.
Chapter 12 Sample Surveys. At the end of this chapter, you should be able to Take a simple random sample from a population. Understand and use the principles.
A very common paradigm in statistical studies:
Sources of Bias 1. Voluntary response 2. Undercoverage 3. Nonresponse
Chapter 10 Samples.
Inference for Sampling
Chapter 4 Sampling Design.
Sampling.
MA151 Lecture 2: Sampling methods
Chapter 2.1 Research Methods
Presentation transcript:

Introduction to Data Analysis. Sampling

Today’s lecture Sampling (A&F 2) Stata stuff in Lab Why sample? Random sampling. Other sampling methods. Stata stuff in Lab

Sampling introduction Last week we were talking about populations (albeit in some cases small ones, such as my friends). Often when we see numbers used they are not numbers relating to a population, but a sample of that population. Newspapers report the percentage of the electorate thinking Tony Blair is trustworthy, but this is really the percentage of their sample (say 1000 people) that they asked about Blair’s trustworthiness.

Samples and populations For that statistic from the newspaper’s sample to be useful, the sample has to be ‘representative’. i.e. the % saying Blair is trustworthy in newspaper’s survey (the sample of 1000 people) needs to be similar to the % in electorate (the population of 40 million people). An intuitively obvious way of doing this is to pick 1000 people at random. For the survey, metaphorically (or literally with a big hat) put every elector’s name into a hat and pull out 1000 names. For a random sample of people in a large classroom, I could sample every 10th person along each row.

Why sample? Cost. Speed. Impossibility. We could ask all 40 million people that are eligible to vote in Britain. This would prove somewhat expensive. The last British census cost £220 million… Speed. Equally the last British census took 5 years to process the data… Impossibility. Consuming every bottle of wine from a vineyard to assess its quality leaves no wine to sell…

Why random? Random sampling allows us to apply probability theory to our samples. This means that we can assess how likely it is (given how big our sample is) that our sample is representative. Deal with this in more detail later on. Intuitively, non-random sampling doesn’t seem a very good idea. Who’s heard of Alf Landon?

Alf vs. FDR In 1936 the Literary Digest magazine predicted that the Republican Presidential candidate (Landon) would beat FDR. The LD sent 10 million questionnaires out, of the 2 ½ million that were sent back, a large majority claimed to be voting Republican at the election. The LD wanted to estimate the % of voters for each candidate (the parameter), and used the proportion from their sample (the statistic) to estimate this. But, FDR won…

Why did the LD get it wrong? LD’s sample was large, but unrepresentative. They did not send questionnaires to randomly selected people, but rather lists of people with club memberships, lists of car /telephone owners. These people were wealthier and therefore more likely to vote Republican; the sample was not representative of the US electorate as a whole. The LD’s sampling frame was not the population (the electorate), but a wealthy subset of the population.

Non-probability sampling The moral being… If we don’t sample randomly, and instead use non-probability sampling, then we are likely to get sample statistics that are not similar to the population. e.g. Newspapers and TV regularly invite readers/ viewers to ring up and ‘register their opinion’. Scottish Daily Mirror ran a poll on who should be the new 1st Minister in 2001. One of Jack McConnell’s fellow MSPs rang up 169 times to indicate he should take the position… If the Daily Mail and Independent hold phone in-polls on the same issue, the results will be different as the samples are different to one another in a non-random way (social class, ideology, etc.).

Experimental designs Randomness is also useful in experimental sciences, just as with observational data. If we are giving one set of subjects a treatment and one group nothing, then ideally we would randomly select who is in each group. e.g. psychiatrists studying a drug for manic depressives, would give the drug to one group and a placebo to the other. Their results are no good if the groups are initially different (say by age, sex, etc.). Random selection into a group makes these differences unlikely, and allows us to test how likely it is that the drug has a real effect.

Simple random sampling (SRS) ‘Names out of a hat’ sampling. Select the n of the sample that we want, and then randomly pick that n of observations from the population. Each member of the population is equally likely to be sampled. e.g. if I wanted a sample from the room, then I might give everyone a number, and then use a table of random numbers to pick out 10 people. Any method that picks people randomly is acceptable.

Problems with SRS A random sample may not include enough of a particular interesting group for analysis. Interested in experiences of racism, 100 random people will on average include 85 whites, and an individual sample will potentially have even fewer (maybe even zero) non-whites. Can be costly and difficult. A random sample of 50 school-children might include 49 in England and Wales, and one in the Orkney’s. A complete list of every school-child might be possible to obtain, but what of every person living in Britain. A list of the population of interest is not always available.

Solutions (1) Stratified random sampling. Two stages: classify population members into groups, then select by SRS within those groups. e.g. ‘over-sample’ non-whites for our racism study. Once we had divided the population by race, we would SRS within those racial groups. Might take 50 whites and 50 non-whites for our sample if we were interested in comparing experiences of racism.

Solutions (2) Cluster random sampling. If population members are naturally clustered, then we SRS those clusters and then SRS the population members within those clusters. Pupils in schools are naturally grouped by school. We may not have a list of every school-child, but we do have a list of every school. Again two stages. We randomly pick 5 schools, and then randomly pick 10 children in each school.

One further problem This is not to say that all problems with random sampling are soluble. Non-response. Not all members of our chosen sample may respond, particularly when sanctions are nil and incentives are low (or in fact usually negative…). This can matter if non-response is non-random. If certain types of people tend to respond and others do not.

Non-response In 1992 opinion polls predicted a Labour victory, yet the Conservatives were returned by a large majority of votes (if not seats). One of the (many) factors that may have caused this bias in the polls was that Conservative voters were less likely to respond to surveys than other voters. If the members of the sample that choose to not respond are different to those that do then we have a biased sample. More on bias later on. Ultimately, tricky to deal with. Some more on this later this semester.

Sampling – a summary Sampling is a easy way of collecting information about a population. SRS means everyone in the population of interest has the same chance of being selected. We often use slightly different methods to SRS to overcome certain problems. Random sampling allows us to estimate the probability of the sample being similar to the population.