Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Research Methods Winter 2008 Winter 2008 Review for Test 1 Review for Test 1 Chapter s 1- 4 Del Balso Instructor: Dr. Harry Webster.

Similar presentations


Presentation on theme: "1 Research Methods Winter 2008 Winter 2008 Review for Test 1 Review for Test 1 Chapter s 1- 4 Del Balso Instructor: Dr. Harry Webster."— Presentation transcript:

1 1 Research Methods Winter 2008 Winter 2008 Review for Test 1 Review for Test 1 Chapter s 1- 4 Del Balso Instructor: Dr. Harry Webster

2 2 1. Observational Study: Uses predetermined categories (target behaviors/events) and observes frequency. Seeks to describe behavior or event. Ex., observe children for helpful behavior. Have target behaviors identified before beginning observation. Define precisely helpful behavior. Ex., p. 7 textbook. Compared proximity of residence to power lines for children with leukemia to those without leukemia.

3 3 2. Survey Studies: Offer a series of questions usually to a large number of people. Ex., ability and attitude/opinion questionnaires. Looks at how many people gave a certain answer (frequency & percents). 2a) Sample surveys: A survey where much attention is given to securing the sample in a random manner where, as much as possible, everyone has an equal chance of being picked. Ex., public opinion polls, marketing research.

4 4 3. Census Tries to gather data from everyone in a country. Underestimates the homeless and some minority groups. Governments do this to establish voting districts, economic and social trends. For Canadian CensusCanadian Census 4. Experiments esearcher changes Researcher changes one thing (independent variable, IV; called treatment in textbook); keeps everything else the same, and determines if behavior (dependent variable, DV) is affected.

5 5 Basic design is two groups where one group gets one level of the IV and the other gets another level of the IV (this includes its absence). Since everything else kept constant, then any change in behavior (DV) is due to the change in the IV. Can make cause and effect (what causes what) statements with experiments. Cause and effect statements can only be made about groups; not individuals. They are probable causes not certain ones (remember: statistics is about probability not certainty).

6 6

7 7 Ex., Does a tutor impact academic performance? Group ANo tutor in RM Group BA tutor in RM Both groups are otherwise the same. Compare grades in RM.

8 8 Vocabulary Vocabulary Individual: anything that is measured; includes people, objects, animals. The ‘thing’ that provides the measurement. Ex., person Variable: any characteristic of an individual; anything that we actually measure. Ex., height, weigh t, schooling, singing ability.

9 9 Population: The entire group of individuals about which we want information. Ex, students, Canadians, Montrealers, workers, dads. Sample: The part of the population that we actually measure on a variable. Too time consuming & difficult to measure the whole population. Ex., 100 students, 1000 Canadians, 300 Montrealers, 500 dads.

10 10 Exercises Chapter 1 1. Fewer women than men vote for the PQ. A political scientist interviewed 400 male and female voters and asked how they voted in the last election. a) Is this an experiment? b) What is the population? c) What is the sample? d) What is the individual? e) What is the variable under study?

11 11 2. Identify the individual, variable, population, sample and type of study: Researchers investigated how many homeless people have chronic diseases by interviewing 100 homeless in the downtown Montreal area. Individual: Population: Sample: Type of Study:

12 12 3. As a researcher you are interested in whether students who sleep one hour more per night would improve their academic performance. Design an experiment to investigate this. Be sure to identify the independent variable, the dependent variable, the population and the sample.

13 13 Chapter 4 - continued Chapter 4 - continued Samples, Good and Bad Samples, Good and Bad Probability Sample Any sample that is chosen by chance. Includes Simple Random Sample (SRS) and Stratified Samples. 1. Simple Random Sample Gives each individual in the population an equal chance of being chosen.

14 14 Use table of random digits (handout) or computer program to choose individuals or samples. Random: every individual or sample has an equal chance of being picked. This ensures the sample is representative of the population. Therefore, we can generalize the results back to the population.

15 15 Choose a Simple Random Sample (SRS) Step 1. Assign a number to every individual in the population (or to every sample). Assign numbers beginning with 0 and keep going until everyone has a number. Go down columns of lists. Step 2. Use random digits to select the numbers and choose those people.

16 16 The first two lines of the Random Digit Table p 550: Line 10119223950340575628713 96409125314254482853 10273676471509940001927 27754426488242536290

17 17 Ex., Choose a Simple Random Sample of 4 from the following past players of the Canadiens. (number down beginning with 0). Begin at line 101. Ribeiro BriseboisMarkov SouraySundstromHainsey RyderRivetHossa ZednikDackellGratton KoivuBouillonLangdon Perreault KilgerTheodore BulisQuintalDwyer Dagenais WardHiggins JuneauBeginPlekanec GaronKomisarek

18 18 The sample of 4 is: Komisarek (19); Hossa (22); Perreault (05); Dackell (13).

19 19 Exercises 1. Choose another simple random sample of 4 from the Canadiens hockey players starting at line 103.

20 20 2.You want to survey people about the recent election in the greater Montreal area. Let’s say there are 25 electoral districts in the greater Montreal area. Choose an SRS of 5 districts using line 110 of the Random Digit table. Explain each step.

21 21 Chapter 4 Chapter 4 What Do Samples Tell Us What Do Samples Tell Us

22 22 Vocabulary: Parameter: The population’s data/result. Very rarely done as we have to approach the entire population. Example: Population is Vanier college student body. Parameter: Average age of student body. (Here we would get everyone’s age and find the average). Statistic: The sample’s result. In Social Science and Commerce we mostly deal with statistics. Example: Choosing 300 Vanier College students would be a sample. Statistic: The sample’s average age. Would only ask the 300 students.

23 23 Sampling Variability: When we take many samples from the same population, their results (statistics) vary slightly from one another. The results do so in a pattern called the normal distribution unless they are not properly selected samples. When the sample is large and well selected, the sample statistic is close to the the population parameter.

24 24 variability The larger the sample, the less the variability up to sample size of about 1500. Sample sizes beyond 1500 do not help to reduce the variability. Variability: describes how spread out the values of the sample statistic are when taking many samples. To reduce variability: Use larger samples.

25 25 Exercises for Chapter 4 1. A Crop/LaPresse poll found that 780 of 1000 elderly Montrealers were worried about falling into poverty due to fixed incomes and depressed stock market. a) What is the sample? b) What is the population? c) Can we determine the statistic? d) Can we determine the parameter?

26 26 2. A newspaper presents a survey where 500 of 1100 people coming out of the cinema said they went to see an action movie. a) What is the sample? b) What is the population? c) Can we determine a statistic? d) Can we determine a parameter?

27 27 3. 1200 college students, evenly divided per gender, were asked about part-time work. 850 said they worked at least 12 hours per week. a) What is the sample? b) What is the population? c) Can we determine a statistic? d) Can we determine a parameter?

28 28 Chapter 4 Chapter 4 How sample surveys go wrong: 1. Sampling errors: 1. Sampling errors: errors that occur as we secure the sample causing the sample to be different from the population. Biased samples. a) convenience sampling: a) convenience sampling: grab them where you can. ex., Malls, stores. b) voluntary/self -initiated response sample: b) voluntary/self -initiated response sample: interested people decide to participate given a public request. ex., Call in votes; internet polls.

29 29 c) Undercoverage: when some groups in the population are left out of the process of choosing a sample. Ex., homeless, people without phones or unlisted numbers. d) Nonresponse - selected people who can’t be reached or decline to participate; this can be a serious problem as certain groups tend to decline more than others (ex., elderly, very busy women). Nonresponse can exceed 70% of initial sample. Public opinion polls don’t give the percent of nonrespondents.

30 30 2. Random sampling errors: 2. Random sampling errors: difference between the sample result (statistic) and the population result (parameter) caused by chance in selecting the random sample. Ex., margin of error in a confidence statement

31 31 3. Nonsampling errors: 3. Nonsampling errors: errors that occur after securing the sample. a) Processing errors - mistakes when entering data. Easier now with computers. b) Response error - person offers an incorrect response; makes a mistake. c) Wording of questions - can influence the response. Ex., Should we maintain day care places at $5 per day? Should the state subsidize day cares so they can charge $5 per day?

32 32 When we want to ensure specific groups are represented: Use Stratified Random Sampling: When we want to ensure specific groups are represented: Use Stratified Random Sampling: Proportional Stratified Random Sample 1. Divide the population into distinct groups according to an important factor (ex., gender, age, residence location). Each group, or layer, called a stratum (strata is plural). proportional stratified random sample To have a proportional stratified random sample we choose the same percentage of each strata as is present in the population.

33 33 2. Take a separate SRS in each stratum and combine them to make the whole sample. Ex., Social Science and Commerce Program Want to know opinion about computer facilities. Know we should divide the groups into Social Science and Commerce because each group uses the computer facilities differently. Want a sample of 200 students.

34 34 Step 1 Have 1000 Social Science and 250 Commerce students. These are the two strata. To find proportion/percent in each group: Total 1000 + 250 = 1250 Social Science = 1000/1250 = 80% or.8 proportion. Commerce = 250/1250 = 20% or.2 proportion. 20% are Commerce students and 80% are Social Science students.

35 35 Step 2. Number each stratum separately. Number the 1000 Social Science students from 0 to 999. Number the Commerce students from 0 to 249. Final sample is 200. Social Science stratum: 200 x.8 (80%) = 160 Commerce stratum: 200 x.2 (20%) = 40 Choose 160 Social Science students then 40 Commerce students.

36 36 3. Recent survey results show that 35% of college students graduate with over $1500 in debt. This is very different from the 65% usually reported. The survey used 1500 respondents after attempting to contact 4000. a) What is the response rate for the survey?

37 37 CHAPTER 2 CHAPTER 2 MEASURING MEASURING Measurement: assigning a number to represent a quality of a thing/person. Ex, age, height, speed, depression, elation. A measurement is valid when it measures an object/person in an appropriate manner. It is a plausible and reasonable reflection of the meaning of the broader theory: Ex., measure height in inches or centimeters; not in speed, time; Social status in terms of wealth or political rank. What about Conrad Black?

38 38 A measurement is reliable when the same answer is obtained over multiple measurements of the same variable. Ex., height. What is a valid measurement for school performance? What is a valid measurement for helping behavior?

39 39 A measurement can be in a count (frequency) or in a rate (proportion or percent). Usually, a rate is better. Offers the advantage of comparing two different measurements. Ex., Metro is 2 minutes late 10 times = count. Bus is 2 minutes late 25 times = count. Metro is 2 minutes late 10% of the time = rate. Bus is 2 minutes late 15% of the time = rate.

40 40 Predictive Validity: A measurement has predictive validity if it can be used to predict success or performance on tasks that are related to what we measured. Ex., Do college grades predict university grades? To some extent yes, but motivation and drive are missing from the prediction. (ref: p. 141) SAT (Scholastic Aptitude Tests) and high school grades predict about 34% of the variation in college grades. This means that about 1/3 rd of the grade differences in college are predicted by the grades in high school and on the SAT tests.

41 41 Three types of errors in measurement Three types of errors in measurement. 1. Measuring something in an invalid manner. 2. A biased measurement - repeatedly overestimating or underestimating the value. Ex., How many hours I will study next week. 3. Unreliable - the results of repeated measurements change from one time to the next. If the differences in results are small can use an average to get a good approximation of the true score. Ex., the clocks in the college.

42 42 Exercises Chapter 1 1. Give a valid way to measure determination. 2. Give an example of a biased measurement. 3. Give an example of a reliable measurement.

43 43 CHAPTER 4 CHAPTER 4 1. Introduction to statistical inference. 2. What is a Confidence Interval. 3. What is a Test of Significance.

44 44 INTRODUCTION TO STATISTICAL INFERENCE INTRODUCTION TO STATISTICAL INFERENCE Statistical inference is a technique to make decisions regarding the probability that the population would behave in the same way as the sample. As it is based on probability, then the rules of probability must be followed. Therefore, the assumptions which must be met are: 1) Randomness: the predictable pattern of outcomes after very many trials. 1a) If samples are chosen randomly, then the pattern of outcomes is a normal distribution. This is called a sampling distribution.

45 45 2) We assume the mean of the normal distribution reflects the mean of the population parameter. Statistical inference helps us determine how confident we are about where a result falls on the sampling distribution in two ways: 1. Confidence Intervals: 1. Confidence Intervals: How confident we are that our sample’s result captured the population parameter within a certain range (margin of error). 2. Tests of significance: 2. Tests of significance: We make a claim about the population and use the sample’s results to test that claim. Want to determine the probability of our claim being right.

46 46 Review - CHAPTER 4 Review - CHAPTER 4 WHAT IS A CONFIDENCE INTERVAL? WHAT IS A CONFIDENCE INTERVAL? confidence interval A confidence interval estimates a population parameter from a sample statistic at a certain level of confidence. It represents a range within which we expect the true value to fall, eg + 1.2 or -1.2 percentage points. margin of error margin of error (the margin within which we believe the population’s answer falls. (Determined theoretically by taking many samples; Some of which fall above the true population mean and some fall below).

47 47 Here confidence means the probability of being right. confidence level We take the sample’s statistic (data) and estimate what the population’s answer would be. Involves how sure we are (confidence level). We take the samples’ statistic (data) and estimate what the population’s answer would be.

48 48 Involves how sure we are confidence level – a range within which we expect the true value to fall (confidence level – a range within which we expect the true value to fall), and margin of error (or sampling error): margin of error (or sampling error): the difference between the true value of a characteristic in a population (eg; support for student fee hikes in college and university amongst ALL the students) and the value estimated from a random sample of the population.

49 49

50 50 A) When the statistic is given in percents or proportions. A) When the statistic is given in percents or proportions. confidence interval for any level of confidence is: The formula to find a confidence interval for any level of confidence is:

51 51 = sample statistic (proportion or percent) z* = z scores (standard scores) n = number of subjects in the sample

52 52 Example: Mayor Villeneuve is two weeks from election day. He wants to know his chances of winning the election. A polling company asks 1000 people who they would vote for if the election were held today and 57% say Villeneuve. Villeneuve wants to be 90% confident that he will win..57 + -.0256 or 57% plus and minus 2.5%

53 53 The margin of error is 2.5%. By subtracting and adding it to the percent of people who said they would vote for Mayor Villeneuve (57%) we find the range of scores (margin of error) within which we are 90% confident lies the population parameter. Confidence Statement Mayor Villeneuve can be 90% confident that between 54.4% and 59.5% of all voters will vote for him if the election were held today. (The all reflects the population parameter) The confidence statement is the whole sentence; the margin of error is between 54.4% and 59.5%; the confidence level is 90%.

54 54 Exercises 1. A sample of 500 students in Montreal was asked if they graduated from college in debt and 58% of them said yes. Give a 99% confidence interval of all Montreal students who are in debt.

55 55

56 56 2. A sample of 1000 students in Montreal was asked if they graduated from college in debt and 58% of them said yes. Give a 99% confidence interval of all Montreal students who are in debt. Using the same formula:

57 57

58 58 3. A sample of 1000 students in Montreal was asked if they graduated from college in debt and 58% of them said yes. Give a 90% confidence interval of all Montreal students who are in debt.

59 59 4. How does the sample size affect the margin of error? 5. How does the confidence level influence the margin of error?

60 60 B) Confidence Statements when the statistic is B) Confidence Statements when the statistic is given as a mean given as a mean proportion/percent We have seen how to determine a confidence interval for a population on the basis of a sample’s proportion/percent. mean To establish a confidence interval for a population on the basis of a sample’s mean we do the same thing but with a different mathematical formula.

61 61 The formula:

62 62 Example: The average debt of 500 students is $750 with a standard deviation of $50. Make a confidence statement about the debt level of all students at the 99% confidence level. Using the formula:

63 63 750 +/- 2.58 (2.24) 750 +/- 5.77 Confidence statement: We are 99% confident that all students are between $744.23 and $755.77 in debt.

64 64 Exercises 6. A sample of 200 mice yields an average weight of 3 ounces. Make a confidence statement at the 99% confidence level. The standard deviation for mice weight is.2 ounces.

65 65 7. Using the mice data in question #1, make a confidence statement at the 85% confidence level.

66 66 8. Compare the margin of error for questions 1 & 2. Explain why they are different.

67 67 CHAPTER 22 CHAPTER 22 What is a Test of Significance? What is a Test of Significance? Statistical tests determine if the sample statistic (result) gives good evidence against a claim of no effect or no difference. This claim is the null hypothesis. There are two types of hypotheses. 1. Null hypothesis (Ho): There is no difference or no effect of what we are studying. The statistical test assesses the strength of the evidence against the null hypothesis. This is the claim we seek evidence against.

68 68 2. Alternative (Experimental) Hypotheses (Ha) The alternative hypothesis states there is an effect or a difference. 2a) One-tailed alternative hypothesis: States that the data states the direction of the difference; the population’s answer (parameter) is either bigger or smaller than the null hypothesis.

69 69 2b) Two-tailed alternative hypothesis states there will be a difference between the sample’s data (statistic) and the null hypothesis but we don’t know if is larger or smaller.

70 70

71 71 A)How to do this with sample proportions or percents The Formula:

72 72 Interpreting the formula

73 73 Ex. Perform a study on coffee preference: fresh or instant. The null hypothesis would be that half the population would prefer fresh and half would prefer instant. The alternative hypothesis is that more people prefer fresh coffee. Result of study: 72% said they preferred fresh coffee. Sample size is 50.

74 74 ALWAYS TEST THE NULL HYPOTHESIS!!! Null Hypothesis: Ho p =.5 (claims that 50% of the population would prefer fresh; 50% would prefer instant.) The statistical test attempts to determine how much evidence (sample statistic) exists against that claim. How much evidence is needed to reject the null hypothesis claim is called the alpha level or significance level. The alpha level is usually set at.05

75 75 The alpha level is usually set at.05 This means that the P-value (result of statistical test of significance) must be in the last 5% of the normal curve. We are determining whether the result obtained from the research is sufficiently different from the large assumed population parameter so as to invalidates the null hypothesis. NB: The null hypothesis always assumes no effect or no difference between preferences, opinions, groups.

76 76 Steps to Determining Statistical Significance. Steps to Determining Statistical Significance. Step 1: Develop a null hypothesis that states no effect or no difference. Ex., no coffee preference. Ho =.5 (50% fresh; 50% instant) This is the assumed population parameter. Step 2: Develop an alternative hypothesis that makes sense given the results from the sample. Ex., People prefer fresh coffee. p >.5 More than 50% of people. NB: We hypothesize larger or smaller than the null hypothesis without putting a specific number.

77 77 Step 3: Determine an alpha level (level of significance) at which you will test your null hypothesis. Can’t be any higher than.05. Ex., Can’t establish an alpha level of.3.

78 78 Step 4: The Z score for proportions/percents.

79 79 The Z score cont.

80 80 The Z score is.22 = 3.1.0707 Z scores are standard scores.

81 81 Step 5. Find the percentile ranking for the standard score. (Use Table B in Chapter 13). The percentile ranking for a standard score of 3.1 is 99.9 th percentile. A normal distribution has 1 as the total area below the curve. The percentile of 99.9 is put as a decimal =.999 This means the 72% statistic of people preferring fresh coffee (st. score of 3.1) is beyond.999 of the normal curve.

82 82 That leaves.001 or.1% of the total area that is further than a st. score of 3.1 The.001 is called the P-value The.001 is called the P-value (result of statistical test). Step 6 To reject the null hypothesis that there is no difference between the number of people in the population who prefer fresh and instant coffee we need to have a P-value smaller than.05 of the curve beyond the statistic.

83 83 As.001 is much smaller.05 (the level of significance), then we reject the null hypothesis that there is no difference between the number of people who like fresh and instant. Expressed as p<.05. We then support the alternative hypothesis that there are a greater number of people in the population who prefer fresh coffee.

84 84 We have taken a sample statistic (72%) and compared it to the estimated population parameter (50%) for the null hypothesis and found that it permits us to reject that claim of no difference between number of people in the population who prefer fresh and who prefer instant. But we are only 95% confident (.05 significance level) that we are right; there is no certainty.

85 85 Exercises 1. It is known that 75% of all apple crops are profitable. This year a sample of 200 orchards showed that 76% of the crops were profitable. 1a) Develop a null hypothesis. 1b) Develop an appropriate alternative hypothesis.

86 86 1c) Is there a statistically significant greater percentage profit this year than previous years. Use the.05 significance level.

87 87

88 88 Exercises 1d) Would there have been a statistically significant difference had you used the.01 significance level.

89 89 2. When asking a sample of 250 people if they believe increasing the size of the gas filter would results in greater gas mileage; 55% said yes. 2a) Develop a null hypothesis. 2b) Develop an appropriate alternative hypothesis.

90 90 2c) Using the.05 significance level, determine if we can reject the null hypothesis.

91 91

92 92 B)How to test for statistical significance with sample means B)How to test for statistical significance with sample means The idea is the same: does the sample mean reflect a score sufficiently far away from the center of a normal distribution to be able to say that the population parameter would not be in the center?

93 93 The null hypothesis and alternative hypotheses still mean the same thing. The null hypothesis proposes there is no significant difference between the assumed population parameter and the sample result. The alternative hypotheses propose a difference between the sample’s result and the assumed population parameter.

94 94 How to do this with a Sample Mean

95 95 Example: A sample of 100 students yields an average r score of 28.5 with a standard deviation of 3. The assumed or estimated provincial average r score is 27.5. Determine if the assumed provincial average appears correct. Null hypothesis: There is no difference between sample’s r score and the provincial average. H 0 : = 27.5 Alternative hypothesis: The sample’s average is significantly higher than 27.5. H a : > 27.5

96 96 We will set the alpha level (level of significance) at.05. Steps to calculating the P-value for a sample mean. Step 1: Finding the Z score

97 97

98 98 Step 2: Finding the P-Value: Z Score (is a standard score) = 3.33 To translate a St. Score to a P-value we go to Table B (p. 547) St. Score = 3.33 becomes a percentile of 99.95. Put the percentile in decimal form before subtracting from 1. 99.95 =.9995 in decimal form. The P-Value is 1 -.9995 =.0005 (Reminder: the complete area under a normal distribution is equal to 1)

99 99 Step 3: Comparing our P-value to the alpha level. significantly lower The alpha level is.05 and the P-value is.0005 which is significantly lower and therefore we reject the null hypothesis that the sample’s r score is not significantly different from the assumed population’s score. We support the alternative hypothesis that the population’s r score is significantly higher than 27.

100 100 Exercises 1.The average bike ride is 11 kilometers. A sample of 100 adults rode an average of 12 kilometers per bike ride with a standard deviation of 2 kilometers. Using the.05 level of significance, determine if the sample’s average bike ride is statistically significantly different from the population’s average.

101 101 2. A researcher finds that an increase in the size of the gas filter results in a mean of 32 miles per gallon with a standard deviation of 3. The sample size was 14. It is known that car model gets 30 miles per gallon with a regular size filter. 2a) Develop a null hypothesis. 2b) Develop an appropriate alternative hypothesis.

102 102 2c. Determine if there is a statistically significant difference between the gas filters. Use the.05 significance (alpha) level.

103 103

104 104 CHAPTER 14 CHAPTER 14 DESCRIBING RELATIONSHIPS: DESCRIBING RELATIONSHIPS: SCATTERPLOTS AND CORRELATIONS SCATTERPLOTS AND CORRELATIONS Scatterplots: Involves the relationship between two or more quantitative (ordinal, interval, ratio) variables measured on the same individuals/objects. (For our course, we will deal with two variables.)

105 105 scatterplot. The graph that depicts this relationship is called a scatterplot. Sometimes, scatterplots have an explanatory variable (on the horizontal axis) and a response variable (on the vertical axis). The explanatory variable is the independent variable. The response variable is the dependent variable.

106 106 Each dot in a scatterplot reflects two pieces of information (variables) about an individual. In this example, the individuals are countries. The graph depicts the relationship between gross domestic product per person and longevity. (p. 271)

107 107 Some scatterplots have no explanatory and response variables; only the relationship between two variables. Ex., The Archaeopteryx: the femur (leg bone) and humerus (arm bone); the size of one does not ‘explain’ or ‘contribute’ to the size of the other. (p. 274)

108 108 This scatterplot has a definite shape: as one variable increases, the other tends to increase. positive association This is called a positive association.

109 109 negative association When one variable decreases and the other increases, it is called a negative association.

110 110 When there is no relationship between the change in one variable and the change in another variable, there is no association.

111 111 To examine a scatterplot: 1. Look at the overall pattern and any important deviations. 2. Describe the scatterplot using the form, direction and strength of the relationship. 3. Look for outliers 4. The closer the data are to forming a linear line, the stronger the association. Ex., The Archaeopteryx: There is a strong positive association between the size of the femur and the humerus with no outliers.

112 112 correlation. When the association between two variables is expressed mathematically, it is called a correlation. Features of Correlations 1. It is expressed as r. 2. The range is from -1.00 to +1.00. 3. -1.00 is a perfect negative correlation; +1.00 is a perfect positive correlation. These are never seen with real data. Zero is no correlation - there is no relationship between the variables.

113 113 4. Correlations use standard scores so we can compute them for any two variables (doesn’t have to be the same unit of measurement). 5. Correlations measures the strength of straight-line (linear) associations between variables. 6. Correlations are affected by outliers. The more data there is, the less an outlier will influence the correlation.

114 114

115 115 Correlations between:Are considered:.8 - 1.00Very Strong.6 -.79Strong.4 -.59Moderate.2 -.39 Weak 0.0 -.19Very Weak

116 116 Exercises 1. Develop a scatterplot for the following data: Running times for 100 meter dash. Trial 11011 10.510.2 10 Trial 29.910.9 10.6 10.2 9.8

117 117

118 118 2. What is wrong with the following statements: a) The correlation between number of hours worked by students and number of times they answer a question in class is r =.5 b) The correlation between the first snow storm of any given year and the number of car accidents that day is r = - 1.3

119 119 c) The correlation between gender and income is about r =.66 3. Give an example for each of the following: a) A strong positive correlation b) A strong negative correlation c) No correlation

120 120 CHAPTER 15 CHAPTER 15 DESCRIBING RELATIONSHIPS: DESCRIBING RELATIONSHIPS: REGRESSION, PREDICTION AND CAUSATION REGRESSION, PREDICTION AND CAUSATION Regression and Prediction Regression and Prediction Uses two or more ordinal, interval or ratio variables. Formula used to predict the value of one variable by using the value of a second variable.

121 121 A regression line describes how a response variable changes as the explanatory variable takes on different meanings. (here explanatory variable means the variable that we use to predict the other variable) We predict the response variable for a given explanatory value. To do this: Least Squares Regression: Formula: Y = a + bx Y Y is the value you want to predict x x is the value of the explanatory variable

122 122 a a is the intercept - the value of the response variable (y) when the explanatory value (x) is 0. It is where the least square regression line cuts across the x axis. b bis the slope of the least square regression line. The amount by which the response variable (y) changes when the explanatory variable (x) increases by one unit (of whatever is being measured). Minitab calculates the slope and intercept for us; we plug data into the formula. Can place a least square regression line on a scatterplot by putting a line through the data points.

123 123 Example of Scatterplot with a least regression line: Correlation: r =.992 Regression: - 3.60 + 1.20Femur

124 124 Scatterplot showing effect of outlier on regression line

125 125 Scatterplot of a negative association with regression line

126 126 We cannot predict response values for explanatory values that are out of the range of the data. Ex., gas and temperature. We should not predict beyond the given temperatures. As prediction involves only associations between variables, they do not address cause and effect; therefore, no statements about causality.

127 127 Coefficient of determination (r 2 ) : Coefficient of determination (r 2 ) : The square of the correlation (r 2 ) gives the total amount of change in one variable that is due to changes in the other variable. Does not mean one variable causes the other; only that some of the change in the first is due to change in the second. Ex., r =.5 between temperature and level of outdoor activity. r 2 =.25 which means that 25% of the change (variation) in the outdoor activity variable is linked to change in the temperature variable.

128 128 EXERCISES 1. Professor Lively, runs every day for at least 30 minutes and checks her pulse rate. Some of the data, from p. 276: Time Pulse Time Pulse 34.12152 35.52124 34.52140 34.05152 34.13146 35.52128 36.17136

129 129 1a) Draw a scatterplot for these data.

130 130 1b) The correlation, r = -.815 Briefly describe what this means. 1c) Draw a least square regression line in the scatterplot (eyeball method).

131 131 1d) The least square regression formula is: 510 - 10.6 time Predict the pulse rate for 34.5 minutes eyeballing the scatterplot. Then predict the pulse rate using the formula. Compare the results.

132 132 2. The following are the closing quotes for the Nasdaq and Microsoft for ten trading days. Nasdaq:Microsoft: 174254 178557 177055 178956 178456 180457 186260 184560 182659 182459

133 133 The scatterplot with regression line for these data.

134 134 2a) The correlation is r =.974 Describe what this means.

135 135 4. The correlation between hours of sunshine and number of strawberries on a plant is r =.67. What percentage of strawberry production is linked to hours of sunshine?

136 136 Causation: The reason something occurs; what makes it happen. Requires experimental research designs where there is a great deal of control of all variables. Philosophically, causation requires a ‘leap of faith’ from excluding all other possible explanations to granting the independent variable the power to have caused the behavior.

137 137 a) Simple Causation: Very rare in real life. A causes B to happen. Ex., paying students $250 to get 80%+ in a course. This would increase the number of students who get 80%+. caused If everything else is kept constant, we could say that the $250 had an effect on students’ behavior; it caused an increase in grades. AB

138 138 b) Common Response A causes B and C When changes in two variables are caused by a third, common, variable. Ex., July is season for highest ice cream sales; July is also the month where the most people drown. Ice cream does not cause drowning; the warm weather increases both sales and drownings. A B C

139 139 c) Confounding Response: We know two variables cause a change in a third but we don’t know the ‘weight’ of each variable. Ex., person smokes and drinks too much. Heart is affected; we know that both contribute but do not know how much each contribute. Need to do experimental research to ‘sort out’ the influences. Helps to isolate each variable’s effect on heart.

140 140 When experimentation is not possible, we can approach causation if the following conditions are met: 1. The association between two variables is strong. 2. The association between two variables is consistent. 3. The alleged cause precedes the effect. 4. The alleged cause is plausible.

141 141 3. People who drink diet soft drinks tend to gain more weight over a one year period than people who do not. Does drinking diet drinks make people gain weight? Give a more plausible explanation.

142 142 Chapter 24 Chapter 24 Two-Way Tables and the Chi-Square Test Two-Way Tables and the Chi-Square Test Two- Way Tables: Two- Way Tables: Displays results as percents or frequency for two variables. Use for nominal (categorical) variables or with one nominal variable and one ordinal or interval variable. Good for surveys.

143 143 Example Distribution of Employed in a Very Small Town Ages 15-24 25-34 35-4445+ F20255050 M30356075 These data are put in bar graphs. First thing to do is to sum all rows, columns and grand total.

144 144 Types of questions asked of two-way tables: A) What percent of the sample is male? Total sample= 345 Total male= 200 Percent male= 58% B) What percent of the sample is between 15-24 years old? Total sample= 345 Total 15-24= 50 Percent 15-24= 14.5%

145 145 C) What percent of the sample are males between 35- 44 years old? Total sample= 345 Males 35-44= 60 % males 35-44= 17.4% D) What percent of females are 25-34 years old? Total females= 145 Females 25-34= 25 % females 25-34= 17.2%

146 146 The other way to explore data in a two-way table is to investigate whether there is an association between the two variables that would also be reflected in the population. To do this use a test of significance. (Hint: whenever we want to generalize to the population we have to use a confidence interval or a test of significance). The test of significance is called the Chi-Square Test. It is very often used when analyzing surveys.

147 147 (Moore p. 470) Treating cocaine addiction GroupTxSubj.Succ.Percent 1 Desipramine 241458.3% 2 Lithium 24 625.0% 3 Placebo 24416.7% Want to determine if these results will also be reflected in the population.

148 148 The Chi-Square Test squared Good for nominal and ordinal data. All numbers are squared; there are no negatives. Step 1: The hypotheses. Null Hypothesis: Null Hypothesis: There is no association between treatments and outcomes in the population of cocaine addicts. Alternative Hypothesis: Alternative Hypothesis: There is an association between treatment and outcome in the population of cocaine addicts. We do not calculate Chi Square. It will be given. In this example, Chi Square X 2 = 10.50

149 149 For the curious, here is how to calculate a Chi Square For the curious, here is how to calculate a Chi Square Step 2: Reorganize data into a two-way table to determine the degrees of freedom. Success Failure Total Despiramine 14 10 24 Lithium 6 18 24 Placebo 4 20 24 Total 24 48 72 Formula: (r-1) (c-1) gives degrees of freedom (df). Do not include Totals when determining df.

150 150 Degrees of freedom is a statistical concept that identifies how many categories must be known for all of them to be known. Ex. If something is in either a red box or a blue box and you know it can’t be in the red box then by elimination it must be in the blue box. Here the df is 1; only one thing must be fixed for you to know the other thing. We must know how to use Degrees of Freedom The df for Chi Square involves both rows and columns. Formula: (r-1) (c-1) gives degrees of freedom (df).

151 151 We ‘place’ our chi-square result on the distribution to see if to falls far enough to reject the null hypothesis of no difference between treatment and outcomes. The distribution for the chi-square is not a normal distribution; it is a curve that is skewed to the right. As the formula uses squares, then there can be no negative values. This curve changes depending upon the degrees of freedom.

152 152 The curves below depict the chi-square distribution for differing degrees of freedom. We can see that as the df increases, the curve becomes less skewed to the right and we get larger values.

153 153 For our cocaine example, the degrees of freedom are: (3-1) (2-1) = 2 df We use a table to determine if we attain statistical significance.

154 154

155 155 Step 3 Determining if we can reject the null hypothesis. The Chi Square X 2 = 10.50 With a chi-square of 10.50, we attain statistical significance at the.01 alpha level but not the.001 alpha level.

156 156 Therefore, we can say that there are statistically significant relationship between treatments and outcomes for the population beyond the.01 alpha level (p >.01). When we observe statistical significance, we then go back to the table to determine where it lies. In the cocaine example, it is obvious that desipramine treatment fares better than the others.

157 157 Exercises 1. The following table reflects the number of nails made at three different hours using two production methods. 9 AM 1 PM 3 PM Robotics 20 18 21 Human 30 24 15

158 158 1a) What percent of nails are produced at 1 PM? 1b) What percent of nails are produced with robotics?

159 159 1c) What percent of the nails produced at 3 PM are human made? 1d) What percent of the robotic made nails are produced at 9 AM?

160 160 2. A researcher investigates whether there is any association between time of production and method of production. 2a) Give a null hypothesis. 2b) Give an alternative hypothesis.

161 161 2c) The chi-square statistic is X 2 = 6.04 Is this statistically significant at or beyond the.05 alpha level?

162 162 4. Using the following table, answer these questions. Students’ Dream Cars JaguarBMWMercedes Male253540 Female302060

163 163 4a) What percent of women want a Jaguar? 4b) What percent of the sample want a BMW?

164 164 4c) What percent of people who want Mercedes are men?

165 165 4d) The Chi Square X 2 is 8.088 for the relationship between gender and car preference. Determine whether this is statistically significant beyond the.05 alpha level. 4e) Describe the association.

166 166 CHAPTER 16 CHAPTER 16 CONSUMER PRICE INDEX CONSUMER PRICE INDEX Index number: Describes the percent change from a base (comparison) period. Use index points to be able to compare ‘apples and oranges’. Anything can be put into an index number. Most frequently, it is the cost of a service/good. Formula for index number: value x 100 base value Index number for base value is always 100.

167 167 Example: Price of gasoline Tuesday is base value: $.70 Thursday is value: $.77 index number:.77 x 100 = 110.70 Gasoline’s index number for Thursday is 110% of the base value (of 100). index points Therefore, there has been an increase of 10% in index points since Tuesday.

168 168 Usually, an index number does not reflect the price change of one item; most often it reflects the change in a ‘basket of services/goods’ over a period of time. fixed These baskets are fixed; once you determine the items and quantities in the basket for the base year they cannot change for the comparison years.

169 169 Example of a fixed market basket price index. Gerry’s fixed market basket: 2001 quantity 2001 price 2001 cost Good/service 2001 quantity 2001 price 2001 cost Poutine100$1.75$175.00 Gas1000 liters $.73$730.00 Cell1$40.00 mth $480.00 Total cost: $1385.00 Gerry’s expenses for 2001 were $1385.00.

170 170 2003 quantity 2003 price2003 cost Good/service 2003 quantity 2003 price2003 cost Poutine100$1.85 $185.00 Gas1000 liters $.75$750.00 Cell1$45.00 mth $540.00 Total cost $1475.00 Gerry’s expenses for 2003 are $1475.00. Notice: The quantity stayed the same for both years; this is the fixed market basket - it cannot change. Index number for 2003: 1475 x 100 = 106.5 1385

171 171 Gerry’s index points rose 6.5% in two years. Consumer Price Index The most famous index number for a fixed market basket of goods is the Consumer Price Index. It is also called inflation, or the percent increase for living expenses given a fixed basket. Important that pensions be indexed to the CPI; go up as much as the CPI to not lose purchasing power. CPI is used to establish the relative value of money (the purchasing power) for two different time periods.

172 172 2001 Canadian CPI Weights

173 173 If inflation (CPI) goes up from year to year and salary does not, then we don’t have the as much purchasing power. If inflation goes down and salary stays stable, then we have more purchasing power. We will use the Canadian CPI. Base year is 1992 = 100 index points. Formula: $ time B = $ time A x CPI time B CPI time A We are changing the money value of time A into its value at time B.

174 174

175 175 Example: A person makes $120 per week in 1998. What salary would it take to have the same purchasing power in 2003. $ time 2003 = $120.00 x 122.7 = $135.58 108.6 If salary did not increase during those years: $120 x 52 = $6240 (1998) $135,58 x 52 = $7050.17. (2003) Lost $810.17 in purchasing power in the 2003 year. This does not take into account the loss of purchasing power in the previous years.

176 176 Example 2: An apartment cost $450 in 2000. In 2003 that apartment cost $650. Calculate what the apartment would have cost if inflation were the only rent increase factor. $ time 2003 = $450 x 122.7 = $495.65 111.4 In 2003, the apartment costs $650; given inflation it should cost $495.65. The difference, $154.35, is due to ‘supply and demand’ factors. Bank of Canada: Calculation Web pageCalculation Web page

177 177 Exercises 1. Suzy is a woman of habits. Her fixed market basket for an afternoon with friends is: Good/service 2003 quantity 2003 priceTotal cost Expresso 2 2.004.00 Croissant 1 2.752.75 Movie 1 10.00 10.00 16.75 In 2004, all prices increased by 10%. 1a) Calculate Suzy’s fixed market basket price index for 2004.

178 178 2. If a person earned $10.00 per hour in 1980, how much would they have to earn in 2002 to keep the same purchasing power?

179 179 3. In 2001, the minimum wage in Quebec was $7.00. How much would you have to earn in 2004 to keep the same purchasing power?


Download ppt "1 Research Methods Winter 2008 Winter 2008 Review for Test 1 Review for Test 1 Chapter s 1- 4 Del Balso Instructor: Dr. Harry Webster."

Similar presentations


Ads by Google