Why include statistics as part of Psychology? Doing psychology research Reading psychology research articles Analytical reasoning, critical thinking Statistics Fundamental tool for all scientific inquiry Way of making sense out of data
Populations and Samples the group of individuals (or things) of interest in a particular study For example, a researcher may be interested in the relation between class size (variable 1) and academic performance (variable 2) for the population of third-grade children. Sample Usually populations are so large that a researcher cannot examine the entire group a sample is selected to represent the population in a research study Sample size depends on the type of research The goal is to use the results obtained from the sample to help answer questions about the population
Sampling from a Population
Figure 1-1 (p. 6) The relationship between a population and a sample. Make an Inference representative of the population
Variables And Data A variable is a characteristic or condition that can change or take on different values Most research begins with a general question about the relationship between two variables for a specific group of individuals. (similar to forming an hypothesis) Data are measurements or observations The measurements obtained in a research study are called the data or data set Each measurement is a datum (singular) or score The goal of statistics is to help researchers organize and interpret the data. 5
Sources of Data Observation Research Survey Research Experiments Naturalistic no intervention Poor control Correlational data Survey Research A correlational method of collecting data Do not exercise any control over time order Poor control of alternatives Can show relationships Experiments Exercise control over covariation, time order and alternatives Can help establish causation
Using Statistics in Psychology Carrying out psychological research using an empirical approach means the collection of data. Statistics are a way of making use of this data Descriptive Statistics: used to describe characteristics of our sample Inferential Statistics: used to generalise from our sample to our population Any samples used should therefore be representative of the target population
Descriptive Statistics Descriptive statistics are methods for organizing and summarizing data. For example, tables or graphs are used to organize data, and descriptive values such as the average score are used to summarize data. A descriptive value for a population is called a parameter and a descriptive value for a sample is called a statistic. 8
Inferential Statistics methods for using sample data to make general conclusions (inferences) about populations a sample is only a part of the population sample data provide only limited information about the population sample statistics are imperfect representatives of the population parameters because of sampling error 9
Sampling Error The discrepancy between a sample statistic and its population parameter is called sampling error. Defining and measuring sampling error is a large part of inferential statistics. 10
Sampling from a Population
Figure 1-2 (p. 9) A demonstration of sampling error Figure 1-2 (p. 9) A demonstration of sampling error. Two samples are selected from the same population. Notice that the sample statistics are different from one sample to another, and all of the sample statistics are different from the corresponding population parameters. The natural differences that exist, by chance, between a sample statistic and a population parameter are called sampling error.
Is an example of sampling error Margin of Error Box 1.1 Is an example of sampling error Terminology used in polling data such as political polls Amount of error between a sample statistic and a population parameter There will always be sampling error in: survey research experiments
Figure 1-3 (p. 10) The role of statistics in experimental research.
Relationship Between Variables Correlational Method Measuring two variables for each individual Height and Weight SAT and GPA Wake-up Time and Academic Performance (Figure 1.4) Determine the relationship between the variables Limitations of the correlational method Can not demonstrate cause-and-effect relationships
Figure 1.4 One of two data structures for studies evaluating the relationship between variables. Note that there are two separate measurements for each individual (wake-up time and academic performance). The same scores are shown in a table (a) and in a graph (b). Figure 1.4 One of two data structures for studies evaluating the relationship between variables. Note that there are two separate measurements for each individual (wake-up time and academic performance). The same scores are shown in a table (a) and in a graph (b). 16
Hypothetical data showing results from a correlational study evaluating the relationship between exposure to TV violence and aggressive behavior for a sample of 10 children. Note that we have measured two different variables, obtaining two different scores, for each child. The data show a tendency for higher levels of TV violence to be associated with higher levels of aggressive behavior.
Relationship Between Variables Comparing two groups of scores Experimental (see Figure 1.5 ) One variable defines the groups (violence vs no violence) Independent variable Another variable is the measurement, scores from the groups Dependent variable NonExperimental “quasi-experimental” Natural or pre-existing groups such as gender which are selected not manipulated Before and after measurements for example before and after therapy Do not confuse with control vs experimental groups There is only one group of participants whom get measured twice
Figure 1. 6 The Structure of an experiment Figure 1.6 The Structure of an experiment. Participants are randomly assigned to one of two treatment conditions: counting money or counting blank pieces of paper. Later, each participant is tested by placing one hand in a bowl of hot (122 F) water and rating the level of pain. A difference between the ratings for the two groups is attributed to the treatment (paper vs money). 19
The structure of an experiment The structure of an experiment. Volunteers are randomly assigned to one of two treatment conditions: a 70° room or a 90° room. A list of words is presented and the participants are tested by writing down as many words as they can remember from the list. A difference between groups is attributed to the treatment (the temperature of the room).
In this experiment, the effect of instructional method (the independent variable) on test performance (the dependent variable) is examined. However, any difference between groups is performance cannot be attributed to the method of instruction. In this experiment, there is a confounding variable. The instructor teaching the course varies with the independent variable,so that the treatment of the groups differs in more ways than one (instructional method and instructor vary).
Figure 1-7 (p. 17) Two examples of nonexperimental studies that involve comparing two groups of scores. In (a) the study uses two preexisting groups (boys/girls) and measures a dependent variable (verbal scores) in each group. In (b), time is the variable used to define the two groups, and the dependent variable (depression) is measured at each of the two times.
NonExperimental “quasi-experimental” Terminology Similar data structure to experiments One variable identifies groups (independent) A second variable is measured to obtain data (dependent) For nonexperimental Independent variable such as gender is not manipulated So it is called “quasi-independent variable”
Data Structures and Statistical Method Data structure is used to classify statistical methods One group with two variables measured for each individual Survey research Collect GPA and SAT scores for each person Use correlational statistics to describe the data Survey or Observational Research Number of individuals in a group Groups based on “natural” categories such as gender Groups based on some activity such as “talk” vs “text” (table 1.1) Use Chi-square statistic to describe the data See scales of measurement on page 23
Data Structures and Statistical Method Data structure is used to classify statistical methods Two or more groups of scores Compare two groups such as “Money” and “Paper” (fig 1.6) Two groups of individuals Compare average from each group of scores Several different statistical tests are used such as t-test or ANOVA based on number of groups
Constructs and Operational Definitions To form a hypothesis from a research question the researcher needs to define the variables What the effects of drug “Bulk-O” on weight gain? Independent variable is drug or no drug Dependent variable is weight gain which is “concrete” What are the effects of drug PQX1450 on Anxiety? Dependent variable is “anxiety” which is a construct so we need to define the construct of anxiety Need an Operational Definition for anxiety How intelligent are students taking Methods course?
Variables And Measurement Discrete Variable Discrete categories such as students, cars, houses Usually a count of the number of individuals or things number of students in class number of cars in the parking lot number of houses along the street Also called “Categorical variables” Sometimes referred to as “Qualitative variables” which is confusing because qualitative is just description not counting Continuous Variable Variable can be divided into an infinite number of values height, weight, time
Use of Real Limits with Continuous Variables When working with continuous variable Can adjust precision by changing units Hours vs Minutes vs Seconds up to the limit of accuracy for the measuring device such as a wall clock or a stop watch Because a variable such as weight is infinitely divisible the researcher needs to set boundaries or limits use real limits which are boundaries located exactly half-way between adjacent categories. Researcher decides where to set limits as a practical matter such as record weight to the nearest pound So if someone has weight of 149.6 they are in 150 data Each value “150” is an interval with upper and lower limits Values that fall on the boundary “150.5” can be rounded up or down just be consistent with the rounding rule 28
Figure 1.8 p.21 When measuring weight to the nearest whole pound, 149.6 and 150.3 are assigned the value of 150 (top). Any value in the interval between 149.5 and 150.5 is given the value of 150.
To establish relationships between variables, researchers must observe the variables and record their observations. This requires that the variables be measured. The process of measuring a variable requires a set of categories called a scale of measurement and a process that classifies each individual into one category. Measuring Variables 30
Four Types of Measurement Scales : Nominal (by name / category) Ordinal (by order / rank) Interval (meaningful, equal interval scaling) Ratio (interval with a “real”zero point –degrees Kelvin)
Nominal Scale “Names” Classifying subjects into categories Scales of Measurement Nominal Scale “Names” Classifying subjects into categories No category is “more” or “less,” just different Categories can be labeled by words (e.g., Male, Female) or numbers (e.g., 0, 1) which can be confusing Nominal scale always yields discrete variable
Ordinal Scale “ordered” Categories are in ordered sequence, ranked Scales of Measurement Ordinal Scale “ordered” Categories are in ordered sequence, ranked Examples: Gold, silver, bronze medals Don’t know how far gold was from silver, or silver from bronze Class standing (33rd out of 108) Ordinal scale technically yields discrete variables (can not be ranked 33rd and a half) Different statistical procedures are required.
Scales of Measurement Interval Scales Distance between two values is the same at any point on the scale The difference between scores of 6 and 10 is 4 units The difference between scores of 26 and 30 is 4 units Interval scale does not have absolute zero Attitudinal scales, on a scale of from 1(not a all) to 10 (a great deal) how much do you like anchovy pizza? Example 1.2 page 25: convert ratio scale to interval scale Height measurements (ratio scale) can be converted to difference scores i.e. difference from the average score Average height of 50 inches so a height of 52 becomes a difference score of +2 which is an interval scale measurement
Ratio Scales But what about Scales of Measurement Ratio Scales In addition to having even intervals we can calculate ratios so a Ratio scale has meaningful, absolute zero Distance: zero distance Weight: zero weight Temperature: absolute zero but not zero degrees Fahrenheit Time: zero time ?? But what about IQ score? Interval Score on test of neuroticism? Interval
Scales of Measurement In practice, many psychological variables give more than ordinal-level information, but not possible to clearly establish that they are interval-level. Generally treated as interval data. Many statistical procedures assume at least interval level data, but function reasonably well with ordinal-level data
Statistical Notation X is a discrete variable, where 0=men and 1=women. Y is a continuous variable, representing years of age. Subject# Gender (X) Age (Y) 1 8 2 10 3 7 4 6 5 12 N refers to number of subjects; N=6. Xi refers to the ith person’s score on variable X. For this data set, X4 = 1, Y 2 = 10.
Statistical Notation S Greek letter Sigma symbolizes summation Subject# Gender (X) Age (Y) 1 8 2 10 3 7 4 6 5 12 Gender (x) Girl Boy Y = ? X = ? 53 Illegal Operation
The order of operations is: 1. parentheses 2. exponents Statistical Notation Examples 1.3, 1.4 & 1.5 p 28 X X2 (X-1) (X-1) 2 3 9 2 4 1 7 49 6 36 16 ΣX = 3+1+7+4 = 15 ΣX2 = 9+1+49+16 = 75 Σ(X-1) = 2+0+6+3 = 11 Σ(X-1)2 = 4+0+36+9 = 49 However ΣX-1 = 15 – 1 = 14 Because The order of operations is: 1. parentheses 2. exponents 3. multiply / divide 4. summation ( Σ) 5. addition / subtraction
Statistical Notation Example 1.6 p 29 ΣX = 3+1+7+4 = 15 ΣY = 5+3+4+2 = 14 ΣXY = 15+3+28+8 = 54 Σ(X + Y) = ΣX + ΣY Σ(X + Y) = 8+4+11+6 = 29 ΣX + ΣY = 15+14 = 29 Person X Y XY A 3 5 15 B 1 C 7 4 28 D 2 8