Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 2: Basic Concepts and Data Visualization
Primary Goal Statistics Statistics
Why do we use statistics? Is This Difference Meaningful? Do statistics lie? Adherence to Scientific Method Specific Assumptions Long-Term Replicability
Definition of Terms Variable Sample Population A concept or entity of interest on which variability exists Goal of behavioral science research is to explain why scores differ Sample Set of observations used in analysis Subset of the population Population Entire set of relevant observations Findings with sample are used to generalize to population What is the Harvard Student Body?
Definitions Continued Statistics Numerical values summarizing sample data Examples: mean, median, variance Parameters Numerical values summarizing population data We estimate population parameters based on sample statistics Random Sample Sample in which each member of population has an equal chance of inclusion.
Descriptive vs. Inferential Statistics Distinct types for distinct purposes Descriptive Purpose is to provide statistics that summarize or capture nature of the sample Mean is average score Standard Deviation is measure of average dispersion or deviation from the norm (i.e., how well the mean captures the score of the sample) Inferential Purpose is to calculate probability that differences in statistics across groups or levels of relationships among variables reflect the operation of chance alone.
Measurement In order to conduct analyses, we have assign “values” or “codes” to observations. Different types of data require different types of scales. Scale types determine which analytic procedures are appropriate
Measurement Scales There are two broad types containing four subtypes. Qualitative: nominal scales Quantitative: ordinal, interval, and ratio scales.
Nominal Scales Categorical in nature No ordering is possible Examples: Religion, Ethnicity, Gender We can assign numerical codes, but they do not represent any magnitude or ordering information
Ordinal Scales Examples: Rankings Order is provided No information provided about magnitudes of differences between points on the scale Examples: Rankings We can again use numerical codes, but they do not offer information on levels of difference or additivity
Interval Scales Order is provided Equivalence of differences between points is provided Examples: Fahrenheit, Likert Scales (?) Majority of statistical techniques we will cover are designed for use with interval or ratio data.
Ratio Scales Order is provided Equivalence of differences between points is provided Scale has an absolute and meaningful zero point. Examples: Kelvin, Salary, Hormone Levels For ratio scaled data, we tend to use “raw” data descriptors. For interval, we often use “standardized” descriptors (e.g., z-scores)
More Definitions Discrete Variables Continuous Variables Take on smallish sets of possible values Continuous Variables Variables that can take any values Independent Variables Variables that are controlled by experimenter or designated as possible causal factors Dependent Variables Variables being measured as data theorized to be caused by independent variables
Random Sampling Used to ensure that composition of sample “matches” composition of population If sample deviates from population, generalizability is threatened Randomization happens in many ways: Randomization programs, random number tables Note that Chance is lumpy Convenience samples
Random Assignment Used to ensure that composition of groups are equivalent If groups deviate on relevant variables, validity of experiment is reduced Purpose of the control group is to “match” treatment group in every way except experimental manipulation.
Notation Sigma (S) is the symbol for summation. Rules of summation.
Sample Data
Visualizing Data One of most useful things you can do is display data visually. As we’ll see, a picture is worth a thousand words when it comes to checking assumptions of data.
Frequency Distributions Presents data in a logical order that is easy to see. Values of variable are plotted against their frequency of occurrence.
Data: 1,1,1,1,1,2,2,2,3
Problems with Frequency Distributions Sensitive to individual frequencies as opposed to general patterns With a highly variable scale, there may be very few indices of specific values In such cases, a histogram provides a better description of the data
Histograms Graph in which bars represent frequencies of observations within specific intervals
Each observed frequency No true optimal number of intervals. Ten is a good rule of thumb. Binned into 6 intervals (34.5 – 38.5; 38.5 – 42.5; Etc.)
Stem and Leaf Displays The benefits of stem and leaves is that they show both pattern of frequencies and actual individual level data itself. As the name implies, the data are separated into “stems” (i.e., leading digits) and “leaves” (i.e., following digits marking each data point).
Stem Trailing Digits Leaves Vertical axis comprised of leading digits Digits to the right of the leading ones Leaves Horizontal axis of trailing digits Stem-and-Leaf Plot Frequency Stem & Leaf 2.00 0 . 69 5.00 1 . 01222 5.00 1 . 67789 4.00 2 . 1223 2.00 2 . 57 Stem width: 10.00 Data 6,9,10,11,12,12,12, 16,17,17,18,19,21, 22,22,23,25,27
The nature of the stems is determined by visual ease. Here, there are two stems for each digit, broken at the midpoint. Stem-and-leaf of RxTime N = 300 Leaf Unit = 1.0 7 3 6788999 27 4 00001112223333344444 62 4 55555566666666666777777777888899999 103 5 00000111111111111222222222233333333444444 150 5 55555556666666666777777788888888888899999999999 150 6 000000000000111111111112222222222222233333333334444444 96 6 555555556666666677777777777777889999999 57 7 0111122222222333444444 35 7 5566667788899 22 8 000112333 13 8 5678 9 9 044 6 9 558 3 10 44 1 10 1 11 1 12 1 12 5 Outlier
Looking for Volunteers!!! Height Stem & Leaf Looking for Volunteers!!!
Modality & Skewness Modality Skewness Number of meaningful peaks Unimodal=1, Bimodal=2 Skewness Measure of the asymmetry of a distribution Positive skew: tail to the right Negative skew: tail to the left