Anthony J Greene1 Distributions of Variables I.Properties of Variables II.Nominal Data & Bar Charts III.Ordinal Data IV.Interval & Ratio Data, Histograms & Frequency Distributions V.Cumulative Frequency Distributions & Percentile Ranks
Anthony J Greene2 Variables Variable: A characteristic that takes on multiple values. I.e.,varies from one person or thing to another.
Anthony J Greene3 Variables Cause and Effect The Independent Variable The Dependent Variable
Anthony J Greene4 Distributions The distribution of population data is called the population distribution or the distribution of the variable. The distribution of sample data is called a sample distribution.
Anthony J Greene5 Variables
Anthony J Greene6 Variables Kinds of Variables (any of which can be an independent or dependent variable) Qualitative variable: A nonnumerically valued variable. Quantitative variable: A numerically valued variable. Discrete Variable: A quantitative variable whose possible values form a finite (or countably infinite) set of numbers. Continuous variable: A quantitative variable whose possible values form some interval of numbers.
Anthony J Greene7 Quantitative Variables Discrete data: Data obtained by observing values of a discrete variable. Continuous data: Data obtained by observing values of a continuous variable.
Anthony J Greene8 The Four Scales Nominal: Categories Ordinal: Sequence Interval: Mathematical Scale w/o a true zero Ratio: Mathematical Scale with a true zero
Anthony J Greene9 The Four Scales Nominal: Classes or Categories. Also called a Categorical scale. E.g., Catholic, Methodist, Jewish, Hindu, Buddhist, … Qualitative Data
Anthony J Greene10 The Four Scales Ordinal: Sequential Categories. e.g., 1st, 2nd, 3rd, … with no indication of the distance between classes Discrete Data
Anthony J Greene11 The Four Scales Interval: Data where equal spacing in the variable corresponds to equal spacing in the scale. E.g., 1940s, 1950s, 1960s… : or SAT Scores. Discrete or Continuous
Anthony J Greene12 The Four Scales Ratio: An interval scale with a mathematically meaningful zero. e.g., latencies of 1252 ms, 1856 ms, ….: mg of Prozac Discrete or Continuous
Anthony J Greene13 The Four Scales Nominal: No mathematical operations Ordinal:, = Interval: +, -, and ordinal operations Ratio: , , and interval operations
Anthony J Greene14 Nominal Variables Classes: Categories for grouping data. Frequency: The number of observations that fall in a class. Frequency distribution: A listing of all classes along with their frequencies. Relative frequency: The ratio of the frequency of a class to the total number of observations. Relative-frequency distribution: A listing of all classes along with their relative frequencies.
Anthony J Greene15 Frequencies of Nominal Variables
Anthony J Greene16 Sample Pie Charts and Bar Charts of Nominal Data
Anthony J Greene17 Frequency Bar Charts Frequency bar chart: A graph that displays the independent variable on the horizontal axis -- categories -- and the frequencies -- dependent variable -- on the vertical axis. The frequency is represented by a vertical bar whose height is equal to the frequency of cases that fall within a given class of the I.V.
Anthony J Greene18 Frequency Charts of Nominal Data
Anthony J Greene19 Relative Frequency Bar Charts Relative-frequency bar chart: A graph that displays the I.V. on the horizontal axis -- categories -- and the relative frequencies -- D.V. -- on the vertical axis. The relative frequency of each class is represented by a vertical bar whose height is equal to the relative frequency of the class. The difference between this and a frequency bar chart is that the proportion or percentage (always between zero and one) is listed instead of the numbers that fall into a given class.
Anthony J Greene20 Relative Frequency Charts of Nominal Data
Anthony J Greene21 Probability Distribution and Probability Bar Chart Frequency Distributions and Charts for a whole population Probability distribution: A listing of the possible values and corresponding probabilities of a discrete random variable; or a formula for the probabilities. Probability bar chart: A graph of the probability distribution that displays the possible values of a discrete random variable on the horizontal axis and the probabilities of those values on the vertical axis. The probability of each value is represented by a vertical bar whose height is equal to the probability.
Anthony J Greene22 Probability Charts of Nominal Data
Anthony J Greene23 Bar Chart
Anthony J Greene24 The Bar Graph: Nominal Data
Anthony J Greene25 Sum of the Probabilities of a Discrete Random Variable For any discrete random variable, X, the sum of the probabilities of its possible values equals 1; in symbols, we have P(X = x) = 1. For example Republicans: 32.5%, Democrats 45.0%, Other 22.5% = 1.00 or 100%
Anthony J Greene26 Ordinal Variables Note that “Rank” is the ordinal variable. “Mortality” is a ratio variable but can easily be downgraded to an ordinal variable with a loss of information
Anthony J Greene27 Distributions and Charts for Ordinal Data Frequency distributions, relative frequency distribution, and probability distributions are done exactly as they were for Nominal Data Bar charts are used.
Anthony J Greene28 Distribution of Education Level LevelP(x) Elementary0.03 High School0.45 Associates0.12 Bachelors0.28 Masters0.10 Doctorate0.02
Anthony J Greene29 Interval and Ratio Data Frequency: The number of observations that fall in a class. Frequency distribution: A listing of all classes along with their frequencies. Relative frequency: The ratio of the frequency of a class to the total number of observations. Relative-frequency distribution: A listing of all classes along with their relative frequencies.
Anthony J Greene30 Histograms Frequency histogram: A graph that displays the independent variable on the horizontal axis and the frequencies -- dependent variable -- on the vertical axis. The frequency is represented by a vertical bar whose height is equal to the frequency of cases that fall within a given range of the I.V.
Anthony J Greene31 Interval and Ratio Variables Years of Education Avg. Income (in thousands)
Anthony J Greene32 Enrollment in Milwaukee Public Elementary Schools
Anthony J Greene33 Relative Frequency distribution of Enrollments in MPS
Anthony J Greene34 Probability distribution of a randomly selected elementary- school student
Anthony J Greene35 Probability distribution of the age of a randomly selected student
Anthony J Greene36 Probability Histogram
Anthony J Greene37 Another Example
Anthony J Greene38 Frequency vs. Relative Frequency
Anthony J Greene39 Frequency vs. Relative Frequency This is also the Probability Distribution
Anthony J Greene40 More Examples: Frequency Histogram
Anthony J Greene41 More Examples: Grouped Frequency Histogram
Anthony J Greene42 Grouped Frequency Histogram
Anthony J Greene43
Anthony J Greene44 Proportions and Frequency
Anthony J Greene45 Frequency Groupings 9 intervals with each interval 5 points wide. The frequency column (f) lists the number of individuals with scores in each of the class intervals.
Groupings: There had to be a catch What to do with the in-betweens? Only a concern for continuous variables Real Limits -- those in the “14” bar are really from 13.5 to 14.5 Upper Real Limits & Lower Real Limits: For the case of whole numbers, simply add 0.5 to the high score and subtract 0.5 from the lowest observed score (these observed scores are the “apparent limits”)
Anthony J Greene47 Understanding Real Limits
Anthony J Greene48 Real Limits & Apparent Limits
Anthony J Greene49 Frequency & Cumulative Frequency I.Q. RangeReal LimitsFrequencyCuml. Freq. < 520 – >
Frequency (Normal Distribution)
Cumulative Frequency (Ogive)
Anthony J Greene52 Computing Percentile Ranks Pounds x Real Limits Freq f Relative Freq. Cuml. Freq. %ile
Anthony J Greene53 Computing Percentile Ranks Remember that each value has real limits What is the 90 th %ile? Pounds x Real Limits Freq f Relative Freq. Cuml. Freq. %ile
Anthony J Greene54 Computing Percentile Ranks Remember that each value has real limits What is the 90 th %ile? 2.5 because at or below “2” are 90% of the scores, but “2” includes all from 1.5 to 2.5 Pounds x Real Limits Freq f Relative Freq. Cuml. Freq. %ile
Anthony J Greene55 Computing Percentile Ranks Remember that each value has real limits What is the 90 th %ile? 2.5 because at or below “2” are 90% of the scores, but “2” includes all from 1.5 to 2.5 What is the 20 th %ile? Pounds x Real Limits Freq f Relative Freq. Cuml. Freq. %ile
Anthony J Greene56 Computing Percentile Ranks Remember that each value has real limits What is the 90 th %ile? 2.5 because at or below “2” are 90% of the scores, but “2” includes all from 1.5 to 2.5 What is the 20 th %ile? 0.5 Pounds x Real Limits Freq f Relative Freq. Cuml. Freq. %ile
Anthony J Greene57 Computing Percentile Ranks What about the in-betweens? What is the 80 th %ile? What %ile corresponds to 2 lbs? Pounds x Real Limits Freq f Relative Freq. Cuml. Freq. %ile
Anthony J Greene58 Linear Interpolation
Anthony J Greene59 Linear Interpolation And Percentiles What is the 80th %ile? Pounds x Real Limits Freq f Relative Freq. Cuml. Freq. %ile
Anthony J Greene60 Linear Interpolation And Percentiles What is the 80th %ile? Where’s the 80 th %ile? 17.5/27.5 = The interval is 1.0 lb, so (0.63) = 2.13 Pounds x Real Limits Freq f Relative Freq. Cuml. Freq. %ile
Anthony J Greene61 Linear Interpolation And Percentiles What is the 80th %ile? Where’s the 80 th %ile? 17.5/27.5 = The interval is 1.0 lb, so (0.63) = 2.13 What %ile corresponds to 2 lbs? Pounds x Real Limits Freq f Relative Freq. Cuml. Freq. %ile
Anthony J Greene62 Linear Interpolation And Percentiles What is the 80th %ile? Where’s the 80 th %ile? 17.5/27.5 = The interval is 1.0 lb, so (0.63) = 2.13 What %ile corresponds to 2 lbs? 2 lbs. Is halfway into the interval (0.5). So its halfway between Since 27.5% of the scores are in this interval we need to go up 0.5(27.5%) = 13.75%. 62.5% % = 76.25% Pounds x Real Limits Freq f Relative Freq. Cuml. Freq. %ile
Anthony J Greene63 The Stem & Leaf Diagram
Anthony J Greene64 Stem & Leaf Plots
Anthony J Greene65 Comparison of Frequency Histogram vs. Stem & Leaf Diagram
Anthony J Greene66 The Blocked Frequency Histogram
Anthony J Greene67 The Frequency Distribution Polygon –or– Line Graph
Anthony J Greene68 Grouped Frequency Polygon
Anthony J Greene69 The Normal Distribution
Anthony J Greene70 Variants on the Normal Distribution
Anthony J Greene71 Comparing Two Distributions Number of Sentences recalled from each category
Anthony J Greene72 Comparing Distributions
Anthony J Greene73 Distributions
Anthony J Greene74 Variables and Distributions In Class Exercise
Anthony J Greene75 The Math You’ll Need To Know Calculate: ΣX = ΣX 2 = (ΣX) 2 = X
Anthony J Greene76 The Math You’ll Need To Know Calculate: ΣX = 7 ΣX 2 = (ΣX) 2 = X
Anthony J Greene77 The Math You’ll Need To Know Calculate: ΣX = 7 ΣX 2 = 21 (ΣX) 2 = X
Anthony J Greene78 The Math You’ll Need To Know Calculate: ΣX = 7 ΣX 2 = 21 (ΣX) 2 = 49 X
Anthony J Greene79 The Math You’ll Need To Know Calculate: ΣX = ΣY = ΣX ΣY = ΣXY = ΣX 2 = (ΣX) 2 = ΣY 2 = (ΣY) 2 = XY
Anthony J Greene80 The Math You’ll Need To Know Calculate: ΣX = 6 ΣY = ΣX ΣY = ΣXY = ΣX 2 = (ΣX) 2 = ΣY 2 = (ΣY) 2 = XY
Anthony J Greene81 The Math You’ll Need To Know Calculate: ΣX = 6 ΣY = -2 ΣX ΣY = ΣXY = ΣX 2 = (ΣX) 2 = ΣY 2 = (ΣY) 2 = XY
Anthony J Greene82 The Math You’ll Need To Know Calculate: ΣX = 6 ΣY = -2 ΣX ΣY = -12 ΣXY = ΣX 2 = (ΣX) 2 = ΣY 2 = (ΣY) 2 = XY
Anthony J Greene83 The Math You’ll Need To Know Calculate: ΣX = 6 ΣY = -2 ΣX ΣY = -12 ΣXY = -2 ΣX 2 = (ΣX) 2 = ΣY 2 = (ΣY) 2 = XY
Anthony J Greene84 The Math You’ll Need To Know Calculate: ΣX = 6 ΣY = -2 ΣX ΣY = -12 ΣXY = -2 ΣX 2 = 14 (ΣX) 2 = ΣY 2 = (ΣY) 2 = XY
Anthony J Greene85 The Math You’ll Need To Know Calculate: ΣX = 6 ΣY = -2 ΣX ΣY = -12 ΣXY = -2 ΣX 2 = 14 (ΣX) 2 = 36 ΣY 2 = (ΣY) 2 = XY
Anthony J Greene86 The Math You’ll Need To Know Calculate: ΣX = 6 ΣY = -2 ΣX ΣY = -12 ΣXY = -2 ΣX 2 = 14 (ΣX) 2 = 36 ΣY 2 = 30 (ΣY) 2 = XY
Anthony J Greene87 The Math You’ll Need To Know Calculate: ΣX = 6 ΣY = -2 ΣX ΣY = -12 ΣXY = -2 ΣX 2 = 14 (ΣX) 2 = 36 ΣY 2 = 30 (ΣY) 2 = 4 XY
Anthony J Greene88 The Math You’ll Need To Know The Mean Σx/n = M where n = sample size X
Anthony J Greene89 The Math You’ll Need To Know Calculate: Σ(x-M) = Σ(x-M) 2 = Σ(x 2 –M 2 ) = XM =
Anthony J Greene90 The Math You’ll Need To Know Calculate: Σ(x-M) = 0 Σ(x-M) 2 = Σ(x 2 –M 2 ) = XM =
Anthony J Greene91 The Math You’ll Need To Know Calculate: Σ(x-M) = 0 Σ(x-M) 2 = 26 Σ(x 2 –M 2 ) = XM =
Anthony J Greene92 The Math You’ll Need To Know Calculate: Σ(x-M) = 0 Σ(x-M) 2 = 26 Σ(x 2 –M 2 ) = 26 XM =
Anthony J Greene93 The Math You’ll Need To Know Calculate: s p = 13 n 1 = 8 n 2 = 10
Anthony J Greene94 The Math You’ll Need To Know Calculate: s p = 13 n 1 = 8 n 2 = 10
Anthony J Greene95 The Math You’ll Need To Know Calculate: s p = 13 n 1 = 8 n 2 = 10
Anthony J Greene96 What Type of Data? Years Spent in the Military
Anthony J Greene97 What Type of Data? Military Rank: Lieutenant Captain Major Lt. Colonel Colonel General
Anthony J Greene98 What Type of Data? Branch of Service: Army Air Force Navy Marine Corps Coast Guard
Anthony J Greene99 What Type of Data? Time taken to complete a 30 mile bicycle race
Anthony J Greene100 What Type of Data? Finishing place in a 30 mile bicycle race
Anthony J Greene101 Frequency Dist. & Percentile Raw Scores: 15, 18, 21, 23, 27, 33, 33, 35, 36, 36, 39, 41 44, 47, 49, 50
Anthony J Greene102 Frequency Dist. & Percentile Xf
Anthony J Greene103 Frequency Dist. & Percentile Compute the 52%ile Xf
Anthony J Greene104 Frequency Dist. & Percentile Compute the 52%ile XfCum f
Anthony J Greene105 Frequency Dist. & Percentile Compute the 52%ile The 52%ile is somewhere between XfCum f
Anthony J Greene106 Frequency Dist. & Percentile Compute the 52%ile The 52%ile is somewhere between That interval is from – XfCum f
Anthony J Greene107 Frequency Dist. & Percentile Compute the 52%ile The 52%ile is somewhere between That interval is from – That interval is wide XfCum f
Anthony J Greene108 Frequency Dist. & Percentile Compute the 52%ile The 52%ile is somewhere between That interval is from – That interval is wide To get from to 0.52 we go into the interval XfCum f
Anthony J Greene109 Frequency Dist. & Percentile That interval is from – That interval is wide To get from to 0.52 we go into the interval That’s of the way into the interval (0.2075/0.375) XfCum f
Anthony J Greene110 Frequency Dist. & Percentile That’s of the way into the interval (0.2075/0.375) The real limits are from 19.5 to 29.5 (a range of 10) 52%ile is = This Process is called Linear Interpolation XfCum f