Download presentation
Presentation is loading. Please wait.
Published byPatricia Joseph Modified over 9 years ago
1
Descriptive statistics I Distributions, summary statistics
2
Frequency distributions Frequency means the number of cases at a single value of a variable A “distribution” depicts the frequency (number of cases) at every value of a variable –Frequency distributions illustrate how values disperse –For categorical variables use a BAR graph –For continuous variables use a HISTOGRAM (also try AREA) Open DEMO PLUS.SAV For categorical choose variable SEX (1=Male, 2=Female) For continuous choose variable AGE Open Height weight gender age.sav (or.xls), choose a categorical and continuous variable, display their distributions as above
3
Summarizing distributions Producing a single statistic that best depicts a distribution For categorical variables, use the statistic “proportion” –Proportions with a base 100 are called a “percentage” (per 100) For continuous variables, use a measure of central tendency –The statistic “mean” (arithmetic average) –The statistic “median” (midpoint value – half of cases above, half below) –The statistic “mode” (most frequent value – can be more than one) Open DEMO PLUS.SAV –For categorical choose variable SEX (1=Male, 2=Female) Analyze|Descriptive Statistics|Frequencies Ask for a Bar Chart –For continuous choose variable AGE Analyze|Descriptive Statistics|Frequencies Ask for a Histogram Open Height weight gender age.sav (or.xls), choose a categorical and continuous variable, proceed as above
4
Categorical variables “Percent” is a summary statistic – it summarizes a distribution “Percent” – per cent – per hundred. 100 is always the denominator Increases in percentage are computed off the base amount: Increase in jail population of 100 prisoners 100 percent increase - 100 percent of 100 is 100; 100 + 100 = 200 150 percent increase – 150 percent of 100 is 150, 150 plus 100 = 250 200 percent increase – 200 percent of 100 is 200, 200 plus 100= 300 (3 times the base amount)
5
Percentages of less than 1 percent are described as a fraction –Example - 0.2 percent is 2/10 th of 1 percent –Do not confuse decimals and percentages Decimal.20 = 20/100 = 20 percent Decimal.0020 = 20/10,000 =.20 percent
6
Percentages (proportions) are usually the best way to summarize datasets using categorical variables –70 percent of students are employed –60 percent of parolees recidivate Percentages can be used to summarize findings when large numbers are involved –50,000 persons were asked whether crime is a serious problem: 32,700 said “yes” Compute…
7
Divide 32,700 by 50,000 and multiply by 100 32,700 -------- =.65.65 X 100 = 65% 50,000
8
Percentages can be used to compare datasets –This year, 65% of 10,000 people polled said crime is a serious problem –Last year, 12,000 people were polled and 9,000 said crime is a serious problem Compute…
9
9,000 --------- =.75.75 X 100= 75% 12,000 Because both samples were standardized (responses per 100 persons) they are directly comparable even though different numbers of persons were polled –65% v. 75%
10
Percentages can magnify differences when raw numbers are small Percentages can deflate differences when numbers are large –Increase from 1 to 3 convictions is … –Increase from 5,000 to 6,000 convictions is … Compute both...
11
Increase from 1 to 3 convictions is 200 percent –3-1 = 2 –2/1 (base) X 100= 200% Increase from 5,000 to 6,000 convictions is 20 percent –6,000 - 5,000 = 1000 –1000/5000 (base) X 100= 20%
12
Categorical variables – categories reflect an inherent rank or order Can summarize the distribution of an ordinal variable two ways: –As a categorical variable, using proportions / percentages –As a continuous variable, treating categories as points on a scale Assign a numerical value to each category and calculate a mean Open DEMO PLUS.SAV –Variable “class” is ordinal –Display and summarize the distribution both ways... As a categorical/ordinal variable As a continuous variable Summarizing a distribution for ordinal variables
13
If variables are continuous, can summarize a distribution with one or more measures of “central tendency” –M ean, median, mode Mean: arithmetic average of scores –Pulled in the direction of extreme scores –Experiment with Height weight gender age.sav Median: Middle score – half higher, half lower –If there is an even number of scores, average the two center scores –If there is an odd number of scores, use the center score Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21 Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21 Continuous variables
14
Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21 Answer: 8 Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21 Answer: 10 12-8 = 4 4/2 = 2 8+2 or 12-2 = 10 Median is a useful summary statistic when there are extreme scores –Extreme scores make the mean a misleading summary measure of a distribution Median can be used with continuous or ordinal variables
15
Mode: Score that occurs most often (with the greatest frequency) –There can be more than one mode (bi-modal, tri-modal, etc.) Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21 Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21
16
Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21 Mode = 5 (uni-modal) Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21 Modes = 5, 21 (bi-modal) Modes are a useful summary statistic for distributions where cases cluster at particular scores – an interesting condition that would be missed by the mean or median
17
Range Another way to describe a distribution of a continuous variable –Not a measure of central tendency Range depicts the lowest and highest scores in a distribution 2, 3, 5, 5, 8, 12, 17, 19, 21 Range is 2 21 or 19 (21-2)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.