Download presentation
Presentation is loading. Please wait.
Published byLoraine Hines Modified over 9 years ago
1
Numeric Summaries and Descriptive Statistics
2
populations vs. samples we want to describe both samples and populations the latter is a matter of inference…
3
“outliers” minority cases, so different from the majority that they merit separate consideration –are they errors? –are they indicative of a different pattern? think about possible outliers with care, but beware of mechanical treatments… significance of outliers depends on your research interests
5
summaries of distributions graphic vs. numeric –graphic may be better for visualization –numeric are better for statistical/inferential purposes resistance to outliers is usually an advantage in either case
6
general characteristics kurtosis ‘leptokurtic’ ’platykurtic’ [“peakedness”]
7
right (positive) skew left (negative) skew skew (skewness)
9
central tendency measures of central tendency –provide a sense of the value expressed by multiple cases, over all… mean median mode
10
mean center of gravity evenly partitions the sum of all measurement among all cases; average of all measures
11
crucial for inferential statistics mean is not very resistant to outliers a “trimmed mean” may be better for descriptive purposes mean – pro and con
12
mean R: mean(x)
13
trimmed mean R: mean(x, trim=.1)
14
median 50 th percentile… less useful for inferential purposes more resistant to effects of outliers…
15
median
16
mode the most numerous category for ratio data, often implies that data have been grouped in some way can be more or less created by the grouping procedure for theoretical distributions—simply the location of the peak on the frequency distribution
17
isolated scatters hamletsvillagesregional centers modal class = ‘hamlets’ 1.0 1.52.02.5
18
dispersion measures of dispersion –summarize degree of clustering of cases, esp. with respect to central tendency… range variance standard deviation
19
range would be better to use midspread… R: range(x)
20
variance analogous to average deviation of cases from mean in fact, based on sum of squared deviations from the mean—“sum-of-squares” R: var(x)
21
variance computational form:
22
note: units of variance are squared… this makes variance hard to interpret ex.: projectile point sample: mean = 22.6 mm variance = 38 mm 2 what does this mean???
23
standard deviation square root of variance:
24
standard deviation units are in same units as base measurements ex.: projectile point sample: mean = 22.6 mm standard deviation = 6.2 mm mean +/- sd (16.4—28.8 mm) –should give at least some intuitive sense of where most of the cases lie, barring major effects of outliers
26
trimmed dispersion measures variance and sd are even more sensitive to extreme values (outliers) than the mean… why?? you can calculate a trimmed version of the variance simply by eliminating cases from the tails, and calculating the variance in the normal way…
27
trimmed standard deviation trimmed sd is calculated differently s T = trimmed standard deviation n =number of cases in untrimmed batch s 2 w = variance of trimmed (winsorized) batch n T = number of cases in the trimmed batch
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.