Presentation is loading. Please wait.

Presentation is loading. Please wait.


Similar presentations

Presentation on theme: "Distributions."— Presentation transcript:

1 Distributions

2 Outline Distributions Frequency Histograms Cumulative frequency
Quantiles Continuous variables Shape of a distribution

3 Distribution The set of values present in a sample or population
Which values occur How often Starting point for statistics Every statistic is computed from sample distribution Every parameter is a property of population distribution Need ways of representing or talking about distributions

4 Frequency Easiest way to characterize distribution
How often each value occurs f(x) = frequency of value x Sample: {1, 6, 3, 8, 6, 4}. f(6) = ? Frequency table Shows frequencies of all values 1st column for value, 2nd column for frequency x f(x) 2 1 3 4 {5,7,3,7,2,5,5,3,7,5,3,11,7,5,3,5} 5 6 7 4 11 1

5 Frequency of this value
Histogram Graphical representation of a distribution, showing frequency of each value Frequency of this value Units Values Variable Label

6 Cumulative Frequency {4,3,4,5,3,4,2,4,3,4} f(3) = ? 3 F(3) = ? 4
Number of scores below or equal to a given value F(x) = cumulative frequency for value x {4,3,4,5,3,4,2,4,3,4} f(3) = ? 3 f(4) x f(x) F(x) 2 3 1 4 5 F(3) = ? 4 1 4 f(3) 9 10

7 Quantile Quantile - the value of X that's greater than a certain fraction of the data Percentile - quantile defined by a certain percentage {8,2,5,5,7,1,8,2,4,8} 50th percentile = 5 90th percentile = 8 {1,2,2,4,5,5,7,8,8,8} 25th %ile Interpolation 90th %ile

8 Continuous vs. Discrete Variables
Can only take certain values (usually integers) Counts: people, test score, stories, … Continuous variable Infinite set of values, in principle Height, weight, temp, IQ, … For any two scores, there are other possible scores in between

9 Histograms of Continuous Variables
Plotting unique scores isn’t useful Bins or intervals Ranges for grouping continuous variables Best width depends on number of data 71.5 72.5 73.5

10 Density 2% 100% Frequency only well-defined for discrete variables
f(x): scores exactly equal to x 0 almost everywhere for continuous variables Density function Describes theoretical distribution of continuous variable Allows determination of number of scores in any range, by integration Usually shown as proportion of total population (probability), not frequency 100% Household Income Density 2%

11 Shape of a Distribution
Information beyond average score & variability Broad, often qualitative property Need "nice" shape to do statistics Normal distribution Gold standard for good shape Symmetric, unimodal, thin tails

12 Bad Shape Skew: Asymmetric distribution
Extreme scores in one direction bias results Positive skew vs negative skew - which tail is bigger Solutions Only consider order of scores (“ordinal data”) Transform: Do statistics on new variable

13 Bad Shape Multimodal: More than one peak
Suggests there are multiple constituent populations Learners vs. non-learners Solution: discretize Do statistics on proportion of learners

Download ppt "Distributions."

Similar presentations

Ads by Google