Download presentation
Presentation is loading. Please wait.
Published byJuliet Alexander Modified over 9 years ago
1
Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data
2
Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data Francis Galton, Father of Intelligence Testing: Whenever you can, count!
3
Frequency Table Variable is Continuous
4
Grouped Frequency Table & Distribution Continuous variable, Data from Same 100 Subjects Constant Interval “Class Interval”
5
Grouped Frequency Histogram For Continuous Variable Bars “Touch”, the end of one interval is beginning of next Value is middle value of Interval Spatz says the bars don’t touch – Whaaaaaa?????
6
Bar Chart for Categorical Variable Bars are separated – a lot of Biology is not almost English
7
Standard Normal Distribution The more Extreme your score the more unusual, improbable you are Remember this relationship -- it’s the basis of 90% of statistics Typical of many characteristics -- E.G., height, intelligence, speed
8
Rectangular Distribution Never Seen One Extreme Scores are NOT less usual/frequent/probable
9
Non-Normal Distribution Example: Income -- Where is the mean? How would you characterize these data?
10
Negative Skew
11
Bimodal Distribution Is the Mean appropriate/representative E.G., Mean age of onset for Anorexia is 17yrs One Peak is at 14yrs -- Onset of Puberty One Peak is at 18yrs -- Going away to college
12
Bimodal Distribution, cont.
13
Characterizing Your Data Measures of Central Tendency Characterizing your Data: Shorthand notation for all of your values Central Tendency: A representative value Where Your Scores tend to “Hang Out” Where you go to find your data 1.Mean -- What is definition & why do you use it? 2.Median -- Middle Value What if you have an even # of values? 3.Mode -- Most frequent value
14
Which Central Tendency is Best? Mean Ratio Data (People allow Interval Data) Symmetrical Distributions Median Skewed Distributions Ordinal (Ranked) Data -- A mean cannot be computed Mode Nominal (Qualitative) Data Bimodal Data
15
If you Had to Guess the Value of Each (Quantitative) Data Point Mode: Highest # of correct guesses Median: Errors would be symmetrical Overestimations would balance out Underestimations Mean: Errors of Estimation will be smallest, overall Two Unique Properties of the Mean: 1.Deviations are smallest from the mean Than for any other value 2.Deviation scores sum to zero
16
How Strong Is Your Tendency? Measures of Heterogeneity (Chapter 3) Two Data Sets with nearly identical: Ns Means Medians Modes Are these two data sets similar?
17
Are They The Same?
18
Some Data Sets are More Heterogeneous Jockeys:Very Low average height Very Homogeneous Presbyterians:Medium average heightVery Heterogeneous NBA Players:Very High average heightVery Homogenous How do you characterize a data set’s Heterogeneity? The Greater the Heterogeneity, the Weaker the Central Tendency
19
Quantifying Heterogeneity Range: Highest Score minus Lowest Score Very sensitive to a single Extreme Score Inter Quartile Range: 75 th percentile minus 25 th percentile Captures 50% of the scores How wide do you have to go to capture 50% of values? The wider you have to go the more Heterogeneity
20
Heterogeneity, cont. The more Heterogeneity, the more the scores will deviate from The mean
21
Heterogeneity, cont. Two Unique properties of the Mean: 1.All deviation scores sum to zero 2.Raw scores Deviate Less from the mean than from any other Value This makes the mean the Best Representative of the data Set If distribution is symmetrical
22
Heterogeneity, cont. Problem: All deviation scores sum to zero no matter how Heterogeneous the raw scores You Cannot average deviations scores to quantify heterogeneity Solution: Make all deviation scores Positive
23
Heterogeneity, cont. Two way to make all deviation scores Positive: Take the Absolute Value of the Deviation Scores: Average of absolute values = Average Deviation Mean +/- AD Captures 50% of raw scores Take the Square of the Deviation Scores Average of squared deviation scores = Variance 2 for Population S 2 for Sample S 2 -”hat” for estimating Population from Sample
24
Variance Population Estimate of Population from Sample To Describe sample use N S 2 = Sample Variance Problem: Magnitude of Variance is large relative to individual Deviation scores -- Quantifies but not very descriptive
25
Standard Deviation PopulationSample Population Estimate Mean +/- SD captures 68% of Data Points
26
Standard Deviation, cont.
27
The Concept Standard Deviation Standard Deviation from the Mean “Average” Deviation from the Mean Expected Deviation from the Mean Expect 68% of your data to be within 1 SD of the mean Expect 95% of your data to be within 2 SD of the mean If your score is beyond 2 SDs of the mean You are very infrequent You are very unusual You are very improbable Associate: Infrequent with Improbable
28
Interpreting a Value Transforming a score to make it more interpretable: Comparing two scores: Two tests of Equal Difficulty but of Different Length Pretend both tests were 100 items long How many would you have gotten right? Percent Correct is a Transformed Score Comparing one score to everybody else: Pretend there were 100 people, where would rank? Percentile is a Transformed Score
29
Z-scores & Z-transformations Take each score (Xi) and covert it to Zi Mean of z-scores = 0 Standard Deviation = 1 Units of z-scores are in Standard Deviations Z-score compares Your Deviation (numerator) to the “Average Deviation” (denominator)
30
Where you are relative to Population Think Percentile
31
Interpreting Your Z-Score
32
Interpreting Your Z-Score, cont.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.