Download presentation
Presentation is loading. Please wait.
Published byDelilah Chandler Modified over 8 years ago
1
II. Descriptive Statistics (Zar, Chapters 1 - 4)
2
Statistics and Randomization Group 1 y 1, y 2, , y m Group 2 z 1, z 2, , z n m Randomize Statistical Test Conclusion Extrapolate Describe the Population
3
Hypothesis H 0 :Group 1 = Group 2 H A :Group 1 ≠ Group 2 Or H A1 :Group 1 < Group 2 H A2 :Group 1 > Group 2 Null Hypothesis Alternative One-sided Two-sided Statistical Test
4
Types of Data. Discrete. Binary(Examples: alive or dead heads or tails Drug "A" or Drug "B" Male or Female Normal/Disease) Representation as data: 0 = alive 1 = dead or "A" for "alive " "D" FOR "dead" Sample then is with each x having only two choices
5
Summarize by (1)Table Factor number % Status Alive 25 71% Dead1029% (2) Histogram (a) Numbers (b) Percent
8
. Coded (ex. diagnosis, genus/species, race, TNM, stage, color) Representation as data: By name or coded name 1 = Caucasian, Non Hispanic 2 = Black (African American) 3 = Hispanic or just “C”, “B” or “A”, and “H” if 4 = Oriental, then C,B(A), H, O.
9
Summarize by (1)Table Race Number % W10 29% B 5 15% H12 34% O 7 21% (2) Histogram NumbersPercent
13
Ordered Scale Examples: Date, Severity Scales (Benign, Possible Ca,Probable Ca, Cancer), Agreement/Preference (Likert:Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree) Stage Strength Scales (0, +, ++, +++) Represented by an Integer Scale 1: Benign; 2: Possible; 3: Not Sure or Neutral; 4: Probable; 5: Cancer
14
Summarized by: (2) Histogram PercentCummulative Percent
16
Continuous Ratio Scales Scale differences are the same (Ex: most data that have a zero) True ratio data (Ex: normalized data: raw datatreated effector backgroundcontrol target Continuous Log Scale
17
Representation of data: real number scientific notation real number w/significant digits {x 1, x 2, …,x n } Summarized by (1) TableEx: 10 data points x x x = 4 x 4 = 1 x x x = 3.5 x x x = 5.5 (a) Point Plot X 8 X X 1 X 7 X 3 X 5 X 6 X 10 X 9 X 2
18
b histogram (1) form “bins” Ex: 0-2, 2-4, 4-6, 6-8 (2) count number of data points in each bin and plot # or % (a) Count(b) Percent
19
( c ) Cummulative Histogram 1) form bins as before 0-2, 2-4, 4-6, 6-8 2) Count number ≤ or ≥ 0 2 4 6 8 10 0 0-2 0-4 0-6 0-8 0-10 ≥0-10 2-10 4-10 6-10 8-10 0
24
What else can we do to summarize, or describe, the data? (1) define where the center of the data lies (measures of central tendency) (2) how the data varies from that center (measures of dispersion) Center Dispersion Two numbers instead of all n
25
Chapter 3 Measures of Central Tendency Where is the middle of the data? Random Sample: x 1, x 2, ---, x n (1) The arithmetic mean (average) X 8 X X 1 X 7 X 3 X 5 X 6 X 10 X 9 X 2 Center of Gravity
26
(2) The order statistics x (1) = min (x i ) ≤ x (2) ≤ x (3) ≤ … ≤ x (n) = max(x i ) x 4 ≤ x 1 ≤ x 7 ≤ x 3 ≤ x 5 = x 8 ≤ x 6 ≤ x 10 ≤ x 9 ≤ x 2 x (1) ≤ x (2) ≤ x (3) ≤ x (4) ≤ x (5.5) = x (5.5) ≤ x (7) ≤ x (8) ≤ x (9) ≤ x (10) For Ties, sum up the indices and divide by the number of ties!! Ex., x 5 and x 8 are tied (4.5) the order statistic index is (5+6)/2, The order statistic is x 5.5.
27
Median - middle order statistic: If n is odd, it’s the middle statistic If n is even, it’s the average of the two middles
28
If we want a formula that has even and odd together, we can use the greatest integer function: Where [-] is the “greatest integer in … “ In the example above, n = 10, [n/2] = 5
29
Plot the order statistic index (plot i on the y-axis) against the corresponding order statistic (x (i) on the x-axis), The plot is called a frequency polygon:
30
(3) The Mode The x where the histogram is maximal. Usually use the midpoint of the box where the histogram is maximal. Ex: In our continuous example: The mode is in the box 4-6 = 5.0 = (4+6)/2
31
(4) The mid-range (5) The geometric mean
32
Derivation of the geometric mean. Let y i = log 10 (x i ) Then
33
(6) The harmonic mean
34
SUMMARY: Measures of Central Tendency (1) MEAN Data evenly weighted Average of salaries in lab: 4 hard working G.R.A. = 20,000 20,000 1 Faculty member 100,000 180,000 Mean=36,000.
35
(2) Median Center of Data 50% above, 50% below Median=20,000 (3) Mode bin sizes to be about the same Mode=20,000 (4) Midrange - only the endpoints. 100,000 + 20,000 = 60,000
36
Chapter 4 Measures of Dispersion and Variability (1)Range Range = x (n) - x (1) (2)Mean Deviation (3) Variance Sometimes called the sample variance. Sometimes called the moment of inertia.
37
Each data point selected randomly and independently of all other points. It represents a degree of freedom. Variance (cont.) A sample of n points is a vector in n-dimensional space. The new statistics used by s 2 are
38
(4) The standard deviation so that the are not independent The estimate of for the true mean costs one degree of freedom to make (n-1) degrees of freedom. The units of s are the same as x i (5) The standard error of the mean
40
(6) The coefficient of variation (7) Quartiles (Divide the data in Quarters)
41
Interquartile range: IQR = q 3 - q 1 Percentiles (Divide into %)
42
and h i = data pts in the ith bin. The p i ’s represent & estimate the “true” probabilities in the bins (∑p i = 100%). (8) Indicies of diversity “Shannon Index” Information Theory
43
So, How do we use a measure of the Center and a measure of Dispersion to represent the data? (1) Mean SD or SE In a Table
44
In a Graph
45
More common: Histogram Bars with whiskers Problem: perception of lower limits -- who is similar?
49
Choices: Standard Deviation Show Population Variability Standard Error Show Mean Comparisons
50
Confidence Interval Shows the result of the t-test Box Plot Median with quartiles Whiskers for Min&Max Circles/Asterisk for outliers
56
Extrapolation to the Universe Universe Sample Space Is esti mat ed by Probability Density Function Histogram As the sample size n gets large and bin width gets small
57
Parameter in the UniverseStatistic in the Sample Space F(x) is called the distribution function and is also approximated by the frequency polygon.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.