Presentation is loading. Please wait.

Presentation is loading. Please wait.

II. Descriptive Statistics (Zar, Chapters 1 - 4).

Similar presentations


Presentation on theme: "II. Descriptive Statistics (Zar, Chapters 1 - 4)."— Presentation transcript:

1 II. Descriptive Statistics (Zar, Chapters 1 - 4)

2 Statistics and Randomization Group 1  y 1, y 2, , y m  Group 2  z 1, z 2, , z n  m  Randomize Statistical Test Conclusion Extrapolate Describe the Population

3 Hypothesis H 0 :Group 1 = Group 2 H A :Group 1 ≠ Group 2 Or H A1 :Group 1 < Group 2 H A2 :Group 1 > Group 2 Null Hypothesis Alternative One-sided Two-sided Statistical Test

4 Types of Data. Discrete. Binary(Examples: alive or dead heads or tails Drug "A" or Drug "B" Male or Female Normal/Disease) Representation as data: 0 = alive 1 = dead or "A" for "alive " "D" FOR "dead" Sample then is with each x having only two choices

5 Summarize by (1)Table Factor number % Status Alive 25 71% Dead1029% (2) Histogram (a) Numbers (b) Percent

6

7

8 . Coded (ex. diagnosis, genus/species, race, TNM, stage, color) Representation as data: By name or coded name 1 = Caucasian, Non Hispanic 2 = Black (African American) 3 = Hispanic or just “C”, “B” or “A”, and “H” if 4 = Oriental, then C,B(A), H, O.

9 Summarize by (1)Table Race Number % W10 29% B 5 15% H12 34% O 7 21% (2) Histogram NumbersPercent

10

11

12

13  Ordered Scale  Examples: Date, Severity Scales (Benign, Possible Ca,Probable Ca, Cancer), Agreement/Preference (Likert:Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree) Stage Strength Scales (0, +, ++, +++) Represented by an Integer Scale 1: Benign; 2: Possible; 3: Not Sure or Neutral; 4: Probable; 5: Cancer

14 Summarized by: (2) Histogram PercentCummulative Percent

15

16  Continuous  Ratio Scales  Scale differences are the same (Ex: most data that have a zero)  True ratio data (Ex: normalized data: raw datatreated effector backgroundcontrol target Continuous Log Scale

17 Representation of data: real number scientific notation real number w/significant digits {x 1, x 2, …,x n } Summarized by (1) TableEx: 10 data points x   x   x  = 4 x 4 = 1 x   x   x  = 3.5 x   x     x  = 5.5 (a) Point Plot  X 8 X  X 1 X 7 X 3 X 5 X 6 X 10 X 9 X 2 

18  b  histogram (1) form “bins” Ex: 0-2, 2-4, 4-6, 6-8 (2) count number of data points in each bin and plot # or % (a) Count(b) Percent

19 ( c ) Cummulative Histogram 1) form bins as before 0-2, 2-4, 4-6, 6-8 2) Count number ≤ or ≥ 0 2 4 6 8 10  0 0-2 0-4 0-6 0-8 0-10 ≥0-10 2-10 4-10 6-10 8-10 0

20

21

22

23

24 What else can we do to summarize, or describe, the data? (1) define where the center of the data lies (measures of central tendency) (2) how the data varies from that center (measures of dispersion) Center Dispersion Two numbers instead of all n

25 Chapter 3 Measures of Central Tendency Where is the middle of the data? Random Sample: x 1, x 2, ---, x n (1) The arithmetic mean (average) X 8 X  X 1 X 7 X 3 X 5 X 6 X 10 X 9 X 2 Center of Gravity

26 (2) The order statistics x (1) = min (x i ) ≤ x (2) ≤ x (3) ≤ … ≤ x (n) = max(x i ) x 4 ≤ x 1 ≤ x 7 ≤ x 3 ≤ x 5 = x 8 ≤ x 6 ≤ x 10 ≤ x 9 ≤ x 2 x (1) ≤ x (2) ≤ x (3) ≤ x (4) ≤ x (5.5) = x (5.5) ≤ x (7) ≤ x (8) ≤ x (9) ≤ x (10) For Ties, sum up the indices and divide by the number of ties!! Ex., x 5 and x 8 are tied (4.5) the order statistic index is (5+6)/2, The order statistic is x 5.5.

27 Median - middle order statistic: If n is odd, it’s the middle statistic If n is even, it’s the average of the two middles

28 If we want a formula that has even and odd together, we can use the greatest integer function: Where [-] is the “greatest integer in … “ In the example above, n = 10, [n/2] = 5

29 Plot the order statistic index (plot i on the y-axis) against the corresponding order statistic (x (i) on the x-axis), The plot is called a frequency polygon:

30 (3) The Mode The x where the histogram is maximal. Usually use the midpoint of the box where the histogram is maximal. Ex: In our continuous example: The mode is in the box 4-6 = 5.0 = (4+6)/2

31 (4) The mid-range (5) The geometric mean

32 Derivation of the geometric mean. Let y i = log 10 (x i ) Then

33 (6) The harmonic mean

34 SUMMARY: Measures of Central Tendency (1) MEAN Data evenly weighted Average of salaries in lab: 4 hard working G.R.A. = 20,000 20,000 1 Faculty member 100,000 180,000 Mean=36,000.

35 (2) Median Center of Data 50% above, 50% below Median=20,000 (3) Mode bin sizes to be about the same Mode=20,000 (4) Midrange - only the endpoints. 100,000 + 20,000 = 60,000

36 Chapter 4 Measures of Dispersion and Variability (1)Range Range = x (n) - x (1) (2)Mean Deviation (3) Variance Sometimes called the sample variance. Sometimes called the moment of inertia.

37 Each data point selected randomly and independently of all other points. It represents a degree of freedom. Variance (cont.) A sample of n points is a vector in n-dimensional space. The new statistics used by s 2 are

38 (4) The standard deviation so that the are not independent The estimate of for the true mean costs one degree of freedom to make (n-1) degrees of freedom. The units of s are the same as x i (5) The standard error of the mean

39

40 (6) The coefficient of variation (7) Quartiles (Divide the data in Quarters)

41 Interquartile range: IQR = q 3 - q 1 Percentiles (Divide into %)

42 and h i = data pts in the ith bin. The p i ’s represent & estimate the “true” probabilities in the bins (∑p i = 100%). (8) Indicies of diversity “Shannon Index” Information Theory

43 So, How do we use a measure of the Center and a measure of Dispersion to represent the data? (1) Mean  SD or SE In a Table

44 In a Graph

45 More common: Histogram Bars with whiskers Problem: perception of lower limits -- who is similar?

46

47

48

49 Choices: Standard Deviation Show Population Variability Standard Error Show Mean Comparisons

50 Confidence Interval Shows the result of the t-test Box Plot Median with quartiles Whiskers for Min&Max Circles/Asterisk for outliers

51

52

53

54

55

56 Extrapolation to the Universe Universe Sample Space Is esti mat ed by Probability Density Function Histogram As the sample size n gets large and bin width gets small

57 Parameter in the UniverseStatistic in the Sample Space F(x) is called the distribution function and is also approximated by the frequency polygon.


Download ppt "II. Descriptive Statistics (Zar, Chapters 1 - 4)."

Similar presentations


Ads by Google