II. Descriptive Statistics (Zar, Chapters 1 - 4).

II. Descriptive Statistics (Zar, Chapters 1 - 4)

Statistics and Randomization Group 1  y 1, y 2, , y m  Group 2  z 1, z 2, , z n  m  Randomize Statistical Test Conclusion Extrapolate Describe the Population

Hypothesis H 0 :Group 1 = Group 2 H A :Group 1 ≠ Group 2 Or H A1 :Group 1 < Group 2 H A2 :Group 1 > Group 2 Null Hypothesis Alternative One-sided Two-sided Statistical Test

Types of Data. Discrete. Binary(Examples: alive or dead heads or tails Drug "A" or Drug "B" Male or Female Normal/Disease) Representation as data: 0 = alive 1 = dead or "A" for "alive " "D" FOR "dead" Sample then is with each x having only two choices

Summarize by (1)Table Factor number % Status Alive 25 71% Dead1029% (2) Histogram (a) Numbers (b) Percent

. Coded (ex. diagnosis, genus/species, race, TNM, stage, color) Representation as data: By name or coded name 1 = Caucasian, Non Hispanic 2 = Black (African American) 3 = Hispanic or just “C”, “B” or “A”, and “H” if 4 = Oriental, then C,B(A), H, O.

Summarize by (1)Table Race Number % W10 29% B 5 15% H12 34% O 7 21% (2) Histogram NumbersPercent

 Ordered Scale  Examples: Date, Severity Scales (Benign, Possible Ca,Probable Ca, Cancer), Agreement/Preference (Likert:Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree) Stage Strength Scales (0, +, ++, +++) Represented by an Integer Scale 1: Benign; 2: Possible; 3: Not Sure or Neutral; 4: Probable; 5: Cancer

Summarized by: (2) Histogram PercentCummulative Percent

 Continuous  Ratio Scales  Scale differences are the same (Ex: most data that have a zero)  True ratio data (Ex: normalized data: raw datatreated effector backgroundcontrol target Continuous Log Scale

Representation of data: real number scientific notation real number w/significant digits {x 1, x 2, …,x n } Summarized by (1) TableEx: 10 data points x   x   x  = 4 x 4 = 1 x   x   x  = 3.5 x   x     x  = 5.5 (a) Point Plot  X 8 X  X 1 X 7 X 3 X 5 X 6 X 10 X 9 X 2 

 b  histogram (1) form “bins” Ex: 0-2, 2-4, 4-6, 6-8 (2) count number of data points in each bin and plot # or % (a) Count(b) Percent

( c ) Cummulative Histogram 1) form bins as before 0-2, 2-4, 4-6, 6-8 2) Count number ≤ or ≥ 0 2 4 6 8 10  0 0-2 0-4 0-6 0-8 0-10 ≥0-10 2-10 4-10 6-10 8-10 0

What else can we do to summarize, or describe, the data? (1) define where the center of the data lies (measures of central tendency) (2) how the data varies from that center (measures of dispersion) Center Dispersion Two numbers instead of all n

Chapter 3 Measures of Central Tendency Where is the middle of the data? Random Sample: x 1, x 2, ---, x n (1) The arithmetic mean (average) X 8 X  X 1 X 7 X 3 X 5 X 6 X 10 X 9 X 2 Center of Gravity

(2) The order statistics x (1) = min (x i ) ≤ x (2) ≤ x (3) ≤ … ≤ x (n) = max(x i ) x 4 ≤ x 1 ≤ x 7 ≤ x 3 ≤ x 5 = x 8 ≤ x 6 ≤ x 10 ≤ x 9 ≤ x 2 x (1) ≤ x (2) ≤ x (3) ≤ x (4) ≤ x (5.5) = x (5.5) ≤ x (7) ≤ x (8) ≤ x (9) ≤ x (10) For Ties, sum up the indices and divide by the number of ties!! Ex., x 5 and x 8 are tied (4.5) the order statistic index is (5+6)/2, The order statistic is x 5.5.

Median - middle order statistic: If n is odd, it’s the middle statistic If n is even, it’s the average of the two middles

If we want a formula that has even and odd together, we can use the greatest integer function: Where [-] is the “greatest integer in … “ In the example above, n = 10, [n/2] = 5

Plot the order statistic index (plot i on the y-axis) against the corresponding order statistic (x (i) on the x-axis), The plot is called a frequency polygon:

(3) The Mode The x where the histogram is maximal. Usually use the midpoint of the box where the histogram is maximal. Ex: In our continuous example: The mode is in the box 4-6 = 5.0 = (4+6)/2

(4) The mid-range (5) The geometric mean

Derivation of the geometric mean. Let y i = log 10 (x i ) Then

(6) The harmonic mean

SUMMARY: Measures of Central Tendency (1) MEAN Data evenly weighted Average of salaries in lab: 4 hard working G.R.A. = 20,000 20,000 1 Faculty member 100,000 180,000 Mean=36,000.

(2) Median Center of Data 50% above, 50% below Median=20,000 (3) Mode bin sizes to be about the same Mode=20,000 (4) Midrange - only the endpoints. 100,000 + 20,000 = 60,000

Chapter 4 Measures of Dispersion and Variability (1)Range Range = x (n) - x (1) (2)Mean Deviation (3) Variance Sometimes called the sample variance. Sometimes called the moment of inertia.

Each data point selected randomly and independently of all other points. It represents a degree of freedom. Variance (cont.) A sample of n points is a vector in n-dimensional space. The new statistics used by s 2 are

(4) The standard deviation so that the are not independent The estimate of for the true mean costs one degree of freedom to make (n-1) degrees of freedom. The units of s are the same as x i (5) The standard error of the mean

(6) The coefficient of variation (7) Quartiles (Divide the data in Quarters)

Interquartile range: IQR = q 3 - q 1 Percentiles (Divide into %)

and h i = data pts in the ith bin. The p i ’s represent & estimate the “true” probabilities in the bins (∑p i = 100%). (8) Indicies of diversity “Shannon Index” Information Theory

So, How do we use a measure of the Center and a measure of Dispersion to represent the data? (1) Mean  SD or SE In a Table

In a Graph

More common: Histogram Bars with whiskers Problem: perception of lower limits -- who is similar?

Choices: Standard Deviation Show Population Variability Standard Error Show Mean Comparisons

Confidence Interval Shows the result of the t-test Box Plot Median with quartiles Whiskers for Min&Max Circles/Asterisk for outliers

Extrapolation to the Universe Universe Sample Space Is esti mat ed by Probability Density Function Histogram As the sample size n gets large and bin width gets small

Parameter in the UniverseStatistic in the Sample Space F(x) is called the distribution function and is also approximated by the frequency polygon.

II. Descriptive Statistics (Zar, Chapters 1 - 4).

Similar presentations

Presentation on theme: "II. Descriptive Statistics (Zar, Chapters 1 - 4)."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

II. Descriptive Statistics (Zar, Chapters 1 - 4).

Similar presentations

Presentation on theme: "II. Descriptive Statistics (Zar, Chapters 1 - 4)."— Presentation transcript:

Similar presentations

About project

Feedback