II. Descriptive Statistics (Zar, Chapters 1 - 4).

Slides:



Advertisements
Similar presentations
Descriptive Measures MARE 250 Dr. Jason Turner.
Advertisements

Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Agricultural and Biological Statistics
IB Math Studies – Topic 6 Statistics.
Math 20: Foundations FM20.6 Demonstrate an understanding of normal distribution, including standard deviation and z-scores. FM20.7 Demonstrate understanding.
Statistical Tests Karen H. Hagglund, M.S.
1 Midterm Review Econ 240A. 2 The Big Picture The Classical Statistical Trail Descriptive Statistics Inferential Statistics Probability Discrete Random.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Topic 2: Statistical Concepts and Market Returns
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Analysis of Research Data
Slides by JOHN LOUCKS St. Edward’s University.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Measures of Dispersion
Statistics 800: Quantitative Business Analysis for Decision Making Measures of Locations and Variability.
1 Introduction to biostatistics Lecture plan 1. Basics 2. Variable types 3. Descriptive statistics: Categorical data Categorical data Numerical data Numerical.
The Data Analysis Plan. The Overall Data Analysis Plan Purpose: To tell a story. To construct a coherent narrative that explains findings, argues against.
Measurement Tools for Science Observation Hypothesis generation Hypothesis testing.
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
Engineering Probability and Statistics - SE-205 -Chap 1 By S. O. Duffuaa.
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
PTP 560 Research Methods Week 8 Thomas Ruediger, PT.
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
Chapter 2 Describing Data.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Describing Data Using Numerical Measures. Topics.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Skewness & Kurtosis: Reference
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
QUANTITATIVE RESEARCH AND BASIC STATISTICS. TODAYS AGENDA Progress, challenges and support needed Response to TAP Check-in, Warm-up responses and TAP.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Determination of Sample Size: A Review of Statistical Theory
TYPES There are several TYPES of variables that reflect characteristics of the data Ratio Interval Ordinal Nominal.
©2003 Thomson/South-Western 1 Chapter 3 – Data Summary Using Descriptive Measures Slides prepared by Jeff Heyl, Lincoln University ©2003 South-Western/Thomson.
Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape.
Statistics 1: Introduction to Probability and Statistics Section 3-2.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
LIS 570 Summarising and presenting data - Univariate analysis.
Introduction to statistics I Sophia King Rm. P24 HWB
MODULE 3: DESCRIPTIVE STATISTICS 2/6/2016BUS216: Probability & Statistics for Economics & Business 1.
1 Day 1 Quantitative Methods for Investment Management by Binam Ghimire.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
1 Design and Analysis of Experiments (2) Basic Statistics Kyung-Ho Park.
STATISTICS Chapter 2 and and 2.2: Review of Basic Statistics Topics covered today:  Mean, Median, Mode  5 number summary and box plot  Interquartile.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Statistics -Descriptive statistics 2013/09/30. Descriptive statistics Numerical measures of location, dispersion, shape, and association are also used.
Data Presentation Numerical Summary Measures Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
Measure of the Central Tendency For Grouped data
Statistical Methods Michael J. Watts
Statistics 1: Statistical Measures
Statistical Methods Michael J. Watts
Descriptive Statistics
Description of Data (Summary and Variability measures)
Ch. 18- Descriptive Statistics.
Statistics 1: Introduction to Probability and Statistics
Warm Up # 3: Answer each question to the best of your knowledge.
Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics
Probability and Statistics
Ticket in the Door GA Milestone Practice Test
Presentation transcript:

II. Descriptive Statistics (Zar, Chapters 1 - 4)

Statistics and Randomization Group 1  y 1, y 2, , y m  Group 2  z 1, z 2, , z n  m  Randomize Statistical Test Conclusion Extrapolate Describe the Population

Hypothesis H 0 :Group 1 = Group 2 H A :Group 1 ≠ Group 2 Or H A1 :Group 1 < Group 2 H A2 :Group 1 > Group 2 Null Hypothesis Alternative One-sided Two-sided Statistical Test

Types of Data. Discrete. Binary(Examples: alive or dead heads or tails Drug "A" or Drug "B" Male or Female Normal/Disease) Representation as data: 0 = alive 1 = dead or "A" for "alive " "D" FOR "dead" Sample then is with each x having only two choices

Summarize by (1)Table Factor number % Status Alive 25 71% Dead1029% (2) Histogram (a) Numbers (b) Percent

. Coded (ex. diagnosis, genus/species, race, TNM, stage, color) Representation as data: By name or coded name 1 = Caucasian, Non Hispanic 2 = Black (African American) 3 = Hispanic or just “C”, “B” or “A”, and “H” if 4 = Oriental, then C,B(A), H, O.

Summarize by (1)Table Race Number % W10 29% B 5 15% H12 34% O 7 21% (2) Histogram NumbersPercent

 Ordered Scale  Examples: Date, Severity Scales (Benign, Possible Ca,Probable Ca, Cancer), Agreement/Preference (Likert:Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree) Stage Strength Scales (0, +, ++, +++) Represented by an Integer Scale 1: Benign; 2: Possible; 3: Not Sure or Neutral; 4: Probable; 5: Cancer

Summarized by: (2) Histogram PercentCummulative Percent

 Continuous  Ratio Scales  Scale differences are the same (Ex: most data that have a zero)  True ratio data (Ex: normalized data: raw datatreated effector backgroundcontrol target Continuous Log Scale

Representation of data: real number scientific notation real number w/significant digits {x 1, x 2, …,x n } Summarized by (1) TableEx: 10 data points x   x   x  = 4 x 4 = 1 x   x   x  = 3.5 x   x     x  = 5.5 (a) Point Plot  X 8 X  X 1 X 7 X 3 X 5 X 6 X 10 X 9 X 2 

 b  histogram (1) form “bins” Ex: 0-2, 2-4, 4-6, 6-8 (2) count number of data points in each bin and plot # or % (a) Count(b) Percent

( c ) Cummulative Histogram 1) form bins as before 0-2, 2-4, 4-6, 6-8 2) Count number ≤ or ≥  ≥

What else can we do to summarize, or describe, the data? (1) define where the center of the data lies (measures of central tendency) (2) how the data varies from that center (measures of dispersion) Center Dispersion Two numbers instead of all n

Chapter 3 Measures of Central Tendency Where is the middle of the data? Random Sample: x 1, x 2, ---, x n (1) The arithmetic mean (average) X 8 X  X 1 X 7 X 3 X 5 X 6 X 10 X 9 X 2 Center of Gravity

(2) The order statistics x (1) = min (x i ) ≤ x (2) ≤ x (3) ≤ … ≤ x (n) = max(x i ) x 4 ≤ x 1 ≤ x 7 ≤ x 3 ≤ x 5 = x 8 ≤ x 6 ≤ x 10 ≤ x 9 ≤ x 2 x (1) ≤ x (2) ≤ x (3) ≤ x (4) ≤ x (5.5) = x (5.5) ≤ x (7) ≤ x (8) ≤ x (9) ≤ x (10) For Ties, sum up the indices and divide by the number of ties!! Ex., x 5 and x 8 are tied (4.5) the order statistic index is (5+6)/2, The order statistic is x 5.5.

Median - middle order statistic: If n is odd, it’s the middle statistic If n is even, it’s the average of the two middles

If we want a formula that has even and odd together, we can use the greatest integer function: Where [-] is the “greatest integer in … “ In the example above, n = 10, [n/2] = 5

Plot the order statistic index (plot i on the y-axis) against the corresponding order statistic (x (i) on the x-axis), The plot is called a frequency polygon:

(3) The Mode The x where the histogram is maximal. Usually use the midpoint of the box where the histogram is maximal. Ex: In our continuous example: The mode is in the box 4-6 = 5.0 = (4+6)/2

(4) The mid-range (5) The geometric mean

Derivation of the geometric mean. Let y i = log 10 (x i ) Then

(6) The harmonic mean

SUMMARY: Measures of Central Tendency (1) MEAN Data evenly weighted Average of salaries in lab: 4 hard working G.R.A. = 20,000 20,000 1 Faculty member 100, ,000 Mean=36,000.

(2) Median Center of Data 50% above, 50% below Median=20,000 (3) Mode bin sizes to be about the same Mode=20,000 (4) Midrange - only the endpoints. 100, ,000 = 60,000

Chapter 4 Measures of Dispersion and Variability (1)Range Range = x (n) - x (1) (2)Mean Deviation (3) Variance Sometimes called the sample variance. Sometimes called the moment of inertia.

Each data point selected randomly and independently of all other points. It represents a degree of freedom. Variance (cont.) A sample of n points is a vector in n-dimensional space. The new statistics used by s 2 are

(4) The standard deviation so that the are not independent The estimate of for the true mean costs one degree of freedom to make (n-1) degrees of freedom. The units of s are the same as x i (5) The standard error of the mean

(6) The coefficient of variation (7) Quartiles (Divide the data in Quarters)

Interquartile range: IQR = q 3 - q 1 Percentiles (Divide into %)

and h i = data pts in the ith bin. The p i ’s represent & estimate the “true” probabilities in the bins (∑p i = 100%). (8) Indicies of diversity “Shannon Index” Information Theory

So, How do we use a measure of the Center and a measure of Dispersion to represent the data? (1) Mean  SD or SE In a Table

In a Graph

More common: Histogram Bars with whiskers Problem: perception of lower limits -- who is similar?

Choices: Standard Deviation Show Population Variability Standard Error Show Mean Comparisons

Confidence Interval Shows the result of the t-test Box Plot Median with quartiles Whiskers for Min&Max Circles/Asterisk for outliers

Extrapolation to the Universe Universe Sample Space Is esti mat ed by Probability Density Function Histogram As the sample size n gets large and bin width gets small

Parameter in the UniverseStatistic in the Sample Space F(x) is called the distribution function and is also approximated by the frequency polygon.