Some definitions In Statistics. A sample: Is a subset of the population.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Random Sampling and Data Description
Measures of Dispersion
Introduction to Summary Statistics
Organizing and describing Data. Instructor:W.H.Laverty Office:235 McLean Hall Phone: Lectures: M W F 11:30am - 12:20pm Arts 143 Lab: M 3:30 -
Descriptive Statistics
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
1 Economics 240A Power One. 2 Outline w Course Organization w Course Overview w Resources for Studying.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Descriptive Statistics
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 3 Describing Data Using Numerical Measures.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Describing Data: Numerical
Chapter 2 Describing Data with Numerical Measurements
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Chapter 3 Statistics for Describing, Exploring, and Comparing Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Elementary Statistical Concepts
© Copyright McGraw-Hill CHAPTER 3 Data Description.
© 2008 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter 2 Describing Data.
6-1 Numerical Summaries Definition: Sample Mean.
Chapter 21 Basic Statistics.
Lecture 3 Describing Data Using Numerical Measures.
Skewness & Kurtosis: Reference
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
 The mean is typically what is meant by the word “average.” The mean is perhaps the most common measure of central tendency.  The sample mean is written.
Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape.
1 Review Sections 2.1, 2.2, 1.3, 1.4, 1.5, 1.6 in text.
Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape.
1 Chapter 4 Numerical Methods for Describing Data.
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
Summarizing Data Graphical Methods. Histogram Stem-Leaf Diagram Grouped Freq Table Box-whisker Plot.
Stats Introduction to Statistical Methods. Instructor:W.H.Laverty Office:235 McLean Hall Phone: Lectures: M T W Th F 11:00am - 12:20pm Geol.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Statistics Descriptive Statistics. Statistics Introduction Descriptive Statistics Collections, organizations, summary and presentation of data Inferential.
Exploratory Data Analysis
Numerical Measures.
Chapter 3 Describing Data Using Numerical Measures
4. Interpreting sets of data
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Introduction to Summary Statistics
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Descriptive Statistics
An Introduction to Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Displaying Distributions with Graphs
Data analysis and basic statistics
Statistics: The Interpretation of Data
Elementary Statistics: Looking at the Big Picture
Ticket in the Door GA Milestone Practice Test
Ticket in the Door GA Milestone Practice Test
Advanced Algebra Unit 1 Vocabulary
Presentation transcript:

Some definitions In Statistics

A sample: Is a subset of the population

In statistics: One draws conclusions about the population based on data collected from a sample

Reasons: Cost It is less costly to collect data from a sample then the entire population Accuracy

Data from a sample sometimes leads to more accurate conclusions then data from the entire population Costs saved from using a sample can be directed to obtaining more accurate observations on each case in the population

Types of Samples different types of samples are determined by how the sample is selected.

Convenience Samples In a convenience sample the subjects that are most convenient to the researcher are selected as objects in the sample. This is not a very good procedure for inferential Statistical Analysis but is useful for exploratory preliminary work.

Quota samples In quota samples subjects are chosen conveniently until quotas are met for different subgroups of the population. This also is useful for exploratory preliminary work.

Random Samples Random samples of a given size are selected in such that all possible samples of that size have the same probability of being selected.

Convenience Samples and Quota samples are useful for preliminary studies. It is however difficult to assess the accuracy of estimates based on this type of sampling scheme. Sometimes however one has to be satisfied with a convenience sample and assume that it is equivalent to a random sampling procedure

Population Sample Case  Variables X Y Z

Some other definitions

A population statistic (parameter): Any quantity computed from the values of variables for the entire population.

A sample statistic: Any quantity computed from the values of variables for the cases in the sample.

Since only cases from the sample are observed –only sample statistics are computed –These are used to make inferences about population statistics –It is important to be able to assess the accuracy of these inferences

To download lectures 1.Go to the stats 244 web site a)Through PAWS or b)by going to the website of the department of Mathematics and Statistics -> people -> faculty -> W.H. Laverty -> Stats Lectures. 2.Then a)select the lecture b)Right click and choose Save as

To print lectures 1.Open the lecture using MS Powerpoint 2.Select the menu item File -> Print

The following dialogue box appear

In the Print what box, select handouts

Set Slides per page to 6 or 3.

6 slides per page will result in the least amount of paper being printed

3 slides per page leaves room for notes

Organizing and describing Data

Techniques for continuous variables

The Grouped frequency table: The Histogram

To Construct A Grouped frequency table A Histogram

1.Find the maximum and minimum of the observations. 2.Choose non-overlapping intervals of equal width (The Class Intervals) that cover the range between the maximum and the minimum. 3.The endpoints of the intervals are called the class boundaries. 4.Count the number of observations in each interval (The cell frequency - f). 5.Calculate relative frequency relative frequency = f/N

Data Set #3 The following table gives data on Verbal IQ, Math IQ, Initial Reading Acheivement Score, and Final Reading Acheivement Score for 23 students who have recently completed a reading improvement program InitialFinal VerbalMathReadingReading StudentIQIQAcheivementAcheivement

In this example the upper endpoint is included in the interval. The lower endpoint is not.

Histogram – Verbal IQ

Histogram – Math IQ

Example In this example we are comparing (for two drugs A and B) the time to metabolize the drug. 120 cases were given drug A. 120 cases were given drug B. Data on time to metabolize each drug is given on the next two slides

Drug A

Drug B

Grouped frequency tables

Histogram – drug A (time to metabolize)

Histogram – drug B (time to metabolize)

Some comments about histograms The width of the class intervals should be chosen so that the number of intervals with a frequency less than 5 is small. This means that the width of the class intervals can decrease as the sample size increases

If the width of the class intervals is too small. The frequency in each interval will be either 0 or 1 The histogram will look like this

If the width of the class intervals is too large. One class interval will contain all of the observations. The histogram will look like this

Ideally one wants the histogram to appear as seen below. This will be achieved by making the width of the class intervals as small as possible and only allowing a few intervals to have a frequency less than 5.

As the sample size increases the histogram will approach a smooth curve. This is the histogram of the population

N = 25

N = 100

N = 500

N = 2000

N = ∞

Comment: the proportion of area under a histogram between two points estimates the proportion of cases in the sample (and the population) between those two values.

Example: The following histogram displays the birth weight (in Kg’s) of n = 100 births

Find the proportion of births that have a birthweight less than 0.34 kg.

Proportion = ( )/100 = 0.62

The Characteristics of a Histogram Central Location (average) Spread (Variability, Dispersion) Shape

Central Location

Spread, Dispersion, Variability

Shape – Bell Shaped (Normal)

Shape – Positively skewed

Shape – Negatively skewed

Shape – Platykurtic

Shape – Leptokurtic

Shape – Bimodal

The Stem-Leaf Plot An alternative to the histogram

Each number in a data set can be broken into two parts – A stem – A Leaf

Example Verbal IQ = –Stem = 10 digit = 8 – Leaf = Unit digit = 4 Leaf Stem

Example Verbal IQ = –Stem = 10 digit = 10 – Leaf = Unit digit = 4 Leaf Stem

To Construct a Stem- Leaf diagram Make a vertical list of “all” stems Then behind each stem make a horizontal list of each leaf

Example The data on N = 23 students Variables Verbal IQ Math IQ Initial Reading Achievement Score Final Reading Achievement Score

Data Set #3 The following table gives data on Verbal IQ, Math IQ, Initial Reading Acheivement Score, and Final Reading Acheivement Score for 23 students who have recently completed a reading improvement program InitialFinal VerbalMathReadingReading StudentIQIQAcheivementAcheivement

We now construct: a stem-Leaf diagram of Verbal IQ

A vertical list of the stems We now list the leafs behind stem

The leafs may be arranged in order

The stem-leaf diagram is equivalent to a histogram

The stem-leaf diagram is equivalent to a histogram

Rotating the stem-leaf diagram we have

The two part stem leaf diagram Sometimes you want to break the stems into two parts for leafs 0,1,2,3,4 * for leafs 5,6,7,8,9

Stem-leaf diagram for Initial Reading Acheivement This diagram as it stands does not give an accurate picture of the distribution

We try breaking the stems into two parts 1.* * 0 2.

The five-part stem-leaf diagram If the two part stem-leaf diagram is not adequate you can break the stems into five parts for leafs 0,1 tfor leafs 2,3 ffor leafs 4, 5 s for leafs 6,7 *for leafs 8,9

We try breaking the stems into five parts 1.*01 1.t23 1.f s * 0

Stem leaf Diagrams Verbal IQ, Math IQ, Initial RA, Final RA

Some Conclusions Math IQ, Verbal IQ seem to have approximately the same distribution “bell shaped” centered about 100 Final RA seems to be larger than initial RA and more spread out Improvement in RA Amount of improvement quite variable

Numerical Measures Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Measures of Central Tendency (Location) Mean Median Mode Central Location

Measures of Non-central Location Quartiles, Mid-Hinges Percentiles Non - Central Location

Measure of Variability (Dispersion, Spread) Variance, standard deviation Range Inter-Quartile Range Variability

Measures of Shape Skewness Kurtosis

Measures of Central Location (Mean) Summation Notation Let x 1, x 2, x 3, … x n denote a set of n numbers. Then the symbol denotes the sum of these n numbers x 1 + x 2 + x 3 + …+ x n

Example Let x 1, x 2, x 3, x 4, x 5 denote a set of 5 denote the set of numbers in the following table. i12345 xixi

Then the symbol denotes the sum of these 5 numbers x 1 + x 2 + x 3 + x 4 + x 5 = = 66

Meaning of parts of summation notation Quantity changing in each term of the sum Starting value for i Final value for i each term of the sum

Example Again let x 1, x 2, x 3, x 4, x 5 denote a set of 5 denote the set of numbers in the following table. i12345 xixi

Then the symbol denotes the sum of these 3 numbers = = = 12979

Mean Let x 1, x 2, x 3, … x n denote a set of n numbers. Then the mean of the n numbers is defined as:

Example Again let x 1, x 2, x 3, x 4, x 5 denote a set of 5 denote the set of numbers in the following table. i12345 xixi

Then the mean of the 5 numbers is:

Interpretation of the Mean Let x 1, x 2, x 3, … x n denote a set of n numbers. Then the mean,, is the centre of gravity of those the n numbers. That is if we drew a horizontal line and placed a weight of one at each value of x i, then the balancing point of that system of mass is at the point.

x1x1 x2x2 x3x3 x4x4 xnxn

In the Example

The mean,, is also approximately the center of gravity of a histogram

The Median Let x 1, x 2, x 3, … x n denote a set of n numbers. Then the median of the n numbers is defined as the number that splits the numbers into two equal parts. To evaluate the median we arrange the numbers in increasing order.

If the number of observations is odd there will be one observation in the middle. This number is the median. If the number of observations is even there will be two middle observations. The median is the average of these two observations

Example Again let x 1, x 2, x 3, x 3, x 4, x 5 denote a set of 5 denote the set of numbers in the following table. i12345 xixi

The numbers arranged in order are: Unique “Middle” observation – the median

Example 2 Let x 1, x 2, x 3, x 4, x 5, x 6 denote the 6 denote numbers: Arranged in increasing order these observations would be: Two “Middle” observations

Median = average of two “middle” observations =

Example The data on N = 23 students Variables Verbal IQ Math IQ Initial Reading Achievement Score Final Reading Achievement Score

Data Set #3 The following table gives data on Verbal IQ, Math IQ, Initial Reading Acheivement Score, and Final Reading Acheivement Score for 23 students who have recently completed a reading improvement program InitialFinal VerbalMathReadingReading StudentIQIQAcheivementAcheivement

Computing the Median Stem leaf Diagrams Median = middle observation =12 th observation

Summary

Some Comments The mean is the centre of gravity of a set of observations. The balancing point. The median splits the obsevations equally in two parts of approximately 50%

The median splits the area under a histogram in two parts of 50% The mean is the balancing point of a histogram 50% median

For symmetric distributions the mean and the median will be approximately the same value 50% Median &

50% median For Positively skewed distributions the mean exceeds the median For Negatively skewed distributions the median exceeds the mean 50%

An outlier is a “wild” observation in the data Outliers occur because –of errors (typographical and computational) –Extreme cases in the population

The mean is altered to a significant degree by the presence of outliers Outliers have little effect on the value of the median This is a reason for using the median in place of the mean as a measure of central location Alternatively the mean is the best measure of central location when the data is Normally distributed (Bell-shaped)