Descriptive Statistics

Slides:



Advertisements
Similar presentations
The mean for quantitative data is obtained by dividing the sum of all values by the number of values in the data set.
Advertisements

Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Calculating & Reporting Healthcare Statistics
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
SOC 3155 SPSS CODING/GRAPHS & CHARTS CENTRAL TENDENCY & DISPERSION.
Introduction to Educational Statistics
Data observation and Descriptive Statistics
Describing Data: Numerical
Study of Measures of Dispersion and Position. DATA QUALITATIVEQUANTITATIVE DISCRETE CONTINUOUS.
Describing distributions with numbers
BIOSTAT - 2 The final averages for the last 200 students who took this course are Are you worried?
Measures of Variability OBJECTIVES To understand the different measures of variability To determine the range, variance, quartile deviation, mean deviation.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Descriptive Statistics
By: Amani Albraikan 1. 2  Synonym for variability  Often called “spread” or “scatter”  Indicator of consistency among a data set  Indicates how close.
Describing distributions with numbers
Lecture 3 Describing Data Using Numerical Measures.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
INVESTIGATION 1.
June 21, Objectives  Enable the Data Analysis Add-In  Quickly calculate descriptive statistics using the Data Analysis Add-In  Create a histogram.
Numerical Measures of Variability
Central Tendency & Dispersion
Summary Statistics: Measures of Location and Dispersion.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 5. Measuring Dispersion or Spread in a Distribution of Scores.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Descriptive Statistics(Summary and Variability measures)
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
A QUANTITATIVE RESEARCH PROJECT -
Descriptive Statistics ( )
SPSS CODING/GRAPHS & CHARTS CENTRAL TENDENCY & DISPERSION
Analysis and Empirical Results
Chapter 3 Describing Data Using Numerical Measures
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Research Methods in Psychology PSY 311
Descriptive Statistics (Part 2)
Teaching Statistics in Psychology
26134 Business Statistics Week 3 Tutorial
Central Tendency and Variability
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Descriptive Statistics
Description of Data (Summary and Variability measures)
Summary descriptive statistics: means and standard deviations:
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Chapter 3.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Numerical Descriptive Measures
Summary descriptive statistics: means and standard deviations:
Basic Practice of Statistics - 3rd Edition
Numerical Descriptive Measures
Essential Statistics Describing Distributions with Numbers
Basic Practice of Statistics - 3rd Edition
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
CHAPTER 2: Basic Summary Statistics
Basic Practice of Statistics - 3rd Edition
Advanced Algebra Unit 1 Vocabulary
Measures of Dispersion
Basic Biostatistics Measures of central tendency and dispersion
Numerical Descriptive Measures
Presentation transcript:

Descriptive Statistics ••••••••••••••••••••••••••••••••••

Describing your Data Generally one of the first things to do with new data is to get to know it by asking some general questions like but not limited to the following: What variables are included? What information are we getting? What is the format of the variables: string, numeric, etc.? What type of variables: categorical, continuous, and discrete? Is this sample or population data?

After looking at the data you may want to know: How many males/females? What is the average age? How many undergraduate/graduates students? What is the average SAT score? It is the same for graduates and undergraduates? Who reads the newspaper more frequently: men or women?

Statistics is basically the study of what causes such variability. You can start answering some of these questions by looking directly at the table. For some other questions, you may have to do some calculations by obtaining a set of descriptive statistics. These statistics are a collection of measurements of two things: location and variability. Location tells you the central value (the mean is the most common measure of this) of your variables. Variability refers to the spread of the data from the center value (i.e. variance, standard deviation). Statistics is basically the study of what causes such variability. Location Variability Mean Variance Mode Standard deviation Median Range

Use the Descriptive Statistics data file. For input range, select ALL data. Descriptive Statistics Data Analysis Click on DATA Use the Descriptive Statistics data file. You should see the screen below.

Check “Summary statistics” Press OK Check “Summary statistics” For the output option which is the place where excel will enter the results, select O1 or you can select a new worksheet or even new workbook. Since we include the labels in first row make sure to check that option.

You will get the following:

While the whole descriptive statistics cells are selected go to Format Cells to change all numbers to have one decimal point. When you get the ‘format cells’ window, select the following:

Click OK. All numbers should now have one decimal as follows:

Now we know something about our data. the average student in this sample is 25.2 years has a SAT score of 1848.9 got a grade of 80.4 is 66.4 inches tall reads the newspaper 4.9 times a week We know this by looking at the ‘mean’ value on each variable.

Mean =AVERAGE(range of cells with the values of interest) The sum of the observations divided by the total number of observations. The most common indicator of central tendency of a variable. Mean If you look at the last two rows: “Sum” and “Count” you can estimate the mean dividing “Sum” by ”Count” (sum/count). You can also calculate the mean using the function below (IMPORTANT: All functions start with the equal “=” sign): =AVERAGE(range of cells with the values of interest)

Sum refers to the sum of all the values in a range of values The excel function for sum is: =SUM(range of cells with the values of interest)

means the sum of the ages of all students For “age” =AVERAGE(J2:J31)

Count refers to the count of cell that contain values (numbers) The function is: =COUNT(range of cells with the values of interest)

“Min” the lowest value in an array of values The function is: =MIN(range of cells with the values of interest)

“Max” the largest value in an array of values The function is: =MAX(range of cells with the values of interest)

indicates how close the sample mean is from the ‘true’ population mean The average age of 25.2 years is just an estimate of this sample of students but it can vary had you used a different set of students. The standard error is calculated by dividing the standard deviation of the population (or the sample) by the square root of the total number of observations. Standard Error (SE)

Mean – (SE*Z) for example 25.2 – (1.3 * 2) = 22.7 The Standard Error (SE) can be used to roughly define a range of certainty for the mean. Using “age”: Z % Certainty Lower bound Upper bound 1 (0.99) 68% 23.9 26.5 2 (1.96) 95% 22.7 27.7 3 (2.58) 99% 21.4 29.0 Lower:  Mean – (SE*Z) for example 25.2 – (1.3 * 2) = 22.7 Upper:   Mean + (SE*Z) for example 25.2 + (1.3 * 2) = 27.7

You are 68% certain that the average age is between 23. 9 and 26 You are 68% certain that the average age is between 23.9 and 26.5 years old You are 95% certain that the average age is between 22.7 and 27.7 years old You are 99% certain that the average age is between 21.4 and 29.0 years old Note that the more certainty the wider the gap.

Median =MEDIAN(range of cells with the values of interest) another measure of central tendency which is the number in the middle To get the median you have to order the data from lowest to highest. If the number of cases is odd the median is the single value, for an even number of cases the median is the average of the two numbers in the middle Median The excel function is: =MEDIAN(range of cells with the values of interest)

refers to the most frequent, repeated or common number in the data By age there are more students 19 years old in the sample than any other group. In the SAT scores the mode is “#N/A” which means that all values are unique. Mode The function is: =MODE(range of cells with the values of interest)

Range measure of dispersion It is simply the difference between the largest and smallest value, “max” – “min”. Range

Sample Variance measures the dispersion of the data from the mean It is the simple mean of the squared distance from the mean. It is calculated by: SV = sum of (X-mean of X)2 / number of observation minus 1 Higher variance means more dispersion from the mean.   Sample Variance The excel function is: =VAR(range of cells with the values of interest)

Standard Deviation the squared root of the variance indicates how close the data is to the mean Assuming a normal distribution, 68% of the values are within 1 sd from the mean, 95% within 2 sd and 99% within 3 sd. Standard Deviation  The excel function is: =STDEV(range of cells with the values of interest)

measures the asymmetry of the data, when in an otherwise normal curve one of the tails is longer than the other It is a roughly test for normality in the data (by dividing it by the SE). Skewness

If it is positive there is more data on the left side of the curve (right skewed, the median and the mode are lower than the mean). A negative value indicates that the mass of the data is concentrated on the right of the curve (left tail is longer, left skewed, the median and the mode are higher than the mean). A normal distribution has a skew of 0.  Skewness Can also be estimated using the function: =SKEW(range of cells with the values of interest)

Kurtosis measures the peak of the distribution also an indicator of normality Positive kurtosis indicates too few cases in the tails or a tall distribution (leptokurtic). Negative kurtosis indicates too many cases in the tails or a flat distribution (platykurtic). A normal distribution has a kurtosis of 0 (given a correction of –3, otherwise it will have a kurtosis of 3). Kurtosis The excel function is: =KURT(range of cells with the values of interest)

Quartiles During a period of one month, a random sample of approved insurance policies were selected and listed below: Computer the mean, median, first, and third quartile Compute the range, interquartile range, standard deviation. What would you tell a customer who enters the bank to purchase this type of policy and asks how long the approval process is?