MATH-138 Elementary Statistics

Slides:



Advertisements
Similar presentations
Chapter 3, Numerical Descriptive Measures
Advertisements

Describing Quantitative Variables
DESCRIBING DISTRIBUTION NUMERICALLY
Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Statistics 100 Lecture Set 6. Re-cap Last day, looked at a variety of plots For categorical variables, most useful plots were bar charts and pie charts.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
Wednesday, May 13, 2015 Report at 11:30 to Prairieview.
Skewness & Kurtosis: Reference
Univariate EDA. Quantitative Univariate EDASlide #2 Exploratory Data Analysis Univariate EDA – Describe the distribution –Distribution is concerned with.
UNIT #1 CHAPTERS BY JEREMY GREEN, ADAM PAQUETTEY, AND MATT STAUB.
AP Statistics. Chapter 1 Think – Where are you going, and why? Show – Calculate and display. Tell – What have you learned? Without this step, you’re never.
Howard Community College
Descriptive Statistics ( )
UNIT ONE REVIEW Exploring Data.
Thursday, May 12, 2016 Report at 11:30 to Prairieview
Methods for Describing Sets of Data
Chapter 1: Exploring Data
Statistics for Managers Using Microsoft® Excel 5th Edition
Business and Economics 6th Edition
Analysis and Empirical Results
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Unit 4 Statistical Analysis Data Representations
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
Laugh, and the world laughs with you. Weep and you weep alone
Chapter 3 Describing Data Using Numerical Measures
CHAPTER 1 Exploring Data
Numerical Descriptive Measures
CHAPTER 1 Exploring Data
Unit 7: Statistics Key Terms
Chapter 5: Describing Distributions Numerically
Please take out Sec HW It is worth 20 points (2 pts
AP Exam Review Chapters 1-10
Topic 5: Exploring Quantitative data
Histograms: Earthquake Magnitudes
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
BUS173: Applied Statistics
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Welcome!.
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Numerical Descriptive Measures
Honors Statistics Review Chapters 4 - 5
CHAPTER 1 Exploring Data
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Ten things about Descriptive Statistics
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
The Five-Number Summary
Probability and Statistics
CHAPTER 1 Exploring Data
Advanced Algebra Unit 1 Vocabulary
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Lesson Plan Day 1 Lesson Plan Day 2 Lesson Plan Day 3
Business and Economics 7th Edition
Presentation transcript:

MATH-138 Elementary Statistics Emily C. Francis Howard Community College Unit 1 Lecture Slides

What is Statistics? Statistics is the science of: Collecting data Analyzing data Drawing conclusions and making decisions as a result of the data analysis. This is referred to as “statistical inference”.

What is a “Statistic”? A statistic is a function of the data Data -> [function] -> statistic For example, suppose we have a data set of height of students. Taking the average, or mean, of heights is a function. Thus the mean height is a statistic of the data.

Phases in Statistical Analysis Data Collection: The process of collecting data (samples) via surveys, observational studies, and/or designed experiments Data analysis: Graphing and summarizing key features of the data to discover major patterns in the data Statistical Inference: Drawing inferences (conclusions) and making decisions based on the data

Population vs. Sample For a given statistical inquiry: The population consists of all items of interest (people, places, companies, etc.) A sample is a (hopefully representative and random) subset of the population A numerical value/characteristic of a population is called a parameter. These are usually unknown. A numerical value/characteristic of a sample is called a statistic 5 5

Components of a Data Set Cases: people, places, companies, colleges, etc. Variables: characteristics/measurements of each individual case

Variable Types Categorical variables Quantitative variables Have values that are described by words Represent categories Can be represented with #’s (the actual #’s assigned are irrelevant). The #’s have no units and no mathematical operations can be performed on these #’s. Quantitative variables Have numerical values and units

Displaying & Describing Categorical Data No mathematical operations can be performed on categorical data Categorical data can only be counted and then described/displayed using: Frequency (and relative frequency) tables Bar charts Pie charts 8

Contingency Tables A contingency table shows how cases are distributed along each variable, contingent on the value of another variable Marginal and conditional distributions 9

Displaying & Summarizing Quantitative Data A frequency (& relative frequency) distribution is an excellent initial data analysis tool A histogram is a visual representation of a frequency distribution. A relative frequency histogram is a visual representation of a relative frequency distribution Dotplot 10

Describing a Distribution Shape Center Spread 11

Distribution Shape “Modality” Symmetry Outliers 12

Measures of Center Median Mean 13

Median The median of a variable is the midpoint of the sorted data values For odd n, the median equals the middle data value For even n, the median equals the average of the middle two values Is useful when the variable of interest has a skewed distribution and/or has outliers (it is not sensitive to these outliers) Does not have to be a data value 14

Mean The mean is the sum of all the data values divided by the # of data values: Treats all values equally and can therefore be influenced by outliers Does not have to be a data value Deviations from the mean to the data points always sum to zero Is useful when the variable of interest is symmetric with no outliers 15

Distribution Shape (Contd.) Symmetric data: Mean is approx. equal to the median Tails of the distribution are balanced Skewed left data: Mean<Median Long tail of distribution “points” left A few low values, but most data on right Skewed right data: Median<Mean Long tail of distribution “points” right A few high values, but most data on left 16

Five-Number Summary Max Q3 Median Q1 Min 17

Measures of Spread Range Interquartile range (IQR) Variance Standard deviation 18

Range & IQR The range is the difference between the maximum and minimum data values IQR = Q3 – Q1 The IQR is useful when the variable of interest has a skewed distribution and/or has outliers (it is not sensitive to these outliers) 19

Variance The variance is basically the average of the squared deviations from the mean: The units of this statistic are in squared units of the original data values 20

Standard Deviation The SD is the square root of the variance: Is a single # that helps us understand how spread out the data is Units of measurement are the same as the original data 21

Standard Deviation (Contd.) The standard deviation (and variance) statistics are never negative If every data value is equal, then there is no variation, and hence SD=Var=0 Is useful when the variable of interest is symmetric with no outliers 22

Boxplots A Boxplot is a graphical display of the five-number summary The procedure to construct a boxplot can be found on pgs. 90-91 of the text 23

Standardized Variables: Z Scores To “standardize” a variable, calculate each observation’s distance from the mean in units of the standard deviation. That is, define variable Z as: 24

Normal Models A normal model: Is symmetric and “bell” shaped Is commonly used to model many things in the business and physical worlds Is defined by 2 parameters, μ (the mean) and σ (the standard deviation) Its distribution peaks at μ A normal distribution with mean=0 and std. dev.=1 is called “standard”

The 68-95-99.7 Rule For data from a NORMAL model: ~68% will lie within 1 std. dev. of the mean ~95% will lie within 2 std. dev’s of the mean ~99.7% (virtually all the data) will lie within 3 std. dev’s of the mean 26

Normalcdf & Invnorm If you are given a value(s) and you want a percentage under the normal model, you use “normalcdf” on your calculator: normalcdf(left value, right value, mean, std. dev.) If you are given a percentage under the normal model and you want a value, you use “invnorm” on your calculator: invnorm(percentage, mean, std. dev.) 27

Scatter Plots A scatter plot shows n pairs of bivariate data observations on an X-Y graph A scatter plot is usually the starting point for bivariate data analysis We create scatter plots to investigate the relationship between two variables: Direction Form Strength

Correlation In our discussion of correlation (and regression), we will be talking about paired sample data A correlation exists between 2 variables when one of them is related to the other in some way The linear correlation coefficient, r, measures the strength of the LINEAR relationship between two variables Before you calculate r, the following should hold: Quantitative variables condition “Straight Enough” condition Outlier condition

Correlation Properties The value of r is always between -1 and 1, inclusive. That is, -1<=r<=1. The value of r is not affected by the choice of x or y r measures the strength of a linear relationship. It is not designed to measure the strength of a relationship that is not linear. Correlation is sensitive to outliers Correlation does not imply causality! Correlation does not measure slope

Regression If 2 variables have a “significant” linear correlation, it is appropriate to estimate their exact linear relationship – regression does this A regression estimates a and b so that the linear relationship between x and y can be expressed as: Note that is the PREDICTED value of y – thus, you can use this equation to predict values of y for given values of x (though not all values of x) The residual for any data point is: 31

Regression (Cont.) When predicting a value of y based on some given value of x, do the following: If there is NOT a linear correlation, the best predicted y-value is the sample average of y If there IS a linear correlation, the best predicted y-value is found using the regression equation 32