Discrete Data Distributions and Summary Statistics Terms: histogram, mode, mean, range, standard deviation, outlier.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

DESCRIBING DISTRIBUTION NUMERICALLY
Introduction to Summary Statistics
Looking at data: distributions - Describing distributions with numbers IPS chapter 1.2 © 2006 W.H. Freeman and Company.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
Statistics Intro Univariate Analysis Central Tendency Dispersion.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Created by Tom Wegleitner, Centreville, Virginia Section 3-1.
Statistics Lecture 2. Last class began Chapter 1 (Section 1.1) Introduced main types of data: Quantitative and Qualitative (or Categorical) Discussed.
Looking at data: distributions - Describing distributions with numbers
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Measures of Central Tendency
Today: Central Tendency & Dispersion
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Programming in R Describing Univariate and Multivariate data.
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
STA Lecture 111 STA 291 Lecture 11 Describing Quantitative Data – Measures of Central Location Examples of mean and median –Review of Chapter 5.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
Slide Slide 1 Section 3-3 Measures of Variation. Slide Slide 2 Key Concept Because this section introduces the concept of variation, which is something.
Descriptive Statistics Measures of Variation. Essentials: Measures of Variation (Variation – a must for statistical analysis.) Know the types of measures.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Measures of Variation Section 3-3.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Chapter 2 Describing Data.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Measures of Center.
Lecture 3 Describing Data Using Numerical Measures.
Skewness & Kurtosis: Reference
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Categorical vs. Quantitative…
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
1 Measures of Center. 2 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely.
Central Tendency & Dispersion
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Describing Distributions Numerically.
Chapter 5: Measures of Dispersion. Dispersion or variation in statistics is the degree to which the responses or values obtained from the respondents.
IPS Chapter 1 © 2012 W.H. Freeman and Company  1.1: Displaying distributions with graphs  1.2: Describing distributions with numbers  1.3: Density Curves.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
INTRODUCTION TO STATISTICS
MATH-138 Elementary Statistics
One-Variable Statistics
Chapter 1: Exploring Data
Chapter 3 Describing Data Using Numerical Measures
Describing Distributions Numerically
Objective: Given a data set, compute measures of center and spread.
Introduction to Summary Statistics
Midrange (rarely used)
Introduction to Summary Statistics
Chapter 3 Describing Data Using Numerical Measures
Descriptive Statistics
Introduction to Summary Statistics
DAY 3 Sections 1.2 and 1.3.
Introduction to Summary Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Chapter 1: Exploring Data
Summary (Week 1) Categorical vs. Quantitative Variables
Summary (Week 1) Categorical vs. Quantitative Variables
Describing Distributions Numerically
Advanced Algebra Unit 1 Vocabulary
Lesson Plan Day 1 Lesson Plan Day 2 Lesson Plan Day 3
Presentation transcript:

Discrete Data Distributions and Summary Statistics Terms: histogram, mode, mean, range, standard deviation, outlier

Discrete vs. Continuous Data dis·crete adj. 1. Constituting a separate thing. See Synonyms at distinct.distinct 2. Consisting of unconnected distinct parts. 3. Mathematics: Defined for a finite or countable set of values; not continuous. con·tin·u·ous adj. 1. Uninterrupted in time, sequence, substance, or extent. See Synonyms at continual.continual 2. Attached together in repeated units: a continuous form fed into a printer. 3. Mathematics: Of or relating to a line or curve that extends without a break or irregularity.

Discrete vs. Continuous Data discrete Usually related to counts. Variable values for different units often tie. Averaging two values does not necessary yield another possible value. continuous Any value in some interval. A tie among different units is in theory virtually impossible (and in practice very rare). Ties (due to rounding) are infrequent in practice. The average of any two values is another (and different) possible value.

Distribution The distribution of a variable tells us what values it takes and how often it takes those values. MAKE A PICTURE! For discrete quantitative data, use a relative frequency chart / histogram* to display the distribution. * Fundamentally these are the same thing.

Outlier outlier noun 1: something that is situated away from or classed differently from a main or related body 2: a statistical observation that is markedly different in value from the others of the sample

Measures of Center Median Half the data are above/below the median. Not too suitable to highly discrete data. More later about this. (Sample) Mean Sum all the data x, then divide by how many (n) Denoted (“x bar”) Both have the same measurement units as the data.

Less Important Measures of Center Midrange Average the minimum and maximum For highly skewed data, the midrange is often a value that is quite atypical. Mode Most common value - highest proportion of occurrence There can be 2 (or more) modes if there are ties in relative frequencies. Generally found by graphical inspection. Sometimes not anywhere near any “center.” Both have the same measurement units as the data.

Measure of spread / variation SAME THING Range = Max – Min In statistics Range is a single number Interquartile Range Better suited to continuous data More later about this. Variance / Standard Deviation All but variance have the same measurement units as the data.

Variance S 2 Mean of the squared deviations from the mean 1.Obtain the Mean. 2.Determine, for each value, the deviation from the Mean. 3.Square each of these deviations 4.Sum these squares 5.Divide this sum by one fewer than the number of observations to get the Variance Measure of squared variation from the mean

Standard Deviation S Square root of the Variance Measure of spread / variation (from the mean) Same measurement units as the data.

Comparing Means & Standard Deviations Small: Mean = 41.60SD = 2.07 Large: Mean = 44.80SD = 2.59

Comparing Means & Standard Deviations Mean44.80SD2.59 Add a 40 and a 50…

Comparing Means & Standard Deviations Mean44.80SD2.59 Add a 40 and a 50… Mean 44.86SD3.58

Comparing Means & Standard Deviations Mean44.80SD2.59 Add a 42 and a 48…

Comparing Means & Standard Deviations Mean44.80SD2.59 Add a 42 and a 48… Mean 44.86SD2.73

Comparing Means & Standard Deviations Mean44.80SD2.59 Add 45 and 45…

Comparing Means & Standard Deviations Mean44.80SD2.59 Add 45 and 45… Mean 44.86SD2.12

Comparing Means & Standard Deviations Mean = 4.0SD = 3.0 Mean = 8.0SD = 3.0

Comparing Means & Standard Deviations Mean = 8.0SD = 3.0 Mean = 8.0SD = 6.0

Computing Mean & Standard Deviation Data listed by unit 1.By hand with calculator support (UGH) 2.Using your calculator’s built in statistics functionality 60 second quiz: Determine and write down the mean and standard deviation of at most 10 data values in under 1 minute 3.Using Excel 4.Using Minitab

Z = # of St Devs from Mean “…within Z standard deviations of the mean…” Determine Z  SD. Find the values Mean – Z  SD & Mean + Z  SD This means: “…between __________ and ______________.”

Mean & Standard Deviation Where the data are In general you’ll find that about 68% of the data falls within 1 standard deviation of the mean 95% falls within 2 all falls within 3 There are exceptions. These guidelines hold fairly precisely for data that has a bell (Normal) shaped histogram.

Range Rule of Thumb To guess the standard deviation, take the usual range of data and divide by four.

Most homes for sale in the Oswego City School District are listed at prices between $50,000 and $200,000. What would you guess for the standard deviation of prices?

$50,000 to $200,000 Range about $ – $50000 = $ Apply the RRoT… $ / 4 = $37,500

Students are asked to complete a survey online. This assignment is made on a Monday at about noon. The survey closes Wednesday at midnight. Since each student’s submission is accompanied by a time stamp, it is simple to figure how early, relative to the deadline, each student submitted the work. For the data set of amount of time early, guess the standard deviation. Give results in both days and hours.

This assignment is made on a Monday at about noon. The survey closes Wednesday at midnight. That’s 2.5 days, or 60 hours. People will hand it in between immediately (2.5 days / 60 hours early) and at the last minute (0 early). The range is about 2.5 days or 60 hours. Apply the RRoT… 2.5 / 4 = days these are the same 60 / 4 = 15 hours

Consider GPAs of graduating seniors. Guess the standard deviation.

GPAs. You can’t graduate under 2.0. All As gives 4.0. Min about 2.0Max probably exactly 4.0 Range about 4.0 – 2.0 = 2.0 Apply the RRoT… 2.0 / 4 = 0.5

Example An instructor asked students in two sections of the same course to guess the instructor’s age. Students in the first class (in a large lecture hall) had no other knowledge of the instructor’s personal life. Students in the second class (in a small classroom) knew that the instructor was the father of a young girl.

Variable Guess of instructor’s age Quantitative Units The students Guess of instructor’s age varies from student to student.

Variable Class (or Which class?) Categorical Units The students Which class varies from student to student.

This is a fairly symmetric distribution. Mode = 42 Range = 54 – 32 = 22

This is a symmetric distribution.Mode = 42 Mean = 42.0 Symmetry: Typically Mean  Mode “Nearly equal”

Mean = 42.0 Mean = 39.0

Mean = 42.0St Dev  22 / 4 = 5.5 Mean = 39.0St Dev  22/ 4 = 5.5

Mean = 40.25St Dev = 4.33 (guess 4.25) Mean = St Dev = 4.14 (guess 3.75)

Properties: Mean & Standard Deviation They don’t really “depend” (in the usual sense) on how much data there is. They depend on the relative frequency (percent) of occurrence of each value. Adding a new unit… Sometimes the mean will go up; sometimes down. But on average it will stay the same. Same for standard deviation.

Standard Deviation Calculation

ALWAYS – for every data set

Standard Deviation Calculation

Sample Mean: Sample Standard Deviation: