CHS 221 V ISUALIZING D ATA Week 3 Dr. Wajed Hatamleh 1.

Slides:



Advertisements
Similar presentations
Descriptive Measures MARE 250 Dr. Jason Turner.
Advertisements

1 Chapter 2. Section 2-4. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION E LEMENTARY.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
Introduction to Summary Statistics
Introductory Statistics Options, Spring 2008 Stat 100: MWF, 11:00 Science Center C. Stat 100: MWF, 11:00 Science Center C. –General intro to statistical.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Descriptive Statistics Statistical Notation Measures of Central Tendency Measures of Variability Estimating Population Values.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Created by Tom Wegleitner, Centreville, Virginia Section 3-1.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Measures of Dispersion
Chapter 3: Central Tendency
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data.
Today: Central Tendency & Dispersion
Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Summarizing Scores With Measures of Central Tendency
Describing distributions with numbers
hss2381A – quantitative methods
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Copyright © 2004 Pearson Education, Inc.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Chapter 3 EDRS 5305 Fall 2005 Gravetter and Wallnau 5 th edition.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Slide 1 Lecture 4: Measures of Variation Given a stem –and-leaf plot Be able to find »Mean ( * * )/10=46.7 »Median (50+51)/2=50.5 »mode.
Statistics Workshop Tutorial 3
Chapter 3 Statistics for Describing, Exploring, and Comparing Data
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Created by Tom Wegleitner, Centreville, Virginia Section 3-1 Review and.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
Measures of Central Tendency or Measures of Location or Measures of Averages.
Smith/Davis (c) 2005 Prentice Hall Chapter Four Basic Statistical Concepts, Frequency Tables, Graphs, Frequency Distributions, and Measures of Central.
Copyright © 2004 Pearson Education, Inc.. Chapter 2 Descriptive Statistics Describe, Explore, and Compare Data 2-1 Overview 2-2 Frequency Distributions.
Created by Tom Wegleitner, Centreville, Virginia Section 2-4 Measures of Center.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Chapter 2 Describing Data.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Descriptive Statistics: Presenting and Describing Data.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
1 M ARIO F. T RIOLA E IGHTH E DITION E LEMENTARY S TATISTICS Section 2-4 Measures of Center.
1 a value at the center or middle of a data set Measures of Center.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Chapter 2 Descriptive Statistics
Honors Statistics Chapter 3 Measures of Variation.
Measures of Central Tendency (MCT) 1. Describe how MCT describe data 2. Explain mean, median & mode 3. Explain sample means 4. Explain “deviations around.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
PRESENTATION OF DATA.
Descriptive Statistics
Chapter 2: Methods for Describing Data Sets
Summarizing Scores With Measures of Central Tendency
Descriptive Statistics: Presenting and Describing Data
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Midrange (rarely used)
Description of Data (Summary and Variability measures)
STATS DAY First a few review questions.
Numerical Descriptive Measures
Descriptive Statistics
An Introduction to Statistics
Overview Created by Tom Wegleitner, Centreville, Virginia
Statistics: The Interpretation of Data
Chapter 2 Describing, Exploring, and Comparing Data
Presentation transcript:

CHS 221 V ISUALIZING D ATA Week 3 Dr. Wajed Hatamleh 1

V ISUALIZING D ATA Depict the nature of shape or shape of the data distribution In a graph: Different graphs used for different types of data 2

H ISTOGRAM Another common graphical presentation of quantitative data is a histogram. The variable of interest is placed on the horizontal axis. A rectangle is drawn above each class interval with its height corresponding to the interval’s frequency, relative frequency, or percent frequency. 3

H ISTOGRAMS Histograms: Used for quantitative data Similar to a bar graph, with an X and Y axis—but adjacent values are on a continuum so bars touch one another Data values on X axis are arranged from lowest to highest Bars are drawn to height to show frequency or percentage (Y axis) 4

H ISTOGRAMS ( CONT ’ D ) Example of a histogram: Heart rate data f Heart rate in bpm 5

Histogram A bar graph in which the horizontal scale represents the classes of data values and the vertical scale represents the frequencies. Figure 2-1 6

Relative Frequency Histogram Has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies. Figure 2-2 7

Histogram and Relative Frequency Histogram Figure 2-1 Figure 2-2

Ogive. n An ogive is a graph of a cumulative distribution. n The data values are shown on the horizontal axis. n Shown on the vertical axis are the: cumulative frequencies, or cumulative frequencies, or cumulative relative frequencies, or cumulative relative frequencies, or cumulative percent frequencies cumulative percent frequencies n The frequency (one of the above) of each class is plotted as a point. n The plotted points are connected by straight lines. 9

Ogive A line graph that depicts cumulative frequencies Figure

B AR G RAPHS Bar graphs: Used qualitative data. Bar graphs have a horizontal dimension (X axis) that specifies categories (i.e., data values) The vertical dimension (Y axis) specifies either frequencies or percentages Bars for each category drawn to the height that indicates the frequency or % 11

B AR G RAPHS Example of a bar graph Note the bars do not touch each other

P IE C HART Pie Charts: Also used for qualitative data. Circle is divided into pie-shaped wedges corresponding to percentages for a given category or data value All pieces add up to 100% Place wedges in order, with biggest wedge starting at “12 o’clock” 13

P IE C HART Example of a pie chart, for same marital status data

Recap In this Section we have discussed graphs that are pictures of distributions. Keep in mind that the object of this section is not just to construct graphs, but to learn something about the data sets – that is, to understand the nature of their distributions. 15

C HARACTERISTICS OF A D ATA D ISTRIBUTION Central tendency Variability Both central tendency and variability can be expressed by indexes that are descriptive statistics 16

C ENTRAL T ENDENCY Indexes of central tendency provide a single number to characterize a distribution Measures of central tendency come from the center of the distribution of data values, indicating what is “typical,” and where data values tend to cluster Popularly called an “average” 17

C ENTRAL T ENDENCY I NDEXES Three alternative indexes: The mode The median The mean 18

T HE M ODE The mode is the score value with the highest frequency; the most “popular” score Age: Mode = 27  The mode 19

T HE M ODE : A DVANTAGES Can be used with data measured on any measurement level (including nominal level) Easy to “compute” Reflects an actual value in the distribution, so it is easy to understand Useful when there are 2+ “popular” scores (i.e., in multimodal distributions) 20

Mode A data set may be: Bimodal Multimodal No Mode  denoted by M the only measure of central tendency that can be used with qualitative data 21

a b c Examples  Mode is 1.10  Bimodal - 27 & 55  No Mode 22

T HE M ODE : D ISADVANTAGES Ignores most information in the distribution Tends to be unstable (i.e., value varies a lot from one sample to the next) Some distributions may not have a mode (e.g., 10, 10, 11, 11, 12, 12) 23

T HE M EDIAN The median is the score that divides the distribution into two equal halves 50% are below the median, 50% above Age: Median (Mdn) = 28  The median 24

(in order - odd number of values) exact middle MEDIAN is (even number of values – no exact middle shared by two numbers) MEDIAN is

T HE M EDIAN : A DVANTAGES Not influenced by outliers Particularly good index of what is “typical” when distribution is skewed Easy to “compute” 26

T HE M EDIAN : D ISADVANTAGES Does not take actual data values into account—only an index of position Value of median not necessarily an actual data value, so it is more difficult to understand than mode 27

T HE M EAN The mean is the arithmetic average Data values are summed and divided by N Age: Mean = 28.3  The mean 28

T HE M EAN ( CONT ’ D ) Most frequently used measure of central tendency Equation: M = Σ X ÷ N Where: M = sample mean Σ = the sum of X = actual data values N = number of people 29

T HE M EAN : A DVANTAGES The balance point in the distribution: Sum of deviations above the mean always exactly balances those below it Does not ignore any information The most stable index of central tendency Many inferential statistics are based on the mean 30

T HE M EAN : D ISADVANTAGES Sensitive to outliers Gives a distorted view of what is “typical” when data are skewed Value of mean is often not an actual data value 31

T HE M EAN : S YMBOLS Sample means: In reports, usually symbolized as M In statistical formulas, usually symbolized as (pronounced X bar) Population means: The Greek letter μ (mu) 32

Notation µ is pronounced ‘mu’ and denotes the mean of all values in a population x = n ∑ x∑ x is pronounced ‘x-bar’ and denotes the mean of a set of sample values x N µ = ∑ x∑ x 33

Best Measure of Center 34

 Symmetric Data is symmetric if the left half of its histogram is roughly a mirror image of its right half.  Skewed Data is skewed if it is not symmetric and if it extends more to one side than the other. Definitions 35

Skewness Figure

Recap In this section we have discussed:  Types of Measures of Center Mean Median Mode  Mean from a frequency distribution  Best Measures of Center  Skewness 37

M EASURES OF V ARIATION Because this section introduces the concept of variation, this is one of the most important sections in the entire book 38

D EFINITION The range of a set of data is the difference between the highest value and the lowest value value highest lowest value 39

D EFINITION The standard deviation of a set of sample values is a measure of variation of values about the mean 40

S AMPLE S TANDARD D EVIATION F ORMULA ∑ ( x - x ) 2 n - 1 S =S = 41

S AMPLE S TANDARD D EVIATION (S HORTCUT F ORMULA ) n ( n - 1) s = n ( ∑ x 2 ) - ( ∑ x ) 2 42

Standard Deviation - Key Points  The standard deviation is a measure of variation of all values from the mean  The value of the standard deviation s is usually positive  The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others )  The units of the standard deviation s are the same as the units of the original data values 43

Definition  Empirical ( ) Rule For data sets having a distribution that is approximately bell shaped, the following properties apply :  About 68% of all values fall within 1 standard deviation of the mean  About 95% of all values fall within 2 standard deviations of the mean  About 99.7% of all values fall within 3 standard deviations of the mean 44

The Empirical Rule FIGURE

The Empirical Rule FIGURE

The Empirical Rule FIGURE

A RE YOU R EADY Post test Time 48

Slide Which measure of center is the only one that can be used with data at the catogrical level of measurement? A.Mean B.Median C.Mode

Slide Which of the following measures of center is not affected by outliers? A.Mean B.Median C.Mode

Slide Which of the following measures of center is not affected by outliers? A.Mean B.Median C.Mode

Slide Find the mode (s) for the given sample data. 79, 25, 79, 13, 25, 29, 56, 79 A.79 B.48.1 C.42.5 D.25

Slide Find the mode (s) for the given sample data. 79, 25, 79, 13, 25, 29, 56, 79 A.79 B.48.1 C.42.5 D.25

Slide Which is not true about the variance? A.It is the square of the standard deviation. B. It is a measure of the spread of data. C. The units of the variance are different from the units of the original data set. D. It is not affected by outliers.

Slide Which is not true about the variance? A.It is the square of the standard deviation. B. It is a measure of the spread of data. C. The units of the variance are different from the units of the original data set. D. It is not affected by outliers.

Slide Which of the following measures of center is not affected by outliers? A.Mean B.Median C.Mode

E XERCISE TIME 57

E XERCISE 1 1. The following 10 data values are diastolic blood pressure readings. Compute the mean, the range and SD, for these data

E XERCISE 2 The following are the fasting blood glucose level of 10 children Compute the: a. range b. standard deviation 59

E XERCISE 3 3.The fifteen patients making initial visits to a rural health department travelled these distances: Find: a. Range, b. Standard Deviation Patient Distance (Miles) Patient Distance (Miles)

A NSWER 1. Range = 60 ; SD = Range = 16 ; SD = Range = 12 ; SD =