Chapter 3 EXPLORATION DATA ANALYSIS 3.1 GRAPHICAL DISPLAY OF DATA 3.2 MEASURES OF CENTRAL TENDENCY 3.3 MEASURES OF DISPERSION.

Slides:



Advertisements
Similar presentations
Agricultural and Biological Statistics
Advertisements

Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Very Basic Statistics.
Chapter 2: Charts and Graphs. LO1Explain the difference between grouped and un- grouped data and construct a frequency distribution from a set of data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
2 Textbook Shavelson, R.J. (1996). Statistical reasoning for the behavioral sciences (3 rd Ed.). Boston: Allyn & Bacon. Supplemental Material Ruiz-Primo,
Measures of Central Tendency
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Summarizing Scores With Measures of Central Tendency
Objective To understand measures of central tendency and use them to analyze data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Business Statistics Chapter 2 Charts & Graphs by Ken Black.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Sta220 - Statistics Mr. Smith Room 310 Class #3. Section
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
© Copyright McGraw-Hill CHAPTER 3 Data Description.
What is Business Statistics? What Is Statistics? Collection of DataCollection of Data –Survey –Interviews Summarization and Presentation of DataSummarization.
Chapter 2 Describing Data.
Basic Statistics  Statistics in Engineering  Collecting Engineering Data  Data Summary and Presentation  Probability Distributions - Discrete Probability.
Lecture 3 Describing Data Using Numerical Measures.
Basic Statistics  Statistics in Engineering  Collecting Engineering Data  Data Summary and Presentation  Probability Distributions - Discrete Probability.
STATISTICS. Statistics * Statistics is the area of science that deals with collection, organization, analysis, and interpretation of data. * A collection.
1 Univariate Descriptive Statistics Heibatollah Baghi, and Mastee Badii George Mason University.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
INVESTIGATION 1.
Subbulakshmi Murugappan H/P:
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Chapter 4: Quantitative Data Part 1: Displaying Quant Data (Week 2, Wednesday) Part 2: Summarizing Quant Data (Week 2, Friday)
© Copyright McGraw-Hill CHAPTER 2 Frequency Distributions and Graphs.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 2-1 Business Statistics, 4e by Ken Black Chapter 2 Charts & Graphs.
CHAPTER 1 Basic Statistics Statistics in Engineering
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
FARAH ADIBAH ADNAN ENGINEERING MATHEMATICS INSTITUTE (IMK) C HAPTER 1 B ASIC S TATISTICS.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
Introduction to statistics I Sophia King Rm. P24 HWB
CHAPTER 1 Basic Statistics Statistics in Engineering
Chapter ( 2 ) Strategies for understanding the meanings of Data : Learning outcomes Understand how data can be appropriately organized and displayed Understand.
CHAPTER 1 EQT 271 (part 1) BASIC STATISTICS. Basic Statistics 1.1Statistics in Engineering 1.2Collecting Engineering Data 1.3Data Presentation and Summary.
Applied Quantitative Analysis and Practices LECTURE#05 By Dr. Osman Sadiq Paracha.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Chapter 3 Numerical Descriptive Measures. 3.1 Measures of central tendency for ungrouped data A measure of central tendency gives the center of a histogram.
Descriptive Statistics
Exploratory Data Analysis
Methods for Describing Sets of Data
BUSINESS MATHEMATICS & STATISTICS.
Chapter 2: Methods for Describing Data Sets
Frequency Distributions and Graphs
Business Statistics, 4e by Ken Black
Summarizing Scores With Measures of Central Tendency
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
NUMERICAL DESCRIPTIVE MEASURES
Descriptive Statistics
Description of Data (Summary and Variability measures)
Descriptive Statistics
Frequency Distributions and Graphs
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Statistics: The Interpretation of Data
LESSON 3: CENTRAL TENDENCY
Numerical Descriptive Measures
Business Statistics, 4e by Ken Black
Central Tendency & Variability
Presentation transcript:

Chapter 3 EXPLORATION DATA ANALYSIS 3.1 GRAPHICAL DISPLAY OF DATA 3.2 MEASURES OF CENTRAL TENDENCY 3.3 MEASURES OF DISPERSION

3.1 Graphical Display of Data  Most of the statistical information in newspapers, magazines, company reports and other publications consists of data that are summarized and presented in a form that is easy for the reader to understand  In this chapter we will discusses and displays several graphical tools for summarizing and presenting data, including histogram, frequency polygon, ogive, dot plot, bar chart, pie chart and the scatter plot for two- variable numerical data.

3.1 Graphical Display of Data: Ungroup Versus Group of Data  Ungrouped data  have not been summarized in any way  are also called raw data  Grouped data  logical groupings of data exists  i.e. age ranges (20-29, 30-39, etc.)  have been organized into a frequency distribution

Ages of a Sample of Managers from Urban Child Care Centers in the United States 3.1 Graphical Display of Data Example of Ungrouped Data

3.1 Graphical Display of Data Frequency Distribution  Frequency Distribution – summary of data presented in the form of class intervals and frequencies  Vary in shape and design  Constructed according to the individual researcher's preferences

 Steps in Frequency Distribution  Step 1 - Determine range of frequency distribution  Range is the difference between the high and the lowest numbers  Step 2 – determine the number of classes  Don’t use too many, or two few classes  Step 3 – Determine the width of the class interval  Approx class width can be calculated by dividing the range by the number of classes  Values fit into only one class Frequency Distribution

Class Interval Frequency 20-under under under under under under 801 Frequency Distribution of Child Care Manager’s Ages

Relative Class IntervalFrequencyFrequency 20-under under under under under under Total The relative frequency is the proportion of the total frequency that is any given class interval in a frequency distribution. 3.1 Graphical Display of Data Relative Frequency

The cumulative frequency is a running total of frequencies through the classes of a frequency distribution. 3.1 Graphical Display of Data Cumulative Frequency Cumulative Class IntervalFrequencyFrequency 20-under under under under under under Total50

 Histogram -- vertical bar chart of frequencies  Frequency Polygon -- line graph of frequencies  Ogive -- line graph of cumulative frequencies  Stem and Leaf Plot – Like a histogram, but shows individual data values. Useful for small data sets.  Pareto Chart -- type of chart which contains both bars and a line graph.  The bars display the values in descending order, and the line graph shows the cumulative totals of each category, left to right.  The purpose is to highlight the most important among a (typically large) set of factors. Common Statistical Graphs – Quantitative Data

3.1 Graphical Display of Data Histogram  A histogram is a graphical summary of a frequency distribution  The number and location of bins (bars) should be determined based on the sample size and the range of the data

Smallest Largest Data Range

Number of Classes and Class Width  The number of classes should be between 5 and 15.  Fewer than 5 classes cause excessive summarization.  More than 15 classes leave too much detail.  Or use the formula no. of class = log n (n = numbers set of data)  Class Width  Divide the range by the number of classes for an approximate class width  Round up to a convenient number

The midpoint of each class interval is called the class midpoint or the class mark. Class Midpoint

Relative Cumulative Class IntervalFrequencyMidpointFrequencyFrequency 20-under under under under under under Total Midpoints for Age Classes

Class IntervalFrequency 20-under under under under under under 801 Histogram

Class IntervalFrequency 20-under under under under under under 801 Frequency Polygon

Cumulative Class IntervalFrequency 20-under under under under under under 8050 Ogive

Stem and Leaf plot: Safety Examination Scores for Plant Trainees Raw Data Stem Leaf

Construction of Stem and Leaf Plot Raw Data Stem Leaf Stem Leaf Stem Leaf

Common Statistical Graphs – Qualitative Data  Pie Chart -- proportional representation for categories of a whole  Bar Chart – frequency or relative frequency of one more categorical variables

COMPLAINTNUMBERPROPORTION DEGREES Stations, etc.28, Train Performance 14, Equipment10, Personnel9, Schedules, etc. 7, Total70, Complaints by Amtrak Passengers

Second Quarter U.S. Truck Production Second Quarter Truck Production in the U.S. (Hypothetical values) 2d Quarter Truck Production Company A B C D E Totals 357, , ,997 34,099 12, ,190

Second Quarter U.S. Truck Production

2d Quarter Truck Production ProportionDegreesCompany A B C D E Totals 357, , ,997 34,099 12, , Pie Chart Calculations for Company A

3.2 Measures of Central Tendency: Ungrouped Data  Measures of central tendency yield information about “particular places or locations in a group of numbers.”  Common Measures of Location  Mode  Median  Mean  Percentiles  Quartiles

 Mode - the most frequently occurring value in a data set  Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio)  Can be used to determine what categories occur most frequently  Sometimes, no mode exists (no duplicates)  Bimodal – In a tie for the most frequently occurring value, two modes are listed  Multimodal -- Data sets that contain more than two modes Mode

Median  Median - middle value in an ordered array of numbers.  Half the data are above it, half the data are below it  Mathematically, it’s the (n+1)/2 th ordered observation  For an array with an odd number of terms, the median is the middle number  n=11 => (n+1)/2 th = 12/2 th = 6 th ordered observation  For an array with an even number of terms the median is the average of the middle two numbers  n=10 => (n+1)/2 th = 11/2 th = 5.5 th = average of 5 th and 6 th ordered observation

Arithmetic Mean  Mean is the average of a group of numbers  Applicable for interval and ratio data  Not applicable for nominal or ordinal data  Affected by each value in the data set, including extreme values  Computed by summing all values in the data set and dividing the sum by the number of values in the data set

The number of U.S. cars in service by top car rental companies in a recent year according to Auto Rental News follows. Company Number of Cars in Service Enterprise 643,000; Hertz 327,000; National/Alamo 233,000; Avis 204,000; Dollar/Thrifty 167,000; Budget 144,000; Advantage 20,000; U-Save 12,000; Payless 10,000; ACE 9,000; Fox 9,000; Rent-A-Wreck 7,000; Triangle 6,000 Compute the mode, the median, and the mean. Demonstration Problem 3.1

Solutions Mode: 9,000 (two companies with 9,000 cars in service) Median: With 13 different companies in this group, N = 13. The median is located at the (13 +1)/2 = 7th position. Because the data are already ordered, median is the 7th term, which is 20,000. Mean: μ = ∑x/N = (1,791,000/13) = 137,769.23

Which Measure Do I Use?  Which measure of central tendency is most appropriate?  In general, the mean is preferred, since it has nice mathematical properties (in particular, see chapter 7)  The median and quartiles, are resistant to outliers  Consider the following three datasets  1, 2, 3 (median=2, mean=2)  1, 2, 6 (median=2, mean=3)  1, 2, 30 (median=2, mean=11)  All have median=2, but the mean is sensitive to the outliers  In general, if there are outliers, the median is preferred to the mean

IntervalFrequency (f)Midpoint (M) f*M 20-under under under under under under Calculation of Grouped Mean Sometimes data are already grouped, and you are interested in calculating summary statistics

Cumulative Class IntervalFrequency Frequency 20-under under under under under under N = 50 Median of Grouped Data - Example

Mode of Grouped Data Class IntervalFrequency 20-under under under under under under 80 1  Midpoint of the modal class  Modal class has the greatest frequency

3.3 Measures of Dispersion : Range  The difference between the largest and the smallest values in a set of data  Advantage – easy to compute  Disadvantage – is affected by extreme values

3.3 Measures of Dispersion : Sample Variance  Sample Variance - average of the squared deviations from the arithmetic mean  Sample Variance – denoted by s2 X 2, ,625 1,844715,041 1, ,756 1, ,444

3.3 Measures of Dispersion : Sample Standard Deviation  Sample standard deviation is the square root of the sample variance  Same units as original data