COMPLETE BUSINESS STATISTICS

Slides:



Advertisements
Similar presentations
Descriptive Statistics
Advertisements

Chapter Four Parameter Estimation and Statistical Inference.
Section 1.2 Describing Distributions with Numbers 用數字描述分配.
Chapter Two Data Summary and Presentation. Statistics II2 敘述統計 Vs. 推論統計 n 敘述統計 : 使用分析方法或圖形來描述一組來自於母 體或樣本之資料 n 推論統計 : 利用抽樣方法取得一樣本, 並針對此樣本 計算樣本統計量, 以推論未之母體之參數.
Section 2.3 Least-Squares Regression 最小平方迴歸
STAT0_sampling Random Sampling  母體: Finite population & Infinity population  由一大小為 N 的有限母體中抽出一樣本數為 n 的樣 本,若每一樣本被抽出的機率是一樣的,這樣本稱 為隨機樣本 (random sample)
Quantitative Data Analysis Social Research Methods 2109 & 6507 Spring, 2006 March
Section 2.2 Correlation 相關係數. 散佈圖 1 散佈圖 2 散佈圖的盲點 兩座標軸的刻度不同,散佈圖的外觀呈 現的相聯性強度,會有不同的感受。 散佈圖 2 相聯性看起來比散佈圖 1 來得強。 以統計數字相關係數做為客觀標準。
03/05/2003 Week #2 江支弘 Measuring Center or Average 量度中心或平均 Stemplot: Mean: 平均數 arithmetic average of observations Median: 中位數 middle value of... (in increasing.
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
2009fallStat_samplec.i.1 Chap10 Sampling distribution (review) 樣本必須是隨機樣本 (random sample) ,才能代表母體 Sample mean 是一隨機變數,隨著每一次抽出來的 樣本值不同,它的值也不同,但會有規律性 為了要知道估計的精確性,必需要知道樣本平均數.
1 Pertemuan 3 Statistik Deskriptif-1 Matakuliah: A0064 / Statistik Ekonomi Tahun: 2005 Versi: 1/1.
緒論 統計的範圍 敘述統計 推論統計 有母數統計 無母數統計 實驗設計 統計的本質 大量 數字 客觀.
Analysis of Variance (ANOVA) CH 13 變異數分析. What is ANOVA? n 檢定 3 個或 3 個以上的母體平均數是否相等的統計檢定 n 檢定多個母體平均數是否相同 n 比較大二、大三、大四學生實習滿意度是否一樣 ? ( 來 自相同的 population)
第七章 連續機率分配.
Descriptive Statistics
Section 4.2 Probability Models 機率模式. 由實驗看機率 實驗前先列出所有可能的實驗結果。 – 擲銅板:正面或反面。 – 擲骰子: 1~6 點。 – 擲骰子兩顆: (1,1),(1,2),(1,3),… 等 36 種。 決定每一個可能的實驗結果發生機率。 – 實驗後所有的實驗結果整理得到。
Analysis of Research Data
Numerically Summarizing Data
03/05/2003 Week #2 江支弘 Chapter Two Describing, Displaying, and Exploring Statistical Data.
Introduction to Educational Statistics
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Chapter Two Descriptive Statistics McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 7 Sampling Distribution
Chapter 2 Frequency Distributions 次數分配
Measures of Dispersion
Ch 3 Central Tendency 中央集中趨勢測量.
連續隨機變數 連續變數:時間、分數、重量、……
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 肆 資料分析與表達.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 2 Describing Data with Numerical Measurements
Programming in R Describing Univariate and Multivariate data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
Statistics for Education Research Statistics for Education Research Lecture 1 Scales/Graph/Central Tendency Instructor: Dr. Tung-hsien He
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Introduction and Descriptive Statistics
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
© Copyright McGraw-Hill CHAPTER 3 Data Description.
CHAPTER 1 Basic Statistics Statistics in Engineering
Chapter 2 Describing Data.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
1 Elementary Statistics Larson Farber Descriptive Statistics Chapter 2.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
1 COMPLETE BUSINESS STATISTICS by AMIR D. ACZEL & JAYAVEL SOUNDERPANDIAN 7 th edition. Prepared by Lloyd Jaisingh, Morehead State University.
Displaying Distributions with Graphs. the science of collecting, analyzing, and drawing conclusions from data.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
CHAPTER 1 Basic Statistics Statistics in Engineering
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
CHAPTER 1 Basic Statistics Statistics in Engineering
1 Pertemuan 4 Statistik Deskriptif-2 Matakuliah: A0064 / Statistik Ekonomi Tahun: 2005 Versi: 1/1.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
COMPLETE BUSINESS STATISTICS
Exploratory Data Analysis
Methods for Describing Sets of Data
Analysis and Empirical Results
Pertemuan 01 Data dan Statistika
Introduction and Descriptive Statistics
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Descriptive Statistics
Description of Data (Summary and Variability measures)
Basic Statistical Terms
Pendahuluan Pertemuan 1
Presentation transcript:

COMPLETE BUSINESS STATISTICS by AMIR D. ACZEL & JAYAVEL SOUNDERPANDIAN 6th edition. Prepared by Lloyd Jaisingh, Morehead State University

Introduction and Descriptive Statistics 1 Using Statistics Percentiles and Quartiles Measures of Central Tendency Measures of Variability Grouped Data and the Histogram Skewness and Kurtosis Relations between the Mean and Standard Deviation Methods of Displaying Data Exploratory Data Analysis Using the Computer

LEARNING OBJECTIVES After studying this chapter, you should be able to: Distinguish between qualitative data and quantitative data. Describe nominal, ordinal, interval, and ratio scales of measurements. Describe the difference between population and sample. Calculate and interpret percentiles and quartiles. Explain measures of central tendency and how to compute them. Create different types of charts that describe data sets. Use Excel templates to compute various measures and create charts.

WHAT IS STATISTICS? Statistics is a science that helps us make better decisions in business and economics as well as in other fields. Statistics teaches us how to summarize, analyze, and draw meaningful inferences from data that then lead to improve decisions. These decisions that we make help us improve the running, for example, a department, a company, the entire economy, etc.

1-1. Using Statistics (Two Categories) Descriptive Statistics Collect Organize Summarize Display Analyze Inferential Statistics Predict and forecast values of population parameters Test hypotheses about values of population parameters Make decisions

Types of Data - Two Types (p.28) Qualitative Categorical or Nominal: Examples are- Color Gender Nationality Quantitative Measurable or Countable: Examples are- Temperatures Salaries Number of points scored on a 100 point exam

Scales of Measurement (p.28-29) Analytical or metric type Interval scale Ratio scale Categorical or nonmertric type Nominal scale Ordinal scale

Samples and Populations P.29 A population consists of the set of all measurements for which the investigator is interested. A sample is a subset of the measurements selected from the population. A census is a complete enumeration of every item in a population.

Simple Random Sample Sampling from the population is often done randomly, such that every possible sample of equal size (n) will have an equal chance of being selected. A sample selected in this way is called a simple random sample or just a random sample. A random sample allows chance to determine its elements.

Samples and Populations Population (N) Sample (n)

Why Sample? Census of a population may be: Impossible Impractical Too costly

Exercise (p.32, 5min) 1-1 1-4 1-5

1-2 Percentiles and Quartiles Given any set of numerical observations, order them according to magnitude. The Pth percentile in the ordered set is that value below which lie P% (P percent) of the observations in the set. The position of the Pth percentile is given by (n + 1)P/100, where n is the number of observations in the set.

Example 1-2 (p.33) A large department store collects data on sales made by each of its salespeople. The number of sales made on a given day by each of 20 salespeople is shown on the next slide. Also, the data has been sorted in magnitude.

Example 1-2 (Continued) - Sales and Sorted Sales Sales Sorted Sales 9 6 6 9 12 10 10 12 13 13 15 14 16 14 14 15 14 16 16 16 17 16 16 17 24 17 21 18 22 18 18 19 19 20 18 21 20 22 17 24

Example 1-2 (Continued) Percentiles Find the 50th, 80th, and the 90th percentiles of this data set. To find the 50th percentile, determine the data point in position (n + 1)P/100 = (20 + 1)(50/100) = 10.5. Thus, the percentile is located at the 10.5th position. The 10th observation is 16, and the 11th observation is also 16. The 50th percentile will lie halfway between the 10th and 11th values and is thus 16.

Example 1-2 (Continued) Percentiles To find the 80th percentile, determine the data point in position (n + 1)P/100 = (20 + 1)(80/100) = 16.8. Thus, the percentile is located at the 16.8th position. The 16th observation is 19, and the 17th observation is also 20. The 80th percentile is a point lying 0.8 of the way from 19 to 20 and is thus 19.8.

Example 1-2 (Continued) Percentiles To find the 90th percentile, determine the data point in position (n + 1)P/100 = (20 + 1)(90/100) = 18.9. Thus, the percentile is located at the 18.9th position. The 18th observation is 21, and the 19th observation is also 22. The 90th percentile is a point lying 0.9 of the way from 21 to 22 and is thus 21.9. Example 1-2

Quartiles – Special Percentiles ,p.35) Quartiles are the percentage points that break down the ordered data set into quarters. The first quartile is the 25th percentile. It is the point below which lie 1/4 of the data. The second quartile is the 50th percentile. It is the point below which lie 1/2 of the data. This is also called the median. The third quartile is the 75th percentile. It is the point below which lie 3/4 of the data.

Quartiles and Interquartile Range The first quartile, Q1, (25th percentile) is often called the lower quartile. The second quartile, Q2, (50th percentile) is often called median or the middle quartile. The third quartile, Q3, (75th percentile) is often called the upper quartile. The interquartile range is the difference between the first and the third quartiles.

Example 1-3: Finding Quartiles Sorted Sales Sales 9 6 6 9 12 10 10 12 13 13 15 14 16 14 14 15 14 16 16 16 17 16 16 17 24 17 21 18 22 18 18 19 19 20 18 21 20 22 17 24 Quartiles Position (n+1)P/100 13 + (.25)(1) = 13.25 First Quartile (20+1)25/100=5.25 16 + (.5)(0) = 16 Median (20+1)50/100=10.5 (16-16) (20+1)75/100=15.75 18+ (.75)(1) = 18.75 Third Quartile Basic Stat.xls

Example 1-3: Using the Template

Example 1-3 (Continued): Using the Template This is the lower part of the same template from the previous slide.

Exercise, p.35-36, 10 min 1-9(Ans:Q1=9, Q2=11.6, Q3=15.5, 55%=12.32, 85%=16.5) 1-12(Ans:median=51, Q1=30.5, Q3=194.25 IQR=163.75, 45%=42.2) Basic Stat.xls P %= (n+1)P / 100

Summary Measures: Population Parameters Sample Statistics Measures of Central Tendency(衡量集中傾向) Median 中位數 Mode 眾數 Mean 平均數 Measures of Variability(衡量變異性) Range 全距 Interquartile range 四分位間距 Variance 變異數 Standard Deviation 標準差 Other summary measures: 其他 Skewness 偏態 Kurtosis 峰態

1-3 Measures of Central Tendency or Location(p.36)  Median 中位數 Middle value when sorted in order of magnitude 50th percentile  Mode 眾數 Most frequently- occurring value  Mean 平均數 Average

Example – Median (Data is used from Example 1-2) Sales Sorted Sales 9 6 6 9 12 10 10 12 13 13 15 14 16 14 14 15 14 16 16 16 17 16 16 17 24 17 21 18 22 18 18 19 19 20 18 21 20 22 17 24 See slide # 19 for the template output Median 50th Percentile (20+1)50/100=10.5 16 + (.5)(0) = 16 Median The median is the middle value of data sorted in order of magnitude. It is the 50th percentile.

Example - Mode (Data is used from Example 1-2) See slide # 19 for the template output . . . . . . : . : : : . . . . . --------------------------------------------------------------- 6 9 10 12 13 14 15 16 17 18 19 20 21 22 24 Mode = 16 The mode is the most frequently occurring value. It is the value with the highest frequency.

Arithmetic Mean or Average The mean(平均數) of a set of observations is their average - the sum of the observed values divided by the number of observations. Population Mean母體平均數 Sample Mean樣本平均數 m = å x N i 1 x n i = å 1

Example – Mean (Data is used from Example 1-2) Sales 9 6 12 10 13 15 16 14 17 24 21 22 18 19 20 317 x n i = å 1 317 20 15 85 . See slide # 19 for the template output

Example - Mode (Data is used from Example 1-2) . . . . . . : . : : : . . . . . --------------------------------------------------------------- 6 9 10 12 13 14 15 16 17 18 19 20 21 22 24 Mean = 15.85 Median and Mode = 16 每一點代表一個數值 See slide # 19 for the template output

Exercise, p.40, 5 min 例1- 4 1-13 ~ 1-16 (See Textbook p.698) 1-17(Ans:mean=592.93, median=566, LQ=546, UQ=618.75 Outlier=940, suspected outlier=399)

1-4 Measures of Variability or Dispersion (p.40) Range 全距 Difference between maximum and minimum values Interquartile Range 四分位數間距 Difference between third and first quartile (Q3 - Q1) Variance 變異數 Average*of the squared deviations from the mean Standard Deviation 標準差 Square root of the variance Definitions of population variance and sample variance differ slightly.

Example - Range and Interquartile Range (Data is used from Example 1-2) Sorted Sales Sales Rank 9 6 1 6 9 2 12 10 3 10 12 4 13 13 5 15 14 6 16 14 7 14 15 8 14 16 9 16 16 10 17 16 11 16 17 12 24 17 13 21 18 14 22 18 15 18 19 16 19 20 17 18 21 18 20 22 19 17 24 20 Range Maximum - Minimum = 24 - 6 = 18 Minimum Q1 = 13 + (.25)(1) = 13.25 First Quartile Q3 = 18+ (.75)(1) = 18.75 Third Quartile Interquartile Range Q3 - Q1 = 18.75 - 13.25 = 5.5 Maximum

Variance and Standard Deviation Population Variance母體變異數 Sample Variance樣本變異數 n N å ( x - x ) 2 å ( x - m ) 2 s = 2 i = 1 s = ( ) 2 i = 1 n - 1 N ( ) ( ) 2 N n 2 x x å å N å = n = - i 1 å x - i 1 x 2 2 N n = = i = 1 i = 1 ( ) N n - 1 s = s 2 s = s 2

公式證明

Calculation of Sample Variance (p.44) 6 -9.85 97.0225 36 9 -6.85 46.9225 81 10 -5.85 34.2225 100 12 -3.85 14.8225 144 13 -2.85 8.1225 169 14 -1.85 3.4225 196 14 -1.85 3.4225 196 15 -0.85 0.7225 225 16 0.15 0.0225 256 17 1.15 1.3225 289 17 1.15 1.3225 289 18 2.15 4.6225 324 19 3.15 9.9225 361 20 4.15 17.2225 400 21 5.15 26.5225 441 22 6.15 37.8225 484 24 8.15 66.4225 576 317 0 378.5500 5403

Example: Sample Variance Using the Template Note: This is just a replication of slide #19.

Exercise, p.45, 10 min 標準差之計算-例1- 5, 1- 6 (p.36)或例1- 2 1- 18 (p.46) 1-19 (Ans. Range=27, 57.7386, 7.5986) 1-20 (Ans. Range=60, 321.3788, 17.9270) 1-21 (Ans. Range=1186, 110287.45, 332.0555) Basic Stat.xls

1-5 Group Data and the Histogram 群聚數據與直方圖 Dividing data into groups or classes or intervals Groups should be: Mutually exclusive 群間互斥 Not overlapping - every observation is assigned to only one group Exhaustive 完全分群 Every observation is assigned to a group Equal-width (if possible) 等寬 First or last group may be open-ended

Frequency Distribution頻率分配 Table with two columns兩行 listing: Each and every group or class or interval of values Associated frequency of each group Number of observations assigned to each group Sum of frequencies is number of observations N for population n for sample Class midpoint組中點 is the middle value of a group or class or interval Relative frequency相對頻率 is the percentage of total observations in each class Sum of relative frequencies = 1

Example 1-7: Frequency Distribution p.47 x f(x) f(x)/n Spending Class ($) Frequency (number of customers) Relative Frequency 0 to less than 100 30 0.163 100 to less than 200 38 0.207 200 to less than 300 50 0.272 300 to less than 400 31 0.168 400 to less than 500 22 0.120 500 to less than 600 13 0.070 184 1.000 Example of relative frequency: 30/184 = 0.163 Sum of relative frequencies = 1

Cumulative Frequency Distribution x F(x) F(x)/n Spending Class ($) Cumulative Frequency Cumulative Relative Frequency 0 to less than 100 30 0.163 100 to less than 200 68 0.370 200 to less than 300 118 0.641 300 to less than 400 149 0.810 400 to less than 500 171 0.929 500 to less than 600 184 1.000 The cumulative frequency累積頻率 of each group is the sum of the frequencies of that and all preceding groups.

頻率分配圖練習, 10 min 例1- (p.33), 以5為距離 Basic Stat.xls

Histogram 直方圖 A histogram is a chart made of bars of different heights. 不同高度之條狀圖 Widths and locations of bars correspond to widths and locations of data groupings 寬度與位置代表群組的資料寬度與位置 Heights of bars correspond to frequencies or relative frequencies of data groupings 高度代表頻率

Histogram Example:1-7 Frequency Histogram

Histogram Example Relative Frequency Histogram

1-6 Skewness偏度 and Kurtosis峰度 p.49 Measure of asymmetry of a frequency distribution Skewed to left 左偏 <0 Symmetric or unskewed 對稱 Skewed to right 右偏 >0 Kurtosis Measure of flatness or peakedness of a frequency distribution Platykurtic (relatively flat) Mesokurtic (normal) Leptokurtic (relatively peaked) *公示如p.51

Skewness 偏度值-, 越左偏 Skewed to left

Skewness Symmetric

Skewness 偏度值+, 越右偏 Skewed to right

Kurtosis 扁度值越小, 越平扁 Platykurtic平扁 - flat distribution

Kurtosis Mesokurtic - not too flat and not too peaked

Kurtosis 扁度值越大, 越尖突 Leptokurtic尖扁 - peaked distribution

1-7 Relations between the Mean and Standard Deviation p.51 (重要) Chebyshev’s Theorem柴比雪夫定理 Applies to any distribution, regardless of shape 可應用於任何分配之數據 Places lower limits on the percentages of observations within a given number of standard deviations from the mean Empirical Ruler 經驗法則 Applies only to roughly mound-shaped and symmetric distributions 適用山型與對稱之數據 Specifies approximate percentages of observations within a given number of standard deviations from the mean

Chebyshev’s Theorem At least of the elements of any distribution lie within k standard deviations of the mean 2 3 4 Standard deviations of the mean At least Lie within

Empirical Rule 經驗法則 For roughly mound-shaped and symmetric distributions, approximately:

Exercise, p.52, 10 min Exercise 1- 22 Basic Stat.xls

1-8 Methods of Displaying Data Pie Charts 圓餅圖 Categories represented as percentages of total Bar Graphs 直條圖 Heights of rectangles represent group frequencies Frequency Polygons 頻率圖 Height of line represents frequency Ogives 累加頻率圖 Height of line represents cumulative frequency Time Plots 時間圖 Represents values over time

Pie Chart

Bar Chart Fig. 1-11 Airline Operating Expenses and Revenues 2 Average Revenues Average Expenses 1 8 6 4 2 American Continental Delta Northwest Southwest United USAir A i r l i n e

Frequency Polygon and Ogive Relative Frequency Polygon Ogive 5 4 3 2 1 . Relative Frequency Sales 5 4 3 2 1 . Cumulative Relative Frequency Sales

Time Plot y e P r d u c ( b m 1 - 4 ) O S A J M F D N 8 . 5 7 6 o n t h i l s f T y e P r d u c ( b m 1 - 4 )

圖形練習, 10 min 1- 24 1- 25 1-25.xls 1-24.xls

1-9 Exploratory Data Analysis – EDA探索性資料分析 Techniques to determine relationships關係 and trends趨勢, identify outliers離群值 and influential有影響的 observations, and quickly describe快速描述 or summarize總結 data sets. Stem-and-Leaf Displays 莖葉 Quick-and-dirty listing of all observations 快速瀏覽所有觀測值 Conveys some of the same information as a histogram 將資料轉化成直方圖 Box Plots 盒形圖 Median Lower and upper quartiles Maximum and minimum

Example 1-8: Stem-and-Leaf Display p.59 1 122355567 (10 ~) 2 0111222346777899 (20 ~) 3 012457 (30 ~) 4 11257 (40 ~) 5 0236 (50 ~) 6 02 (60 ~)

Box Plot 盒形圖 p.62 Elements of a Box Plot * o Q1 Q3 Inner Fence Outer Median Q1 Q3 Inner Fence Outer Interquartile Range Smallest data point not below inner fence Largest data point not exceeding inner fence Suspected outlier Outlier Q1-3(IQR) Q1-1.5(IQR) Q3+1.5(IQR) Q3+3(IQR) 離群值 一半數據在盒內 IQR

Example: Box Plot

Exercise, p.64, 15 min 1- 27 BoxPlot.xls

1-10 Using the Computer – The Template Output

Using the Computer – Template Output for the Histogram

Using the Computer – Template Output for Histograms for Grouped Data

Using the Computer – Template Output for Frequency Polygons & the Ogive for Grouped Data

Using the Computer – Template Output for Two Frequency Polygons for Grouped Data

Using the Computer – Pie Chart Template Output

Using the Computer – Bar Chart Template Output

Using the Computer – Box Plot Template Output

Using the Computer – Box Plot Template to Compare Two Data Sets

Using the Computer – Time Plot Template

Using the Computer – Time Plot Comparison Template