1 1 Copyright © 2010, HJ Shanghai Normal Uni. Chapter 3 Descriptive Statistics: Numerical Methods n Measures of Location n Measures of Variability n Measure.

Slides:



Advertisements
Similar presentations
St. Edward’s University
Advertisements

Chapter 3 - Part A Descriptive Statistics: Numerical Methods
在近年的高考地理试题中,考查地球上 两点间最短航线的方向问题经常出现,由于 很多学生对这类问题没有从本质上搞清楚, 又缺乏空间想象能力,只是机械地背一些结 论,造成解这类题目时经常出错。 地球上两点间的最短航线方向问题.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
基本知识和几何要素的投影 模块一: 字体练习 第一章 制图的基本知识与基本技能 题目提示返回.
Descriptive Statistics: Numerical Measures
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
一、拟合优度检验 二、变量的显著性检验 三、参数的置信区间
平衡态电化学 化学电池 浓差电池 电极过程动力学.
第二章 质点组力学 质点组:许多(有限或无限)相 互联系的质点组成的系统 研究方法: 1. 分离体法 2. 从整体考虑 把质点的三个定理推广到质点组.
2.2 结构的抗力 抗力及其不定因素 材料强度的标准值 材料强度的设计值.
地理信息系统概述. 数据和信息 (Data & Information) 数据 原始事实 如:员工姓名, 数据可以有数值、图形、声音、视觉数据等 信息 以一定规则组织在一起的事实的集合。
分析化学与无机化学中溶液 pH 值计算的异同比较 谢永生  分析化学是大学化学系的一门基础课,课 时较少,其内容主要是无机物的化学分析。 分析化学是以无机化学作为基础的,我们 都是在已掌握一定的无机化学知识后才学 习分析化学 。所以在分析 化学的学习中会 重复许多无机化学内容,造成学习没有兴.
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 为了更好的揭示随机现象的规律性并 利用数学工具描述其规律, 有必要引入随 机变量来描述随机试验的不同结果 例 电话总机某段时间内接到的电话次数, 可用一个变量 X 来描述 例 检测一件产品可能出现的两个结果, 也可以用一个变量来描述 第五章 随机变量及其分布函数.
Five-Number Summary 1 Smallest Value 2 First Quartile 3 Median 4
第十一章 曲线回归 第一节 曲线的类型与特点 第二节 曲线方程的配置 第三节 多项式回归.
线性代数习题课 吉林大学 术洪亮 第一讲 行 列 式 前面我们已经学习了关 于行列式的概念和一些基本 理论,其主要内容可概括为:
Slides by JOHN LOUCKS St. Edward’s University.
第二章 贝叶斯决策理论 3学时.
流态化 概述 一、固体流态化:颗粒物料与流动的流体接触,使颗粒物料呈类 似于流体的状态。 二、流态化技术的应用:流化催化裂化、吸附、干燥、冷凝等。 三、流态化技术的优点:连续化操作;温度均匀,易调节和维持; 气、固间传质、传热速率高等。 四、本章基本内容: 1. 流态化基本概念 2. 流体力学特性 3.
Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods
§2.2 一元线性回归模型的参数估计 一、一元线性回归模型的基本假设 二、参数的普通最小二乘估计( OLS ) 三、参数估计的最大或然法 (ML) 四、最小二乘估计量的性质 五、参数估计量的概率分布及随机干 扰项方差的估计.
平行线的平行公理与判定 九年制义务教育七年级几何 制作者:赵宁睿. 平行线的平行公理与判定 要点回顾 课堂练习 例题解析 课业小结 平行公理 平行判定.
第二十四讲 相位延时系统 相位超前系统 全通系统. 一、最小与最大相位延时系统、最小 与最大相位超前系统 LSI 系统的系统函数: 频率响应:
卫生学(第 7 版) · 第十二章 直线相关与回归 1 直线相关与回归 第十一章. 卫生学(第 7 版) · 第十二章 直线相关与回归 2 主要内容 直线相关 直线回归 直线相关与回归的区别与联系 等级相关.
§8-3 电 场 强 度 一、电场 近代物理证明:电场是一种物质。它具有能量、 动量、质量。 电荷 电场 电荷 电场对外的表现 : 1) 电场中的电荷要受到电场力的作用 ; 2) 电场力可移动电荷作功.
Department of Mathematics 第二章 解析函数 第一节 解析函数的概念 与 C-R 条件 第二节 初等解析函数 第三节 初等多值函数.
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2001 South-Western/Thomson Learning  Anderson  Sweeney  Williams Anderson  Sweeney  Williams  Slides Prepared by JOHN LOUCKS  CONTEMPORARYBUSINESSSTATISTICS.
Chapter 3 - Part B Descriptive Statistics: Numerical Methods
1 1 Slide © 2001 South-Western /Thomson Learning  Anderson  Sweeney  Williams Anderson  Sweeney  Williams  Slides Prepared by JOHN LOUCKS  CONTEMPORARYBUSINESSSTATISTICS.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
首 页 首 页 上一页 下一页 本讲内容本讲内容 视图,剖视图(Ⅰ) 复习: P107 ~ P115 作业: P48(6-2,6-4), P49( 去 6-6) P50, P51(6-13), P52 P50, P51(6-13), P52 P53 (6-18,6-20) P53 (6-18,6-20)
1-4 节习题课 山东省淄博第一中学 物理组 阚方海. 2 、位移公式: 1 、速度公式: v = v 0 +at 匀变速直线运动规律: 4 、平均速度: 匀变速直线运动 矢量式 要规定正方向 统一单位 五个量知道了三 个量,就能求出 其余两个量 3 、位移与速度关系:
Chapter 3 – Descriptive Statistics
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
《 UML 分析与设计》 交互概述图 授课人:唐一韬. 知 识 图 谱知 识 图 谱知 识 图 谱知 识 图 谱.
1 1 Slide © 2003 Thomson/South-Western. 2 2 Slide © 2003 Thomson/South-Western Chapter 3 Descriptive Statistics: Numerical Methods Part A n Measures of.
1 1 Slide Descriptive Statistics: Numerical Measures Location and Variability Chapter 3 BA 201.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
1 1 Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western/Thomson Learning.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
请同学们仔细观察下列两幅图有什么共同特点? 如果两个图形不仅形状相同,而且每组对应点所在的直线 都经过同一点, 那么这样的两个图形叫做位似图形, 这个点叫做位 似中心.
Chapter 3, Part A Descriptive Statistics: Numerical Measures n Measures of Location n Measures of Variability.
Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
力的合成 力的合成 一、力的合成 二、力的平行四边形 上一页下一页 目 录 退 出. 一、力的合成 O. O. 1. 合力与分力 我们常常用 一个力来代替几个力。如果这个 力单独作用在物体上的效果与原 来几个力共同作用在物体上的效 果完全一样,那么,这一个力就 叫做那几个力的合力,而那几个 力就是这个力的分力。
1 1 Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western /Thomson Learning.
1 1 Slide © 2003 South-Western/Thomson Learning TM Chapter 3 Descriptive Statistics: Numerical Methods n Measures of Variability n Measures of Relative.
人 有 悲 欢 离 合, 月有阴晴圆缺。月有阴晴圆缺。 华师大版七年级数学第二册 海口市第十中学 数学组 吴锐.
欢 迎 使 用 《工程流体力学》 多媒体授课系统 燕 山 大 学 《工程流体力学》课程组. 第九章 缝隙流动 概述 9.1 两固定平板间的层流流动 9.2 具有相对运动的两平行平板 间的缝隙流动 9.3 环形缝隙中的层流流动.
Chapter 3 Descriptive Statistics: Numerical Methods.
1 1 Slide © 2003 Thomson/South-Western. 2 2 Slide © 2003 Thomson/South-Western Chapter 3 Descriptive Statistics: Numerical Methods Part B n Measures of.
协方差分析及 SPSS 统计软件包应用 临床流行病学应用研究室欧爱华. 为什么要进行协方差分析 影响效应指标的因素不可控性 (未控制或难以控制) 影响效应指标的因素不可控性 (未控制或难以控制) 组间基线的不均衡性等 组间基线的不均衡性等.
§7.2 估计量的评价标准 上一节我们看到,对于总体 X 的同一个 未知参数,由于采用的估计方法不同,可 能会产生多个不同的估计量.这就提出一 个问题,当总体的一个参数存在不同的估 计量时,究竟采用哪一个好呢?或者说怎 样评价一个估计量的统计性能呢?下面给 出几个常用的评价准则. 一.无偏性.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
St. Edward’s University
St. Edward’s University
St. Edward’s University
Essentials of Statistics for Business and Economics (8e)
St. Edward’s University
Presentation transcript:

1 1 Copyright © 2010, HJ Shanghai Normal Uni. Chapter 3 Descriptive Statistics: Numerical Methods n Measures of Location n Measures of Variability n Measure of Relative Location and Detecting Outliers n Exploratory Data Analysis n Measures of Association between Two Variables n The Weighted mean and Working with Grouped Data x x     % %

2 2 Copyright © 2010, HJ Shanghai Normal Uni. 3.1 Measures of Location n Mean (均值) n Median (中位数) n Mode (众数) n Percentiles (百分位数) n Quartiles (四分位数)

3 3 Copyright © 2010, HJ Shanghai Normal Uni. Example: Apartment Rents Given below is a sample of monthly rent values ($) for one-bedroom apartments. The data is a sample of 70 apartments in a particular city. The data are presented in ascending order.

4 4 Copyright © 2010, HJ Shanghai Normal Uni. Mean n The mean (平均值) of a data set is the average of all the data values. n If the data are from a sample, the mean is denoted by. If the data are from a population, the mean is denoted by  (mu). If the data are from a population, the mean is denoted by  (mu).

5 5 Copyright © 2010, HJ Shanghai Normal Uni. Example: Apartment Rents n Mean

6 6 Copyright © 2010, HJ Shanghai Normal Uni. Median n The median (中位数) is the measure of location most often reported for annual income and property value data. n A few extremely large incomes or property values can inflate the mean.

7 7 Copyright © 2010, HJ Shanghai Normal Uni. Median n The median of a data set is the value in the middle when the data items are arranged in ascending order. n For an odd number of observations, the median is the middle value. n For an even number of observations, the median is the average of the two middle values.

8 8 Copyright © 2010, HJ Shanghai Normal Uni. Example: Apartment Rents n Median Median = 50th percentile Median = 50th percentile i = ( p /100) n = (50/100)70 = 35 Averaging the 35th and 36th data values: Median = ( )/2 = 475

9 9 Copyright © 2010, HJ Shanghai Normal Uni. Mode n The mode (众数) of a data set is the value that occurs with greatest frequency. n The greatest frequency can occur at two or more different values. n If the data have exactly two modes, the data are bimodal. (双峰) n If the data have more than two modes, the data are multimodal. (多峰)

10 Copyright © 2010, HJ Shanghai Normal Uni. Example: Apartment Rents n Mode 450 occurred most frequently (7 times) 450 occurred most frequently (7 times) Mode = 450 Mode = 450

11 Copyright © 2010, HJ Shanghai Normal Uni. Percentiles n A percentile (百分位数) provides information about how the data are spread over the interval from the smallest value to the largest value. n Admission test scores for colleges and universities are frequently reported in terms of percentiles.

12 Copyright © 2010, HJ Shanghai Normal Uni. n The p th percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100 - p ) percent of the items take on this value or more. Arrange the data in ascending order. Arrange the data in ascending order. Compute index i, the position of the p th percentile. Compute index i, the position of the p th percentile. i = ( p /100) n i = ( p /100) n If i is not an integer, round up. The p th percentile is the value in the i th position. If i is not an integer, round up. The p th percentile is the value in the i th position. If i is an integer, the p th percentile is the average of the values in positions i and i +1. If i is an integer, the p th percentile is the average of the values in positions i and i +1. Percentiles

13 Copyright © 2010, HJ Shanghai Normal Uni. Example: Apartment Rents n 90th Percentile i = ( p /100) n = (90/100)70 = 63 Averaging the 63rd and 64th data values: 90th Percentile = ( )/2 = th Percentile = ( )/2 = 585

14 Copyright © 2010, HJ Shanghai Normal Uni. Quartiles n Quartiles (四分位数) are specific percentiles n First Quartile = 25th Percentile n Second Quartile = 50th Percentile = Median n Third Quartile = 75th Percentile

15 Copyright © 2010, HJ Shanghai Normal Uni. Example: Apartment Rents n Third Quartile Third quartile = 75th percentile Third quartile = 75th percentile i = ( p /100) n = (75/100)70 = 52.5 = 53 i = ( p /100) n = (75/100)70 = 52.5 = 53 Third quartile = 525 Third quartile = 525

16 Copyright © 2010, HJ Shanghai Normal Uni. Measures of Variability n It is often desirable to consider measures of variability (dispersion), as well as measures of location. n For example, in choosing supplier A or supplier B we might consider not only the average delivery time for each, but also the variability in delivery time for each.

17 Copyright © 2010, HJ Shanghai Normal Uni. Measures of Variability n Range (极差) n Interquartile Range (四分位点内距) n Variance (方差) n Standard Deviation (标准差) n Coefficient of Variation (变异系数)

18 Copyright © 2010, HJ Shanghai Normal Uni. Range n The range (极差) of a data set is the difference between the largest and smallest data values. n It is the simplest measure of variability. n It is very sensitive to the smallest and largest data values.

19 Copyright © 2010, HJ Shanghai Normal Uni. Example: Apartment Rents n Range Range = largest value - smallest value Range = largest value - smallest value Range = = 190 Range = = 190

20 Copyright © 2010, HJ Shanghai Normal Uni. Interquartile Range n The interquartile range (四分位点内距) of a data set is the difference between the third quartile and the first quartile. n It is the range for the middle 50% of the data. n It overcomes the sensitivity to extreme data values.

21 Copyright © 2010, HJ Shanghai Normal Uni. Example: Apartment Rents n Interquartile Range 3rd Quartile ( Q 3) = 525 3rd Quartile ( Q 3) = 525 1st Quartile ( Q 1) = 445 1st Quartile ( Q 1) = 445 Interquartile Range = Q 3 - Q 1 = = 80 Interquartile Range = Q 3 - Q 1 = = 80

22 Copyright © 2010, HJ Shanghai Normal Uni. Variance n The variance (方差) is a measure of variability that utilizes all the data. It is based on the difference between the value of each observation ( x i ) and the mean ( x for a sample,  for a population). It is based on the difference between the value of each observation ( x i ) and the mean ( x for a sample,  for a population).

23 Copyright © 2010, HJ Shanghai Normal Uni. Variance n The variance is the average of the squared differences between each data value and the mean. n If the data set is a sample, the variance is denoted by s 2. If the data set is a population, the variance is denoted by  2. (sigma) If the data set is a population, the variance is denoted by  2. (sigma)

24 Copyright © 2010, HJ Shanghai Normal Uni. Standard Deviation n The standard deviation (标准差) of a data set is the positive square root of the variance. n It is measured in the same units as the data, making it more easily comparable, than the variance, to the mean. n If the data set is a sample, the standard deviation is denoted s. If the data set is a population, the standard deviation is denoted  (sigma). If the data set is a population, the standard deviation is denoted  (sigma).

25 Copyright © 2010, HJ Shanghai Normal Uni. Coefficient of Variation n The coefficient of variation (变异系数) indicates how large the standard deviation is in relation to the mean. n If the data set is a sample, the coefficient of variation is computed as follows: n If the data set is a population, the coefficient of variation is computed as follows:

26 Copyright © 2010, HJ Shanghai Normal Uni. Example: Apartment Rents n Variance n Standard Deviation n Coefficient of Variation

27 Copyright © 2010, HJ Shanghai Normal Uni. 课堂练习 一项关于大学生体重状况的研究发现,男生的平均体重 为 60kg ,标准差为 5kg ;女生的平均体重为 50kg ,标 准差为 5kg 。请回答下面的问题: 要求:( 1 )男生的体重差异大还是女生的体重差异大? 为什么?  ( 2 )以磅为单位( 1 磅 =2.2kg )求体重的平均数和 标准差。  ( 3 )粗略地估计一下,男生中有百分之几的人体 重在 55kg ~ 65kg 之间?  ( 4 )粗略地估计一下,女生中有百分之几的人体 重在 40kg ~ 60kg 之间?

28 Copyright © 2010, HJ Shanghai Normal Uni. Chapter 3 Descriptive Statistics: Numerical Methods n Measures of Relative Location and Detecting Outliers n Exploratory Data Analysis n Measures of Association Between Two Variables n The Weighted Mean and Working with Grouped Data Working with Grouped Data     % % x x

29 Copyright © 2010, HJ Shanghai Normal Uni. Measures of Relative Location and Detecting Outliers n z-Scores ( Z- 分数) n Chebyshev’s Theorem (切比雪夫定理) n Empirical Rule (经验法则) n Detecting Outliers (异常值检测)

30 Copyright © 2010, HJ Shanghai Normal Uni. z-Scores ( Z- 分数) n The z-score is often called the standardized value. n It denotes the number of standard deviations a data value x i is from the mean. n A data value less than the sample mean will have a z- score less than zero. n A data value greater than the sample mean will have a z-score greater than zero. n A data value equal to the sample mean will have a z- score of zero.

31 Copyright © 2010, HJ Shanghai Normal Uni. n z-Score of Smallest Value (425) Standardized Values for Apartment Rents Example: Apartment Rents

32 Copyright © 2010, HJ Shanghai Normal Uni. Example : Z-scores for the class-size n Sample mean: 44; sample standard deviation:8

33 Copyright © 2010, HJ Shanghai Normal Uni. Chebyshev’s Theorem (切比雪夫定理) At least (1 - 1/ z 2 ) of the items in any data set will be At least (1 - 1/ z 2 ) of the items in any data set will be within z standard deviations of the mean, where z is any value greater than 1. At least 75% of the items must be within At least 75% of the items must be within z = 2 standard deviations of the mean. At least 89% of the items must be within At least 89% of the items must be within z = 3 standard deviations of the mean. At least 94% of the items must be within At least 94% of the items must be within z = 4 standard deviations of the mean. 与均值的距离必定在 z 个标准差以内的数据比例至少为 (1 - 1/ z 2 ) At least (1 - 1/ z 2 ) of the items in any data set will be At least (1 - 1/ z 2 ) of the items in any data set will be within z standard deviations of the mean, where z is any value greater than 1. At least 75% of the items must be within At least 75% of the items must be within z = 2 standard deviations of the mean. At least 89% of the items must be within At least 89% of the items must be within z = 3 standard deviations of the mean. At least 94% of the items must be within At least 94% of the items must be within z = 4 standard deviations of the mean. 与均值的距离必定在 z 个标准差以内的数据比例至少为 (1 - 1/ z 2 )

34 Copyright © 2010, HJ Shanghai Normal Uni. Example: the midterm test scores n the midterm test scores for 100 students in a college business statistics course had a mean of 70 and a standard deviation of 5. How many students had test scores between 60 and 80? How many students had test scores between 58 and 82? n 60-80: Z 60 =(60-70)/5=-2 ; Z 80 =(80-70)/5=2; Z 60 =(60-70)/5=-2 ; Z 80 =(80-70)/5=2; At least (1 - 1/(2) 2 ) = 0.75 or 75% of the students have scores between 60 and 80. At least (1 - 1/(2) 2 ) = 0.75 or 75% of the students have scores between 60 and 80. n 58-82?

35 Copyright © 2010, HJ Shanghai Normal Uni. Example: Apartment Rents n Chebyshev’s Theorem (切比雪夫定理) Let z = 1.5 with = and s = Let z = 1.5 with = and s = At least (1 - 1/(1.5) 2 ) = = 0.56 or 56% of the rent values must be between of the rent values must be between - z ( s ) = (54.74) = z ( s ) = (54.74) = 409 and and + z ( s ) = (54.74) = z ( s ) = (54.74) = 573

36 Copyright © 2010, HJ Shanghai Normal Uni. n Chebyshev’s Theorem (continued) Actually, 86% of the rent values Actually, 86% of the rent values are between 409 and 573. are between 409 and 573. Example: Apartment Rents

37 Copyright © 2010, HJ Shanghai Normal Uni. Empirical Rule (经验法则) For data having a bell-shaped distribution: For data having a bell-shaped distribution: Approximately 68% of the data values will be within one standard deviation of the mean. Approximately 68% of the data values will be within one standard deviation of the mean.

38 Copyright © 2010, HJ Shanghai Normal Uni. Empirical Rule For data having a bell-shaped distribution: Approximately 95% of the data values will be within two standard deviations of the mean. Approximately 95% of the data values will be within two standard deviations of the mean.

39 Copyright © 2010, HJ Shanghai Normal Uni. Empirical Rule For data having a bell-shaped distribution: Almost all (99.7%) of the items will be within three standard deviations of the mean. Almost all (99.7%) of the items will be within three standard deviations of the mean.

40 Copyright © 2010, HJ Shanghai Normal Uni.

41 Copyright © 2010, HJ Shanghai Normal Uni. Example: Apartment Rents n Empirical Rule Interval % in Interval Interval % in Interval Within +/- 1 s to /70 = 69% Within +/- 2 s to /70 = 97% Within +/- 3 s to /70 = 100%

42 Copyright © 2010, HJ Shanghai Normal Uni. 应用: six sigma( 六西格玛 ) n 用 “σ” 度量质量特性总体上对目标值 的偏离程度。几个西格玛是一种表 示品质的统计尺度。任何一个工作 程序或工艺过程都可用几个西格玛 表示。 n 六个西格玛可解释为每一百万个机 会中有 3.4 个出错的机会,即合格率 是 %。而三个西格玛的合 格率只有 %。 n 六个西格玛的管理方法重点是将所 有的工作作为一种流程,采用量化 的方法 分析流程中影响质量的因素 ,找出最关键的因素加以改进从而 达到更高的客户满意度。

43 Copyright © 2010, HJ Shanghai Normal Uni. Detecting Outliers (异常值检测) n An outlier is an unusually small or unusually large value in a data set. n A data value with a z-score less than -3 or greater than +3 might be considered an outlier. n It might be: an incorrectly recorded data value an incorrectly recorded data value a data value that was incorrectly included in the data set a data value that was incorrectly included in the data set a correctly recorded data value that belongs in the data set a correctly recorded data value that belongs in the data set

44 Copyright © 2010, HJ Shanghai Normal Uni. Example: Apartment Rents n Detecting Outliers The most extreme z-scores are and Using | z | > 3 as the criterion for an outlier, there are no outliers in this data set. Standardized Values for Apartment Rents

45 Copyright © 2010, HJ Shanghai Normal Uni. Exploratory Data Analysis (探索性数据分析) n Five-Number Summary (五数据概括法) n Box Plot (箱形图)

46 Copyright © 2010, HJ Shanghai Normal Uni. Five-Number Summary n Smallest Value (最小值) n First Quartile (第一四分位数) n Median (中位数) n Third Quartile (第三四分位数) n Largest Value (最大值)

47 Copyright © 2010, HJ Shanghai Normal Uni. Example: Apartment Rents n Five-Number Summary Lowest Value = 425 First Quartile = 450 Median = 475 Median = 475 Third Quartile = 525 Largest Value = 615

48 Copyright © 2010, HJ Shanghai Normal Uni. Box Plot n A box is drawn with its ends located at the first and third quartiles. n A vertical line is drawn in the box at the location of the median. n Limits are located (not drawn) using the interquartile range (IQR). The lower limit is located 1.5(IQR) below Q 1. The lower limit is located 1.5(IQR) below Q 1. The upper limit is located 1.5(IQR) above Q 3. The upper limit is located 1.5(IQR) above Q 3. Data outside these limits are considered outliers. Data outside these limits are considered outliers.

49 Copyright © 2010, HJ Shanghai Normal Uni. Box Plot (Continued) n Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the limits. n The locations of each outlier is shown with the symbol *.

50 Copyright © 2010, HJ Shanghai Normal Uni. Example: Apartment Rents n Box Plot Lower Limit: Q (IQR) = (75) = Lower Limit: Q (IQR) = (75) = Upper Limit: Q (IQR) = (75) = Upper Limit: Q (IQR) = (75) = There are no outliers

51 Copyright © 2010, HJ Shanghai Normal Uni. Measures of Association Between Two Variables n Covariance (协方差) n Correlation Coefficient (相关系数)

52 Copyright © 2010, HJ Shanghai Normal Uni. n The covariance (协方差) is a measure of the linear association between two variables. n If the data sets are samples, the covariance is denoted by s xy. n If the data sets are populations, the covariance is denoted by. Covariance

53 Copyright © 2010, HJ Shanghai Normal Uni. Covariance n Positive values indicate a positive relationship. n Negative values indicate a negative relationship.

54 Copyright © 2010, HJ Shanghai Normal Uni. Correlation Coefficient (相关系数) n The coefficient can take on values between -1 and +1. n Values near -1 indicate a strong negative linear relationship. n Values near +1 indicate a strong positive linear relationship. n If the data sets are samples, the coefficient is r xy. n If the data sets are populations, the coefficient is.

55 Copyright © 2010, HJ Shanghai Normal Uni. The Weighted Mean and Working with Grouped Data n Weighted Mean (加权平均值) n Mean for Grouped Data (分组数据均值) n Variance for Grouped Data (分组数据方差) n Standard Deviation for Grouped Data (分组数据标 准差)

56 Copyright © 2010, HJ Shanghai Normal Uni. Weighted Mean (加权平均值) n When the mean is computed by giving each data value a weight that reflects its importance, it is referred to as a weighted mean. n In the computation of a grade point average (GPA), the weights are the number of credit hours earned for each grade. n When data values vary in importance, the analyst must choose the weight that best reflects the importance of each value.

57 Copyright © 2010, HJ Shanghai Normal Uni. Weighted Mean x =  w i x i x =  w i x i  w i  w iwhere: x i = value of observation i x i = value of observation i w i = weight for observation i w i = weight for observation i

58 Copyright © 2010, HJ Shanghai Normal Uni. Grouped Data n The weighted mean computation can be used to obtain approximations of the mean, variance, and standard deviation for the grouped data. n To compute the weighted mean, we treat the midpoint of each class as though it were the mean of all items in the class. n We compute a weighted mean of the class midpoints using the class frequencies as weights. n Similarly, in computing the variance and standard deviation, the class frequencies are used as weights.

59 Copyright © 2010, HJ Shanghai Normal Uni. n Sample Data n Population Data where: f i = frequency of class i f i = frequency of class i M i = midpoint of class i M i = midpoint of class i Mean for Grouped Data (分组数据均值)

60 Copyright © 2010, HJ Shanghai Normal Uni. Example: Apartment Rents Given below is the previous sample of monthly rents for one-bedroom apartments presented here as grouped data in the form of a frequency distribution.

61 Copyright © 2010, HJ Shanghai Normal Uni. Example: Apartment Rents n Mean for Grouped Data This approximation differs by $2.41 from This approximation differs by $2.41 from the actual sample mean of $ the actual sample mean of $

62 Copyright © 2010, HJ Shanghai Normal Uni. Variance for Grouped Data (分组数据方差) n Sample Data n Population Data

63 Copyright © 2010, HJ Shanghai Normal Uni. Example: Apartment Rents n Variance for Grouped Data n Standard Deviation for Grouped Data (分组数据标 准差) This approximation differs by only $.20 from the actual standard deviation of $54.74.

64 Copyright © 2010, HJ Shanghai Normal Uni. 小结 n 中心位置的度量:均值、中位数、众数 n 数据集其它位置的描述:百分位数,四分位点 n 变异程度或分散程度:极差、四分位点内距、方差、 标准差、变异系数、 Z 分数、切比雪夫定理 n 构建五数概括法和箱形图 n 两变量之间的协方差和相关系数 n 加权平均值、分组数据的均值、方差和标准差

65 Copyright © 2010, HJ Shanghai Normal Uni.

66 Copyright © 2010, HJ Shanghai Normal Uni.

67 Copyright © 2010, HJ Shanghai Normal Uni. End of Chapter 3, Part B