On Average On Spread Frequency Distribution and Statistical Parameters

Slides:



Advertisements
Similar presentations
Lecture (3) Description of Central Tendency. Hydrological Records.
Advertisements

Measures of Dispersion
1. 2 BIOSTATISTICS TOPIC 5.4 MEASURES OF DISPERSION.
DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Grouped Data Calculation
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
(c) 2007 IUPUI SPEA K300 (4392) Outline: Numerical Methods Measures of Central Tendency Representative value Mean Median, mode, midrange Measures of Dispersion.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Worked examples and exercises are in the text STROUD PROGRAMME 27 STATISTICS.
Chapter 2 Describing Data.
STATISTICS “CALCULATING DESCRIPTIVE STATISTICS –Measures of Dispersion” 4.0 Measures of Dispersion.
The Central Tendency is the center of the distribution of a data set. You can think of this value as where the middle of a distribution lies. Measure.
Numerical Statistics Given a set of data (numbers and a context) we are interested in how to describe the entire set without listing all the elements.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.
Measures of Central Tendency. These measures indicate a value, which all the observations tend to have, or a value where all the observations can be assumed.
CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)
Introduction to statistics I Sophia King Rm. P24 HWB
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
7 th Grade Math Vocabulary Word, Definition, Model Emery Unit 4.
MEASURE OF CENTRAL TENDENCY. INTRODUCTION: IN STATISTICS, A CENTRAL TENDENCY IS A CENTRAL VALUE OR A TYPICAL VALUE FOR A PROBABILITY DISTRIBUTION. IT.
Chapter 4 – Statistics II
PRESENTATION OF DATA.
Descriptive Statistics
Descriptive Statistics ( )
Exploratory Data Analysis
Measure of the Central Tendency For Grouped data
Methods of mathematical presentation (Summery Statistics)
Business and Economics 6th Edition
MATHEMATICS The Measure of Data Location
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
SUBTOPIC 8.3 : Measures of Location 8.4 : Measures of Dispersion
INTRODUCTION Dispersion refers to the extent to which the items vary from one another and from the central value.It may be noted that the measures of dispersion.
Measures of Central Tendency
Intro to Statistics Part II Descriptive Statistics
Warm Up What is the mean, median, mode and outlier of the following data: 16, 19, 21, 18, 18, 54, 20, 22, 23, 17 Mean: 22.8 Median: 19.5 Mode: 18 Outlier:
PCB 3043L - General Ecology Data Analysis.
Intro to Statistics Part II Descriptive Statistics
Virtual University of Pakistan
Measures of Central Tendency
Describing, Exploring and Comparing Data
PROGRAMME 27 STATISTICS.
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
NUMERICAL DESCRIPTIVE MEASURES
Descriptive Statistics
Lecture 5,6: Measures in Statistics
T6.1 – Introduction to Statistics
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
MEASURES OF CENTRAL TENDENCY
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Representation of Data
LESSON 3: CENTRAL TENDENCY
Numerical Descriptive Measures
Mean Deviation Standard Deviation Variance.
Measures of Dispersion
Essentials of Statistics for Business and Economics (8e)
Descriptive Statistics
CHAPTER 2: Basic Summary Statistics
(-4)*(-7)= Agenda Bell Ringer Bell Ringer
Ticket in the Door GA Milestone Practice Test
Ticket in the Door GA Milestone Practice Test
Presentation transcript:

On Average On Spread Frequency Distribution and Statistical Parameters By Dr D D Basu Advisor, CSE Former Addl. Director, CPCB

Central Tendency and Spread: The Two key parameters of Data Analysis

Concept of Average The purpose of an average is to represent a group of individual values in a simple and concise manner. Average is to act as a representation. The simplest average is called as “mean” meaning “centre’. All averages are known to statistician as measures as central tendencies. Several types of mean are The Arithmetic mean The Weighted Arithmetic mean The Geometric mean

THE ARITHMETIC MEAN The arithmetic mean, or briefly the mean, of a set of N numbers X₁ , X₂ , X₃ ,…….. XN is denoted by X (read “X bar”) and is defined as EXAMPLE. The arithmetic mean of the numbers 8, 3, 5, 12 and 10 is

If the numbers X1, X2, …. , XK occur f1, f2, … If the numbers X1, X2, …., XK occur f1, f2, …., fk times, respectively (i.e., occur with frequencies f1, f2,…., fk), the arithmetic mean is Where N = ∑f is the total frequency (i.e., the total number of cases) EXAMPLE. If 5, 8, 6, and 2 occur with frequencies 3, 2, 4, and 1, respectively, the arithmetic mean is

THE WEIGHTED ARITHMETIC MEAN Sometimes we associate with the numbers X1, X2, …., XK certain weighting factors (or weights) w1, w2, …., wK depending on the significance or importance attached to the numbers. In this case, Is called the weighted arithmetic mean. Note the similarity to equation (2), which can be considered a weighted arithmetic mean with weights f1, f2, …., fK . EXAMPLE. If a final examination in a course is weighted 3 times as much as a quiz and a student has a final examination grade of 85 and quiz grades of 70 and 90, the mean grade is

THE GEOMETRIC MEAN G The geometric mean G of a set of N positive numbers is the Nth root of the product of the numbers: EXAMPLE: Find the mean 6 and 54 Arithmetic Mean = (6+54) = 30 2 Geometric Mean = √ 6*54 = √ 324 = 18

Example Find the Mean of SOx values 13, 23, 12, 44, 55 measured in a city for 5 consecutive days, Arithmetic Mean = 13+23+12+44+55 = 29.4 5 Geometric Mean = √13*23*12*44*55 = 24.4

MEASURE OF DISPERSION Mean deviation= Standard Deviation= How far is the location of data from mean X1 - x is the distance of location x1 from positive mean value X2 - x is the distance of location x2 from negative mean value |X - x| ignores the sign Mean deviation= Standard Deviation=

EXAMPLES OF STANDARD DEVIATION Find the standard deviation s of each set of numbers in the problem a) Х=∑ X/N= (12+6+7+3+15+10+18+5)/8 = 76/8 = 9.5 S= ∑ ( X – X)2 N =√(12-9.5)²+(6-9.5)²+(7-9.5)²+(3-9.5)²+(15-9.5)²+(10-9.5)²+(18-9.5)²+(5-9.5)² 8 =√23.75 = 4.87

Concept of Frequency

Definition The rate of which some thing occurs over a period of time frequency. Frequency is the number of occurrence of event per unit time.

Example Dissolved oxygen were measured 20 time at a sampling point in river values are 5, 6, 3, 3, 2, 4, 7, 5, 2, 3, 5, 6, 5, 4, 4, 3, 5, 2, 5, 3

Frequency table of DO at sampling point X of city Y Sr. No. Data Value Frequency 1 2 3 5 4 6 7 TOTAL 20

Relative and Cumulative Frequency Relative frequency: Relative frequency is the fraction or proportion of times an answer occurs Cumulative Frequency: Is the accumulation previous frequency Relative cumulative frequency is the accumulation of previous Relative cumulative frequency

Table: Frequency, Relative frequency, Cumulative frequency, Relative Cumulative frequency Sr. No. Data Value Frequency Relative Frequency Cumulative Frequency Relative Cumulative Frequency 1 2 3 3/20=0.15 0.15 5 5/20=0.25 3+5=8 0.15+0.25 =0.4 4 3+5+3=11 0.15+0.4 =0.55 6 6/20=0.3 3+5+3+6 =17 0.3+0.55 =0.85 2/20=0.1 3+5+3+6+2 =19 0.1+0.85 =0.95 7 1/20=0.05 3+5+3+6+2+1=20 0.05+0.95 =1.00 N=20

Graphical – Presentation of Cumulative Frequency

Graphical – Presentation of Relative Frequency

Frequency and Mean

Frequency and Standard Deviation

Calculation of Standard Deviation Std. Deviation (S) = √ ∑ (X – X)2 n Variance = S2 = ∑ (X – X)2 = ∑ X2 – 2 X ∑ X + ∑ X2 n n n But ∑ X = X, X is constant = ∑ X2 – 2X2 + X2 = ∑ X2 – X2 n n

For Group Frequency S2 = ∑ f (X – X)2 ∑ f S = ∑ f (X – X)2

Example Data value, x Frequency, f 2 3 12 5 45 4 48 6 150 72 1 7 49 ∑ fx2 = 376 S = ∑ fx2 – X2 = 376 - (4.1)2 = 18.8 – 16.81 = 1.99 = 1.41 ∑ f 20

Group Frequency and Statistical Parameter

How to construct a frequency distribution Particulate Matter (PM)

Data for PM at point X in City Y

Class Interval, Tally Mark and Frequency

PM (mg/NM3) Frequency 118- 126 3 127- 135 5 136- 144 9 145- 153 12 154- 162 163- 171 4 172- 180 2 Total 40

Group Frequency and Mean Class Interval Class Boundaries Class Mid Mark Frequency Frequency X Mid Mark 118- 126 117.5 – 126.5 122.5 3 367.5 127- 135 126.5 – 135.5 131.5 5 657.5 136- 144 135.5 – 144.5 140.5 9 1264.5 145- 153 144.5 – 153.5 149.5 12 1794 154- 162 153.5 – 162.5 159.5 797.5 163- 171 162.5 – 171.5 167.5 4 670 172- 180 171.5 – 180.5 176.5 2 353 ∑ f = 40 ∑ f x = 5904 ∑ f x / ∑ f = 5904 / 40 = 147.6

Definition on Range and Class Intervals Range – The differences between two extreme value i.e. maximum and minimum value is called the range. The range of observations of particulate matter value is 176 – 119 = 57. Class Interval – The overall range can be subdivided into number of smaller ranges which are called class intervals. The length of class intervals are usually equal (in the first case it is kept 5, in the second case it is kept 9).

How to choose Class Intervals Generally for large sample, 20 class intervals are chosen i.e. class interval is R/20. For small sample, it is preferred as R/12. But by inspecting, we may decide the class interval. So in second case, it is kept 9. This is called tuning.

Definition of Class Mid marks Class mid marks – Middle value of class interval is called the class mid marks.

Definition of Class Boundaries Class boundaries – Frequency distribution is continuous phenomenon. Thus the value “127” may be located any where between 126.5 to 127.5 that is why class interval 127 – 135 imply class boundaries 126.5 – 135.5. Grouped frequency tables shall be developed with class boundaries so that class boundaries cover the whole range of observed values gap or overlap.

Standard Deviation Class Mid mark (x) Frequency (f) (x2) f(x2) 122.5 03 15006.25 45018.75 131.5 05 17292.25 86461.25 140.5 09 19740.25 177662.3 149.5 12 22350.25 268203 159.5 25440.25 127201.3 167.5 04 28056.25 112225 176.5 02 31152.25 62304.5 = ∑ f(x2) = 879076 Standard Deviation, S = ∑ f(x2) – X2 = 879076 – (147.6)2 = 21977 - 21786 ∑ f 40 = 191 = 13.82

The median and percentiles Using Interpolation The weights in the frequency distribution of Table X are assumed to be continuously distributed. In such case the median is that weight for which half the total frequency (40/2 = 20) lies above it and half lies below it. Table X PM (mg/Nm3) Frequency 118- 126 3 127- 135 5 136- 144 9 145- 153 12 154- 162 163- 171 4 172- 180 2 Total 40

Now the sum of the first three class frequencies is 3+5+9 = 17 Now the sum of the first three class frequencies is 3+5+9 = 17. Thus to give the desired 20, we require three more of the 12 cases in the fourth class. Since the fourth class interval, 145-153, actually corresponds to weights 144.5 to 153.5, the median must lie 3/12 of the way between 144.5 and 153.5; that is, the median is 1st Quartile, is Now the sum of the 1st two class frequency is 5+3= 8. Thus, to give desired 10, we require 2 more of the 9 cases in the 3rd class. Since the 3rd class interval is 136 to 144, actually corresponds to weight 135.5 to 144.5. The 1st Quarter must lie 2/9 of the way between 135.5 and 144.5; that is the 1st Quarter is

3rd quartile is Now, sum of the 1st 4 class frequency is 29. Thus, to give desired 30, we require 1 more of the 5 cases in the 5th class. Since the 5th class interval is 154 – 162, actually corresponds to weights 154.5 to 162.5. The 3rd quarter must lie 1/5 of the way between 154.5 to 162.5; that is, the 3rd quartile is

Mode Median Mean relation

Definition of Statistics: The science of producing unreliable facts from reliable figures. Evan Esar