Download presentation
Presentation is loading. Please wait.
Published byVera Budiaman Modified over 6 years ago
1
On Average On Spread Frequency Distribution and Statistical Parameters
By Dr D D Basu Advisor, CSE Former Addl. Director, CPCB
2
Central Tendency and Spread: The Two key parameters of Data Analysis
3
Concept of Average The purpose of an average is to represent a group of individual values in a simple and concise manner. Average is to act as a representation. The simplest average is called as “mean” meaning “centre’. All averages are known to statistician as measures as central tendencies. Several types of mean are The Arithmetic mean The Weighted Arithmetic mean The Geometric mean
4
THE ARITHMETIC MEAN The arithmetic mean, or briefly the mean, of a set of N numbers X₁ , X₂ , X₃ ,…….. XN is denoted by X (read “X bar”) and is defined as EXAMPLE. The arithmetic mean of the numbers 8, 3, 5, 12 and 10 is
5
If the numbers X1, X2, …. , XK occur f1, f2, …
If the numbers X1, X2, …., XK occur f1, f2, …., fk times, respectively (i.e., occur with frequencies f1, f2,…., fk), the arithmetic mean is Where N = ∑f is the total frequency (i.e., the total number of cases) EXAMPLE. If 5, 8, 6, and 2 occur with frequencies 3, 2, 4, and 1, respectively, the arithmetic mean is
6
THE WEIGHTED ARITHMETIC MEAN
Sometimes we associate with the numbers X1, X2, …., XK certain weighting factors (or weights) w1, w2, …., wK depending on the significance or importance attached to the numbers. In this case, Is called the weighted arithmetic mean. Note the similarity to equation (2), which can be considered a weighted arithmetic mean with weights f1, f2, …., fK . EXAMPLE. If a final examination in a course is weighted 3 times as much as a quiz and a student has a final examination grade of 85 and quiz grades of 70 and 90, the mean grade is
7
THE GEOMETRIC MEAN G The geometric mean G of a set of N positive numbers is the Nth root of the product of the numbers: EXAMPLE: Find the mean 6 and 54 Arithmetic Mean = (6+54) = 30 2 Geometric Mean = √ 6*54 = √ = 18
8
Example Find the Mean of SOx values 13, 23, 12, 44, 55 measured in a city for 5 consecutive days, Arithmetic Mean = = Geometric Mean = √13*23*12*44*55 = 24.4
9
MEASURE OF DISPERSION Mean deviation= Standard Deviation=
How far is the location of data from mean X1 - x is the distance of location x1 from positive mean value X2 - x is the distance of location x2 from negative mean value |X - x| ignores the sign Mean deviation= Standard Deviation=
10
EXAMPLES OF STANDARD DEVIATION
Find the standard deviation s of each set of numbers in the problem a) Х=∑ X/N= ( )/8 = 76/8 = S= ∑ ( X – X)2 N =√(12-9.5)²+(6-9.5)²+(7-9.5)²+(3-9.5)²+(15-9.5)²+(10-9.5)²+(18-9.5)²+(5-9.5)² =√ = 4.87
11
Concept of Frequency
12
Definition The rate of which some thing occurs over a period of time frequency. Frequency is the number of occurrence of event per unit time.
13
Example Dissolved oxygen were measured 20 time at a sampling point in river values are 5, 6, 3, 3, 2, 4, 7, 5, 2, 3, 5, 6, 5, 4, 4, 3, 5, 2, 5, 3
14
Frequency table of DO at sampling point X of city Y
Sr. No. Data Value Frequency 1 2 3 5 4 6 7 TOTAL 20
15
Relative and Cumulative Frequency
Relative frequency: Relative frequency is the fraction or proportion of times an answer occurs Cumulative Frequency: Is the accumulation previous frequency Relative cumulative frequency is the accumulation of previous Relative cumulative frequency
16
Table: Frequency, Relative frequency, Cumulative frequency, Relative Cumulative frequency
Sr. No. Data Value Frequency Relative Frequency Cumulative Frequency Relative Cumulative Frequency 1 2 3 3/20=0.15 0.15 5 5/20=0.25 3+5=8 =0.4 4 3+5+3=11 =0.55 6 6/20=0.3 =17 =0.85 2/20=0.1 =19 =0.95 7 1/20=0.05 =20 =1.00 N=20
17
Graphical – Presentation of Cumulative Frequency
18
Graphical – Presentation of Relative Frequency
19
Frequency and Mean
20
Frequency and Standard Deviation
21
Calculation of Standard Deviation
Std. Deviation (S) = √ ∑ (X – X)2 n Variance = S2 = ∑ (X – X)2 = ∑ X2 – 2 X ∑ X + ∑ X2 n n n But ∑ X = X, X is constant = ∑ X2 – 2X2 + X2 = ∑ X2 – X2 n n
22
For Group Frequency S2 = ∑ f (X – X)2 ∑ f S = ∑ f (X – X)2
23
Example Data value, x Frequency, f 2 3 12 5 45 4 48 6 150 72 1 7 49
∑ fx2 = 376 S = ∑ fx2 – X2 = (4.1)2 = – = = 1.41 ∑ f
24
Group Frequency and Statistical Parameter
25
How to construct a frequency distribution
Particulate Matter (PM)
26
Data for PM at point X in City Y
27
Class Interval, Tally Mark and Frequency
29
PM (mg/NM3) Frequency 3 5 9 12 4 2 Total 40
31
Group Frequency and Mean
Class Interval Class Boundaries Class Mid Mark Frequency Frequency X Mid Mark 117.5 – 126.5 122.5 3 367.5 126.5 – 135.5 131.5 5 657.5 135.5 – 144.5 140.5 9 1264.5 144.5 – 153.5 149.5 12 1794 153.5 – 162.5 159.5 797.5 162.5 – 171.5 167.5 4 670 171.5 – 180.5 176.5 2 353 ∑ f = 40 ∑ f x = 5904 ∑ f x / ∑ f = 5904 / 40 = 147.6
32
Definition on Range and Class Intervals
Range – The differences between two extreme value i.e. maximum and minimum value is called the range. The range of observations of particulate matter value is 176 – 119 = 57. Class Interval – The overall range can be subdivided into number of smaller ranges which are called class intervals. The length of class intervals are usually equal (in the first case it is kept 5, in the second case it is kept 9).
33
How to choose Class Intervals
Generally for large sample, 20 class intervals are chosen i.e. class interval is R/20. For small sample, it is preferred as R/12. But by inspecting, we may decide the class interval. So in second case, it is kept 9. This is called tuning.
34
Definition of Class Mid marks
Class mid marks – Middle value of class interval is called the class mid marks.
35
Definition of Class Boundaries
Class boundaries – Frequency distribution is continuous phenomenon. Thus the value “127” may be located any where between to 127.5 that is why class interval 127 – 135 imply class boundaries – Grouped frequency tables shall be developed with class boundaries so that class boundaries cover the whole range of observed values gap or overlap.
36
Standard Deviation Class Mid mark (x) Frequency (f) (x2) f(x2) 122.5
03 131.5 05 140.5 09 149.5 12 268203 159.5 167.5 04 112225 176.5 02 = ∑ f(x2) = Standard Deviation, S = ∑ f(x2) – X2 = – (147.6)2 = ∑ f = = 13.82
37
The median and percentiles Using Interpolation
The weights in the frequency distribution of Table X are assumed to be continuously distributed. In such case the median is that weight for which half the total frequency (40/2 = 20) lies above it and half lies below it. Table X PM (mg/Nm3) Frequency 3 5 9 12 4 2 Total 40
38
Now the sum of the first three class frequencies is 3+5+9 = 17
Now the sum of the first three class frequencies is = 17. Thus to give the desired 20, we require three more of the 12 cases in the fourth class. Since the fourth class interval, , actually corresponds to weights to 153.5, the median must lie 3/12 of the way between and 153.5; that is, the median is 1st Quartile, is Now the sum of the 1st two class frequency is 5+3= 8. Thus, to give desired 10, we require 2 more of the 9 cases in the 3rd class. Since the 3rd class interval is 136 to 144, actually corresponds to weight to The 1st Quarter must lie 2/9 of the way between and 144.5; that is the 1st Quarter is
39
3rd quartile is Now, sum of the 1st 4 class frequency is 29. Thus, to give desired 30, we require 1 more of the 5 cases in the 5th class. Since the 5th class interval is 154 – 162, actually corresponds to weights to The 3rd quarter must lie 1/5 of the way between to 162.5; that is, the 3rd quartile is
40
Mode Median Mean relation
41
Definition of Statistics: The science of producing unreliable facts from reliable figures. Evan Esar
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.