Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Average On Spread Frequency Distribution and Statistical Parameters

Similar presentations


Presentation on theme: "On Average On Spread Frequency Distribution and Statistical Parameters"— Presentation transcript:

1 On Average On Spread Frequency Distribution and Statistical Parameters
By Dr D D Basu Advisor, CSE Former Addl. Director, CPCB

2 Central Tendency and Spread: The Two key parameters of Data Analysis

3 Concept of Average The purpose of an average is to represent a group of individual values in a simple and concise manner. Average is to act as a representation. The simplest average is called as “mean” meaning “centre’. All averages are known to statistician as measures as central tendencies. Several types of mean are The Arithmetic mean The Weighted Arithmetic mean The Geometric mean

4 THE ARITHMETIC MEAN The arithmetic mean, or briefly the mean, of a set of N numbers X₁ , X₂ , X₃ ,…….. XN is denoted by X (read “X bar”) and is defined as EXAMPLE. The arithmetic mean of the numbers 8, 3, 5, 12 and 10 is

5 If the numbers X1, X2, …. , XK occur f1, f2, …
If the numbers X1, X2, …., XK occur f1, f2, …., fk times, respectively (i.e., occur with frequencies f1, f2,…., fk), the arithmetic mean is Where N = ∑f is the total frequency (i.e., the total number of cases) EXAMPLE. If 5, 8, 6, and 2 occur with frequencies 3, 2, 4, and 1, respectively, the arithmetic mean is

6 THE WEIGHTED ARITHMETIC MEAN
Sometimes we associate with the numbers X1, X2, …., XK certain weighting factors (or weights) w1, w2, …., wK depending on the significance or importance attached to the numbers. In this case, Is called the weighted arithmetic mean. Note the similarity to equation (2), which can be considered a weighted arithmetic mean with weights f1, f2, …., fK . EXAMPLE. If a final examination in a course is weighted 3 times as much as a quiz and a student has a final examination grade of 85 and quiz grades of 70 and 90, the mean grade is

7 THE GEOMETRIC MEAN G The geometric mean G of a set of N positive numbers is the Nth root of the product of the numbers: EXAMPLE: Find the mean 6 and 54 Arithmetic Mean = (6+54) = 30 2 Geometric Mean = √ 6*54 = √ = 18

8 Example Find the Mean of SOx values 13, 23, 12, 44, 55 measured in a city for 5 consecutive days, Arithmetic Mean = = Geometric Mean = √13*23*12*44*55 = 24.4

9 MEASURE OF DISPERSION Mean deviation= Standard Deviation=
How far is the location of data from mean X1 - x is the distance of location x1 from positive mean value X2 - x is the distance of location x2 from negative mean value |X - x| ignores the sign Mean deviation= Standard Deviation=

10 EXAMPLES OF STANDARD DEVIATION
Find the standard deviation s of each set of numbers in the problem a) Х=∑ X/N= ( )/8 = 76/8 = S= ∑ ( X – X)2 N =√(12-9.5)²+(6-9.5)²+(7-9.5)²+(3-9.5)²+(15-9.5)²+(10-9.5)²+(18-9.5)²+(5-9.5)² =√ = 4.87

11 Concept of Frequency

12 Definition The rate of which some thing occurs over a period of time frequency. Frequency is the number of occurrence of event per unit time.

13 Example Dissolved oxygen were measured 20 time at a sampling point in river values are 5, 6, 3, 3, 2, 4, 7, 5, 2, 3, 5, 6, 5, 4, 4, 3, 5, 2, 5, 3

14 Frequency table of DO at sampling point X of city Y
Sr. No. Data Value Frequency 1 2 3 5 4 6 7 TOTAL 20

15 Relative and Cumulative Frequency
Relative frequency: Relative frequency is the fraction or proportion of times an answer occurs Cumulative Frequency: Is the accumulation previous frequency Relative cumulative frequency is the accumulation of previous Relative cumulative frequency

16 Table: Frequency, Relative frequency, Cumulative frequency, Relative Cumulative frequency
Sr. No. Data Value Frequency Relative Frequency Cumulative Frequency Relative Cumulative Frequency 1 2 3 3/20=0.15 0.15 5 5/20=0.25 3+5=8 =0.4 4 3+5+3=11 =0.55 6 6/20=0.3 =17 =0.85 2/20=0.1 =19 =0.95 7 1/20=0.05 =20 =1.00 N=20

17 Graphical – Presentation of Cumulative Frequency

18 Graphical – Presentation of Relative Frequency

19 Frequency and Mean

20 Frequency and Standard Deviation

21 Calculation of Standard Deviation
Std. Deviation (S) = √ ∑ (X – X)2 n Variance = S2 = ∑ (X – X)2 = ∑ X2 – 2 X ∑ X + ∑ X2 n n n But ∑ X = X, X is constant = ∑ X2 – 2X2 + X2 = ∑ X2 – X2 n n

22 For Group Frequency S2 = ∑ f (X – X)2 ∑ f S = ∑ f (X – X)2

23 Example Data value, x Frequency, f 2 3 12 5 45 4 48 6 150 72 1 7 49
∑ fx2 = 376 S = ∑ fx2 – X2 = (4.1)2 = – = = 1.41 ∑ f

24 Group Frequency and Statistical Parameter

25 How to construct a frequency distribution
Particulate Matter (PM)

26 Data for PM at point X in City Y

27 Class Interval, Tally Mark and Frequency

28

29 PM (mg/NM3) Frequency 3 5 9 12 4 2 Total 40

30

31 Group Frequency and Mean
Class Interval Class Boundaries Class Mid Mark Frequency Frequency X Mid Mark 117.5 – 126.5 122.5 3 367.5 126.5 – 135.5 131.5 5 657.5 135.5 – 144.5 140.5 9 1264.5 144.5 – 153.5 149.5 12 1794 153.5 – 162.5 159.5 797.5 162.5 – 171.5 167.5 4 670 171.5 – 180.5 176.5 2 353 ∑ f = 40 ∑ f x = 5904 ∑ f x / ∑ f = 5904 / 40 = 147.6

32 Definition on Range and Class Intervals
Range – The differences between two extreme value i.e. maximum and minimum value is called the range. The range of observations of particulate matter value is 176 – 119 = 57. Class Interval – The overall range can be subdivided into number of smaller ranges which are called class intervals. The length of class intervals are usually equal (in the first case it is kept 5, in the second case it is kept 9).

33 How to choose Class Intervals
Generally for large sample, 20 class intervals are chosen i.e. class interval is R/20. For small sample, it is preferred as R/12. But by inspecting, we may decide the class interval. So in second case, it is kept 9. This is called tuning.

34 Definition of Class Mid marks
Class mid marks – Middle value of class interval is called the class mid marks.

35 Definition of Class Boundaries
Class boundaries – Frequency distribution is continuous phenomenon. Thus the value “127” may be located any where between to 127.5 that is why class interval 127 – 135 imply class boundaries – Grouped frequency tables shall be developed with class boundaries so that class boundaries cover the whole range of observed values gap or overlap.

36 Standard Deviation Class Mid mark (x) Frequency (f) (x2) f(x2) 122.5
03 131.5 05 140.5 09 149.5 12 268203 159.5 167.5 04 112225 176.5 02 = ∑ f(x2) = Standard Deviation, S = ∑ f(x2) – X2 = – (147.6)2 = ∑ f = = 13.82

37 The median and percentiles Using Interpolation
The weights in the frequency distribution of Table X are assumed to be continuously distributed. In such case the median is that weight for which half the total frequency (40/2 = 20) lies above it and half lies below it. Table X PM (mg/Nm3) Frequency 3 5 9 12 4 2 Total 40

38 Now the sum of the first three class frequencies is 3+5+9 = 17
Now the sum of the first three class frequencies is = 17. Thus to give the desired 20, we require three more of the 12 cases in the fourth class. Since the fourth class interval, , actually corresponds to weights to 153.5, the median must lie 3/12 of the way between and 153.5; that is, the median is 1st Quartile, is Now the sum of the 1st two class frequency is 5+3= 8. Thus, to give desired 10, we require 2 more of the 9 cases in the 3rd class. Since the 3rd class interval is 136 to 144, actually corresponds to weight to The 1st Quarter must lie 2/9 of the way between and 144.5; that is the 1st Quarter is

39 3rd quartile is Now, sum of the 1st 4 class frequency is 29. Thus, to give desired 30, we require 1 more of the 5 cases in the 5th class. Since the 5th class interval is 154 – 162, actually corresponds to weights to The 3rd quarter must lie 1/5 of the way between to 162.5; that is, the 3rd quartile is

40 Mode Median Mean relation

41 Definition of Statistics: The science of producing unreliable facts from reliable figures. Evan Esar


Download ppt "On Average On Spread Frequency Distribution and Statistical Parameters"

Similar presentations


Ads by Google