Presentation on Statistics for Research Lecture 7.

Presentation on Statistics for Research Lecture 7

Contents What is Statistics?- its scope Is Statistics Science or Arts?- Debatable Types of Data Presentation of Data Measure of Central Tendency Measures of Variability Chi square test T test for testing difference between two means

What is Statistics? ”Statistics is a body of methods or tools for obtaining knowledge” That is Statistics is a tool for obtaining knowledge. Example : correlation coefficient between height and weight is + 8.5

Functions of statistics: presents facts in definite form Simplifies huge number of figures and facilitates analysis Helps in formulating and testing hypothesis helps in prediction.

Scope of Statistics: Vast, unlimited and ever increasing in e.g. Biostatistics, Industrial statistics, Informatics, Design of experiments in agricultural production, Demography, Queuing Theory, Stochastic Process, psychology, sociology, public administration etc.

Types of Data There are three types of data mainly: 1. Cross Sectional, 2. Time Series and 3. Panel data

Cross Sectional Data: Cross-sectional data refer to observations of many individuals (subjects, objects) at a given time. Example: Gross annual income for each of 1000 randomly chosen households in Dhaka City for the year 2009

Example of cross-section data Income data (,000 Tk) of 10 persons in year 2000. Person A Person B Person C Person D Person E Person F Person G Person H Person I Person J Person K 234210187342124234321123128187301

Time series data Data: Time series data also called Longitudinal data refer to observations of a given unit made over time.

Example of Time Series Data: Overtime (10 years) Income data for 1 person (in,000 ). Year Person X 1991129 1992131 1993150 1994170 1995187 1996293 1997209 1998210 1999 2000 213 240

Example of Time series data Average gross annual income of, say, 1000 households randomly chosen from Dhaka City for 10 years 1991-2000.

Panel Data: A panel data set refers contains observations on a number of units (e.g. subjects, objects) over time. Thus, panel data has characteristics of both time series and cross-sectional data.cross-sectional data

Example of Panel data Values of the gross annual income for each of 1000 randomly chosen households in Dhaka City collected for each of 10 years from 1991 to 2000. Such data can be represented as a set of double-indexed values {Vij; i=1,...,10, j=1,...,1000}.

Example of Panel Data: Overtime (10 years) Income data for 3 person (in,000 ) V ij (‘i =1-10, j= 1,2,3. Year Person X Income Person Y Income Person Z Income 199112913187 199213115093 199315017070 199417018734 199518729387 199629317093 199720918770 199821029387 1999 2000 213 240 209 234 16 54

Example of Panel Data: Overtime (10 years) Income, Exp, Loan data for 3 person (in,000 ) Vij (i= 1-10, j = Income, Exp, Loan. Year Person X Income Person X Expenditure Person X Loan 199112913187 199213115093 199315017070 199417018734 199518729387 199629317093 199720918770 199821029387 1999 2000 213 240 209 234 16 54

Example of Panel data Values of the gross annual income for each of 1000 randomly chosen households in Dhaka City collected for each of 10 years from 1991 to 2000. Such data can be represented as a set of double-indexed values {Vij; i=1,...,10, j=1,...,1000}.

Presentation of data Pie chart, Bar chart and Column chart

Pie chart Example

Bar chart Example

Column chart Example

MEASURES OF CENTRAL TENDENCY What is Measures of Central Tendency? Measures of Central Tendency are - Mean, Median, Mode, Quartile, Percentile calculations

Measures of Central Tendency Mean: For a population or a sample, the mean is the arithmetic average of all values. The mean is a measure of central tendency. e.g. mean age of CSC students is say 38

The mean, symbolized by X, is the sum of the weights of students divided by the number of students whose weights have been taken. The following formula both defines and describes the procedure for finding the mean = X 1 + X 2 + X 3 / 3

32,35,36,36, 37,38,38,39,39,39,40,40,42,45 Then the mean denoted as :

Values have tendency to cluster around the central /mean values

Median: The median, symbolized by Md, is the value which lies in the middle point of the distribution so that half the values are above the median and half of the values are below the median. Computation of the median is relatively straightforward

. The first step is to serially write the values (called rank order of the values) from lowest to highest. Then the Median is simply the middle number. In the case below, the Median would be 38 because there are 15 values all together with 7 values larger and 7 values smaller than the median. 32 3536 3738 39 40 4546

Median in case of even number of values Median is calculated as mid-point of the two middle numbers. 38 + 39 / 2 = 38.5 323536 3738 39 40 4245

Mode: Mode is a value that occurs most in a population or a sample. It could be considered as the single value most typical of all the values.

Here Mode is 39 323536 3738 39 40 4245

Shape of distribution if mode is higher than mean and median ooooooo.

Example: For a set of numbers 1,2,3,7,3,8,9,5,3,8,9 the mode is 3 which occurs most NB. Some population may have more than one mode and could be bi-modal.

Percentiles and Quartiles Percentiles are like quartiles except that percentiles divide the set of data into 100 equal parts and quartiles divide the set of data into 4 equal parts.

Example. Research methodology Exam numbers Frequency No. of students Cumulative frequency Cum. No. of students 76-8099 81-852130 86-901848 91-951260

First Quartile = 25 th percentile In total 60 marks, the first quartile will be located (25% of 60) = 15 15 values from the bottom First quartile is the interval 81-85 Similarly 3 rd quartile (75% of 60) = 45 3 rd quartile is the interval 86-90

Percentile rank of the student who got 90 marks Percentile rank = (number of students got below 90 / Total no. of students) x 100  = (47 /60) x 100  = 78 th

Measures of Variability Variability refers to the spread or dispersion of values scores. A distribution of scores is said to be highly variable if the scores differ widely from one another. There are Three measures of dispersion Range Variance Standard Deviation Lecture 8

Importance of Variability Following two data have got same mean But do they reflect the same information? No Data B has more number of under- weight babies Data A weight of new born baby (pound Data B weight of new born baby (pound) 43 53 69 Average 5 Average 5

Range Range is the difference between the largest value and smallest value. Range= Highest value-lowest value Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45 Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45 Although the range is (45-32) 13 for both the distribution but doesn’t give true picture about the variability.

Measures of Variability (Variance and Standard Deviation) : The variance, symbolized by "s2", is a measure of variability. The variance symbolized by "s 2 ", is the average of sum of squares of the deviation.

Formula of Standard Deviation Standard Deviation is the positive Square root of Variance

Example of Variance and Standard Deviation Series 1 : 32 36 37 37 38 40 42 42 43 43 45 45 Mean X = 480/12 = 40 Student No. 1 23456789101112 Weights of students kg 32 3637 384042 43 45 Xi - X -8 -4-3 -20223355 (Xi –X) 2 64 169940449925 Sum of squares = 186

Therefore Variance S 2 = 186 / n-1 = 186 /11 = 16.9 Standard Deviation = 4.11 Standard deviation 4.11 means average variation of the series of values from the mean value is 4.11

Chi Square Test Tests difference in qualitative values For example, whether people have a definite taste for colored cars compared to white cars Suppose in Bangladesh 1000 cars are sold in a month. If there was no preference for colored cars, then:

Chi square Test; Whether Bangladeshi people have a choice for colored cars. Types of Colors Observed no.(O) Expected no.(E) O-E(O-E)**2(O-E)**2/E White400500-1001000020 Colored6005001001000020 Total = 40 From Chi-square table, find value for 40 for n-1 = 2-1=1 degree of freedom. Reject null hypothesis (of no preference) if Calculated Value greater than Tabulated value at 99% or 95% level of significance.

The End

Presentation on Statistics for Research Lecture 7.

Similar presentations

Presentation on theme: "Presentation on Statistics for Research Lecture 7."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presentation on Statistics for Research Lecture 7.

Similar presentations

Presentation on theme: "Presentation on Statistics for Research Lecture 7."— Presentation transcript:

Similar presentations

About project

Feedback