Descriptive statistics using Excel Displaying Data Calculating Measures of Central Tendency and Dispersion Classification of data Index Numbers and Correlation Bell Shaped distribution Kari Peisa, Ramk/Teli 2006
Displaying Data Types of Data and Measurement Scales Scale Data Kvalitative Kvantitative Nominal X Ordinal Interval Ratio
Constructing Diagrams in Excel General instructions 1. Construct an array where data values form continues series either in rows or in columns 2. Write titles for each series 3. Paint (=activate) the series which you want to include in diagram (data + titles), select Chart Wizard, and follow the instructions
Examples of Displaying Data Nominal Scale and Pie Chart Observations can be assigned to different categories which can’t be placed in a meaningful order
Examples of Displaying Data Ordinal Scale and Bar Chart Classes can be rank-ordered from highest to lowest. Classes may be represented also numerically when the measures of central tendency have a real value. But, do they actually have a real meaning?
Examples of Displaying Data Interval Scale and Line Chart Addition and subtraction but not multiplication or division can be performed on data to compare observations. We can say that temperature at 15:00 is 1.1 C warmer than at 12:00, but we can’t say for example that -6C is twice as warm as -12 C or vice versa.
Examples of Displaying Data Ratio Scale and Line Chart Ratio data has an absolute beginning point (true 0 point). All mathematical operations can be performed to compare values. We can say that the length of the plant in Septemper is more than twice as tall as in June.
Measures of Central Tendency Average (Mean) Mode Median Quartiles and Percentiles The most frequently occurred The occurrence in the middle of a set of ordered occurrences The occurrence at the first (25%), second (50%) or third (75%) quartile or at the given percentile (p %) of a set of ordered occurrences
Measures of Central Tendency in Excel Average (Mean) Write the formula into a cell … =AVERAGE(B2:E5) = =AVERAGE(A1:A4;B2;C1:C2;E1:E4) A colon (:) stands between the upper left corner and the lower right corner of an array Empty cells don’t effect on the value of mean The reference to an array is made by painting the array The semicolon (;) connects separate arrays
Measures of Central Tendency in Excel Or, use the wizard: Insert function… Select Category: Statistics Function: Average Activate the command line in the box and paint an array in Excel sheet
Measures of Central Tendency in Excel Mode Median Quartiles Percentiles =MODE(B2:B5;D2:E4) 1., 2. or 3. quartile (the 2. = Median) =MEDIAN(B2:B5;D2:E4) =QUARTILE((B2:B5;D2:E4);1) =PERCENTILE((B2:B5;D2:E4);0.35))
Measures of Dispersion Average deviation Variance Standard Deviation Skewness Skewness characterizes the degree of asymmetry of a distribution around its mean. It has been formulated in many ways. Pearson:
Measures of Dispersion in Excel Average deviation Variance Standard Deviation Skewness =AVEDEV(B2:B5;D2:E4) =VAR(B2:B5;D2:E4) =STDEV(B2:B5;D2:E4) =SKEW(B2:B5;D2:E4)
Classification (Grouping) of Data In classification we arrange a large sample of data into classes There are some rules usually followed when arranging classes The classes should be of equal size (if possible) All data values from the original table need to be included in one and only in one class The number of classes should be between 5 and 15.
Classification in Excel The frequencies indicate the number of observations in the data array that are more than the upper limit in the previous row but less than or equal to the upper limit in this row Activate the (whole) frequency column and write the formula =FREQUENCY(data;bins) into the first cell. Remark! This is an array formula, which means that we have to accept the formula by pressing: shift + ctrl + enter
Index Numbers Index numbers are used to display data in proportional form Source: CIA/ World Factbook 2002 https://www.cia.gov/cia/publications/factbook/index.html
Writing own formulas in Excel Fill right the formula Formula: =C2/$B$2*$B$6 Some fundamental rules for writing formulas Starts with = sign A pair ColumnRow refers a cell in a relative position A pair $Column$Row refers a cell in an absolute position In arrays you should be able to fill right or fill down formulas that are correctly formed Press F4 for switching between relative, absolute, and mixed references
Displaying Correlation in Scattered Chart In scattered chart two data series are displayed in correspondence. The correlation coefficient between two measurement variables and different fitted curves are associated with this type of chart.
Bell-Shaped model When the sample size is reasonably large and tha data are not too skewed, we can estimate the real distribution by a mathematical model called ”bell-shaped” model (or Gaussian shape or Normal Distribution). The model needs to know the mean and the standard deviation of the original data
Example of using Bell-Shaped Model How many percentage of the students in the previous frequency distribution got the mark at least 4? From original data: With the bell-shaped model, where the average =2.697 and standard deviation =1.1674 of the original data, we have to calculate the total area under the model curve after the point x=3.5 In Excel the formula is And it returns 0.246 =1-NORMDIST(3.5;2.697;1.1677;1) Total area = 1 Returns the area before x=3.5