Presentation is loading. Please wait.

Presentation is loading. Please wait.

10b. Univariate Analysis Part 2 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science,

Similar presentations


Presentation on theme: "10b. Univariate Analysis Part 2 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science,"— Presentation transcript:

1 10b. Univariate Analysis Part 2 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson linglu@iupui.edu Department of Computer and Information Science, IUPUI

2 The Range Difference between minimum and maximum values in a data set Larger range usually (but not always) indicates a large spread or deviation in the values of the data set. (73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100) Range : 100 – 45 = 55 Some extreme low or high value might throw off the range, e.g. (20, 76, 77, 80, 82, 82, 84, 88, 90, 93, 99, 100) Range: 100 – 20 = 80

3 Variance One measure of dispersion (deviation from the mean) of a data set. How far away is each data from the mean? Variance – average distance to the mean The larger the variance, the greater is the average deviation of each datum from the mean (more numbers are away from the mean). E.g. 73, 67, 70, 67, 49, 60, 81, 71, 78, 62, 53, 87, 72, 65, 74, 50, 84, 45, 62,100 Variance = ((73-68.5) 2 +(67-68.5) 2 +(70-68.5) 2 + … +(100-68.5) 2 )/20 Variance = Average value of the data set Excel Functions: VARP() – variance for the whole population (data set is complete) VAR() – variance from a sample population (data set is a sample)

4 Standard Deviation Square root of the variance, as the variance gets the square of the distance. The magnitude of the number is more in line with the values in the data set. Can be thought of as the average deviation from the mean of a data set. Standard Deviation = Excel Functions: STDEVP() – use this when the data set is complete STDEV() – use this when the data set is a sample

5 Frequency Tables Use frequency table to observe the distribution E.g. Consider the following data set: {45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100} Need to determine how to group data into different bins. Category LabelsFrequency 0-503 51-602 61-706 71-805 81-903 >901

6 Histogram A histogram is simply a column chart of the frequency table. Category Labels Frequency 0-503 51-602 61-706 71-805 81-903 >901 Page 6

7 Data Distribution Category Labels Frequency 0-501 51-603 61-705 71-805 81-903 >901

8 Normal Distributions The Bell curve –Symmetrical –Mean ≈ Median

9 Skewed Distributions Most of the times the distributions are skewed. Positively skewed distribution: mean > median Negatively skewed distribution: mean < median

10 Average (68.6) and Median (68) Mode (74) -1SD+1SD Data Distribution {45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100} 55.1482.06

11 Standard Deviation With a normal distribution: mean + 1*SD covers 68% of data mean + 2*SD covers 95% of data mean + 3*SD covers 99.7% of data Page 11


Download ppt "10b. Univariate Analysis Part 2 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science,"

Similar presentations


Ads by Google