10a. Univariate Analysis Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science, IUPUI
Univariate Analysis Deal with single variable, one data field. Apply some calculation to describe the data in the field, e.g. central tendency, location, dispersion, etc. Is also called descriptive statistics. First learn the concepts, then use Excel as a tool. Page 2
The Nature of Measurement Measurement is a process of assigning a number or a value to observations according to some established set of rules. The numbers with a quantitative bases are amendable to mathematical analysis. Need to decide on the scale of measurement. –E.g. Age: shall we use categories such as “<10”, “10 – 20”, “21 – 30”, etc., or shall we use the actual value? Rankings : shall we use 1, 2, 3, 4, or A, B, C, D? Salary: shall we use 1000 as the unit so the value are 21, 48, etc., or the actual value such as 21000, 48000, etc.? –Based on the purpose for collecting the values, what type of analysis to be performed, etc.
The Nature of Measurement The measurement process is influenced by many factors and environmental conditions. E.g. –Experimental error –Instrumental error –Incompleteness Need to consider the validity and accuracy of measurement.
Validity of Measurement What is an appropriate and meaningful way to measure a given property? E.g. Measure the area of a rectangular table. –What tool to use – ruler, tape? –What system to use – metric system or the British system? –How to come up with the area – further calculation is needed (indirect measurement)? Measurement of social and behavioral sciences are mostly indirect. E.g. What is a good way to measure how rich a family is? Are drivers over 65 involved in more fatal accidents than drivers below 17? Page 5
Accuracy of Measurement The quality of measurement. Inaccuracy may be caused by –systematic error, e.g. a weighing scale always reads a certain number of pounds low. –Incompleteness, e.g. small sample size. –Lower precision than what’s required, e.g. need a result in millimeters, but use a ruler with only centimeters. Physical measurement is more straightforward than social science. Page 6
Descriptive Statistics Methodology to observe, describe or summarize your data. –Central tendency Mean Median Mode –Dispersion Min/Max Range Variance Standard Deviation Distribution Univariate analysis, summarize data in one data field Page 7
The Mean (Average) Sum of all values divided by the number of values in the data set. One measure of central location in the data set. Mean = Mean =( )/20 = 68.6 Excel function: AVERAGE() Page 8
The Mean (Average) The data may or may not be symmetrical around its average value. Mean itself does not tell what your data looks like. Page 9
The Median The middle value in a sorted data set. Half the values are greater and half are less than the median. Another measure of central location in the data set. Odd number of items: find the middle number. E.g. (1, 2, 4, 7, 8, 9, 9) Median: 7 Even number of items: find the middle two and get the average of the two. E.g. (45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100) Median: 68 Excel function: MEDIAN() Page 10
The Mode Most frequently occurring value. Another measure of central location in the data set. E.g. (45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100) Mode: 74 Generally not all that meaningful unless a larger percentage of the values are the same number. Excel function: MODE() Page 11