1 Basic statistics Week 10 Lecture 1
Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 2 Meanings of statistics Two meanings of statistics Statistics as a group of computational procedures that allow us to find meaning in numerical data Statistics as the value (number) you get by performing one of those procedure on sample Population parameters and sample statistics The symbol employed for designating the factor Population parameterSample statistic Mean or M Standard deviation S or SD Number or totalNn
Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 3 Functions of statistics Descriptive statistics Describe what the data look like Inferential statistics Draw inferences about a large population from sample
Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 4 Descriptive statistics Points of central tendency The central point around which the data revolve. Mode The category or observation that appears most frequently in the distribution Only appropriate measure of central tendency for nominal variables Median The mid point of a distribution Frequently used to describe the central tendency of ordinal variables Mean Arithmetic average of the values within a data set M= x i /n Appropriate for interval and ratio variables
Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 5 Descriptive statistics (cont) Example High school student Joe’s daily grade in February MondayTuesdayWednesdayThursdayFriday Week Week Week Week
Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 6 Descriptive statistics (cont) Measures of variation: dispersion or deviation Range Average deviation Standard deviation The standard measure of variability in most statistical operations Variance The standard deviation squared
Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 7 Shape of the distribution The frequency of values from different ranges of the variable Use histogram to visual inspect the shape
Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 8 Shape of the distribution: normal distribution Many characteristics of human populations follow normal distribution Horizontally symmetrical and bell shaped most of the scores in a normal distribution tend to occur near the center, while more extreme scores on either side of the center become increasingly rare. the mean, median, and mode of the normal distribution are the same
Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 9 Features of normal distribution Predictable percentages of the population lie within any given portion of the curve 68% of the population lie within 1 standard deviation from the mean 95.46% of the cases lies within 2 standard deviation from the mean 95% of the cases fall within 1.96 standard deviation units from the mean
Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 10 The family of normal curves Mean determines where the midpoint of the distribution falls Standard deviation changes the shape of the distribution without affecting the midpoint Standard normal distribution Mean: 0 Standard deviation: 1
Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 11 Measuring relative performance Example A student John obtained 60 out of 100 in a math exam and 50 out of 100 in an English exam Mean score? Standard deviation?
Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 12 Standard scores Z-score Measure the distance, in standard deviation units, of any value in a distribution from the mean Z = (x- )/ John’s standard scores Z math = (60-55)/10 = 0.5 Z English = (50-45)/5 = 1
Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 13 Create index by Z scores in survey research Triangulation of measures We have measures on income and years of education and we want to combine them to form a socio-economic index Annual incomes vary from 5,000 ~ 500,000 Yeas of education vary from 0 ~20
Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 14 Computing an index score using Z score IncomeYears education Given population values Mean65,00011 Standard Deviation22,0004 Suppose 2 individuals A64,00016 B86,0009 Case ACase B Income: (64,000-65,000)/22,000= -0.05Income: (86,000-65,000)/22,000= 0.95 Education: (16-11)/4= 1.25Education: (9-11)/4= Socio-economic index score 1.20Socio-economic index score0.45
Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 15 Correlation: measure of relationship a measure of the relation between two or more variables. Statistic used: correlation coefficient Between -1 and 1 Direction of relationship The sign of the correlation coefficient Strength of relationship The value of the correlation coefficient
Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 16 Pearson r correlation Simple linear correlation The measurement scales used should be at least interval scales Scatter gram (scatter plot) provide visual inspection of linear correlation Excel can calculate correlation Correlation does not indicate causation
Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 17 Example We collected data about the salaries (y) and years of experience (x) for a sample of 50 auditors. Are there any relationship between the salary and the years of experience?