© Copyright 2001, Alan Marshall1 Statistics
2 Statistics è Branch of Mathematics that deals with the collection and analysis of data è Descriptive Statistics: used to analyze and describe data è Inferential Statistics: used to use the information to make statements regarding the relationships between variables or the expectations about future events.
© Copyright 2001, Alan Marshall3 Measures of Central Tendency
© Copyright 2001, Alan Marshall4 Measures of Central Tendency è Arithmetic Mean è Median è Mode è Geometric Mean
© Copyright 2001, Alan Marshall5 Arithmetic Mean è Other names u Average u Mean
© Copyright 2001, Alan Marshall6 Arithmetic Mean è The calculation is identical, just the notation varies slightly
© Copyright 2001, Alan Marshall7 Summation Notation è Notice that the first form uses less vertical space on the page è This makes accountants very happy è The first can also be easier to fit into a line of text
© Copyright 2001, Alan Marshall8 Example è Ten second year BBA students wrote the CSC exam last month è Their scores were: 71, 72, 88, 69, 77, 63, 91, 81, 83, 75
© Copyright 2001, Alan Marshall9 Calculating the Mean è Arithmetic mean u sum the observations and divide by the number of observations è Example: 5%, 7%, -2%, 12%, 8%
© Copyright 2001, Alan Marshall10 Problem with the Arithmetic Mean è Arithmetic mean is incorrect for variables that are related multiplicatively, like rates of growth, rates of return and rates of change è $1,000 at 6% for 5 years should be $1,338.23
© Copyright 2001, Alan Marshall11 Geometric Mean è The Geometric Mean should be used for rates of change, like rates of return
© Copyright 2001, Alan Marshall12 Geometric Mean è The Geometric Mean should be used for rates of change, like rates of return Means: The product of these factors from 1 to N
© Copyright 2001, Alan Marshall13 Geometric vs. Arithmetic Mean è The more variable the underlying data, the greater the error using the Arithmetic mean è The Geometric Mean is often easier to calculate: u Stock prices: 1992: $20; 1999: $40, R = 10.41%
© Copyright 2001, Alan Marshall14 Geometric vs. Arithmetic Mean è For analysis of past performance, use the Geometric mean u The past returns have averaged 5.898% è To use the past returns to estimate the future expected return, use the Arithmetic mean u The expected return is 6%
© Copyright 2001, Alan Marshall15 Median and Mode è Median: Midpoint u If odd number of observations: Middle observation u If even number of observations: Average of middle 2 observations è Mode: Most frequent
© Copyright 2001, Alan Marshall16 Example è Our CSC mark data was (sorted): 63, 69, 71, 72, 75, 77, 81, 83, 88, 91 è The median is 76 è There is no mode
© Copyright 2001, Alan Marshall17 Example è The Deviation is the difference between each observation and the mean è The sign indicates whether the observation is above (+) or below (-) the mean
© Copyright 2001, Alan Marshall18 Example è The average deviation is always zero è If it isn’t, you must have made a mistake!
© Copyright 2001, Alan Marshall19 Measures of Dispersion
© Copyright 2001, Alan Marshall20 Measures of Dispersion è So far, we have look at measures of central tendency è What about measuring the tendency of the data to vary from these centre?
© Copyright 2001, Alan Marshall21 Measures of Dispersion è Range u Highest - Lowest è Variance è Standard Deviation
© Copyright 2001, Alan Marshall22 Example è The range is 91-63=28 è The range can be extremely sensitive to outlier observations è Suppose one of these students had a very bad day and scored 8. u The range would now be 91-8=83
© Copyright 2001, Alan Marshall23 Mean Absolute Deviation è The Mean Absolute Deviation is a measure of average dispersion that is not used very much è It has some undesirable mathematical properties beyond the level of this course
© Copyright 2001, Alan Marshall24 Mean Squared Deviation è The Mean Squared Deviation is very commonly used è The MSD in this example is 694/10=69.4 è The more common name of the MSD is the VARIANCE
© Copyright 2001, Alan Marshall25 Variance è Variance measures the amount of dispersion from the mean. è For Populations:For Samples:
© Copyright 2001, Alan Marshall26 Standard Deviation è Standard Deviation measures the amount of dispersion from the mean. è For Populations:For Samples:
© Copyright 2001, Alan Marshall27 Standard Deviation Example è Using the previous example è The data is sample data
© Copyright 2001, Alan Marshall28 Interpreting the Std. Dev. è You have heard of the Bell Shaped or Normal Distribution è The properties of the Normal Distribution are well known and give us the EMPIRICAL RULE
© Copyright 2001, Alan Marshall29 Normal Distribution
© Copyright 2001, Alan Marshall30 Empirical Rule For approximately Normally Distributed data: Within 1 of the mean: approx.. 2/3s Within 2 of the mean: approx. 95% (19/20) Within 3 of the mean: virtually all
© Copyright 2001, Alan Marshall31 Quartiles, Percentiles, etc. è The Median splits the data in half è Quartiles split the data into quarters è Deciles split the data into tenths è Percentiles split the data into one- hundredths
© Copyright 2001, Alan Marshall32 Rank Measures è “That was a top-half performance” è “WTG Special fund has been a top quartile performer for the past 5 years” è “Our programme accepts only students proven to be top decile performers” è “I was in the 92nd percentile on the GMAT”
© Copyright 2001, Alan Marshall33 Using Excel è Full Descriptive Statistics u Tools ä Data Analysis l Descriptive Statistics
© Copyright 2001, Alan Marshall34 Measures of Association
© Copyright 2001, Alan Marshall35 Bivariate Statistics è So far, we have been dealing with statistics of individual variables è We also have statistics that relate pairs of variables
© Copyright 2001, Alan Marshall36 Interactions Sometimes two variables appear related: è smoking and lung cancers è height and weight è years of education and income è engine size and gas mileage è GMAT scores and MBA GPA è house size and price
© Copyright 2001, Alan Marshall37 Interactions è Some of these variables would appear to positively related & others negatively è If these were related, we would expect to be able to derive a linear relationship: y = a + bx è where, b is the slope, and è a is the intercept
© Copyright 2001, Alan Marshall38 Linear Relationships è We will be deriving linear relationships from bivariate (two-variable) data è Our symbols will be:
© Copyright 2001, Alan Marshall39 Example è Consider the following example comparing the returns of Consolidated Moose Pasture stock (CMP) and the TSE 300 Index è The next slide shows 25 monthly returns
© Copyright 2001, Alan Marshall40 Example Data
© Copyright 2001, Alan Marshall41 Example è From the data, it appears that a positive relationship may exist u Most of the time when the TSE is up, CMP is up u Likewise, when the TSE is down, CMP is down most of the time u Sometimes, they move in opposite directions è Let’s graph this data
© Copyright 2001, Alan Marshall42 Graph Of Data
© Copyright 2001, Alan Marshall43 Example Summary Statistics è The data do appear to be positively related è Let’s derive some summary statistics about these data:
© Copyright 2001, Alan Marshall44 Observations è Both have means of zero and standard deviations just under 3 è However, each data point does not have simply one deviation from the mean, it deviates from both means è Consider Points A, B, C and D on the next graph
© Copyright 2001, Alan Marshall45 Graph of Data
© Copyright 2001, Alan Marshall46 Implications è When points in the upper right and lower left quadrants dominate, then the sums of the products of the deviations will be positive è When points in the lower right and upper left quadrants dominate, then the sums of the products of the deviations will be negative
© Copyright 2001, Alan Marshall47 An Important Observation è The sums of the products of the deviations will give us the appropriate sign of the slope of our relationship
© Copyright 2001, Alan Marshall48 Covariance (Showing the formula only to demonstrate a concept)
© Copyright 2001, Alan Marshall49 Covariance
© Copyright 2001, Alan Marshall50 Covariance è In the same units as Variance (if both variables are in the same unit), i.e. units squared è Very important element of measuring portfolio risk in finance
© Copyright 2001, Alan Marshall51 Covariance in Excel è Tools u Data Analysis ä Covariance
© Copyright 2001, Alan Marshall52 Interpreting the Result è This gives us the variances (7.25 & 6.25) and the covariance between the variables, è In fact, variance is simply the covariance of a variable with itself!
© Copyright 2001, Alan Marshall53 Using Covariance è Very useful in Finance for measuring portfolio risk è Unfortunately, it is hard to interpret for two reasons: u What does the magnitude/size imply? u The units are confusing
© Copyright 2001, Alan Marshall54 A More Useful Statistic è We can simultaneously adjust for both of these shortcomings by dividing the covariance by the two relevant standard deviations è This operation u Removes the impact of size & scale u Eliminates the units
© Copyright 2001, Alan Marshall55 Correlation è Correlation measures the sensitivity of one variable to another, but ignoring magnitude è Range: -1 to 1 è +1: Implies perfect positive co-movement è -1: Implies perfect negative co-movement è 0: No relationship
© Copyright 2001, Alan Marshall56 Calculating Correlation
© Copyright 2001, Alan Marshall57 Correlation in Excel è Tools u Data Analysis ä Correlation
© Copyright 2001, Alan Marshall58 Interpreting the Result è The correlation of a variable with itself is 1 è The correlation between CMP and the TSE Index in this example is u This is positive, and relatively strong
© Copyright 2001, Alan Marshall59 Estimating Linear Relationships
© Copyright 2001, Alan Marshall60 Estimating Linear Relationships è Often the data imply that a linear relationship exists è We can estimate this relationship using the Least Squares Method of Regression è We will just learn to use the Excel output and interpret it
© Copyright 2001, Alan Marshall61 TSE-CMP Regression Output (Abridged)
© Copyright 2001, Alan Marshall62 Interpreting the Output
© Copyright 2001, Alan Marshall63 Where We Are Going è We will develop the use of the regression technique more fully u Multiple explanatory variables è Some Time-Series Applications