Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introductory Statistics. Learning Objectives l Distinguish between different data types l Evaluate the central tendency of realistic business data l Evaluate.

Similar presentations


Presentation on theme: "Introductory Statistics. Learning Objectives l Distinguish between different data types l Evaluate the central tendency of realistic business data l Evaluate."— Presentation transcript:

1 Introductory Statistics

2 Learning Objectives l Distinguish between different data types l Evaluate the central tendency of realistic business data l Evaluate the dispersion of data l Evaluate test statistics l Use a test statistic to formulate a business decisions using regression analysis After the session the students should be able to:

3 Types of data Discrete (A variable controlled by a fixed set of values) Continuous data (A variable measured on a continuous scale ) These data may be collected (ungrouped) and then grouped together in particular form so that can be easily inspected But how would we collect data?

4 Simple random sampling Stratified sampling Cluster sampling Quota sampling Systematic sampling Mechanical sampling Convenience sampling Sampling Techniques

5 Frequency distributions The following are data of ages of a sample of ages managers How could we represent these data effectively?

6 Scattering the data Scatter Diagrams Bar Diagrams

7 The histogram We could group the data into convenient class intervals thus and plot these to produce a histogramplot these to produce a histogram What measures of the central tendency do we have

8 Measures of the central tendency Mode The maximum value of the distribution e.g. the most occurring value (in reality this can be evaluated using a standard formula Median The central value of a set of data or a distribution. Can be evaluated using a standard method of using the CDF Arithmetic mean The central value assuming the data are distributed in accordance to an arithmetic progression Geometric mean The central value assuming the data are distributed according to a geometric progression

9 The mode For our data this occurs between 30-39 (the modal range) The construction shown can be employed to home in on the exact value Or the formula: where L=lower boundary, l=lower freq diff, u=upper freq diff & c=the class boundary width

10 The mode Here, for our data L=29.5, l=5, u=1and the class boundary width c=10

11 The Median For our data we could evaluate this quantity two fold Approximate using by plotting the cumulative frequency diagramcumulative frequency diagram Via logical inference

12 Measures of Dispersion The range Largest value minus Smallest value Variance Mean Square variation from the mean Standard Deviation Square root of the variance NOTE:

13 Use of Computer packages Example: Given the following data use a spreadsheet to produce a grouped histogram using 9 bins also produce a CFD. Hence or otherwise evaluate:spreadsheet a) Three measures of the central tendency and, b) Three measures of the dispersion

14 Decision Processes This is all very well and good however, how does this allow us to make research and managerial & research decisions? To answer this we need to consider the pattern of the data, thus:

15 Many sets of data adhere to the normal distribution. The most important distribution of them all It is pretty much this property that allows us to obtain (research) management decisions The normal distribution is usually written N(μ,σ 2 ); with μ the population mean and σ 2 the variance The Normal distribution

16 Properties of N(μ,σ 2 ) For any normal curve with mean mu and standard deviation sigma: 68 percent of the observations fall within one standard deviation sigma of the mean. 95 percent of observation fall within 2 standard deviations. 99.7 percent of observations fall within 3 standard deviations of the mean.

17 The Z-Score This is formula that allows us to evaluate the probability of an event if we know that a particular population is normally distributed normally distributed N(48,12), find the probability that some value of X<20. Example: If a population is N(48,12), find the probability that some value of X<20.

18 Solution Protocol 1. Establish hypothesis 2. Evaluate the Z- score 3. Sketch the distribution 4. Evaluate probability probability -2.15 p

19 Spreadsheet Solution Protocol 1. Establish hypothesis 2. Use normal distribution functionfunction 3. Perform Check i.e. use Z-function

20 Exercise Example: Using a z score If a population is N(111,33.8 2 ), find the probability that some value of 100 <X<150.

21 Exercise Using a z score and given that the population is N(37,4.35 2 ), find the probability that some value of X>150.

22 Samples If we are using a sample of values as a consequence of the central limit theorem the z score will change, thus

23 The mean expenditure per customer at a tire store is £60 and the sd £6. It is known that the nominal customer per day is 40. A new product costs £64, what is the probability of selling such a product per customer Example

24 Try one In a store, the average number of shoppers is 448, with an sd of 21. What is the probability that 49 shopping hours have a mean between 441 and446.

25 Regression & Correlation analysis A scatter diagram can be used to show the relationship between two variablesscatter diagram Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation Correlation is only concerned with strength of the relationship No causal effect is implied with correlation Scatter diagrams were presented in the last sessions As was Correlation

26 Regression & Correlation analysis A scatter diagram can be used to show the relationship between two variables Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the relationship No causal effect is implied with correlation Scatter diagrams were presented in the last sessions As was Correlation

27 Introduction to Regression Analysis oRegression analysis is used to:  Predict the value of a dependent variable based on the value of at least one independent variable  Explain the impact of changes in an independent variable on the dependent variable oDependent variable: the variable we wish to predict or explain oIndependent variable: the variable used to explain the dependent variable

28 Simple Linear Regression Model o Only one independent variable, X o Relationship between X and Y is described by a linear function o Changes in Y are assumed to be caused by changes in X

29 Types of Relationships Y X Y X Y Y X X Linear relationshipsCurvilinear relationships

30 Types of relationships cont… Y X Y X Y Y X X Strong relationships Weak relationships

31 Types of Relationships Y X Y X No relationship

32 The regression model Linear component Population Y intercept Population Slope Coefficient Random Error term Dependent Variable Independent Variable Random Error component

33 The regression model Random Error for this X i value Y X Observed Value of Y for X i Predicted Value of Y for X i XiXi Slope = β 1 Intercept = β 0 εiεi

34 The Least Squares approach b 0 and b 1 are obtained by finding the values of b 0 and b 1 that minimize the sum of the squared differences between Y and : Rendering: The proof of these requires the calculusproof

35 Regression Formulae Thus the formulae can be summarized as: Where:

36 Regression Example An estate agent wishes to find the relationship between the house prices and size, it is suspected that a linear relationship exists between the house price (the dependent variable Y) and the house size in square metres (the independent variable X). Using linear regression, find the relationship and make a prediction of a house price measuring 200m2. The following data have been collected by the estate agent.

37 Regression data House Price in £k (Y) Area in m sqr (X) 123156 178 140189 154208 100122 110172 203261 162272 160158 128189

38 Regression Solution It is usual to set up a table of results, using an appropriate Excel spreadsheetspreadsheet

39 Regression Solution Cont… Now we simply apply the formulae as follows, first the regression coefficient, i.e. the gradientformulae

40 Regression Solution Cont… Then we evaluate the regression constant There are various computer methods available which do these calculations for you these are detailed in the handoutcomputer methods handout

41 Regression computer solution There a three methods to evaluate the Regression coefficient and constant using an Excel spreadsheet. These being: Graphical Calculation Functions

42 Regression computer solution Cont… This is an example of the graphical method, which is required for a pass grade in the forthcoming assignment! If you want higher grades however you will have to check these answers using the other two methods shown in the handouthandout

43 Summary l Distinguish between different data types l Evaluate the central tendency of realistic business data l Evaluate the dispersion of data l Evaluate test statistics l Use a test statistic to formulate a business decisions using regression analysis Have we met out learning objectives? Specifically are you able to:


Download ppt "Introductory Statistics. Learning Objectives l Distinguish between different data types l Evaluate the central tendency of realistic business data l Evaluate."

Similar presentations


Ads by Google