Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Sunderland CSEM03 R.E.P.L.I. Unit 1 CSEM03 REPLI Research and the use of statistical tools.

Similar presentations


Presentation on theme: "University of Sunderland CSEM03 R.E.P.L.I. Unit 1 CSEM03 REPLI Research and the use of statistical tools."— Presentation transcript:

1 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 CSEM03 REPLI Research and the use of statistical tools

2 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 Objectives At the end of this lecture you will be able to: 1.Be able to plot graphs and know your way around SPSS 2.Describe the shape of normal and non- normal distributions 3.Describe the characteristics of a normal and non-normal distribution Mode, median, mean, standard deviation

3 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 The SPSS statistical tool Recommended text: Andy Field’s Field, A (2005) Discovering Statistics using SPSS, 2 nd Edition, Sage Publishing, London (ISBN 0-7619-4452-4) You can get SPSS for your own machine for £5 from the LRC

4 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 Simple statistical models: mean, sum of squares, variance and standard deviation Mean We can consider the mean of a sample as one of the simplest statistical models because it represents a summary of the data. For example: How may CDs does a group of students own? If we take 5 students then the numbers of CDs owned respectively are 1,2,3,4,6. The mean is the sum of these (  ) divided by the number of students. Mean = 16 = 3.2 5 This is a theoretical mean, you cannot have.2 of a CD. How well does mean represent the data? What are the differences between the observed and the mean?

5 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 Differences between the mean and the observed data 0 1 2 3 4 12345 5 62.8 0.8 - 0.2 -1.2 -2.2 1–3.2 = -2.2 2-3.2 = -1.2 3-3.2 = -0.2 4-3.2 = 0.8 6-3.2 = 2.8 6 No of CDs Student Figure 1 Differences between observed no of CDs and the mean

6 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 Cancelling out the errors in the data (deviation from the mean) x i - = 1-3.2 = -2.2 where xi = first data point x 1 so x 2 - =2 – 3.2 = -1.2 etc The deviances are -2.2, -1.2, -0.2, + 0.8 and +2.8 Total error =  x i - (sum of deviances) = 0

7 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 Sum of Squared errors (SS) So there is no total between our model and the observed data. Some errors are negative, some are positive but they cancel each other out. To avoid the problem of knowing the direction of error (eg in a large dataset) we square each error (a negative squared becomes positive). This is called the Sum of Squared errors (SS) and is a good measure of the accuracy of our model. This however depends on the amount of data collected and the more data points the higher the SS. To overcome this we average the error by dividing SS by the number of observations N.

8 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 A more useful statistic is to use the error in the sample to estimate the error in the population – this is done by dividing SS by the no of observations – 1. This measure is known as the variance

9 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 Variance and Standard Deviation Variance s2 = SS =  (xi - )2 N-1 N-1  ( xi - )2 = 4.84 +1.44 +0.04 + 0.64 + 7.84 = 14.8  (xi - )2 = 14.8 = 3.7 N-1 4 From this statistic we can derive a very useful measure called Standard Deviation (SD) SD =  Variance =  3.7 = 1.92

10 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 Levels of data The type of data you collect will depend on the design of your study. Data can be measured on different scales: Interval data These data are measured on a scale along which intervals are equal. For example if you record ratings of a pop video on a range of 1 to 5 the change between each number should be equal. Categorical data These are any variables that are made up of objects/ entities. For example the UK degree classification system comprises 1, 2:1, 2:2, 3, pass or fail. The interval between each class is not equal. Nominal data This is where numbers can represent names e.g. the numbers on the back of football shirts denotes the type of player ( 1 = goalkeeper).. Measurement data The objects being studied are measured on a quantitative scale. With discrete measurement data only certain values are possible The data can be discrete or continuous. Examples of continuous measurement data are age, height, cholesterol level. Ordinal data A type of categorical data where the order is important e.g Degree classification, seriousness of illness.

11 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 Median The median is the "Middle value" of a list. The smallest number such that at least half the numbers in the list are no greater than it. If the list has an odd number of entries, the median is the middle entry in the list after sorting the list into increasing order. If the list has an even number of entries, the median is equal to the sum of the two middle (after sorting) numbers divided by two. The median can be estimated from a histogram by finding the smallest number such that the area under the histogram to the left of that number is 50%

12 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 Mode For lists, the mode is the most common (frequent) value. A list can have more than one mode. In a histogram, a mode is the most frequently occurring interval (seen as a bump).

13 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 Normal distribution example (scores in a test)

14 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 Positive skew

15 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 Standard deviation  34.1% 13.6% 0.1% 2.1% -3  -2  -1   11 22 33 Mean 

16 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 Find the mean, median, mode, and range of these data. Example: Three dice are rolled 12 times. The sum of the numbers after each roll is recorded below: Numbers rolled 12, 11, 3, 7, 4, 4, 17, 13, 12, 5, 8, 12 Step 1: Rearrange the data elements. 3, 4, 4, 5, 7, 8, 11, 12,12, 12, 13, 17 Step 2: Find the mean. Add all the numbers and divide by 12 =116/12 = 9.7 Step 3: Find the median. The sample size is even, for there are 12 data elements. The median is the average value of the sixth and the seventh elements. median = (8+11)/2 = 9.5

17 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 Step 4: Find the mode. The number 3 occurs once. The number 4 occurs twice. The number 5 occurs once. The number 7 occurs once. The number 8 occurs once. The number 11 occurs once. The number 12 occurs three times. The number 13 occurs once. The number 17 occurs once. mode = 12 Step 5: Find the range. The highest value is 17. The lowest value is 3. range = 17-3 = 14

18 University of Sunderland CSEM03 R.E.P.L.I. Unit 1 Examples for practice The weekly salaries of six employees at McDonalds are £70, £100, £90, £80, £70, £100. For these six salaries, find: (a) the mean (b) the median (c) the mode List the data in order: 70,70,80,90,100,100 Mean: 60+ 70+ 80+ 90 + 90 + 120 = 510 = 85 6 6 Median: 60, 70, 80, 90, 90, 120 The two numbers that fall in the middle need to be averaged. 80 + 90 = 85 2 Mode: The number that appears the most is 90


Download ppt "University of Sunderland CSEM03 R.E.P.L.I. Unit 1 CSEM03 REPLI Research and the use of statistical tools."

Similar presentations


Ads by Google