Lecture 1 Fadzly, N Normality and data transformation + Non-parametric tests
Are our data “normal”? Many parametric statistical tests assumes that our gathered data comes from a normally distributed population However in reality..that is not the case
Are our data “normal”? Usually, one have to sample laboriously, extensively, and have a large quantity of data - remember the Central Limit Theorem Most data will be distribution free (applicable for non- parametric techniques) Counts data for example in theory can never be normal Certain nominal data – for example age class or sex, are usually multimodal
How to check for normality? The Hard Way Gaussian equation Goodness-of –Fit test Kolgomorov-Smirnov Shapiro-Wilk The easy way Simply plot out the data/histogram….see whether they look normal General rule of thumb-calculate the mean and standard s deviation, if 70 % of the observation fall within the interval of
example xf = 8.6; s =1.5 is 7.1 to 10.1, which contains 7/10 observation-sample normal. So the sample is normal.
My general rules: If I don’t mention anything, assume its normal If I mention that its not normal…then its not normal If I mention that its not normal, but I want it normalised…then normalise the data! (Data transformation) However, I sometime test/ trick the students…watch out for keywords like median, average, ordinal
Data transformation Why do we need to transform data? To satisfy the distributional parameter requirements for certain types of parametric tests (which requires all to be normally distributed) Easier to be explained visually
Option 1 : Square root The most easiest method Every non-negative real number x has a unique non-negative square root, called the principal square root Appropriate when the variance of a sample of count data is about equal to the mean, or similar to the Poisson distribution X is transformed into If there are zero counts, replace observation by
Option 2: Logarithmic transformation Appropriate when the variance of a sample of a count data is larger than the mean. x is transformed into log x
Option 3: The Arcsinh Appropriate when there are observations of zero in the count data. Arcsinh/Arcshine Using scientific calculator, set the mode to radians (rad). Enter the value, press inv key, followed by hyp then sin key. Or on certain casio models- press shift and press sin (on the display should show sin -1 The answer will be in degree 0 or radians.
Data transformation Is it compulsory? Depends on the researcher. The decision to use parametric or non-parametric tests. Data transformation simply refers to change a not normal curve into a bell shape normal curve Caution- data is lost through transformation! Do not try multi-level transformation.