Download presentation
Presentation is loading. Please wait.
1
Sunee Raksakietisak Srinakharinwirot University sunee@g.swu.ac.th
Statistical Techniques for Data Analysis : Overview of Statistical Techniques Selection Sunee Raksakietisak Srinakharinwirot University
2
About this presentation
These set of slides were last used in the presentation titled “Reporting Statistical Data Analysis Results” At the seminar and workshop on “Writing Scientific Articles for Publication IV” on July 2008 Organized by Thai-Australian Technological Services Center (TATSC) ( seminar-workshop-writing-scientific-articles-for- publication-iv july-2008)
3
Introduction Article ScienceAsia Vol.32 No.1 march Understanding Data: Important for All Scientists, and Where Any Nation Might Excel My work is in response to this article and for my presentation: Back to the basic of statistical methods
4
Research for my talk Investigate how statistical methods have been used in scientific papers The articles in ScienceAsia (online) were investigated: Current issue (Vol.33 No.2 June 2007) back to Vol.31 No1. March 2005 Total of 154 articles The keyword “statistic” was used for searching covers words like “statistics”, “statistically” some articles need the word “significant”
5
Research for my talk (cont’d)
Results: 40 articles out of 154 articles used statistical methods (26%)
6
Statistical Methods What are the statistical methods that have been used? And how often? Results: t-test (28% ) One way ANOVA (68%) Two way ANOVA (10%) Correlation (13%) Regression (15%) Others (3%)
7
Further questions? How good is the writing/reporting of statistical part? Evaluation in a 10 points scale: Title: – 1 point Abstract: 0 – 2 points Method: 0 – 3 points Results: 0 – 4 points How much is the misconception of writing/reporting statistical results/conclusion?
8
Overview of Statistical Methods
Descriptive statistics Qualitative data (nominal, ordinal) : Frequency and percentages Quantitative data (interval, ratio / scale / numeric) : Mean and SD Distinguish between SD and SEM (Standard Error of the Mean) ! Inferential statistics Hypothesis testing
9
Statistical Methods t-test : compare two means
One way ANOVA : compare two or more means One factor (the effect of the factor on the measured variables) Two-way ANOVA Two factors (the effect of 2 factors on the measured variables)
10
Steps in statistical hypothesis testing
Formulate hypothesis: Ho and H1 Set level of significance (α = 0.05, 0.01, 0.10) Statistics used to test hypothesis in (1) This statistics is called “Inferential statistics” Formulae (don’t need to know) Has distribution (Z, t, F, χ2) Decision rule: Reject Ho if P-value < α Calculate statistics and p-value Statistical package gives these values Make decision: Reject Ho or Do not reject Ho Reject Ho means that the test is significant
11
Normal t Chi-square F
13
What is P-value P-value is the probability from the value of statistics to tails of distribution (either one tail or two tails) Web page to calculate the p-value of various distribution: (
14
P-value (cont’d) P-value can never be zero !!!
Often found misconception since the statistical package gives value up to some decimal places e.g. for 3 decimal places, if p-value is very small--smaller than the package will show .000 hence we have to say P < .001 instead of P = .000
15
Misconception about Alpha and P-value
The frequency of cell division was calculated after 2 weeks of culture and was statistically analyzed by analysis of variance (ANOVA) at p ≤ 0.05 (correct??) Means within a column followed by the same letter are not significantly different at P ≤ 0.05 according to DMRT (correct??)
16
Misconception (cont’d)
Statistical significance was defined as ρ < 0.05 (correct??) The repeated measurements of L’ value and rehydration ratio of the dehydrated products from different pre- treatments were subject to analysis of variance (p=0.05) (correct??) Significant difference at p < .05 (correct??)
17
Correct concept Collected data were statistically analyzed and mean separation was calculated according to the Least Significant Difference (LSD) method at the 5% level of significance (Correct) Results were considered to be statistically significant when p<0.05 (Correct)
18
Correct concept * P < 0.05, ** P< 0.01, *** P<0.001; ns not significant (Correct) ** = significant at 1% level, ns = non-significant (Correct) The bars with the same letter are not significantly different (P>0.05) (Correct)
19
About t-test Two variables
Dependent (variable to compare mean): scale Independent (group variable): nominal Has 2 levels/groups Common mistakes: Independent variable has more than 2 groups, did t-test for many pairs (should do ANOVA)
20
About t-test (cont’d) Statistical test: test whether the two means are different significantly The test is significant when the null hypothesis (mean the same) is rejected; that is the means are different T-test has 2 formula: variances equal and variances not equal
21
Reporting results (Example)
See worksheet N, Mean, and SD for each group, t, and p-value In journal articles, different ways of reporting Report Mean and SD (no N) Report Mean and SEM Report as Mean ± SD, Mean ± SEM Note: SEM = SD/√N SEM gives the picture of Confidence Interval (C.I.)
22
About One way ANOVA Two variables
Dependent (variable to compute mean): scale One independent (factor): nominal Has 2 levels/groups or more t-test is a special case of one way ANOVA T2 = F
23
One way ANOVA (cont’d) Statistical test: test whether there is any effect of factor on dependent variable (or are all the means equal?) F – test (test statistics has F distribution) The test is significant means that there is an effect of factor on dependent variables; at least one pair of the mean is different Multiple comparisons of all pairs of mean by LSD, Duncan, SNK, Tukey, Bonferroni, Scheffé
24
Reporting Results (Example)
See worksheet N, Mean, and SD for each group F, and p-value Symbol indicating the difference in means from multiple comparisons
25
About Two way ANOVA Three variables
Dependent (variable to compute mean): scale Two independent variables (factors): nominal
26
Two way ANOVA (cont’d) Statistical test:
Test for interaction effect first Plot graph to visualize interaction effect If no interaction effect then test for main effect of each factor (one way ANOVA) If there is interaction effect then test for simple effect: the effect of one factor for each level of another factor
27
Reporting Results (Example)
See worksheet N, Mean, and SD for each group Table indicating the significant of main effect of each factor and interaction effect
28
Stop here for a minute t-test, ANOVA is parametric statistical methods for mean comparisons It is a univariate analysis (one dependent variable) Parametric methods have an assumption that the dependent variable has normal distribution
29
Test for Normality Test for normality can be easily done by statistical package If not normal, try transformation If normal, then parametric test can be used If still not normal after transformation, use nonparametric statistical methods If other assumptions of parametric such as equal variances are not assumed, use nonparametric test
30
Nonparametric test Rank of data is used instead of raw data
Robust but give lower power than parametric test Equivalent parametric , nonparametric methods see summary commands in SPSS Most of the time the conclusion by either parametric or nonparametric tests are the same
31
From comparison to Modeling
Most of the scientific experiments, manipulated (independent) variables are quantitative variables But when doing the experiment, some values are selected for experiment Temperature (e.g. 3 levels of temperature) To see effect of temperature on …. (dependent variable) One way ANOVA Show graph of mean of dependent variable on each level of temperature
32
Correlation Correlation of 2 variables (both must be scale variables)
Correlation often mean Pearson Correlation Assume linear correlation Assume bivariate normal distribution (It is parametric methods) Nominal variable with 2 values (level) is ok (watch out if more than 2 values, not ok) If not normal, use rank correlation (nonparametric)
33
Regression It is a modeling technique: cause (independent) and effect (dependent) Model: Regression equation (prediction equation) How good is the model: R2, percentage of variance of dependent variable explained (accounted for) by independent variables (predictors) Only one dependent variables, but can be many independent variables (predictors) All must be scale variables Modeling: Enter, Forward, Backward, Stepwise
34
Nominal Dependent Variable
Variables of interest (dependent) often nominal in medical area Has lung cancer or doesn’t have Has heart attack or doesn’t have Use chi-square to test the differences (like t-test or ANOVA) Use logistic regression for modeling
35
Reliability Analysis Cognitive test analysis
Reliability coefficient: KR20 / Cornbach Alpha Item statistics: difficulty index, discrimination index (item to total correlation / point biserial correlation) Affective test analysis (e.g. likert scale) Reliability coefficient: Cornbach Alpha Item statistics: discrimination index (item to total correlation) See details in handout article
36
My hope and final remark
You have big picture of how to choose statistical methods for your data analysis You know how/what to report statistical data analysis results in the research journal articles See examples of research articles using various statistical methods
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.