Understanding Results NRSG 790
Studies have Multiple Types of Validity External Validity – can the study results be generalized to others? Internal Validity – are the study’s results accurate? Statistical Conclusion Validity – were the study sample, methods and effect size(s) sufficient to draw conclusions about relationships studied? Validity and Reliability of Measures – were the variables measured accurately and consistently? Construct Validity – Do the measures used in a study measure the right concept? I.e., a Grief scale measures grief, not depression.
Relationships Between Different Types of Validity External Validity Each level of validity is necessary but not sufficient to ensure the next level of validity Internal Validity Statistical Conclusion Validity & Power Minimize threats to internal validity Validity of Measures Reliability Validity
Theoretical & Operational Definitions: to study a concept quantitatively, one must be able to accurately measure it. Theoretical Definition of Sadness Operational Definition of Sadness “a feeling of sorrow and loss associated with a decrease in excitement about living” a summed rating of 5 or higher on two Likert-type scales that rate feelings of sorrow and feelings of loss
Instruments – often used to “operationalize” (measure) concepts as study variables Instruments = devices that specify and objectify the data collection process Instruments are usually written and can be given directly to a subject or can be used by a researcher to collect data using interviewing or observation A questionnaire is an instrument used to collect specific written data
What is a questionnaire? A questionnaire may contain scales Scale = a set of written questions or statements that in combination are intended to measure a specified variable The individual questions or statements on a scale are called items
Example 1. I feel sorrowful 2. I feel lost inside always sometimes occasionally never 1 2 3 4 2. I feel lost inside 3. I feel disconnected Scale Item
Decreasing error in quantitative methods— Reliability and Validity of Measures Reliability = the ability of a measure to consistently yield the same results if there has truly been no change Three approaches to measuring or checking reliability of quantitative measures
Tests for Reliability of Measures Inter-rater reliability Test-retest reliability Internal consistency reliability
Validity of Measures Examines how accurate the measure is or how true results are using the measure A measure can be reliable but not valid That is, it can be consistent, but consistently measure the wrong thing
Types of Validity of Measures Content validity Face Validity Criterion-related Validity Construct Validity
Descriptive & Inferential Statistics Descriptive Statistics - used to summarize data Mean Median Mode Measures of variation (e.g, range, standard deviation, variance) Inferential Statistics - used to make inferences (generalizations) to a larger population from which the sample was drawn; to test research hypotheses T-test ANOVA Pearson’s r regression
Why Use Descriptive Statistics? Summarize useful data. Example:
Why Use Inferential Statistics? To help discover: Are there relationships between variables? Are differences observed between groups significant? Or are they due to random variation? After an experimental study’s intervention is done, did the intervention change the results for the experimental group or are the results for the experimental and control groups the same (except for random variation)?
Probability & Hypothesis Testing Hypotheses examine relationships among variables, differences among groups, changes in variables over time and more complex relationships among multiple variables
Hypothesis Testing Evaluate Null Hypothesis vs Alternative Hypothesis The alternate hypothesis is the research hypothesis, that the process, change, intervention or treatment has an effect. The null hypothesis is the opposite, that the change or treatment did not have an effect. Hypotheses can be supported or not supported, but not “proven”
Logic behind Hypothesis Testing: Ruling out chance as an explanation If an independent variable appears to have an effect, it’s important to be able to state with confidence that the effect was really due to the variable and not just due to chance. The concept of statistical significance testing is based on the sampling distribution of a particular statistic, e.g., t statistic.
The Normal Curve – if we calculate a statistic based on data from multiple independent samples, we get a “normal” distribution of different values, with most grouped in the middle. -1 σ -2 σ -3 σ 1 σ 2 σ 3 σ 34.13% 13.59% 2.14% .13% μ
Probability & Hypothesis Testing Results of hypothesis testing have some probability of being incorrect Support for hypotheses is expressed in terms of a test statistic value (e.g., t-test, ANOVA, chi-square) and a p value P value indicates probability that alternative hypothesis and rejection of the null hypothesis were arrived at by chance P=0.03 means a 97% likelihood that the findings supporting the alternate hypothesis did not occur by chance alone Alpha indicates the p value predetermined (by the researcher) as the cut-point for rejecting or supporting the null hypothesis, often set at =< 0.05
Alpha (α) set at < 0.05 means that there is less than 5% probability that findings occurred due to chance alone. In a “two-tailed” test of statistical significance, used when we do not know if there is a relationship between independent and dependent variables, alpha set at < 5% = the sum of the 2.5% probability at either end of the normal curve that findings were due to chance alone.
Probability & Hypothesis Testing If p< the set alpha value, then the null hypothesis is rejected P value is influenced by sample size; the larger the sample the greater the likelihood of finding a significant difference, if there is one. It is important not to confuse the confidence with which the null hypothesis can be rejected with the effect size of the intervention. Statistical significance is not same as practical or clinical significance.
Errors in Hypothesis Testing Type I error (False positives) The likelihood that the null hypothesis is rejected when it is correct The likelihood that the alternative hypothesis is accepted when it is incorrect (i.e., there is no difference, but you find a difference) Type II error (False negatives) the likelihood that the null hypothesis is accepted when it is incorrect The likelihood that the alternative hypothesis is rejected when it is correct (i.e., there is a difference, but you find no difference)
Type I and Type II Errors: Example Testing the Accuracy of a New Screening test for HIV infection Type I Error: Results suggests groups are different, but actually they are the same Type II Error: Results suggests groups are not different, but actually they are different Correct decision: Results suggests groups are different and they are different Correct decision: Results suggests groups are same and they are the same
TRADEOFF BETWEEN TYPE I AND TYPE II ERRORS Decreasing the probability of one increases the probability of the other Lowering the significance level will lower the risk of Type I, but at the same time you increase the chance of Type II Errors To reduce the risk of Type II Errors, you could raise the level of significance (α<.10) to make it easier to detect differences and reject the null
Comparison of Sample Means (b) 1- (Power) s1 s2 CV
Levels of Measurement – It’s important to know the level of measurement of a variable in order to choose an appropriate inferential statistic to test a hypothesis. Categorical Variables Continuous Variables
How to Choose a Statistic to Test a Hypothesis How to Choose a Statistic to Test a Hypothesis? -a more detailed chart is inside the cover of your Polit & Beck text Q: What if you want to compare means for 3 groups, e.g., 2 experimental groups & a control group? A: ANOVA, others