Analyzing & Interpreting Data Assessment Institute Summer 2005.

Slides:



Advertisements
Similar presentations
Item Analysis.
Advertisements

Test Development.
FACULTY DEVELOPMENT PROFESSIONAL SERIES OFFICE OF MEDICAL EDUCATION TULANE UNIVERSITY SCHOOL OF MEDICINE Using Statistics to Evaluate Multiple Choice.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.
Inference for Regression
Statistical Tests Karen H. Hagglund, M.S.
Data Analysis Statistics. Inferential statistics.
T-tests Computing a t-test  the t statistic  the t distribution Measures of Effect Size  Confidence Intervals  Cohen’s d.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Matching level of measurement to statistical procedures
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
Correlations and T-tests
Introduction to Probability and Statistics Linear Regression and Correlation.
Data Analysis Statistics. Inferential statistics.
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
Inferential Statistics
Chapter 9 Comparing Means
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Selecting the Correct Statistical Test
Inference for regression - Simple linear regression
Statistical Analysis & Techniques Ali Alkhafaji & Brian Grey.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Some Introductory Statistics Terminology. Descriptive Statistics Procedures used to summarize, organize, and simplify data (data being a collection of.
Statistical Analysis Statistical Analysis
Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.
Analysis & Interpretation: Individual Variables Independently Chapter 12.
More About Significance Tests
STEM Fair Graphs & Statistical Analysis. Objectives: – Today I will be able to: Construct an appropriate graph for my STEM fair data Evaluate the statistical.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
T-TEST Statistics The t test is used to compare to groups to answer the differential research questions. Its values determines the difference by comparing.
Cluster 5 Spring 2005 Assessment Results Sociocultural Domain.
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Correlation Chapter 15. A research design reminder >Experimental designs You directly manipulated the independent variable. >Quasi-experimental designs.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
6/4/2016Slide 1 The one sample t-test compares two values for the population mean of a single variable. The two-sample t-test of population means (aka.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
1 Inferences About The Pearson Correlation Coefficient.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
Chapter 6: Analyzing and Interpreting Quantitative Data
Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.
1-Sample t-test Amir Hossein Habibi.
Organizing and Analyzing Data. Types of statistical analysis DESCRIPTIVE STATISTICS: Organizes data measures of central tendency mean, median, mode measures.
PART 2 SPSS (the Statistical Package for the Social Sciences)
Review: Stages in Research Process Formulate Problem Determine Research Design Determine Data Collection Method Design Data Collection Forms Design Sample.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Practice As part of a program to reducing smoking, a national organization ran an advertising campaign to convince people to quit or reduce their smoking.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
What statistical tests have we learned so far? Descriptive statistics (chp. 12) –Mean, median, mode –Frequency of each response (frequencies), range, standard.
Psychometrics: Exam Analysis David Hope
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
+ Mortality. + Starter for 10…. In pairs write on a post it note: One statistic that we use to measure mortality On another post it note write down: A.
Choosing and using your statistic. Steps of hypothesis testing 1. Establish the null hypothesis, H 0. 2.Establish the alternate hypothesis: H 1. 3.Decide.
Statistical hypothesis Statistical hypothesis is a method for testing a claim or hypothesis about a parameter in a papulation The statement H 0 is called.
Copyright © 2009 Pearson Education, Inc. Chapter 25 Paired Samples and Blocks.
CHAPTER 15: THE NUTS AND BOLTS OF USING STATISTICS.
Practice As part of a program to reducing smoking, a national organization ran an advertising campaign to convince people to quit or reduce their smoking.
Classroom Analytics.
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Research Statistics Objective: Students will acquire knowledge related to research Statistics in order to identify how they are used to develop research.
Using statistics to evaluate your test Gerard Seinhorst
15.1 The Role of Statistics in the Research Process
Presentation transcript:

Analyzing & Interpreting Data Assessment Institute Summer 2005

Categorical vs. Continuous Variables ► Categorical Variables  Examples  Student’s major, enrollment status, gender, ethnicity; also whether or not the student passed the cutoff on a test ► Continuous Variables  Examples  GPA, test scores, number of credit hours. ► Why make this distinction?  Whether a variable is categorical or continuous affects whether a particular statistic can be used ► Doesn’t make sense to calculate the average ethnicity of students!

Averages ► Typical value of a variable ► In assessment we commonly compare averages of:  Different groups ► Each group consists of different people ► Avg. score on a test for students in different classes  Different occasions ► Same people tested on each occasion ► Avg. score on a test for students who took the test as freshmen an then again when they were seniors

Before calculating an average… ► Check to make sure that the variable:  Is continuous  Has values in your data set that are within the possible limits ► Check minimum and maximum values  Does not have a distribution that is overly skewed ► If so, consider using median  Does not have any values that would be considered outliers

Histogram

Correlations (r) ► Captures linear relationship between two continuous variables (X and Y) ► Ranges from -1 to 1 with values closer to |1| indicating a stronger relationship than values closer to 0 (no relationship) ► Positive values:  High X associated with high Y; low X associated with low Y ► Negative values:  High X associated with low Y; low X associated with high Y

Scatterplot: Does relationship appear linear? Is there a problem with restriction of range? Does there appear to be outliers? In this example, dropping cases that appeared to be outliers did not change the relationship between the two administrations (r =.30), nor their averages.

Standards ► May want to use standard setting procedures to establish “cut-offs” for proficiency on the test ► Could be that students are gaining knowledge/skills over time, but are they gaining enough? ► Another common statistic calculated in assessment is the % of students meeting or exceeding a standard

A. Are the 29 senior music majors in Spring 2005 scoring higher on the Vocal Techniques 10-item test than last year’s 20 senior music majors? ► Compare averages of different groups Yes, this year’s seniors scored higher (M = 6.72) than last year’s (M = 6.65).

B. Are senior kinesiology majors in different concentrations (Sports Management vs. Therapeutic Recreation) scoring differently on a test used to assess their “core” kinesiology knowledge? ► Compare averages of different groups

C. On the Information Seeking Skills Test (ISST), what percent of incoming freshmen in Fall 2004 met or exceeded the score necessary to be considered as having “proficient” information literary skills? Of the 2862 students attempting the ISST, 2751 (96%) met or exceeded the “proficient” standard. ► Percent of students meeting and exceeding a standard

D. Are the well-being levels (as measured using six subscales - e.g., self-acceptance, autonomy, etc.) of incoming JMU freshmen different than the well-being levels of adults? ► Compare averages of different groups (JMU students vs. adults) ► More than one variable (six different subscales)

Similarities JMU Incoming Freshmen seem to be similar to the adult sample (N = 1100) in Positive Relations with Others and Personal Growth. Differences JMU incoming freshmen have significantly lower Autonomy and Environmental Mastery well-being compared to the adult sample and significantly higher Self- Acceptance and Purpose in Life. While the practical significance of the differences for Self-Acceptance and Purpose in Life are considered small (d=.14 and d=.25), the differences for Autonomy (d=.50) and Environmental Mastery (d=.35) are considered medium and small to medium, respectively.

► Comparing Means Across Different Occasions for Different Groups E. Are students scoring higher on the Health and Wellness Questionnaire as sophomores compared to when they were freshmen? Does the difference depend on whether or not they have completed their wellness course requirement?

“Non-Completers” N = 21 “Completers” N = 283

F. Are the writing portfolios collected in the fall semester yielding higher ratings than writing portfolios collected in the spring semester? Are the differences between the semesters the same across three academic years? ► Compare averages of different groups ► Six different groups (fall and spring for each academic year)

In the and academic years, fall portfolios were rated slightly higher than spring portfolios. In the most current academic year, the fall and spring portfolio averages were about the same. There doesn’t seem to be overwhelming evidence that the difference between fall and spring portfolios is of importance.

G. Are students who obtained transfer or AP credit for their general education sociocultural domain course scoring differently on the 27-item Sociocultural Domain Assessment (SDA) than students who completed their courses at JMU? JMU students: N = 369, M = 18.63, SD = 3.83JMU students: N = 369, M = 18.63, SD = 3.83 AP/transfer students: N = 29, M = 18.55, SD = 3.68AP/transfer students: N = 29, M = 18.55, SD = 3.68 Difference was not statistically, t(335)=.11, p =.92, nor practically significant (d =.02).Difference was not statistically, t(335)=.11, p =.92, nor practically significant (d =.02). ► Compare averages of different groups

G. What is the relationship between a student’s general education sociocultural domain course grade and their score on the 27-item Sociocultural Domain Assessment (SDA)? ► Relationship between two variables, finally! r =.31 r =.23

Inferential Statistics ► “How likely is it to have found results such as mine in a population where the null hypothesis is true?” ► Comparing Averages of Different Groups  Independent Samples T-test ► Null  Groups do not differ in population means ► Comparing Averages Across Different Occasions  Paired Samples T-test ► Null  Occasions do not differ in population means ► Correlation ► Null  No relationship between variables in the population Typically, want to reject the null: p-value <.05

Effect Sizes and Confidence Intervals ► Statistical significance is a function of both the magnitude of the effect (e.g., difference between means) and sample size ► Supplement with confidence intervals and effect sizes  SPSS provides you with confidence intervals  Can use Wilson’s Effect Size Calculator to obtain effect sizes

Wellness Domain Example Goals & Objectives Students take one of two courses to fulfill this requirement, either GHTH 100 or GKIN 100.

Knowledge of Health and Wellness (KWH) Test Specification Table

Data Management Plan Wellness_Data.sav (N = 105) Missing data indicated for all variables by "."

Item Analysis Item 1

Item Difficulty ► The proportion of people who answered the item correctly (p) Used with dichotomously scored items  Correct Answer - score=1  Incorrect Answer - score=0 ► Item difficulty a.k.a. p-value ► Dichotomous items  Mean=p  Variance=pq, where q = 1-p

Mean Std Dev Cases 1. ITEM ITEM ITEM ITEM ITEM ITEM SPSS output for 1 st 6 items of 35 item GKIN100 Test3 Spring % of the sample obtained the correct response to Item 6. The difficulty or p-value of Item 6 is.58 Mean is item difficulty (p) Std Dev is a measure f the variability in the item scores Sample size on which analysis is based

Easiest & Hardest Items ► 25.Causes of mortality today are: A.the same as in the early 20th century. B.mostly related to lifestyle factors. C.mostly due to fewer vaccinations. D.a result of contaminated water. ► 34.Which of the following is a healthy lifestyle that influences wellness? A.brushing your teeth B.physical fitness C.access to health care D.obesogenic environment p =.99 EASIEST p =.14 HARDEST

Item Difficulty Guidelines ► High p-values, item is easy; low p-values, item is hard ► If p-value=1.0 (or 0), everyone answering question correctly (or incorrectly) and there will be no variability in item scores ► If p-value too low, item is too difficult, need revision or perhaps test is too long ► Good to have a mixture of difficulty in items on test ► Once know difficulty of items, usually sort them from easiest to hardest on test

Item Discrimination ► Correlation between item score and total score on test ► Since dealing with dichotomous items, this correlation is usually either a biserial or point- biserial correlation ► Can range in value from -1 to 1 ► Positive values closer to 1 are desirable

Item Discrimination Guidelines ► Item discrimination: can the item separate the men from the boys (women from the girls)  Can the item differentiate between low or high scorers? ► Want high item discrimination! ► Consider dropping or revising items with discriminations lower than.30 ► Can be negative, if so – check scoring key and if the key is correct, may want to drop or revise item ► a.k.a. r pbis or Corrected Item-Total Correlation

0 1 Scatterplot of relationship between item 2 score (0 or 1) and total score r pbis =.52 Scatterplot of relationship between item 17 score (0 or 1) and total score r pbis = If I know you item score, I have a pretty good idea as to what your ability level or total score is. If I know you item score, I DO NOT have a pretty good idea as to what your ability level or total score is.

SPSS output for 1 st 6 items of 35 item GKIN100 Test3 Spring 2005 Corrected Item-Total Correlation is Item Discrimination (r pbis ) Why is it called corrected item-total correlation? The corrected implies that the total is NOT the sum of all item scores, but the sum of item scores WIHTOUT including the item in question.

A = 1 B = 2 C = 3 D = 4 9 = Missing Percentage of sample choosing each alternative. Average total test score for students who chose each alternative. Notice how the highest average total test score (M = 27.65) is associated with the correct alternative (B). All other means are quite a bit lower. This indicates that the item is functioning well and will discriminate. This information is for item 2, where the item difficulty and discrimination were: p =.95, r pbis =.52

A = 1 B = 2 C = 3 D = 4 9 = Missing Percentage of sample choosing each alternative. Average total test score for students who chose each alternative. Notice how the highest average total test score (M = 27.91) is associated with the correct alternative (C). Unlike item 2, with this item all other means are fairly close to This indicates that the item does not discriminate as well as item 2. This information is for item 17, where the item difficulty and discrimination were: p =.697, r pbis =.18

► Did this mainly for those items that were difficult (p <.50) or had low discrimination (r pbis <.20) ► Took information from SPSS distractor analysis output and put it in the following graph. 4.The DSHEA of 1994 has: A.labeled certain drugs illegal based on their active ingredient. B.caused health food companies to lose significant business. C.made it easier for fraudulent products to stay on the market. D.caused an increase in the cost of many dietary supplements.

Hard item - but pattern of means indicates it is not problematic. 31.aging relates to lifestyle. A.Time-dependent B.Acquired C.Physical D.Mental

This item may be problematic - students choosing "C" scoring almost as high on the test overall as those choosing "B". 10.Chris MUST get a beer during the commercials each time he watches the NFL. Which stage of addiction does this demonstrate? a) Exposure b) Compulsion c) Loss of control d) This is not an example of addiction.

Other Information from SPSS ► Descriptive Statistics for total score. N of Statistics for Mean Variance Std Dev Variables SCALE Average # of points by which total scores are varying from the mean Average total score # items on the test ► An measure of the internal consistency reliability for your test called coefficient alpha. Alpha =.7779 Ranges from 0  1 with higher values indicating higher reliability. Want it to be >.60

Test Score Reliability ► Reliability defined: extent or degree to which a scale/test consistently measures a person ► Need a test/scale to be reliable in order to trust the test scores! If I administered a test to you today, wiped out your memory, administered it again to you tomorrow – you should receive the same score on both administrations! ► How much would you trust a bathroom scale if you consecutively weighed yourself 4 times and obtained weights of 145, 149, 142, 150?

Internal Consistency Reliability ► Internal consistency reliability: extent to which items on a test are highly intercorrelated ► SPSS reports Cronbach’s coefficient alpha ► Alpha may be low if:  Test is short  Items are measuring very different things (several different content areas or dimensions)  Low variability in your total scores or small range of ability in the sample you are testing  Test only contains either very easy items or very hard items