Testing 101: Quantitative Approaches to Assessment CTE – November 2, 2005 Noelle Griffin, PhD LMU Office of Assessment at Data Analysis
Test Design Generally applies to more quantified approaches to assessment Multiple choice or short answer question More objective (vs. subjective) approaches to scoring than qualitative/performance-based assessment
Approaches to Assessment Through Quantitative Testing Standardized/externally developed tests Locally designed tests
Standardized Tests Examples: ETS content area tests, GREs, FE exam Benefits Statistical properties established Less draw on faculty time Comparison data available Drawbacks $$ Comparability of content Timing
Locally Designed Tests Benefits Content linked directly to Los Local control over scope/focus Adaptable to curricular changes Drawback Lack of outside comparison data No established reliability/validity
Steps in Test Design Identify “constructs” Develop items Pilot Scoring Tracking/benchmarking
Identifying Constructs What are the general areas of knowledge or skill that you will be assessing in the test? Will form “scales” or groups of items Example: Learning Outcome = students will be able to identify and define the primary theories in psychology; Constructs = Psychoanalytic theory, behavioral theory, cognitive theory
Role of Constructs Learning Outcome Construct Multiple Items Construct Multiple Items
Drafting Items For each construct, what are the specific concepts/information points central to that construct? Draft items that address each of these concepts All items addressing a specific construct = scale
Item Drafting “Tips” Avoid “dual pronged” items (asking two questions at once) Avoid confounding vocabulary or jargon with the concept you want to assess (e.g., select the answer that best represents operationalization of the primacy effect) “Multiple response” items Including adequate detractors
Test Quality Pilot testing (trying out the test with a smaller group of students before full-scale implementation) Issues of reliability and validity
Reliability Does the test measure what it purports to measure consistently? Most applicable: internal consistency, or how well the items “hold together” Emergence of “scales” Looking for “outliers” Assessment office as resource
Validity Is the test assessing what you say it is assessing? Conceptually: Face, construct Empirically (if possible): Criterion Issue of student intent: Do students actually address the test with serious effort? Ideas for making test meaningful
Approaches to Scoring/Analysis Overall vs. scale scores Percentage correct Average score Criterion-based/% meeting standard
Comparison Considerations Cross-sectional (comparing two groups at same time) Cohort/time series (comparing different cohorts across time) Consideration of cohort effects Longitudinal (looking at same group over time) Importance of matching pre/post
Additional Resources Test Development (2001). P.W. Miller & H.E. Erickson. Miller & Associates. Introduction to Test Construction in the Social and Behavioral Sciences (2003). J.A. Fishman & T. Galguera. Rowman and Littlefield Publishers.