Educational Assessment

Slides:



Advertisements
Similar presentations
Conceptualization and Measurement
Advertisements

The Research Consumer Evaluates Measurement Reliability and Validity
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
VALIDITY AND RELIABILITY
Validity and Reliability
Research Methodology Lecture No : 11 (Goodness Of Measures)
Part II Sigma Freud & Descriptive Statistics
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.
Part II Sigma Freud & Descriptive Statistics
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
CH. 9 MEASUREMENT: SCALING, RELIABILITY, VALIDITY
Reliability and Validity of Research Instruments
RESEARCH METHODS Lecture 18
VALIDITY.
Concept of Measurement
Developing a Hiring System Reliability of Measurement.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 8 Using Survey Research.
Lecture 7 Psyc 300A. Measurement Operational definitions should accurately reflect underlying variables and constructs When scores are influenced by other.
Measurement: Reliability and Validity For a measure to be useful, it must be both reliable and valid Reliable = consistent in producing the same results.
SELECTION & ASSESSMENT SESSION THREE: MEASURING THE EFFECTIVENESS OF SELECTION METHODS.
Research Methods in MIS
Classroom Assessment A Practical Guide for Educators by Craig A
Classroom Assessment Reliability. Classroom Assessment Reliability Reliability = Assessment Consistency. –Consistency within teachers across students.
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Measurement and Data Quality
Validity and Reliability
Instrumentation.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Technical Adequacy Session One Part Three.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Principles of Test Construction
The Basics of Experimentation Ch7 – Reliability and Validity.
Validity and Reliability THESIS. Validity u Construct Validity u Content Validity u Criterion-related Validity u Face Validity.
Reliability & Validity
Validity Is the Test Appropriate, Useful, and Meaningful?
EDU 8603 Day 6. What do the following numbers mean?
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Measurement Validity.
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
RELIABILITY AND VALIDITY OF ASSESSMENT
Evaluating Survey Items and Scales Bonnie L. Halpern-Felsher, Ph.D. Professor University of California, San Francisco.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Validity Validity is an overall evaluation that supports the intended interpretations, use, in consequences of the obtained scores. (McMillan 17)
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Reliability and Validity Themes in Psychology. Reliability Reliability of measurement instrument: the extent to which it gives consistent measurements.
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
Chapter 6 - Standardized Measurement and Assessment
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Characteristics of Psychology Tests
Validity & Reliability. OBJECTIVES Define validity and reliability Understand the purpose for needing valid and reliable measures Know the most utilized.
Reliability EDUC 307. Reliability  How consistent is our measurement?  the reliability of assessments tells the consistency of observations.  Two or.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Measurement and Scaling Concepts
VALIDITY What is validity? What are the types of validity? How do you assess validity? How do you improve validity?
VALIDITY by Barli Tambunan/
Concept of Test Validity
Reliability & Validity
Human Resource Management By Dr. Debashish Sengupta
پرسشنامه کارگاه.
Presentation transcript:

Educational Assessment Assessment Issues and Program Evaluation Procedures

Outcomes Assessment a.k.a How Do I Know If I’m Doing What I Think I’m Doing? 1st: Identify what you are trying to do. This may include general outcomes and specific outcomes. For example: Increase the number of women entering the fields of math and engineering (general) Improve high school girls attitudes about math and engineering (specific) 2nd: Identify ways to accurately assess whether these outcomes are occurring. 3rd: Establish a procedure for program evaluation

Identify What You Are Trying To Do Some examples: Change attitudes about math and engineering Increase girls’ sense of self-efficacy in math and engineering Improve motivation to engage in math and engineering Increase skills in math and engineering Increase the number of girls who go on to major in math and engineering from your high school Increase the number of women who graduate from college with math and engineering majors Some of these are assessments of attitude, some are assessments of skills and some are assessments of behavior. Because long-term outcome assessment is often difficult, we’d like to be able to assess attitudes that should theoretically predict those long term changes of behavior. It is especially good if we have some empirical knowledge about such a relationship: for example, we know that a sense of self-efficacy in reading is related to the development of future reading skills. Don’t know how much empirical evidence we have for math/engineering, so long-term follow-up would still be really useful. For now, I’m going to talk in more detail about how to accurately assess attitudes and motivation, which is typically (and most easily) done with questionnaires

Critical Issues for Assessment Tools Reliability Consistency of test scores The extent to which performance is not affected by measurement error Validity The extent to which a test actually measures what it is supposed to measure Reliability - use scale example Sometimes these are scales that are already shown to be reliable and valid - great to use these when you can Sometimes you must make up your own scale, and then it will be important to evaluate whether it is reliable and valid. Also, a scale that is reliable and valid for one purpose may not be for another purpose, So good to always check in your own data

Types of Reliability Test-Retest Correlation of two tests taken on separate occasions by the same individual Limits: Practice effects, recall of former responses Alternate Form Correlation of scores obtained on two parallel forms Limits: May have practice effects, alternate forms often not available Ist two probably won’t use, but should know about

Types of Reliability Split-half Correlation between two halves of a test Limits: Shortens test, which affects reliability, difficult with tests that measure different things in the same test (heterogeneous tests) Kuder-Richardson and Coefficient Alpha Inter-item consistency: Average correlation of each item with every other item Limits: Not useful for hetergeneous tests These are better, because they require only one administration. If you plan to publish the results of your program evaulation, you should be sure to check your measure using one of these techniques K-R for yes-no responses Alpha for continuous scale responses

Types of Validity Content Validity Checking to make sure that you’ve picked questions that cover the areas you want to cover, thoroughly and well. Difficulties: “Adequate sampling of the item universe.” Important to ensure that all major aspects are covered by the test items and in the correct proportions Specific Procedures: Content validity is built into the test from the onset through the choice of appropriate items.

Types of Validity Concurrent and Predictive Validity Definition: The relationship between a test and some criteria. The practical validity of a test for a specific purpose. Examples: Do high school girls who score high on this test go on to succeed in college as engineering majors? (P) Do successful women engineering majors score high on this test? (C) Difficulties: Criterion contamination; trainers must not know examinees’ test scores Specific Procedures: infinite, based on purpose of the test

Types of Validity Construct Validity Definition: the extent to which the test may be said to measure a theoretical construct or trait Any data throwing light on the nature of the trait and the conditions affecting its development and manifestations represent appropriate evidence for this validation Example: I have designed a program to lower girls’ math phobia. The girls who complete my program should have lower scores on the Math Phobia Measure compared to their scores before the program and compared to the scores of girls who have not completed the program

Optimizing Reliability & Validity Here are some tips for making sure your test will be reliable and valid for your purpose (circumstances that affect reliability and validity): The more questions the better (the number of test items) Ask questions several times in slightly different ways (homogeneity) Get as many people as you can in your program (N) Get different kinds of people in your program (sample heterogeneity) (Linear relationship between the test and the criterion)

Selecting and Creating Measures 1. Define the construct(s) that you want to measure clearly 2. Identify existing measures, particularly those with established reliability and validity 3. Determine whether those measures will work for your purpose and identify any areas where you may need to create a new measure or add new questions 4. Create additional questions/measures 5. Identify criteria that your measure should correlate with or predict, and develop procedures for assessing those criteria

Measuring Outcomes Pre and post tests Involves giving measure before intervention/training and then following the intervention in order to measure change as a result of the intervention Important to identify what you are trying to change with your intervention (the constructs) in order to use measures that will pick up that change Be sure to avoid criterion contamination Limitations: If your group is preselected for the program, the variability will be restricted

Measuring Outcome Follow-up Procedures These may involve re-administering your pre/post measure again after some interval following the end of the program or any other criterion that should theoretically be predicted by your intervention, such as: choosing to take math/engineering courses choosing to major in math/engineering choosing a career in math/engineering

Measuring Outcome Control Groups One critical problem faced by anyone who conducts an intervention is whether any observed change are related to the intervention or to some other factor (e.g., time, preselection, etc). The only way to be sure that your intervention is causing the desired changes is to use a control group. The control group must be the same as the treatment group in every way (usually by random assignment to groups), except the control group does not receive the intervention. Any differences between these groups can then be attributed to the intervention. How do you know whether the girls who chose to attend your program would not have gone on to major in math/engineering anyway?

Measuring Outcome Alternatives to randomly assigned control groups: Matched controls Comparison groups Comparison across programs Remember, you’ll need to use the same assessment and follow-up procedures for both groups

Comparing Across Programs In order to compare successfully across programs, you will also need to assess: Program characteristics Participant characteristics So you will need to also ask yourselves: What are the important aspects of the programs that I should know about? What are the important characteristics of the girls that I should know about? Probably the most likely procedure most of you will use is comparison across programs. And this is a big part of why we all came together for this workship. So I want to talk a bit about how to do this.

An Ideal Outcome Assessment Treatment group All participants Participants receives All participants All participants fill out initial randomly intervention fill out post- are followed questionnaires assigned to questionnaires through college conditions Control group and to first job receives no intervention

A More Realistic Outcome Assessment? Girls involved Each program Programs in each program Girls Girls fill reports data & conducting fill out pre-tests participate out post- program charac- follow-ups and client in programs questionnaires teristics report follow- characteristics up data