Technical Adequacy Session One Part Three.

Slides:

Advertisements

Similar presentations

Advertisements

The Research Consumer Evaluates Measurement Reliability and Validity

Taking Stock Of Measurement. Basics Of Measurement Measurement: Assignment of number to objects or events according to specific rules. Conceptual variables:

1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.

© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.

Increasing your confidence that you really found what you think you found. Reliability and Validity.

Psychometrics William P. Wattles, Ph.D. Francis Marion University.

Chapter 4 – Reliability Observed Scores and True Scores Error

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.

VALIDITY AND RELIABILITY

Part II Sigma Freud & Descriptive Statistics

Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.

What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.

Part II Sigma Freud & Descriptive Statistics

Reliability and Validity of Research Instruments

RESEARCH METHODS Lecture 18

Concept of Measurement

Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.

Classroom Assessment A Practical Guide for Educators by Craig A

Standardized Test Scores Common Representations for Parents and Students.

Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.

Measurement and Data Quality

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.

LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.

Psychometrics William P. Wattles, Ph.D. Francis Marion University.

Foundations of Recruitment and Selection I: Reliability and Validity

Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs.

Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.

Reliability & Validity

Validity Is the Test Appropriate, Useful, and Meaningful?

1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.

6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)

Measurement Validity.

Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.

Selecting a Sample. Sampling Select participants for study Select participants for study Must represent a larger group Must represent a larger group Picked.

Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.

Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.

Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.

Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.

Measurement Issues General steps –Determine concept –Decide best way to measure –What indicators are available –Select intermediate, alternate or indirect.

SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.

Measurement MANA 4328 Dr. Jeanne Michalski

1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.

Technical Adequacy of Tests Dr. Julie Esparza Brown SPED 512: Diagnostic Assessment.

Chapter 6 - Standardized Measurement and Assessment

Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.

Validity & Reliability. OBJECTIVES Define validity and reliability Understand the purpose for needing valid and reliable measures Know the most utilized.

Measurement Chapter 6. Measuring Variables Measurement Classifying units of analysis by categories to represent variable concepts.

Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.

Data Collection Methods NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN.

1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.

ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.

Ch. 5 Measurement Concepts.

Reliability and Validity

Reliability and Validity in Research

Concept of Test Validity

Evaluation of measuring tools: validity

Tests and Measurements: Reliability

Reliability & Validity

Week 3 Class Discussion.

پرسشنامه کارگاه.

Reliability and Validity of Measurement

Scoring: Measures of Central Tendency

PSY 614 Instructor: Emily Bullock, Ph.D.

RESEARCH METHODS Lecture 18

Chapter 4 Characteristics of a Good Test

MANA 5341 Dr. George Benson Measurement MANA 5341 Dr. George Benson 1.

Presentation transcript:

Technical Adequacy Session One Part Three

Reliability We all have friends, some are reliable and some are not With your partner, discuss what a reliable friend is, List three qualities you would use?

Reliability In Laymen’s term, reliability is being able to depend that the results are accurate for that test. If you did it again, would you get the same score? There are many factors that affect reliability

Error in measurement Two types of error in measurement Systematic-Bias Random

Bias Generally bias refers to raising a persons score because they were advantaged in some way However, the groups that was not advantaged, was affected negatively by the bias Boys score better on multiple choice questions than girls, the boys were advantaged, the girls were disadvantaged

Random error Random error – is very different It is hard to predict who it is affecting, hard to predict by how much, Hard to predict by what magnitude Reliable test try and eliminate most types of error

Reliability Coefficient We can measure how reliable tests are by the reliability coefficient A test free from error- has perfect 1.0 A test filled with error –has a 0 Since every test has error then a reliability around .85 or above

Types of reliability Item reliability Stability Inter-rater reliability or interobserver agreement

Item reliability Item reliability affects the prediction of understanding of the knowledge in several ways Imagine a study trying to predict how the population of a country or state will vote in the next election The prediction is only as good as the sample it selects, if it select from one area, it will not be representative of the population This same concepts applies to developing a test Test developers cannot possible select all the items they need to test, the more accurate the representative is of the total knowledge, the more reliable the test

Item reliability Your goal is for the student performance on the sample items would be the same as if he/she took all of the items ( if that were a possibility) The goal of the test is to be able to generalize the students ability to what they know of the entire realm of knowledge in that area When we over estimate their ability, our test is unreliable

Item reliability There are two main approaches to determining item reliability Alternate form reliability- Internal consistency

Item reliability Alternate form reliability- two forms of a test are developed, each from the same knowledge base but each with different questions You then test a large sample with the test Half take one form , half the other They should have similar scores Scores from the test are correlated and form the correlation coefficient

Item reliability Internal consistency There are many ways to test internal consistency On popular way is to develop a test that can be split with a similar level of difficulty Administer the test and see how the students did Say the test was split by first half an second half, grade half of the class on the first half and the other half on the second half and compare scores. Can also do if for specific items

Stability In many cases, we expect out tests to produce information that when tested later, will yield the same results A child tested for colorblindness- should reveal being colorblind later in life since the problem is not curable, if not the test was unreliable because it is unstable

Stability A test should produce similar results I you give a set of students a test and then wait a while, then readminister the test, it should produce similar results The more similar the results, the more stable and the more reliable

Stability Stability is not affected by, interventions. If you test a child and it shows he is weak in a certain area, then you provide and intervention and the child does better on the next test, that is not considered a weakness in stability

Inter-rater reliability Inter observer/inter-rater reliability The concept is simple and easy to understand- It is analogous to a piece of music, a book or a movie, Two people see, read or watch the same thing and have a different opinion Watch the next clip, what do you think?

Inter-rater reliability Now Watch the next clip, an count how many people test the mattress Do people have similar answers

Inter-rater reliability Inter-rater reliability needs to be developed in several places and can be measured in several ways Different raters/observers need to be trained on what to watch, need to have a clear criteria for what is a positive incident of what you are observer If you are looking for out of seat behavior, is it standing, squirming, leaning over, or being two feet from the desk

Inter-rater reliability Inter-rater reliability can be measured in several ways, by comparing two people scores from the same Or by doing an item by item analysis and comparing the difference observation

Standard Error of Measurement

Standard Error of Measurement Imagine you gave a test to a kindergarten student on his letter sound recognition You developed 100 test of ten items After giving the child about ten of these test, the scores would be about the same. Some of the test he would know the sounds, some he would not, but the average would be accurate SEM tries to predict what that error between the test would be if you only gave him one test, remember it could be a test he scored well on, or it could be a test he scored poorly on It is a similar concept to Standard Deviation, but related specifically to error

Estimate of True Scores This is more of a conceptual concept, that a statistical unit Imagine you take a fifty question test and you do not know ten answers questions You guess on them and being a very lucky person, you get 8 right- These eight answers are really not your true score If you are unlucky, you get a lower score

Confidence Intervals Given the fact that true scores are difficult to obtain, the concept of confidence intervals was created. When it is combined with SEM it relays very accurate scores The level of confidence tells us how certain the score is within the range

Confidence Intervals If a child has a score of 90 ± 5 ( SEM) the we are saying the child score is somewhere between 85 and 95. If we say that a child has a score of 90 ± 5 ( SEM) with a 95% confidence level, we are saying that there is only a 5% chance that the child score is somewhere above or below 85 and 95. The lower the confidence, the smaller the range the child score is somewhere between 88 and 92. at a 80% confidence level

Validity This refers to the degree to which the evidence and theory support the interpretation of the test scores by the proposed uses of tests Often test are interpreted for uses they were not designed. Therefore, Validity is a fundamental consideration

Validity The fundamental question that you need to ask, is, Does the testing process lead to the correct inferences about a specific person.

Validity First assume you give an IQ test in English to a non English speaking person You give a test that measures cultural items a that a person was not exposed to You use a test designed for national standards that does not align to a local standards ( social studies)

Validity Content validity- Is the content of the measure representative of the domain of content it is suppose to assess? Experts look at the content and compare it to what they feel it should contain.

Validity Appropriateness of included items- Should the questions be here Do they represent what it is trying to measure ( different than content validity) are the questions from a too high of a grade level, like middle school stuff on an elementary test Is the presentation of the items appropriate, are the questions worded properly?

Validity Content not included- is there important content missing that should be there? How are the items measured Are the multiple choice, Open ended where you must show work

Validity Criterion Reference Validity- references a tests ability to describe a test takers ability in two ways Present- Concurrent Criterion Referenced Validity Future- Predictive Criterion Referenced Validity

Validity Concurrent Criterion Referenced Validity- Is the test/assessment a good predictor of what the students currently know based on the criterion of the knowledge base? If a child takes an achievement test. Is it a valid measure of how well he did in fourth grade?

Validity Predictive Criterion Referenced Validity Does the test have the ability to predict what it say it will predict A reading readiness test- if a students scores high, does he learn to read easily? If a child scores poorly, does he struggle to learn to read?

Validity Construct Validity refers to the extent to which a procedure or test measures a theoretical trait or characteristic construct validity refers to whether a scale measures or correlates with the theorized psychological construct ( such as intelligence) that it purports to measure.