Module 5: Basic Concepts of Measurement. Module 5 focuses on concepts and terminology that will be helpful as you administer and interpret tests and other.

Slides:



Advertisements
Similar presentations
Ed-D 420 Inclusion of Exceptional Learners. CAT time Learner-Centered - Learner-centered techniques focus on strategies and approaches to improve learning.
Advertisements

Assessment in Early Childhood Education Fifth Edition Sue C. Wortham
Wortham: Chapter 2 Assessing young children Why are infants and Preschoolers measured differently than older children and adults? How does the demand for.
The Research Consumer Evaluates Measurement Reliability and Validity
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
VALIDITY AND RELIABILITY
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.
Please check, just in case…. Announcements 1.Terminology Treasure Hunt due in two weeks (Oct 29). Please check the resources provided in the folder on.
Standardized Tests What They Measure How They Measure.
Chapter Fifteen Understanding and Using Standardized Tests.
Calculating & Reporting Healthcare Statistics
Beginning the Research Design
Reliability and Validity
© 2008 McGraw-Hill Higher Education. All rights reserved. CHAPTER 16 Classroom Assessment.
Copyright 2001 by Allyn and Bacon Standardized Testing Chapter 14.
FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
INTELLIGENCE AND PSYCHOLOGICAL TESTING. KEY CONCEPTS IN PSYCHOLOGICAL TESTING Psychological test: a standardized measure of a sample of a person’s behavior.
Classroom Assessment A Practical Guide for Educators by Craig A
Standardized Test Scores Common Representations for Parents and Students.
Classroom Assessment A Practical Guide for Educators by Craig A
What is Intelligence? Definition: 3 main characteristics 1) 2) 3)
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Measurement and Data Quality
But What Does It All Mean? Key Concepts for Getting the Most Out of Your Assessments Emily Moiduddin.
Standardized Tests. Standardized tests are commercially published tests most often constructed by experts in the field. They are developed in a very precise.
Standardized Testing (1) EDU 330: Educational Psychology Daniel Moos.
Instrumentation.
Foundations of Educational Measurement
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Technical Adequacy Session One Part Three.
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
Chapter 3 Understanding Test Scores Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition.
Chapter 1: Research Methods
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Diagnostics Mathematics Assessments: Main Ideas  Now typically assess the knowledge and skill on the subsets of the 10 standards specified by the National.
Chapter 4: Test administration. z scores Standard score expressed in terms of standard deviation units which indicates distance raw score is from mean.
Review of Basic Tests & Measurement Concepts Kelly A. Powell-Smith, Ph.D.
Reliability & Validity
Validity Is the Test Appropriate, Useful, and Meaningful?
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.
EDU 8603 Day 6. What do the following numbers mean?
Session 7 Standardized Assessment. Standardized Tests Assess students’ under uniform conditions: a) Structured directions for administration b) Procedures.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Selecting a Sample. Sampling Select participants for study Select participants for study Must represent a larger group Must represent a larger group Picked.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 5: Introduction to Norm- Referenced.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:
Psychometrics. Goals of statistics Describe what is happening now –DESCRIPTIVE STATISTICS Determine what is probably happening or what might happen in.
Technical Adequacy of Tests Dr. Julie Esparza Brown SPED 512: Diagnostic Assessment.
Chapter 6 - Standardized Measurement and Assessment
The Normal Distribution and Norm-Referenced Testing Norm-referenced tests compare students with their age or grade peers. Scores on these tests are compared.
Chapter 3 Selection of Assessment Tools. Council of Exceptional Children’s Professional Standards All special educators should possess a common core of.
Characteristics of Psychology Tests
Assessment Assessment is the collection, recording and analysis of data about students as they work over a period of time. This should include, teacher,
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Interpreting Test Results using the Normal Distribution Dr. Amanda Hilsmier.
Concept of Test Validity
Reliability & Validity
Ch. 4: Test Scores and How to Use Them
Understanding and Using Standardized Tests
Chapter 8 VALIDITY AND RELIABILITY
Chapter 3: How Standardized Test….
Presentation transcript:

Module 5: Basic Concepts of Measurement

Module 5 focuses on concepts and terminology that will be helpful as you administer and interpret tests and other standardized measures. Understanding these concepts and terms will assist teachers and childcare workers in clearly and accurately communicating information gathered from assessment to parents and other early childhood education stakeholders. By the conclusion of Module 5, you will understand basic test and measurement concepts as means for interpreting test results. Reading for Module 5: Chapter 4- Using Basic Concepts of Measurement

Importance of Measurement Concepts Teachers administer, collect, organize, interpret, evaluate and report assessment data. Familiarity with measurement terms and concepts facilitates teachers’ ability to understand how to interpret and evaluate test results. Understanding measurement concepts also helps teachers make use of data collected from non-test assessment methods (i.e., inventories, authentic assessments, etc.).

Measurement Terminology – Raw Scores The raw score is the number of items a child answers correctly on a test. Raw scores can be obtained on both teacher-made and standardized tests. Raw scores provide limited information—they do not allow for comparison between children’s performances. Raw scores must be converted into a form that allows for comparison.

Measurement Terminology - Mean The arithmetic average or mean is one way to convert raw scores into measures that allow for comparison among children’s performances. The mean is equal to the sum of all scores divided by the number of scores. Example: Set of raw scores: 60, 62, 65, 67, 72 Mean = = 320/5 = 64 You can now interpret a child’s performance relative to the average performance of the group.

Measurement Terminology - Range It is helpful to know the range of scores on a test. The range gives you an idea of the spread of scores. The range is the difference between the between the highest and lowest score. The range is calculated by subtracting the lowest score from the highest score. Example: Set of raw scores: 60, 62, 65, 67, 72 High score – low score = Range 72 – 60 = 12 The range is also helpful in determining additional information about scores from tests given to children.

Measurement Terminology - Range What is the mean of the set of raw scores below? What is the range of the scores? Raw Scores: 80, 72 84, 95, 63, 62, 88, 74, 78, 64 Mean = Range = Would you prefer to teach in a situation where there was a wide range or narrow range of scores?

Measurement Terminology - Range The mean for the raw scores on the previous slide is 76. The range is 33! This range suggests there is much diversity in performance among children in this class. Planning for groups where a wide range in performance exists requires a diversity of materials and exercises in order to address the learning needs of all children. More information however can be provided about scores from tests by the standard deviation.

Measurement Terminology- Standard Deviation The standard deviation is a measure of the distance of scores from the mean ( mean = ). The normal curve represents a hypothetical distribution of scores if the test was taken by every child of the same age or grade in the population for which the test was designed.

Measurement Terminology- Standard Deviation As you can see from the normal curve, most students, approximately 68 %, score closest to the mean — 34% score below the mean and 34 % score above the mean. These scores are + 1 standard deviation from the mean. The fewest number of students, approximately 4%, score the farthest away from the mean— 2% above the mean and 2% below the mean. These scores are +3 standard deviations from the mean. The normal curve presents the typical spread of scores among the children in your class. The standard deviation presents a clearer picture of how scores compare.

Standardized Tests Standardized tests are increasingly being administered in early childhood educational settings. Standardized tests: Eliminate bias in the assessment of individual children; Allow for comparison among groups; Are based on knowledge and skills embedded in state and national standards.

Standardized Tests Norm-referenced tests Norm-referenced tests compare the performance of individual students to the performance of other children who take the same test. Criterion-referenced tests compare a child’s performance to their progress along identified skills or behaviors. Criteria are established and teachers must assess or observed the child’s performance of specific criterion. Standardized tests are administered, scored, and interpreted in a predetermined way. There are two types of standardized tests.

Standardized Tests Standardized test developers are guided by test development standards developed by the American Educational Research Association (AERA), American Psychological Association (APA), and the National Council on Measurement in Education (NCME). As a result, in the process of developing standardized tests, all test developers: Determine the rationale and purpose for the test; Explain what the test will measure; Determine who will be tested; Work toward absence of bias (assure that the test is not offensive or unfair to certain groups of children); Explain how the test results will be used.

Standardized Tests When using standardized tests, you should make sure you Match the test to the question(s) you want answered; Use the test for the purpose for which it was designed; Choose a test that is valid and reliable; Follow the directions for administering the test exactly as they are outlined in the test manual; Understand the report and statistics generated by the test.

Normative Samples Understanding the normative sample used by developers to standardize a test helps you determine if your children were included in standardization process. Developers use a norming sample in the standardization process. Samples are taken from populations. A population is an entire group of individuals having at least one characteristic in common. Tests are standardized for a particular group of individuals (e.g., kindergarteners, preschoolers, etc.).

Normative Samples Since developers can’t give a test to all members of a population during the development phase, they administer a test to a representative sample. PopulationSample

Normative Samples Characteristics of the population must be represented in the same proportion in the sample as they are in the population. For example: A sample of kindergarteners might have representation as follows: Hispanic American 14% Asian American 5% African American 12% White 69%

Norming All standardized tests are subjected to norming, which is the process used to determine how most children from the population for which the test was designed will score. Select normative sample. Administer the test to all members of the sample. Determine how children from the sample score on the test.

Norms Norms are the scores obtained from testing the normative sample. Norms are influenced by: The representativeness of the normative sample The number of individuals in the sample

Test Scores Test scores provide a snapshot or a sample of children’s behavior on the day the test was given. Test scores allow teachers to evaluate the difference between behavior reflected by the test score, and expected behavior, given the child’s age and grade level. This information is very important as teachers plan appropriate learning activities for the child..

Derived Test Scores Derived test scores are performance scores obtained when raw test scores of the individual child are compared to scores generated from the norming sample. Derived test scores are located in test manuals for specific tests and may take the form of developmental scores, percentiles or standard scores.

Developmental Scores Compares a child’s performance to that expected of the average child of that age. A child who scores 4-3 has the performance level of the average four year and three month old. Age Equivalent Scores Compares a child’s performance to that expected for his/her grade level. A first grader who scores 1.7 on a reading test performed at the level expected of the average child in the seventh month of first grade. Grade Equivalent Scores

Percentile Ranks Percentile ranks are derived scores that show the percentage of children who fall above and below a given raw score. Percentile ranks are based on the percent of people who scored the same number of correct answers and not the percentage of correct answers. A child at the 75 th percentile scores as well or better than 75% of the children in the norming sample.

Standard Scores Standard scores are derived scores that have been changed so that the means and standard deviations have predetermined values. Deviation IQ is an example of a standard score. The mean has been established as 100 The standard deviation is set at + 15

Standard Scores Normal Curve Equivalents The normal curve has been divided into 100 equal intervals with a mean of 50. The standard deviation has been established as + 21.

Standard Scores Stanines ( standard nines) Distributions are divided into nine parts The middle or fifth stanine is + 25 points below/above the mean. The second, third and fourth stanines are.5 standard deviations below the mean and the sixth, seventh, and eighth stanines are.5 standard deviations above the mean. The first stanine is 1.75 standard deviations below the mean and the ninth stanine is 1.75 deviations above the mean. The first, second and third stanines are generally below average; the fourth, fifth and sixth stanines average, and the seventh, eighth and ninth stanines above average. Stanines are the least precise of the standard scores discussed.

Standard Scores Stanines and deviation quotients are routinely used by early childhood educators. Parents may be able to understand stanines more easily than the other standard scores discussed, because performance is represented by a single digit.

Reliability and Validity Reliability Reliability refers to the consistency, dependability, and stability of a test. Validity Validity refers to the extent to which a test measures what it is suppose to measure.

Reliability Determining Reliability Test-Retest Reliability The same test is given to the same group after a period of time elapses. Students should have similar results the second time the test is taken. Alternate Reliability Two separate tests addressing the same material are given to the same group. Students should have similar scores on both tests. Split-half Reliability The same test is administered to the same group, only split in half -the first half of test takers' responses are correlated with their responses on the second half of the test. This process serves as a measure of internal consistency. Inter rater Reliability The extent to which two different testers obtain the same results when using the same test. Test results should not be influenced by who administers the test.

Reliability and Correlation Coefficients Reliability is reported as correlation coefficients in test manuals. A correlation is the term used to describe the relationship between two or more variables. The correlation coefficient is the number that tells us the degree of correlation between variables. Two items may have a strong positive relationship, a strong negative relationship, or no relationship at all. Most correlation coefficients reported in test manuals are positive. The higher the correlation coefficient, the greater the reliability.

Measures of Relationship Correlation Coefficient 1.00 = perfect correlation This means that the increase in one variable is associated with the increase in the other variable. Music Participation Memory = perfect negative correlation This means that the increase in one variable is associated with the decrease in the other variable. Watching TV Creativity

Measures of Relationship Correlation Coefficient 0. 0 = no correlation, meaning that one variable is not associated with the other Low Correlation Low Correlation Moderate Correlation Moderate Correlation High correlation High correlation.80 – 1.00 Very high – perfect correlation.80 – 1.00 Very high – perfect correlation Negligible to Low Correlation Negligible to Low Correlation Interpretation of Correlation Coefficients

Factors Affecting Reliability Standard Error of Measure (SEM) - determines the extent to which a score is due to chance. The larger the SEM, the less reliable the test. The longer the test—the more reliable it tends to be. Tests with age ranges should have sufficient test items for each age level. If not, the test may be less reliable for some age groups within the age range of the test. The shorter the time interval between two administrations of the same test, the higher will be the reliability coefficient. The larger the norming sample, the more reliable the test. The wider the range of scores, the more reliably the test is at distinguishing the spread of scores—which makes for a more reliable test.

Validity Types of Validity Face ValidityA test looks as though it tests what it is suppose to test. Content ValidityA test tests the subject matter/content it is suppose to test. Criterion-related Validity A relationship exists between scores on a teas and another criterion measure that is valid. Concurrent ValidityWhen two test are taken at the same time, the validity of one test can be established by the other test if test scores relate to each other, and the validity of at least one of the tests has already been established. Predictive ValidityThe extent to which a test score can estimate performance on a future test or criterion. Construct ValidityThe extent to which a test measures a theoretical construct—such as intelligence. Convergent ValiditySimilar tests measure similar constructs with comparable results. Discriminate ValidityThe extent to which a measure does not correlate with other constructs from which it's supposed to differ Social ValidityRefers to the usefulness of an assessment information for teachers in educational settings.

Factors Affecting Validity A valid test must be a reliable test. Tests may not be valid for children who are distractible, fail to understand test instructions or who are uncooperative. Issues affecting the child being tested (e.g., anxiety, motivation, degree of bilingualism) may affect the validity of the test.

Evaluating Tests Tests should be evaluated for technical adequacy (See Boxes 4.4, 4.5, & 4.6 –pgs ) and appropriateness. Sources to check: NAEYC position on Early Childhood Curriculum, Assessment, and Program Evaluation NAYEC Code of Ethical Conduct and Statement of Commitment ICLD Clinical Practices Guide Mental Measurements Handbook Appendix C for information on specific standardized and diagnostic tests

What Next?  Review Section V of the Early Childhood Assessment Study Guide. Can you explain each of the concepts and terms listed?  Reflect on the value of standardized tests. What is the primary value of standardized tests in a comprehensive assessment system?  Connect with a parent of an early learner. What has been their experience with standardized tests? Did they find the test information useful?  Connect with an early childhood educator. How have they used standardized test information for the children they teach?