The KOPPITZ-2 A revision of Dr. Elizabeth Koppitz’

Slides:



Advertisements
Similar presentations
Scoring Terminology Used in Assessment in Special Education
Advertisements

Test Development.
Standardized Scales.
Issues of Reliability, Validity and Item Analysis in Classroom Assessment by Professor Stafford A. Griffith Jamaica Teachers Association Education Conference.
Test of Irregular Word Reading Efficiency: TIWRE
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
Merry Christmas and Happy New Year 2007 The Beery- Buktenica Developmental Test of Visual-Motor Integration Present by Asst. Prof. Dr. Nuntanee Satiansukpong.
National Center for Health Statistics DCC CENTERS FOR DISEASE CONTROL AND PREVENTION Changes in Race Differentials: The Impact of the New OMB Standards.
In Today’s Society Education = Testing Scores = Accountability Obviously, Students are held accountable, But also!  Teachers  School districts  States.
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Wide Range Achievement Test WRAT4 Authors: Gary S. Wilkinson, PhD Gary J. Robertson, PhD.
PLS-5 Training.
DETERMINING ELIGIBILITY IN MA EARLY INTERVENTION A General Overview to Scoring.
Appraisal in Counseling Session 2. Schedule Finish History Finish History Statistical Concepts Statistical Concepts Scales of measurement Scales of measurement.
Session 3 Normal Distribution Scores Reliability.
Perceptual-Motor Skills
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
About the tests PATMaths Fourth Edition:
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
By: Allan & Nadeen Kaufman Published by: American Guidance Service.
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
The Learning Behaviors Scale
Foundations of Educational Measurement
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Chapter Eight The Concept of Measurement and Attitude Scales
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs.
Chapter Nine Copyright © 2006 McGraw-Hill/Irwin Sampling: Theory, Designs and Issues in Marketing Research.
Naglieri Nonverbal Ability Test (NNAT) Miami-Dade County Public Schools NNAT Workshop March 26, 28, & 29, 2007.
Chapter 3 Understanding Test Scores Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition.
Evaluation of Age and Sex Distribution Data United Nations Statistics Division.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Bankson-Bernthal Test of Phonology
Chapter 4: Test administration. z scores Standard score expressed in terms of standard deviation units which indicates distance raw score is from mean.
Reliability & Validity
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.
Chapter 2 ~~~~~ Standardized Assessment: Types, Scores, Reporting.
Sub-regional Workshop on Census Data Evaluation, Phnom Penh, Cambodia, November 2011 Evaluation of Age and Sex Distribution United Nations Statistics.
Preschool Language Scale-3 Presented by Sarah Stockton, Andrea Mick, Amy Howard, Manda Clements, Kristina Latta, and Thekla Vei.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Data Collection & Sampling Dr. Guerette. Gathering Data Three ways a researcher collects data: Three ways a researcher collects data: By asking questions.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
CHAPTERS HYPOTHESIS TESTING, AND DETERMINING AND INTERPRETING BETWEEN TWO VARIABLES.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Standardized Testing. Basic Terminology Evaluation: a judgment Measurement: a number Assessment: procedure to gather information.
Chapter 6 - Standardized Measurement and Assessment
Methods for creating indices of child well-being: Examples from the National Survey of America’s Families Sharon Vandivere, Kristin Anderson Moore, Laura.
The Normal Distribution and Norm-Referenced Testing Norm-referenced tests compare students with their age or grade peers. Scores on these tests are compared.
Chapter 3 Selection of Assessment Tools. Council of Exceptional Children’s Professional Standards All special educators should possess a common core of.
San Luis Valley Gifted Education Network Meeting October 17, 2013.
Chapter 2 Norms and Reliability. The essential objective of test standardization is to determine the distribution of raw scores in the norm group so that.
© 2009 Pearson Prentice Hall, Salkind. Chapter 5 Measurement, Reliability and Validity.
Interpreting Test Results using the Normal Distribution Dr. Amanda Hilsmier.
Comparative Analysis of Aggregate Educational Data Between Children in Foster Care and the General Population Florida Department of Children and Families.
Terra Nova By Tammy Stegman, Robyn Ourada, Sandy Perry, & Kim Cotton.
Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.
Assessment Theory and Models Part II
Reliability & Validity
Hypothesis Testing: Hypotheses
Week 3 Class Discussion.
PSY 614 Instructor: Emily Bullock, Ph.D.
Weschler Individual Achievement Test
TABE II: Using TABE® Results to Inform Instruction
Norms.
Chapter 3: How Standardized Test….
Presentation transcript:

The KOPPITZ-2 A revision of Dr. Elizabeth Koppitz’ Bender Developmental Scoring System for Young Children. Now for ages 5 years thru 89 years.

What is the Koppitz-2? The Koppitz-2 is an extensive revision, redevelopment, and extension (up and down) of the original Koppitz (1963, 1975) Bender-Gestalt Test for Young Children, one of the most popular individually administered tests for children of the last century.

How is Koppitz-2 Different from the Original Koppitz? Original age range, 5-10 years. --New = age range 5-89 years • Original, 30 items --New = 34 items ages 5-7, and 45 items for ages 8-89. • Original, local sample of 1,100, all from NY --New = National sample of 3,535

Key Differences in the Revision vs Original Koppitz, cont. Original, only offered perceptual ages. --New = Standard scores, percentile ranks, and age equivalents --Supplementary conversion to T-scores, z- scores, NCEs, and stanines provided. Original, 9 cards used at all ages --New = 13 cards for ages 5-7 and 12 cards for ages 8-89 (16 cards overall, Designs 5-13 used at all ages)

Key Differences in the Revision vs Original Koppitz, cont. Original, scored via error count. --New = scored in a positive direction via an absence of errors Original, items selected on group mean differences only --New = classical true score theory applied to item selection

And, How is it the same? Retains all 9 original Bender cards at all age levels. Retains Koppitz emphasis on the Bender as assessing VMI from a developmental perspective. Retains Koppitz’ conceptual approach to interpretation of BG performance. Retains the unstructured aspects of Bender administration.

And, How is it the same, cont.? Provides data on time to completion. Retains Koppitz’ emotional indicators but adds more scoring guides and provides an EI record form for ease of use.

Original scoring system Koppitz (1963) original developmental scoring system derived from a list of distortions Koppitz observed in the drawings of young children using the original nine Bender (1938) cards. Koppitz then chose items based on their ability to differentiate among 77 children from grades 1 through 4 on the basis of grade placement. These items were then used to devise normative data from the protocols of 1100 children ages 5 through 10 years.

How were new items derived? A group of experienced psychologists was gathered to write developmentally appropriate items for the new designs apparent in the Bender-Gestalt II and its 16 (versus 9 in the original) designs. Additional scoring elements were written for the original Bender designs as well since it was not known how the original Koppitz items would fare using modern techniques of item selection. Once these items were agreed upon (and there were more than 100 for initial analyses), trained, supervised staff scored all protocols according to the newly devised Koppitz Developmental Scoring System.

How were new items derived? Classical test theory was used to guide the item analyses which were performed on all agreed upon scoring elements. Corrected or partial point-biserial correlations between item and total score calculated at every age interval. Using the item means and discrimination indexes (reviewed separately by age level), 34 items were retained for the 5 through 7 year olds and 45 items for the 8 year and older groups.

Koppitz-2 Materials 16 design cards of the Bender-Gestalt-II • Two record forms Ages 5-7 years Ages 8-89 years • Supplemental EI Record Form • Scoring template • Examiner’s Manual

The Koppitz-2 Standardization Sample 3,535 individuals drawn during 2001 and 2002 to represent the US population at large according to the 2000 US Census statistics, stratified on the basis of age, sex, race, ethnicity, geographic region, and SES level (as estimated by educational attainment—of parents for children and the individual for adults).

How successful was the actual sampling? Very! On nearly all variables, the actual sample was within 1 percentage point of the population values. The largest discrepancy occurred on SES where the sample was off by 2.1%, under-sampling those with more than a HS education. In calculation of the norms tables, sample weights were calculated to perfectly mimic the population data.

Summary of Reliability Results The overall reliability of the Koppitz-2 DBGT VMI has been demonstrated to be quite good. Relative to Anastasi and Urbina’s (1997) three sources of test error (content, time, and scorer), the coefficients determined demonstrate very acceptable levels of score reliability. The internal consistency reliability of the VMI is consistently high across demographic classifications as well as various diagnostic groups. The magnitude of these reliability coefficients strongly suggests that the Koppitz DBGT scores generally possess relatively small, acceptable amounts of error and that test users can have confidence in the consistency of Koppitz DBGT results when obtained after carefully following the standardized administration and scoring procedures detailed in the manual.