EVALUATING EPP-CREATED ASSESSMENTS

Slides:

Advertisements

Similar presentations

Analyzing Student Work

Advertisements

Sue Sears Sally Spencer Nancy Burstein OSEP Directors’ Conference 2013

Fairness, Accuracy, & Consistency in Assessment

Victorian Curriculum and Assessment Authority

1 SESSION 3 FORMAL ASSESSMENT TASKS CAT and IT ASSESSMENT TOOLS.

Spiros Papageorgiou University of Michigan

CONNECT WITH CAEP | | Teachers Know Their Content And Teach Effectively: CAEP Standard 1 Stevie Chepko,

1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.

Training Module for Cooperating Teachers and Supervising Faculty

Chapter Fifteen Understanding and Using Standardized Tests.

Susan Malone Mercer University.  “The unit has taken effective steps to eliminate bias in assessments and is working to establish the fairness, accuracy,

Understanding Validity for Teachers

performance INDICATORs performance APPRAISAL RUBRIC

Becoming a Teacher Ninth Edition

BY Karen Liu, Ph. D. Indiana State University August 18,

© 2008 by PACT PACT Scorer Training Pilot.

CONNECT WITH CAEP | | Three-Year-Out Review of Assessments (Pending Accreditation Council and CAEP.

CONNECT WITH CAEP | | CAEP Standard 3: Candidate quality, recruitment and selectivity Jennifer Carinci,

Classroom Assessments Checklists, Rating Scales, and Rubrics

THE DANIELSON FRAMEWORK. LEARNING TARGET I will be be able to identify to others the value of the classroom teacher, the Domains of the Danielson framework.

ASSESSMENT TECHNIQUES THE FOUR PART MODEL Presented by Daya Chetty 20 APRIL 2013.

EdTPA Teacher Performance Assessment. Planning Task Selecting lesson objectives Planning 3-5 days of instruction (lessons, assessments, materials) Alignment.

Chap. 2 Principles of Language Assessment

MISSOURI PERFORMANCE ASSESSMENTS An Overview. Content of the Assessments 2  Pre-Service Teacher Assessments  Entry Level  Exit Level  School Leader.

The Conceptual Framework: What It Is and How It Works Linda Bradley, James Madison University Monica Minor, NCATE April 2008.

Assessment Information from multiple sources that describes a student’s level of achievement Used to make educational decisions about students Gives feedback.

Introduction to... Teacher Evaluation System Teacher Effectiveness 12/6/

Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.

KVEC Presents PGES Observation Calibration Are You On Target?

Why So Much Attention on Rubric Quality? CAEP Standard 5, Component 5.2: The provider’s quality assurance system relies on relevant, verifiable, representative,

Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.

Rubrics, and Validity, and Reliability: Oh My! Pre Conference Session The Committee on Preparation and Professional Accountability AACTE Annual Meeting.

CONNECT WITH CAEP | | Standard 2: Partnership for Practice Stevie Chepko, Sr. VP for Accreditation.

CONNECT WITH CAEP | | Measures of Teacher Impact on P-12 Students Stevie Chepko, Sr. VP for Accreditation.

NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. COMMON.

Designing Quality Assessment and Rubrics

CAEP Standard 4 Program Impact Case Study

Introduction to Teacher Evaluation

Classroom Assessments Checklists, Rating Scales, and Rubrics

Presented by Deborah Eldridge, CAEP Consultant

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.

Assessment Fall Quarter 2007.

Partnership for Practice

Consider Your Audience

Assessments for Monitoring and Improving the Quality of Education

Introduction to the Validation Phase

Phyllis Lynch, PhD Director, Instruction, Assessment and Curriculum

Chapter 6: Checklists, Rating Scales & Rubrics

Elayne Colón and Tom Dana

Creating Analytic Rubrics April 27, 2017

Continuous Improvement through Accreditation AdvancED ESA Accreditation MAISA Conference January 27, 2016.

Tony Kirchner Developing a Quality Assurance Plan with the Teacher Work Sample as the Linchpin Tony Kirchner

Classroom Assessments Checklists, Rating Scales, and Rubrics

Program Review Update Webinar

Educator Effectiveness Regional Workshop: Round 2

(AKA: Meeting CAEP Accreditation Expectations)

Evaluating the Quality of Student Achievement Objectives

Standard Four Program Impact

Rubrics for academic assessment

EDA: Educator Disposition Assessment

Understanding and Using Standardized Tests

Assessing Academic Programs at IPFW

Clinical Educator and Teacher Candidate Training Chapter for the Candidate Preservice Assessment for Student Teachers (CPAST) Form Developed by the VARI-EPP*

Deconstructing Standard 2a Dr. Julie Reffel Valdosta State University

February 21-22, 2018.

Developing a Rubric for Assessment

Assessment Fall Quarter 2007.

Student Learning Objectives (slos)

Training Chapter for the Advanced Field Experience Form (Pre-CPAST

Cooperating Teacher and Student Teacher Training Chapter for the Candidate Preservice Assessment for Student Teachers (CPAST) Form Developed by the VARI-EPP*

Presentation transcript:

EVALUATING EPP-CREATED ASSESSMENTS Donna Fiebelkorn, Lake Superior State University Beth Grzelak, Eastern Michigan University

Goals for today’s session: Introduce the CAEP Evaluation Framework Provide the opportunity for participants to engage with the Evaluation Framework Discuss validity and reliability from the CAEP perspective Q & A

Evaluation Framework focuses on: Relevancy 1. Administration and Purpose 2. Content of Assessment Reliability and Actionability 3. Scoring Quality 4. Data Reliability 5. Data Validity 6. Survey Content 7. Survey Data Quality Note: For Survey Instruments, use Sections 1, 2, 6, and 7 only. All others, use Sections 1, 2, 3, 4, and 5.

Briefly, about the work we have done Between us, we have reviewed assessments from nine programs, public and private, small and medium. Not one assessment, from any EPP, met the ‘Sufficient’ level as defined by the CAEP Evaluation Framework In each case, we worked in teams of 2-3 Initially independently, and then with team members to reconcile discrepancies. The review team’s report was then reviewed and accepted by CAEP staff member. Before we discuss the issues we saw, let’s introduce you to the CAEP Evaluation Framework…

Section 1. Administration and Purpose BELOW Use or purpose are ambiguous or vague There is limited or no basis for reviewers to know what information is given to candidates Instructions given to candidates are incomplete or misleading The criterion for success is not provided or is not clear. 1. ADMINISTRATION AND PURPOSE (informs relevancy) The point or points when the assessment is administered during the preparation program are explicit The purpose of the assessment and its use in candidate monitoring or decisions on progression are specified and appropriate Instructions provided to candidates (or respondents to surveys) about what they are expected to do are informative and unambiguous. The basis for judgment (criterion for success, or what is “good enough”) is made explicit for candidates (or respondents to surveys). Evaluation categories or assessment tasks are aligned to CAEP, InTASC, national/professional and state standards ABOVE Purpose of assessment and use in candidate monitoring or decisions are consequential Candidate progression is monitored and information used for mentoring Candidates are informed how the instrument results are used in reaching conclusions about their status and/or progression

Section 2. Content of Assessment BELOW Indicator alignment with . . . standards is incomplete, absent or only vaguely related to the content of standards being evaluated. Indicators fail to reflect the degree of difficulty described in the standards Indicators not described, are ambiguous, or include only headings. Higher level functioning . . . is not apparent in the indicators. Many indicators require judgment of candidate proficiencies that are of limited importance in . . . standards 2. CONTENT OF ASSESSMENT (informs relevancy) Indicators assess explicitly identified aspects of CAEP, InTASC, national/professional and state standards Indicators reflect the degree of difficulty or level of effort described in the standards Indicators unambiguously describe the proficiencies to be evaluated When the standards being informed address higher level functioning, the indicators require higher levels of intellectual behavior (e.g., create, evaluate, analyze, & apply). For example . . . Most indicators (at least those comprising 80% of the total score) require observers to judge consequential attributes of candidate proficiencies in the standards [Note: the word “indicators” is used as a generic term for assessment items . . .] ABOVE Almost all evaluation categories or tasks (at least those comprising 95% of the total score) require observers to judge consequential attributes of candidate proficiencies in the standards

Section 3. Scoring BELOW Rating scales are used in lieu of rubrics; e.g., “level 1= significantly below expectation” “level 4 = significantly above expectation”. PLDs do not align with indicators. PLDs do not represent developmental progressions. PLDs provide limited or no feedback to candidates specific to their performance. Proficiency level attributes are vague or not defined, and may just repeat from the standard or component 3. SCORING (informs reliability and actionability) The basis for judging candidate work is well defined Each Proficiency Level Descriptor (PLD) is qualitatively defined by specific criteria aligned with indicators. PLDs represent a developmental sequence from level to level (to provide raters with explicit guidelines for evaluating candidate performance and candidates with explicit feedback on their performance) Feedback provided to candidates is actionable - it is directly related to the preparation program and can be used for program improvement as well as for feedback to the candidate. Proficiency level attributes are defined in actionable, performance-based, or observable behavior terms. [NOTE: If a less actionable term is used … criteria are provided to define the use of the term in the context of the category or indicator ABOVE Higher level actions from Bloom’s taxonomy are used such as “analysis” or “evaluation”

Section 4. Data Reliability BELOW Description of or a plan to establish reliability does not inform reviewers about how it was established or is being investigated. Described steps do not meet accepted research standards for reliability. No evidence, or limited evidence, is provided that scorers are trained and their inter-rater agreement is documented. 4. Data Reliability A description or plan is provided that details the type of reliability that is being investigated or has been established (e.g., test-retest, parallel forms, inter-rater, internal consistency, etc.) and the steps the EPP took to ensure the reliability of the data from the assessment Training of scorers and checking on inter-rater agreement and reliability are documented The described steps meet accepted research standards for establishing reliability ABOVE Raters are initially, formally calibrated to master criteria and are periodically formally checked to maintain calibration at levels meeting accepted research standards A reliability coefficient is reported

Section 5. Data Validity 5. Data Validity BELOW Description of or plan to establish validity does not inform reviewers about how it was established or is being investigated. The type of validity established or investigated is misidentified or not described. The instrument was not piloted … Process or plans … are not presented or are superficial. Described steps do not meet accepted research standards for establishing validity. For example, validity is determined through an internal review by only one or two stakeholders. 5. Data Validity A description or plan is provided that details steps the EPP has taken or is taking to ensure the validity of the assessment and its use The plan details the types of validity that are under investigation or have been established (e.g., construct, content, concurrent, predictive, etc.) and how they were established If the assessment is new or revised, a pilot was conducted. The EPP details its current process or plans for analyzing and interpreting results from the assessment The described steps generally meet accepted research standards for establishing the validity of data from an assessment ABOVE Types of validity investigated go beyond content validity and move toward predictive validity A validity coefficient is reported

Section 6. Survey Content BELOW Questions or topics are not aligned with EPP mission or standards Individual item are ambiguous or include more than one subject There are numerous leading questions Items are stated as opinions rather than as behaviors or practices Dispositions surveys provide no evidence of a relationship to effective teaching. 6. Survey Content Questions or topics are explicitly aligned with aspects of the EPP’s mission and also CAEP, InTASC, national/professional, and state standards Individual items have a single subject; language is unambiguous Leading questions are avoided Items are stated in terms of behaviors or practices instead of opinions, whenever possible Surveys of dispositions make clear to candidates how the survey is related to effective teaching ABOVE Scoring is anchored in performance or behavior demonstrably related to teaching practice Dispositions surveys make an explicit connection to effective teaching

Section 7. Survey Data Quality BELOW Scaled choices are numbers only, without qualitative description linked with the item under investigation Limited or no feedback provided to EPP for improvement purposes No evidence that questions/items have been piloted 7. Survey Data Quality Scaled choices are qualitatively defined using specific criteria aligned with key attributes Feedback provided to the EPP is actionable EPP provides evidence that questions are piloted to determine that candidates interpret them as intended and modifications are made, if called for ABOVE EPP provides evidence of survey construct validity derived from its own or accessed research studies

Holistic Evaluation of EPP-Created Assessments And then . . . Holistic Evaluation of EPP-Created Assessments Criteria evaluated during stages of accreditation review and decision-making: EPP provides evidence that data are compiled and tabulated accurately Interpretations of assessment results are appropriate for items and resulting data Results from successive administrations are compared

Data Validity Example: Sufficient? CAEP Training Example: This observation rubric has “face validity” because all of the stakeholders (supervising faculty, program faculty, mentors and principals) agreed, after modifications, that it was measuring good teaching.

Data Reliability Example: Sufficient? CAEP Training Example: We have not yet determined inter-rater reliability for this assessment but plan to conduct a series of faculty meetings in which faculty observe videos of candidates in their teaching residency, complete the observation rubric individually and then discuss differences in ratings. Inter-rater reliability will be determined and differences discussed in order to ensure consistency across raters.

Opportunities for Continued Discussion Open Space Topic? Team Work? Other?