SWPBS Forum October 2008 Claudia Vincent and Scott Spaulding University of.

SWPBS Forum October 2008 Claudia Vincent and Scott Spaulding clavin@uoregon.educlavin@uoregon.edu sspauldi@uoregon.edusspauldi@uoregon.edu University of Oregon

 Provide information about desirable features of SWPBS evaluation tools  Provide an overview of the extent to which SWPBS evaluation tools meet these desirable features

PBS Self-Assessment Implement systems to support practices Implement practices Improved student outcomes EVALUATION DATA Use eval data for decision-making Fidelity measures Student outcome measures Action Plan Interpret eval data 1.Drive implementation decisions 2.Provide evidence for SWPBS impact on student outcomes

 A measure that drives implementation decisions should be: ◦ socially valid ◦ contextually appropriate ◦ sufficiently reliable (reliable enough to make defensible decisions) ◦ easy to use  A measure that builds the evidence base for SWPBS should: ◦ have known reliability ◦ have known validity ◦ clearly link implementation status to student outcomes

 Measurement scores have two components: ◦ True score, e.g. a school’s true performance on “teaching behavioral expectations” ◦ Error, e.g. features of the measurement process itself  Our goal is to use tools that 1.maximize true score and minimize measurement error, and therefore 2.yield precise and interpretable data, and therefore 3.lead to sound implementation decisions and defensible evidence. True score (relevant to construct) Error (noise)

 True score is maximized and error minimized if the evaluation tool is technically adequate, i.e. ◦ can be applied consistently (has good reliability) ◦ measures the construct of interest (has good validity)  Sound implementation decisions are made if the evaluation tool is practical, i.e. data ◦ are cost efficient to collect (low impact) ◦ are easy to aggregate across units of analysis (e.g. students, classrooms, schools, districts, states) ◦ are consistently used to make meaningful decisions (have high utility)

 Consistency across ◦ Items/subscales/total scales (“internal consistency”) ◦ Data collectors (“inter-rater reliability” or “inter-observer agreement”) ◦ Time (“test-retest reliability”)

 Definition: ◦ Extent to which the items on an instrument adequately and randomly sample a cohesive construct, e.g. “SWPBS implementation”  Assessment: ◦ If the instrument adequately and randomly samples one construct, and if it were divided into two equal parts, both parts should correlate strongly  Metric: ◦ coefficient alpha ( the average split-half correlation based on all possible divisions of an instrument into two parts)  Interpretation: ◦ α ≥.70 (adequate for measures under development) ◦ α ≥.80 (adequate for basic research) ◦ α ≥.90 (adequate for measures on which consequential decisions are based)

 Definition: ◦ Extent to which the instrument measures the same construct regardless of who collects the data  Assessment: ◦ If the same construct were observed by two data collectors, their ratings should be almost identical  Metric: ◦ Expressed as percentage of agreement between two data collectors  Interpretation: ◦ ≥ 90% good ◦ ≥ 80% acceptable ◦ < 80% problematic

 Definition: ◦ Extent to which the instrument yields consistent results at two points in time  Assessment: ◦ The measure is administered at two points in time. The time interval is set so that no improvement is expected to occur between first and second administration.  Metric: ◦ Expressed as correlation between pairs of scores from the same schools obtained at the two measurement administrations  Interpretation: ◦ r ≥.6 acceptable

How can we interpret this graph?

 Interpretability of data!  Did these schools truly differ in the extent to which they taught behavioral expectations?  Or…did these schools obtain different scores because ◦ the tool’s items captured only some schools’ approach to teaching expectations? (tool lacked internal consistency) ◦ they had different data collectors? (tool lacked inter-rater agreement) ◦ some collected data in week 1 and some in week 2 of the same month? (tool lacked test-retest reliability)

 Content validity  Criterion-related validity ◦ Concurrent validity ◦ Predictive validity  Construct validity

 Definition: ◦ Extent to which the items on an instrument relate to the construct of interest, e.g. “student behavior”  Assessment: ◦ Expert judgment if items measure content theoretically or empirically linked to the construct  Metric: ◦ Expressed as percentage of expert agreement  Interpretation: ◦ ≥ 80% agreement desirable

 Definition: ◦ Extent to which the instrument correlates with another instrument measuring a similar aspect of the construct of interest and administered concurrently or subsequently  Assessment: ◦ Concurrent validity: compare data from concurrently administered measures for agreement ◦ Predictive validity: compare data from subsequently administered measures for predictive accuracy  Metric: ◦ Expressed as a correlation between two measures  Interpretation: ◦ Moderate to high correlations are desirable ◦ Concurrent validity: Very high correlations might indicate redundancy of measures

 Definition: ◦ Extent to which the instrument measures what it is supposed to measure (e.g. the theorized construct “student behavior”)  Assessment: ◦ factor analyses yielding information about the instrument’s dimensions (e.g. aspects of “student behavior”) ◦ Correlations between constructs hypothesized to impact each other (e.g. “student behavior” and “student reading achievement”)  Metric: ◦ Statistical model fit indices (e.g. Chi-Square)  Interpretation: ◦ Statistical significance

How can we interpret this graph?

 Interpretability of data!  Can we truly conclude that student behavior is better in school F than school J? ◦ Does the tool truly measure well-defined behaviors? (content validity) ◦ Do student behaviors measured with this tool have any relevance for the school’s overall climate? For the student’s long-term success? (concurrent, predictive validity) ◦ Does the tool actually measure “student behavior”, or does it measure “teacher behavior”, “administrator behavior”, “parent behavior” ? (construct validity)

 Consider sample size ◦ Psychometric data derived from large samples are better than psychometric data derived from small samples.  Consider sample characteristics ◦ Psychometric data derived from specific samples (e.g. elementary schools) do not automatically generalize to all contexts (e.g. middle schools, high schools).

 Making implementation decisions based on evaluation data ◦ When has a school reached “full” implementation?  “Criterion” scores on implementation measures should be calibrated based on student outcomes Implementation student outcome goals criterion academic achievement 102030405060708090100 social achievement

 Evaluation data lead to consequential decisions, e.g. ◦ Additional trainings when data indicate insufficient implementation ◦ Emphasis on specific supports where data indicate greatest student needs  To make sure we arrive at defensible decisions, we need to collect evaluation data with tools that ◦ have documented reliability and validity ◦ clearly link implementation to student outcomes

1. Collect evaluation data regularly 2. Collect evaluation data with tools that have good reliability and validity 3. Guide implementation decisions with evaluation data clearly linked to student outcomes

 Provide information about desirable features of SWPBS evaluation tools  Provide an overview of the extent to which SWPBS evaluation tools meet these desirable features

 How is my school doing?  My school is “80/80”. Now what?  My school is just beginning SWPBS. Where do I start?  How do we handle the kids still on support plans?  I’ve heard about school climate. What is that?  What about the classroom problems we still have?

 Measurement within SWPBS  Research or evaluation?  What tools do we have?  What evidence exists for use of these tools?  Guidelines for using the measures

 Focus on the whole school  School-wide PBS began with a focus on multiple systems  Evaluation of a process  Evaluation of an outcome  Growth beyond initial implementation Non- classroom Classroom Individual Student School-wide Systems Sugai & Horner (2002)

Primary Prevention: School-/Classroom- Wide Systems for All Students, Staff, & Settings Continuum of School-wide Positive Behavior Support ~80% of Students ~15% ~5% Secondary Prevention: Specialized Group Systems for Students with At-Risk Behavior Tertiary Prevention: Specialized Individualized Systems for Students with High-Risk Behavior

Student Unit of Measurement and Analysis School Nonclassroom Classroom ? Academics Behavior Academics Behavior Level of Prevention and Intervention Dimension of Measurement Academic Achievement Social Behavior ? ? ? ? ? ? ? ? Tertiary Secondary Primary OutcomesProcess

1. Drive implementation decisions 2. Provide evidence for SWPBS impact on student outcomes  Measures have developed to support research-quality assessment of SWPBS  Measures have developed to assist teams in monitoring their progress

Some commonly used measures:  Effective Behavior Supports Survey  Team Implementation Checklist  Benchmarks of Quality  School-wide Evaluation Tool  Implementation Phases Inventory

Newer measures:  Individual Student Schoolwide Evaluation Tool  Checklist for Individual Student Systems  Self-assessment and Program Review

Whole-SchoolNon-classroomClassroom TertiaryISSET CISS SecondaryISSET CISS UniversalEBS TIC SET BoQ EBS

 Is it important, acceptable, and meaningful?  Can we use it in our school?  Is it consistent?  Is it easy to use?  Is it “expensive”?  Does it measure what it’s supposed to?  Does it link implementation to outcome?

 Effective Behavior Supports Survey (EBS)  School-wide Evaluation Tool (SET)  Benchmarks of Quality (BoQ)

 Effective Behavior Supports Survey ◦ Sugai, Horner, & Todd (2003) ◦ Hagan-Burke et al. (2005) ◦ Safran (2006) Internal consist. T-RInter-raterContentCriterionConstruct ╳

 46-item, support team self-assessment  Facilitates initial and annual action planning  Current status and priority for improvement across four systems: ◦ School-wide ◦ Specific Setting ◦ Classroom ◦ Individual Student  Summary by domain, action planning activities  20-30 minutes, conducted at initial assessment, quarterly, and annual intervals

 Internal consistency ◦ Sample of 3 schools ◦ current status: α =.85 ◦ improvement priority: α =.94 ◦ Subscale α from.60 to.75 for “current status” and.81 to.92 for “improvement priority”  Internal consistency for School-wide ◦ Sample of 37 schools ◦ α =.88 for “current status” ◦ α =.94 for the “improvement priority”

 School-wide Evaluation Tool ◦ Sugai, Horner & Todd (2000) ◦ Horner et al. (2004) Internal consist. T-RInter-raterContentCriterionConstruct ╳╳╳╳╳

 28-item, research evaluation of universal implementation  Total implementation score and 7 subscale scores: 1.school-wide behavioral expectations 2.school-wide behavioral expectations taught 3.acknowledgement system 4.consequences for problem behavior 5.system for monitoring of problem behavior 6.administrative support 7.District support  2-3 hours, external evaluation, annual

 Internal consistency ◦ Sample of 45 middle and elementary schools ◦ α =.96 for total score ◦ α from.71 (district-level support) to.91 (administrative support)  Test-retest analysis ◦ Sample of 17 schools ◦ Total score, IOA = 97.3% ◦ Individual subscales, IOA = 89.8% (acknowledgement of appropriate behaviors) to 100% (district-level support)

 Content validity ◦ Collaboration with teachers, staff, and administrators at 150 middle and elementary schools over a 3-year period

 Construct validity ◦ Sample of 31 schools ◦ SET correlated with EBS Survey ◦ Pearson r =.75, p <.01  Sensitivity to differences in implementation across schools ◦ Sample of 13 schools ◦ Comparison of average scores before and after implementation ◦ t = 7.63; df = 12, p <.001

 Schoolwide Benchmarks of Quality ◦ Kincaid, Childs, & George (2005) ◦ Cohen, Kincaid, & Childs (2007) Internal consist. T-RInter-raterContentCriterionConstruct ╳╳╳╳╳

 Used to identify areas of success / improvement  Self-assessment completed by all team members  53-items rating level of implementation  Team coaches create summary form, noting discrepancies in ratings  Areas of strength, needing development, and of discrepancy noted for discussion and planning  1-1.5 hours (1 team member plus coach)  Completed annually in spring

 Items grouped into 10 subscales: 1.PBS team 2.faculty commitment 3.effective discipline procedures 4.data entry 5.expectations and rules 6.reward system 7.lesson plans for teaching behavioral expectations 8.implementation plans 9.crisis plans 10.evaluation

 Internal consistency ◦ Sample of 105 schools ◦ Florida and Maryland  44 ES, 35 MS, 10 HS, 16 center schools ◦ overall α of.96 ◦ α values for subscales .43 “PBS team” to .87 “lesson plans for teaching expectations”

 Test-retest reliability ◦ Sample of 28 schools ◦ Coaches scores only ◦ Total score: r =.94, p < 0.01 ◦ r values for subscales:  0.63 “implementation plan” to  0.93 “evaluation”  acceptable test-retest reliability  Inter-observer agreement (IOA) ◦ Sample of 32 schools ◦ IOA = 89%

 Content validity ◦ Florida PBS training manual & core SWPBS elements ◦ Feedback from 20 SWPBS research and evaluation professionals ◦ Interviewing to identify response error in the items ◦ Pilot efforts with 10 support teams  Concurrent validity ◦ Sample of 42 schools ◦ Correlation between BoQ and SET ◦ Pearson r =.51, p <.05

 What measures do I use?  How do I translate a score into “practice”?  What school variables affect measurement choices? ◦ SWPBS implementation status  Evaluation template

Fidelity ToolYear 1Year 2Year 3 EBS SurveyX 123412341234 Universal TIC XXXXXXXXX SET / BoQ XXX Secondary / tertiary CISS XXXXXXXXX ISSET XXX Classroom setting Class internal XXXXXXXXX Class external XXX

 Evaluation of School-wide PBS occurs for implementation and outcomes  Evidence of a “good” measure depends on its intended use  The quality of implementation decisions depends on the quality of evaluation tools  Evaluation occurs throughout the implementation process, with different tools for different purposes at different stages

SWPBS Forum October 2008 Claudia Vincent and Scott Spaulding University of.

Similar presentations

Presentation on theme: "SWPBS Forum October 2008 Claudia Vincent and Scott Spaulding University of."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SWPBS Forum October 2008 Claudia Vincent and Scott Spaulding University of.

Similar presentations

Presentation on theme: "SWPBS Forum October 2008 Claudia Vincent and Scott Spaulding University of."— Presentation transcript:

Similar presentations

About project

Feedback