Presentation is loading. Please wait.

Presentation is loading. Please wait.

ECOLT 2006 Slide 1 October 13, 2006 Prospectus for the PADI design framework in language testing ECOLT 2006, October 13, 2006, Washington, D.C. PADI is.

Similar presentations


Presentation on theme: "ECOLT 2006 Slide 1 October 13, 2006 Prospectus for the PADI design framework in language testing ECOLT 2006, October 13, 2006, Washington, D.C. PADI is."— Presentation transcript:

1 ECOLT 2006 Slide 1 October 13, 2006 Prospectus for the PADI design framework in language testing ECOLT 2006, October 13, 2006, Washington, D.C. PADI is supported by the National Science Foundation under grant REC- 0129331. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Robert J. Mislevy Professor of Measurement & Statistics University of Maryland Geneva D. Haertel Assessment Research Area Director SRI International

2 ECOLT 2006 Slide 2 October 13, 2006 Some Challenges in Language Testing Sorting out evidence about interacting aspects of knowledge & proficiency in complex performances Understanding the impact of “complexity factors” and “difficulty factors” on inference Scaling up efficiently to high volume tests— task creation, scoring, delivery Creating valid & cost-effective low volume tests

3 ECOLT 2006 Slide 3 October 13, 2006 Evidence-Centered Design Evidence-centered assessment design (ECD) provides language, concepts, knowledge representations, data structures, and supporting tools to help design and deliver educational assessments, all organized around the evidentiary argument an assessment is meant to embody.

4 ECOLT 2006 Slide 4 October 13, 2006 The Assessment Argument What kinds of claims do we want to make about students? What behaviors or performances can provide us with evidence for those claims? What tasks or situations should elicit those behaviors? Generalizing from Messick (1994)

5 ECOLT 2006 Slide 5 October 13, 2006 Evidence-Centered Design With Linda Steinberg & Russell Almond at ETS »The Portal project / TOEFL »NetPASS with Cisco (computer network design & troubleshooting) Principled Assessment Design for Inquiry (PADI) »Supported by NSF (co-PI: Geneva Haertel, SRI) »Focus on science inquiry—e.g., investigations »Models, tools, examples

6 ECOLT 2006 Slide 6 October 13, 2006 Cognitive design for generating tasks (Embretson) Model-based assessment (Baker) Analyses of task characteristics—test and TLU (Bachman & Palmer) Test specifications (Davidson & Lynch) Constructing measures (Wilson) Understanding by design (Wiggins) Integrated Test Design, Development, and Delivery (Luecht) Some allied work

7 From Mislevy & Riconscente, in press Assessment Delivery How do students and tasks actually interact? How do we report examinee performance? How do students and tasks actually interact? How do we report examinee performance? Assessment Implementation Conceptual Assessment Framework Domain Modeling Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? How do we represent key aspects of the domain in terms of assessment argument. Design structures: Student, evidence, and task models How do we choose and present tasks, and gather and analyze responses? Layers in the assessment enterprise Key ideas: Explicit relationships Explicit structures Generativity Re-usability Recombinability Interoperability Key ideas: Explicit relationships Explicit structures Generativity Re-usability Recombinability Interoperability

8 From Mislevy & Riconscente, in press Assessment Delivery How do students and tasks actually interact? How do we report examinee performance? How do students and tasks actually interact? How do we report examinee performance? Assessment Implementation Conceptual Assessment Framework Domain Modeling Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? How do we represent key aspects of the domain in terms of assessment argument. Design structures: Student, evidence, and task models How do we choose and present tasks, and gather and analyze responses? Expertise research, task analysis, curriculum, target use, critical incident analysis, ethnographic studies, etc. In language assessment, importance of… Psycholinguistics Sociolinguistics Target language use In language assessment, importance of… Psycholinguistics Sociolinguistics Target language use

9 From Mislevy & Riconscente, in press Assessment Delivery How do students and tasks actually interact? How do we report examinee performance? How do students and tasks actually interact? How do we report examinee performance? Assessment Implementation Conceptual Assessment Framework Domain Modeling Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? How do we represent key aspects of the domain in terms of assessment argument. Design structures: Student, evidence, and task models How do we choose and present tasks, and gather and analyze responses? Tangible stuff e.g., what gets made and how it operates in testing situation

10 From Mislevy & Riconscente, in press Assessment Delivery How do students and tasks actually interact? How do we report examinee performance? How do students and tasks actually interact? How do we report examinee performance? Assessment Implementation Conceptual Assessment Framework Domain Modeling Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? How do we represent key aspects of the domain in terms of assessment argument. Design structures: Student, evidence, and task models How do we choose and present tasks, and gather and analyze responses? How do you get from here to here?

11 From Mislevy & Riconscente, in press Assessment Delivery How do students and tasks actually interact? How do we report examinee performance? How do students and tasks actually interact? How do we report examinee performance? Assessment Implementation Conceptual Assessment Framework Domain Modeling Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? How do we represent key aspects of the domain in terms of assessment argument. Design structures: Student, evidence, and task models How do we choose and present tasks, and gather and analyze responses? We will focus today on two “hidden” layers:

12 From Mislevy & Riconscente, in press Assessment Delivery How do students and tasks actually interact? How do we report examinee performance? How do students and tasks actually interact? How do we report examinee performance? Assessment Implementation Conceptual Assessment Framework Domain Modeling Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? How do we represent key aspects of the domain in terms of assessment argument. Design structures: Student, evidence, and task models How do we choose and present tasks, and gather and analyze responses? We will focus today on two “hidden” layers: Domain modeling, which concerns the Assessment Argument

13 From Mislevy & Riconscente, in press Assessment Delivery How do students and tasks actually interact? How do we report examinee performance? How do students and tasks actually interact? How do we report examinee performance? Assessment Implementation Conceptual Assessment Framework Domain Modeling Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? How do we represent key aspects of the domain in terms of assessment argument. Design structures: Student, evidence, and task models How do we choose and present tasks, and gather and analyze responses? And the Conceptual Assessment Framework, which concerns generative & re-combinable design schemas

14 From Mislevy & Riconscente, in press Assessment Delivery How do students and tasks actually interact? How do we report examinee performance? How do students and tasks actually interact? How do we report examinee performance? Assessment Implementation Conceptual Assessment Framework Domain Modeling Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? How do we represent key aspects of the domain in terms of assessment argument. Design structures: Student, evidence, and task models How do we choose and present tasks, and gather and analyze responses? More on the Assessment Argument More on the Assessment Argument

15 ECOLT 2006 Slide 15 October 13, 2006 PADI Design Patterns Organized around elements of assessment argument Narrative structures for assessing pervasive kinds of knowledge / skill / capabilities Based on research & experience, e.g. »PADI: Design under constraint, inquiry cycles, representations »Compliance w. Grice’s maxims; cause/effect reasoning; giving spoken directions Suggest design choices that apply to different contexts, levels, purposes, formats »Capture experience in structured form »Organized in terms of assessment argument

16 ECOLT 2006 Slide 16 October 13, 2006 A Design Pattern Motivated by Grice’s Relation Maxims AttributeValue(s) NameGrice’s Relation Maxim—Responding to a Request SummaryIn this design pattern, an examinee will demonstrate following Grice’s Relation Maxim in a given language, by producing or selecting a response in a situation that presents a request for information (e.g., conversation). Central claimsIn contexts/situations with xxx characteristics, can formulate and respond to representations of implicature from referents. semantic implication pragmatic implication Additional knowledge that may be at issue Substantive knowledge in domain; Familiarity with cultural models; Knowledge of language

17 ECOLT 2006 Slide 17 October 13, 2006 Characteristic features The stimulus situation needs to present a request for relevant information to the examinee, either explicitly or implicitly. Variable task features Production or choice as response? If production, oral or written production required? If oral, single response to a preconfigured situation or part of an evolving conversation? If evolving conversation, open or structured interview? Formality of prepackaged products (multiple choice, video taped conversations, written questions or conversations, one to one or more conversations which are prepared by interviewers) Formality of information and task (concrete or abstract, immediate or remote, information requiring retrieval or transformation, familiar or unfamiliar setting and topic, written or spoken) If prepackaged speech stimulus: length, content, difficulty of language, explicitness of request, degree of cultural dependence. Content of situation (familiar or unfamiliar, degree of difficulty) Time pressure (e.g., time for planning and response) Opportunity for control the conversation Grice’s Relation Maxims

18 ECOLT 2006 Slide 18 October 13, 2006 Potential performances and work products Constructed oral response Constructed written or typed-in response Answer to a multiple-choice question where alternatives vary Potential features of performance to evaluate Whether a student can formulate representations of implicature, as they are required in the given situation. Whether a student can make a conversational contribution or express the idea towards the accepted direction. Whether a student provides the relevant information as is required. Whether quality of choice among alternatives offered for a production in a given situation satisfies the Relation Maxim. Potential rubrics(later slide) Examples(in paper) Grice’s Relation Maxims

19 ECOLT 2006 Slide 19 October 13, 2006 Some Relationships between Design Patterns and Other TD Tools Conceptual models for proficiency & Task characteristic frameworks »Grist for design choices about KSAs & task features »DPs present integrated design space Test specifications »DPs for generating argument, design choices »Test specs for documenting, specifying choices

20 From Mislevy & Riconscente, in press Assessment Delivery How do students and tasks actually interact? How do we report examinee performance? How do students and tasks actually interact? How do we report examinee performance? Assessment Implementation Conceptual Assessment Framework Domain Modeling Domain Analysis What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? What is important about this domain? What work and situations are central in this domain? What KRs are central to this domain? How do we represent key aspects of the domain in terms of assessment argument. Design structures: Student, evidence, and task models How do we choose and present tasks, and gather and analyze responses? More on the Conceptual Assessment Framework

21 ECOLT 2006 Slide 21 October 13, 2006 Evidence-centered assessment design The three basic models Technical specs that embody the elements suggested in the design pattern

22 ECOLT 2006 Slide 22 October 13, 2006 Evidence-centered assessment design The three basic models Conceptual Representation

23 ECOLT 2006 Slide 23 October 13, 2006 Screen shot of user interface User-Interface Representation

24 ECOLT 2006 Slide 24 October 13, 2006 High-level UML Representation of the PADI Object Model UML Representation (sharable data structures, “behind the screen”) UML Representation (sharable data structures, “behind the screen”)

25 ECOLT 2006 Slide 25 October 13, 2006 What complex of knowledge, skills, or other attributes should be assessed? Evidence-centered assessment design

26 ECOLT 2006 Slide 26 October 13, 2006 The NetPass Student Model Can use same student model with different tasks. Multidimensional measurement model with selected aspects of proficiency

27 ECOLT 2006 Slide 27 October 13, 2006 What behaviors or performances should reveal those constructs? Evidence-centered assessment design

28 ECOLT 2006 Slide 28 October 13, 2006 l What behaviors or performances should reveal those constructs? Evidence-centered assessment design From unique student work product to evaluations of observable variables— i.e., task-level “scoring”

29 ECOLT 2006 Slide 29 October 13, 2006 4 Responses and explanations are relevant as required for current purposes of the exchange and neither more elaborated than appropriate or insufficient for the context. They fulfill the demands of the task with at most minor lapses in completeness. They are appropriate for the task and exhibit coherent discourse. 3 Responses and explanations address the task appropriately and are relevant as required for current purposes of the exchange, but they may either more elaborated than required or fall short of being fully developed. 2 The responses and explanations are connected to the task, but are either markedly excessive in information supplied or not very relevant to the current purpose of the exchange. Some relevant information might be missing or inaccurately cast. 1 The responses and explanations are either grossly relevant or are very limited in content or coherence. In either case they may be only minimally connected to the task. 0 Speaker makes no attempt to respond or response is unrelated to the topic. A writing response at this level merely copies sentences from the topic, rejects the topic or is otherwise not connected to the topic. A spoken response is not connected to the direct or implied request for information. Skeletal Rubric for Satisfaction of Quality Maxims

30 ECOLT 2006 Slide 30 October 13, 2006 Re-usable (tailorable) to different tasks & projects Can be multiple aspects of performance being rated. May be 1-1 relationship with Student model Variables, but need not be. That is, there can be multiple aspects of proficiency that are involved in probability of high / satisfactory/ certain style of response Notes re Observable Variables

31 ECOLT 2006 Slide 31 October 13, 2006 l What behaviors or performances should reveal those constructs? Evidence-centered assessment design Values of observable variables used to update probability distributions for student-model variables via psychometric model—i.e., test-level scoring.

32 ECOLT 2006 Slide 32 October 13, 2006 An NetPass Evidence- Model Fragment for Design Re-usable conditional-probability fragments and variable names for different tasks with the same evidentiary structure. Measurement models indicate which SMVs, in which combinations, affect which observables. Task features influence which ones and how much, in structured measurement models.

33 ECOLT 2006 Slide 33 October 13, 2006 l What tasks or situations should elicit those behaviors? Evidence-centered assessment design

34 ECOLT 2006 Slide 34 October 13, 2006 Representations to the student, and sources of variation

35 ECOLT 2006 Slide 35 October 13, 2006 Task Specification Template - Determining Key Features (Wizards) Setting Corporation Conference Center University Building Length Less than 100m More than 100m Ethernet Standard 10BaseT 100BaseT Subgroup Name Teacher Student Customer Bandwidth for a Subgroup Drop 10Mbps 100Mbps Growth Requirements Given NA

36 ECOLT 2006 Slide 36 October 13, 2006 Structured Measurement Models Examples of models »Multivariate Random Coefficients Multinomial Logit Model (MRCMLM; Adams, Wilson, & Wang, 1997) »Bayes nets (Mislevy, 1996) »General Diagnostic Model (von Davier & Yamamoto) By relating task characteristics to difficulty with respect to different aspects of proficiency, create tasks with known properties. Can create families of tasks around same evidentiary frameworks; e.g., For “read & write” tasks, can vary characteristics of texts, directives, audience, purpose.

37 ECOLT 2006 Slide 37 October 13, 2006 Structured Measurement Models Articulated connection between task characteristics and models of proficiency Moves beyond “modeling difficulty” »Traditional test theory a bottleneck in multivariate environment Dealing with “complexity factors” and “difficulty factors” (Robinson) »Model complexity factors as covariates for difficulty parameters wrt those aspects of proficiency they impact »Model difficulty factors as either SMVs, if target of inference, or as noise, if nuisance.

38 ECOLT 2006 Slide 38 October 13, 2006 Advantages: A framework that… Guides task and test construction (Wizards) Provides high efficiency and scalability By relating task characteristics to difficulty, allows creating tasks with targeted properties Promotes re-use of conceptual structures (DPs, arguments) in different projects Promotes re-use of machinery in different projects

39 ECOLT 2006 Slide 39 October 13, 2006 Evidence of effectiveness Cisco »Certification & training assessment »Simulation-based assessment tasks IMS/QTI »Conceptual model for standards for data structures for computer-based testing ETS »TOEFL »NBPTS

40 ECOLT 2006 Slide 40 October 13, 2006 Conclusion Isn’t this just a bunch of new words for describing what we already do?

41 ECOLT 2006 Slide 41 October 13, 2006 An answer (Part 1) No.

42 ECOLT 2006 Slide 42 October 13, 2006 An answer (Part 2) An explicit, general framework makes similarities and implicit principles explicit: »To better understand current assessments… »To design for new kinds of assessment… –Tasks that tap multiple aspects of proficiency –Technology-based tasks (e.g., simulations) –Complex observations, student models, evaluation »To foster re-use, sharing, & modularity –Concepts & arguments –Pieces of machinery & processes (QTI)

43 ECOLT 2006 Slide 43 October 13, 2006 For more information… www.education.umd.edu/EDMS/mislevy/ Has links to PADI, Cisco, articles, etc. (e.g., CRESST report on Task-Based Language Assessment.)


Download ppt "ECOLT 2006 Slide 1 October 13, 2006 Prospectus for the PADI design framework in language testing ECOLT 2006, October 13, 2006, Washington, D.C. PADI is."

Similar presentations


Ads by Google