Design Review “Paper Towels Assessment” Education 334X Technology-Based Student Achievement Assessment 2001.10.22 ‘Alim, Josie, Mike G, Bradley, Nina, Yunn Chyi, Johnnie, Jim
What does this assessment measure? Conceptual framework & locate assessment Conceptual analysis of task response format scoring system What does prior research say? Observational Data: summary of notes from live observation and implications for simulation
In the conceptual framework Content knowledge Rich Process knowledge Constrained Open Paper Towels Lean
Conceptual Analysis Declarative Procedural Schematic Strategic Performance Task What is… weight, volume & relationship between them saturation area & length, purpose/use of different tools How to… read/write saturate control variables measure volume, weight etc. determine amount of water soaked up Why… use care in procedures recognize when has reached valid/reliable result recognize when unexpected factors affect result Response Format Which towel absorbs the most and which the least? Steps in procedure How did you know? Completely wet? Same size? Scoring System Correct result Method for getting towel wet Determine result Care in procedure Saturation
For this type of performance assessment, prior research has found: Student notebook scores are exchangeable for direct observation scores. Notebook scores were slightly more reliable for students with prior experience with hands-on instruction. There is very high interrater reliability. Teachers can be trained to consistently judge performance. There is somewhat higher reliability for observed hands-on performance than for notebooks. There was moderate correlation between this type of assessment and the multiple choice test suggesting that different aspects of science achievement are being measured. There is low to moderate intertask reliability. This may be a concern because a number of tasks are needed in order to be able to generalize the performance to the domain. Sources: Baxter & Shavelson ’94- Evaluation of Procedure-Based Scoring for Hands-On Science Assessment Shavelson, Gao & Baxter ’93 - Sampling Variability of Performance Assessments
n = 1
Observation Consideration: Bias due to familiarity with test and scoring criteria. Single variable procedures only for method. Equalization primary concern for Keri. First Trial: Used dropper to saturate equal pieces. Counted drops and compared. Second Trial: Weighed equal pieces in scale. Third Trial: Used set amount of water in beaker to saturate. Compared amount of water left in beaker.
Valid for what kinds of instructional goals? “Paper Towels” is a Performance Assessment designed to support a hands-on, inquiry-based approach to learning and teaching that is essential in the primary grades. (1) To develop scientific reasoning, critical thinking, and inquiry/curiosity (3) (2) To design and conduct a sound scientific experiments, including identifying and controlling variables (develop procedural understanding) (4) To apply previously learned skills to new situations(6) To develop skills and appreciation for careful observation and measurement (5) http://www.si.edu/nsrc/ http://books.nap.edu/nap-cgi/morehits.cgi?file=21-31.htm&term=inquiry&isbn=0309052971&display= Science for All Children: A Guide to Improving Elementary Science Education in Your School District (1997), National Science Resources Center of the National Academy of Sciences and the Smithsonian Institution, ISBN 0-309-05297-1 p22 “…it is important to begin building children’s experiential base in the primary grades by providing research-based, inquiry-centered experiences…” p23 “…inquiry-centered experiences generate one of the most essential ingredients of learning – curiosity” http://www.exploratorium.edu/IFI/resources/index.html http://www.nsf.gov/pubs/2000/nsf99148/htmstart.htm “Inquiry: Thoughts Views and Strategies for the K-5 classroom”, Foundations, VOL 2, Division of Elementary, Secondary and Informal Education, National Science Foundation. (3) “…Dewey believed that the ability to reason scientifically was an essential skill for coping with the complexities of modern life, and he warned that failure to cultivate such skills risked "a return to intellectual and moral authoritarianism" …” ch1., Peter Dow (4) “The National Science Education Standards, developed by the National Research Council (1996), elaborate major components of learning and teaching science through inquiry. "Students at all grade levels and in every domain of science," it states, "should have the opportunity to use scientific inquiry and develop the ability to think and act in ways associated with inquiry, including asking questions, planning and conducting investigations, using appropriate tools and techniques to gather data, thinking critically and logically about relationships between evidence and explanations, constructing and analyzing alternative explanations, and communicating scientific arguments" ch2, Hubert Dyasi (5) “When learners interact with the world in a scientific way, they find themselves observing, questioning, hypothesizing, predicting, investigating, interpreting, and communicating. These are often called the "process skills" of science. Process skills play a critical role in helping children develop scientific ideas.”, ch7, Doris Ash Baxter, et al “Hands…” spring 1992 (6) “…tests must go beyond the correct response and focus on students’ conceptual understading, probleme solving skills, and ability to apply knowledge and understanding to novel situations.”
What is the utility of this hands-on performance assessment? (+) (-) Provides richer feedback regarding the student’s ability to solve problems using scientific reasoning The experience is instructional and non-intimidating Allows you to characterize students’ procedural strengths and weaknesses (good feedback to teacher) It’s much more collaborative and fun! More costly than multiple-choice, paper-pencil test (new and additional hands-on materials) Requires more time (cost) to score and be trained to score; requires trained scorers (harder to score) Requires more set-up time Requires more class time to execute the test The lab-book depends on literacy skills It’s weak in content
Reliability Inter-rater Very high, .94 (according to authors) Inter-task Not relevant, only one task is offered Sources of error Very few opportunities for discrepancy Limited to judgment calls: - care taken? - saturated?
What is the construct validity of PT assessment? Instructional Goals What PT measures? 1.To develop reasoning, critical thinking and inquiry skills 1.Effective explanation of principles underlying the performance tasks. 2. Ability to provide a goal-directed, efficient strategies (problem solving) 3. Ability to self-monitor, constantly checking on their thinking and reasoning to determine the result. 2.To design and conduct a sound scientific experiment Ability to assess the nature of the problem and construct a plan (mental model) to carry out the strategy 4.To transfer their knowledge Problem solving skills (since PT is content lean) 5.Develop skills and appreciation for careful observation and measurement Care in measuring &/or saturation
What is the construct validity of PT assessment? Content validity: The instructional goals of the lesson parallel with what PT is measuring (as shown in the table) Cognitive process PT assess students’ ability to explain the underlying principles, construct a plan (mental model) to initiate the strategy, devise strategies to solve the problem, monitor their thinking and reasoning… while PERFORMING the task (vs. recalling, reciting). These cognitive processes indicate effective learning of the subject matter (Baxtel & Elder, 1996). Performance level Research by Baxter et al (1992) indicates that PA measures correlated less with traditional ability than did a MC achievement test, ie: this PA doesn’t measure dlec Exchangeability for other methods (M-C, C-R, Teacher observation?) Research has shown performance assessment (e.g. PT) is exchangeable for direct observation. As a content lean performance assessment, PT measures different aspects of science achievement (problem solving skills) as Multiple Choice (MC) and Construct Response (CR), which concentrates more on domain knowledge. Hence they are not exchangeable.
Implications for Virtualizing PT Real world mistakes vs. virtual world mistakes - How many can/should we accommodate? Drop size, scale validity, scattered moisture, size of towels… How many roads (scenarios) can we account for? Cross-method results (combining results from multiple experiments) Can human observation really be replaced by computer tracking? Now that we’re not using the notebooks, should we bring back tracking “care” (eg: alarms or flags set for missing “care steps”) Do we still need to give them some hands-on materials? Can we really do this??