Evaluation, Testing and Assessment June 9, 2011
Curriculum Evaluation Necessary to determine – How the program works – How successfully it works – Whether it meets learners’ needs – Whether more teacher training is needed – Whether learners are in fact learning
Purposes of Evaluations Program Accountability – Examines the effects of the program Program Development – Designed to improve quality of program
Types of Program Evaluation Formative – Focuses on ongoing development Are placement tests accurately placing students Is methodology used by teachers effective Is the pace of material adequate Illuminative – Looks at how different aspects of program work What types of error correction strategies are used How do teachers actually use their lesson plans What are the teacher-student/student-student interaction patterns Summative – Determines the effectiveness of program Did program achieve goals What did students learn Were placement tests adequate
Who is involved in evaluation? Insiders – Teachers—can provide formative evaluation of course – Students—can provide summative evaluation regarding individual improvement and effective teaching Outsiders – Can provide objective perspectives Consultants Administrators
Quantitative vs. Qualitative Evaluations Quantitative – Expressed numerically – Usually require large pool – Data can be analyzed and conclusions can be drawn – Format may be tests, checklists, surveys, self-ratings Qualitative – Subjective – Small pool – Collect information from natural settings – Format may be observations, interviews, case studies, etc.
Testing Tests—method of measuring a person’s ability, knowledge, or performance in a given domain – They occur at designated times A subset of assessment – Method—must be clear and structured Multiple choice, writing prompt with rubric, oral interview with checklist – Measure—from measure of individuals’ abilities an idea of general competence can be made (Adapted from Brown, 2004)
Assessments Assessments—an ongoing process that encompasses a wider domain – Occur throughout class period – Classroom activities are opportunities for assessments but students should have opportunity to take chances and not be penalized
Informal vs. Formal Assessments Informal – Comments/Responses – Smiley faces – Notes in the margin Formal – Tests – Journals – Systematic observations
Formative vs. Summative Assessment Formative – Majority of assessments are formative – Informal – Focus on the continual development of learner’s language Summative – Designated time – Measures what students have learned over period of time – Should still provide feedback
Norm-referenced vs. Criterion- referenced Tests Norm-referenced tests – Individual’s score is compared to a mean score – Quantitative results – Predetermined answers – SAT, NJASK, HSPA—most are not norm-referenced for ELLs Criterion-referenced tests – Individual gets feedback – Material is linked to curriculum – Result in grades
History of Testing Discrete-point tests—each strand tested separately Integrative tests—combining strands in testing but do not provide overall performance ability – Cloze—activities with missing words – Dictation—writing what is heard Communicative tests—include grammatical, textual, illocutionary (speech act e.g. requesting, warning, etc.), and sociolinguistic components Performance-based assessment—usually interactive and less formal e.g. oral interview Multiple intelligence—move beyond simple IQ tests
Computer-Based Testing CAT—Computer Adaptive Testing Thoughts on computers grading essays??
5 Principles of Language Assessment Practicality Reliability Validity Authenticity Washback
Practicality Tests must be practical which means – Not too expensive – Not too time-consuming (in administering and grading) – Easy to administer – Predetermined scoring
Reliability Tests must be consistent and dependable Factors that make a test unreliable include – Learner-related issues when a student has a bad day – Inter-rater issues when 2 or more scorers are not consistent – Intra-rater issues when scorer is not consistent over time – Test-administration issues when setting, time, and other factors are not consistent – Test reliability issues when test is not clear or is too long
Validity Tests must produce results which are appropriate, meaningful, and useful (Gronlund, 1998) – Content Validity Does test relate to material covered in class? – Criterion-related Validity Does test measure specific objectives? – Concurrent Validity Does the result on the test match to real world ability? – Predictive Validity Does the test properly place student at correct level? – Construct Validity Doest the test support theoretical constructs (theories, hypotheses, etc.)? – Face Validity Does this test appear to measure what it states it will measure (according to the students)?
Authenticity Test must have meaning in the real world – Tasks represent real-world tasks – Language is authentic (and not textbook language) – Material is contextualized E.g. authentic materials such as newspaper articles, signs, pictures, etc. are included as opposed to random sentences or fabricated materials
Washback Washback is the effect of testing on teaching and learning (Hughes, 2003) – Teaching to the test – Providing feedback on work so that student can accurately achieve final goal (e.g. comments on rough drafts) Washback helps develop learners self-confidence, interlanguage, language ego, and motivation.