Cross-Lingual Content Scoring

Slides:



Advertisements
Similar presentations
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
ENGLISH LEARNING FOR NON- NATIVE CHILDREN AROUND THE WORLD: SHOULD IT BE “SINK OR SWIM” APPROACH? By Majida Mehana, Ph.D.
Keeping A Scientific Journal I. Outline of each science project or journal entry (400 points) A. Write a Topic/Title at the top of the page (10) [Page1]
Teaching & Assessing English Learners on California’s Standards © Northern California Comprehensive Assistance Center, WestEd, 2001 John Carr
MT Evaluation: Human Measures and Assessment Methods : Machine Translation Alon Lavie February 23, 2011.
Human Judgements in Parallel Treebank Alignment Martin Volk, Torsten Marek, Yvonne Samuelsson University of Zurich and Stockholm University
® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Another addition to the ACER assessment suite….
The aim of this part of the curriculum design process is to find the situational factors that will strongly affect the course.
MACHINE TRANSLATION A precious key to communicate beyond linguistic barriers 1.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Multilingual Word Sense Disambiguation using Wikipedia Bharath Dandala (University of North Texas) Rada Mihalcea (University of North Texas) Razvan Bunescu.
UDEnglish.wikispaces.com Patricia Pereira Lerma.  UDEnglish is a WIKI space which works with the multiple students skills of A1, this material is ideal.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.
Chapter 6 ~~~~~ Oral And English Language Learner/Bilingual Assessment.
LEARNING PRIORITY OF TECHNOLOGY PROCESS SKILLS AT ELEMENTARY LEVEL Hung-Jen Yang & Miao-Kuei Ho DEPARTMENT OF INDUSTRIAL TECHNOLOGY EDUCATION THE NATIONAL.
Classifying Tags Using Open Content Resources Simon Overell, Borkur Sigurbjornsson & Roelof van Zwol WSDM ‘09.
2012: Monolingual and Crosslingual SMS-based FAQ Retrieval Johannes Leveling CNGL, School of Computing, Dublin City University, Ireland.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.
An ICALL writing support system tunable to varying levels of learner initiative Karin Harbusch 1 & Gerard Kempen 2,3 1 University of Koblenz-Landau, Koblenz,
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 5: Introduction to Norm- Referenced.
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
How to conduct a valid experiment.
In-Service Teacher Training Assessment in IGCSE English as a Second Language 0510 Session 2: Question papers and mark schemes.
Managing Science Projects Phase 1 – Creating Science Fair ideas Phase 2 – Planning Phase 3 – Collecting Data and Analyzing Results Phase 4 – Creating the.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Information Transfer through Online Summarizing and Translation Technology Sanja Seljan*, Ksenija Klasnić**, Mara Stojanac*, Barbara Pešorda*, Nives Mikelić.
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
Chapter 7: The Distribution of Sample Means. Frequency of Scores Scores Frequency.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Describing a Score’s Position within a Distribution Lesson 5.
KS1 SATS Guidance for Parents
Acquisition of Character Translation Rules for Supporting SNOMED CT Localizations Jose Antonio Miñarro-Giménez a Johannes Hellrich b Stefan Schulz a a.
Experimental Design © 2013 Project Lead The Way, Inc.Principles of Biomedical Science.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
Assessing SNOMED CT for Large Scale eHealth Deployments in the EU Workpackage 2- Building new Evidence Daniel Karlsson, Linköping University Stefan Schulz,
AP Biology: Standard Deviation and Standard Error of the Mean
End-To-End Memory Networks
Investigating Pitch Accent Recognition in Non-native Speech
Experimental Design Principles of Biomedical Science
AP Biology: Normal Distribution
Two Sample Tests When do use independent
LACONEC A Large-scale Multilingual Semantics-based Dictionary
Science Fair Board Set Up and Example Experiment
Natural Language Processing of Knee MRI Reports
Sugar Cube Lab.
Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov
An ICALL writing support system tunable to varying levels
Experimental Design Principles of Biomedical Science
AP Biology: Standard Deviation and Standard Error of the Mean
iSRD Spam Review Detection with Imbalanced Data Distributions
Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B
Chapter 7: The Distribution of Sample Means
Deputy Commissioner Jeff Wulfson Associate Commissioner Michol Stapel
Doing t-tests by hand.
Thinking critically with psychological science
KS1 SATS Guidance for Parents
KS1 SATS 2019 at Olton Primary KS1 SATS Guidance for Parents
Extracting Why Text Segment from Web Based on Grammar-gram
Active AI Projects at WIPO
1-P-30 Speech-to-Speech Translation using Dual Learning and Prosody Conversion Zhaojie Luo, Yoichi Takashima, Tetsuya Takiguchi, and Yasuo Ariki (Kobe.
Presentation transcript:

Cross-Lingual Content Scoring Andrea Horbach, Sebastian Stennmanns, Torsten Zesch University Duisburg-Essen, Germany

Cross-Lingual Content Scoring - Motivation Core Idea Content scoring of students‘ free-text answers with training and test data in different languages Foster educational equality non-native speaker might know the answer, but is unable to express it teachers ignore spelling and grammar for content scoring correctness of content not language-specific Overcome data sparsety re-use existing training data in different language Training Data Test Data

Cross-Lingual Scoring – Core Idea The Standard Monolingual Content Scoring Case Question: After reading the groups procedure, describe what additional information you would need in order to replicate the experiment. Training Data Test Data LA1: Some additional information you will need are the material. You also need to know the size of the container LA2: The additional information you need is one, the amount of vinegar you poured in each container, two, label the containers. LA1000: You would need to know how many ml of vinegar they used, how much distilled water to rinse the samples with and how they obtained the mass of each sample. LA10001: I would need to know the exact amount of vinegar in each container. train & apply model

Cross-Lingual Scoring – Core Idea LA1: Some additional information you will need are the material. You also need to know the size of the container LA2: The additional information you need is one, the amount of vinegar you poured in each container, two, label the containers. LA1000: Es fehlt der Säuregehalt des Essigs. Die Menge Essig die verwendet wurde. Und welche Holzart da Holzsorten unterschiedliche Säureresistenz aufweist. LA1001: Wir müssen wissen, wie viel Wasser wir sammeln müssen, um die Probe zu machen Training Data Test Data ? Cross-Lingual Scoring

Cross-Lingual Scoring – Core Idea Training Data Test Data LA1: Some additional information you will need are the material. You also need to know the size of the container LA2: The additional information you need is one, the amount of vinegar you poured in each container, two, label the containers. MT LA1: Einige zusätzliche Informationen, die Sie benötigen, sind das Material. Sie müssen auch die Größe des Containers kennen LA2: Die zusätzliche Information, die Sie brauchen, ist eine, die Menge an Essig, die Sie in jeden Behälter gießen, zwei, beschriften Sie die Behälter. LA1000: Es fehlt der Säuregehalt des Essigs. Die Menge Essig die verwendet wurde. Und welche Holzart da Holzsorten unterschiedliche Säureresistenz aufweist. LA1001: Wir müssen wissen, wie viel Wasser wir sammeln müssen, um die Probe zu machen train & apply model

Basic Experimental Setup: Using Machine Translation Translating Training Data Training Test Translate Training Test Translating Test Data

Challenges of Cross-lingual Scoring Data Collection Outline Challenges of Cross-lingual Scoring Data Collection Content Scoring Experiments Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018

Challenges for Automatic Scoring Quality of machine translation spelling errors: translation errors or normalization? the vinegar → der Essig but separate → getrennt the vineger → der Vineger seperate → getrennt „Translationese“ Nature of bi-lingual datasets different learner populations language and culture dependence of prompts If both the (US)-President and the Vice President can no longer serve, who becomes President? vs. vs.

Pre-Study: Influence of MT Quality on Monolingual Scoring Machine Translation google translate: English to German, English to Russian DeepL: English to German Data: 3 prompts from ASAP-2 Translating Training and Test Data Training Test QWK Content Scoring Setup Weka SVM classifier Token n-grams Character n-grams

Collecting a Cross-Lingual Dataset Option 1: Collecting data in two languages Full control over data collection Time & cost-intensive Option 2: Extend existing dataset in another language Use existing data for English Re-collect data for the same prompts in, e.g., German Requirements: Prompt material available Language- and culture-independent Curriculum-independent Scoring guidelines available/applicable

Suitability of existing datasets ASAP-2 PG Sem Mohler -Eval &Mihalcea Prompt available? ✔ ✔ ✘ ✔ Culture independent? (✔) ✘ ✔ ✔ Curriculum independent? ✔ ✔ ✔ ✘ Scoring guidelines? ✔ ✔ ✔ ✔

Recollecting ASAP in German Existing data ASAP >2000 answers per prompt US high school students 5 x ELA 2 x biology 3 x science New data ASAP-DE 300 answers per prompt crowd-sourced 3 x science

Dataset Comparison – Label Distribution rel. frequency of answers with score larger number of high-ranking answers in English Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018

Dataset Comparison – Answer Length Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018

Dataset Comparison – Answer Length Difference between learner populations! Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018

Dataset Comparison – Linguistic Diversity Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018

Dataset Comparison – Linguistic Diversity Difference partially due to language! Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018

Content Scoring Results train test QWK ENall EN 0.68 baseline 0.61 DE 0.67 Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018

Content Scoring Results train test QWK ENall EN 0.68 baseline 0.61 DE 0.67 translate both ENT 0.58 DET 0.66 Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018

Content Scoring Results train test QWK ENall EN 0.68 baseline 0.61 DE 0.67 translate both ENT 0.58 DET 0.66 translate train 0.34 0.40 translate test 0.29 0.32 Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018

Differences between Prompts train test 1 2 10 translate train ENT DE 0.49 0.08 0.46 DET EN 0.41 0.39 translate test 0.35 0.43 0.26 0.33 Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018

The Influence of Translationese Maybe combining translated and original data is the problem… (A) Plastic type B was the superior in both trial 1 and trial 2. (B) Record the weight that was put on to show how much effected each plastic. Also conducting more trials (...) Type B plastic was the supervisor in both Trial 1 and Trial 2. (B) Write down the weight that was put on to show how much each one has made plastic. Also do more experiments (...) MT MT Train Test Idea: translate test data, double translate train data → but: makes little difference Horbach, Stennmanns, Zesch - Cross-Lingual Content Scoring | BEA 2018

Conclusions and Future Work We collected a German version of the ASAP-2 dataset https://github.com/ltl-ude/crosslingual First experiments on cross-lingual scoring using MT Does not work that well Results depend a lot on individual prompt Understand influence factors better: Language Learner population Machine translation artifacts Alternatives to Machine Translation: cross-lingual embeddings Thank you! → Vielen Dank! Questions? →. Fragen?