Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Term 3 ____ Unit 9 Future holidays and activities, future school trip, Futuroscope, a story in French. ____ Unit 10 Visit to a safari park, football match,
® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.
Providing Constructive Feedback
Confidential and Proprietary. Copyright © 2010 Educational Testing Service. All rights reserved. Catherine Trapani Educational Testing Service ECOLT: October.
Annie Louis University of Pennsylvania Derrick Higgins Educational Testing Service 1.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Detecting missrecognitions Predicting with prosody.
Writing Good Software Engineering Research Papers A Paper by Mary Shaw In Proceedings of the 25th International Conference on Software Engineering (ICSE),
Using Error-Correcting Codes For Text Classification Rayid Ghani This presentation can be accessed at
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
Web-Enhanced Learning Physical Science 100 Steve Turley University Conference August, 2003.
Flash talk by: Aditi Garg, Xiaoran Wang Authors: Sarah Rastkar, Gail C. Murphy and Gabriel Murray.
The Big Picture – La Grande Image! What is involved in learning French in Year 9? In Year 9 you will build on the four skills of speaking, listening, reading.
GCSE languages Judith Rowland-Jones Head of languages
® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Accuracy-Configurable Adder for Approximate Arithmetic Designs
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.
Automated Scoring of Picture- based Story Narration Swapna Somasundaran Chong Min Lee Martin Chodorow Xinhao Wang.
Vegas Baby A trip to Vegas is just a sample of a random variable (i.e. 100 card games, 100 slot plays or 100 video poker games) Which is more likely? Win.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Methodology Matters: Doing Research in the Behavioral and Social Sciences ICS 205 Ha Nguyen Chad Ata.
BUS304 – Chapter 6 Sample mean1 Chapter 6 Sample mean  In statistics, we are often interested in finding the population mean (µ):  Average Household.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Evaluating Results of Learning Blaž Zupan
TOEFL IBT Prepared by M.Selcuk AFACAN WHAT IS THE IBT TOEFL TEST? The IBT TOEFL TEST is a test to measure the English academic skills of non-native speakers.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Marginal Treatment Effects and the External Validity of the Oregon Health Insurance Experiment Amanda Kowalski Associate Professor, Department of Economics,
1 Broadcast News Segmentation using Metadata and Speech-To-Text Information to Improve Speech Recognition Sebastien Coquoz, Swiss Federal Institute of.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
GCSE languages 13 October 2015 – Liverpool MFL Conference John Halksworth Copyright © AQA and its licensors. All rights reserved.
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
TOEFL EXAM By: Alexandra Alfonso Code: TOEFL The Test of English as a Foreign Language (TOEFL) measures the ability of nonnative speakers of English.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : STEPHEN T. O’ROURKE, RAFAEL A. CALVO and Danielle S. McNamara 2011, EST Visualizing.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge ACL 2008.
Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.
Evaluating NLP Features for Automatic Prediction of Language Impairment Using Child Speech Transcripts Khairun-nisa Hassanali 1, Yang Liu 1 and Thamar.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
同事 Firstly, a colleague is supposed to be efficient in work. When given a task by the boss, all the team members should take their duties and cooperate.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.
CRF &SVM in Medication Extraction
Evaluating Results of Learning
Erasmus University Rotterdam
Conditional Random Fields for ASR
JUS 510 Competitive Success/snaptutorial.com
JUS 510 Education for Service-- snaptutorial.com.
JUS 510 Teaching Effectively-- snaptutorial.com
Automatic Fluency Assessment
Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov
CSSE463: Image Recognition Day 11
Educational Testing Service (ETS)
Presentation transcript:

Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang Educational Testing Service June 26, 2014

Automated Detection of Plagiarized Spoken Responses Becomes an important application due to the increasing need of automated scoring for spontaneous speech Prevents one type of cheating strategy – Test takers may prepare canned answers using test preparation materials prior to the examination. June 18, Copyright © 2014 by Educational Testing Service. All rights reserved.

Plagiarized Spoken Response in TOEFL ® iBT TOEFL ® iBT, a large scale, high-stakes assessment of English for non-native speakers. – Independent speaking tasks, asking test takers to draw upon their own ideas, opinions, and experiences in a 45-second spoken response. Plagiarized Spoken Responses – Test takers may attempt to game the assessment by memorizing canned material from an external source and adapt it to a question. – This type of plagiarism can affect the validity of a test taker’s speaking score. June 18, Copyright © 2014 by Educational Testing Service. All rights reserved.

June 18, Copyright © 2014 by Educational Testing Service. All rights reserved. Well, the place I enjoy the most is a small town located in France. I like this small town because it has very charming ocean view. I mean the sky there is so blue and the beach is always full of sunshine. You know how romantic it can ever be, just relax yourself on the beach, when the sun is setting down, when the ocean breeze is blowing and the seabirds are singing. Of course I like this small French town also because there are many great French restaurants. They offer the best seafood in the world like lobsters and tuna fishes. The most important, I have been benefited a lot from this trip to France because I made friends with some gorgeous French girls. One of them even gave me a little watch as a souvenir of our friendship. One Source MaterialOne Plagiarized Response family is a little trip to France when I was in primary school ten years ago I enjoy this activity first because we visited a small French town located by the beach the town has very charming ocean view and in the sky is so blue and the beach is always full of sunshine you know how romantic it can ever be just relax yourself on the beach when the sun is settling down the sea birds are singing of course I enjoy this activity with my family also because there are many great French restaurants they offer the best sea food in the world like lobsters and tuna fishes so I enjoy this activity with my family very much even it has passed several years

Canned Response Collection Step1: Human raters flag potentially plagiarized spoken responses. Step2: Rater supervisors review responses by comparing them to external source material. Step3: if the presence of plagiarized material made it impossible to provide a valid assessment of the test taker’s performance, a score of 0 was assigned. June 18, Copyright © 2014 by Educational Testing Service. All rights reserved.

719 potentially plagiarized responses; 239 canned responses with score 0 49 different source materials Approximately 300 control responses from each of the four most-frequent test questions. June 18, Copyright © 2014 by Educational Testing Service. All rights reserved. Data SetN Number of Words MeanStandard Deviation Sources Plagiarized Control The plagiarized responses are on average a little longer than the control responses Data Collection

Methodology (1) Comparison between a test response and each of the 49 reference sources with 9 text-to-text similarity metrics: – Word Error Rate (WER) – TER and TER-Plus (Snover et al., 2006), (Snover et al., 2008) – Four similarity metrics based on WordNet (Wu and Palmer, 1994), (Leacock and Chodorow, 1998) – Latent Semantic Analysis – BLEU (Papineni et al., 2002) June 18, Copyright © 2014 by Educational Testing Service. All rights reserved.

Methodology (2) 4 different features for each similarity metric: – Document-level similarity – Single maximum similarity value and from a sentence- by-sentence comparison – Average of the similarity values for all sentence-by- sentence comparisons – Average of the maximum similarity values for each sentence in the test response, where the maximum similarity of a sentence is obtained by comparing it with each sentence in the source text June 18, Copyright © 2014 by Educational Testing Service. All rights reserved.

Experimental Setup Experiments on both human transcriptions and automatic speech recognition (ASR) outputs ASR system – Trained on approximately 800 hours of TOEFL ® iBT responses – WERs were on the plagiarized set and on the control set Maximum Entropy-based sentence boundary detection system (Chen and Yoon, 2011) June 18, Copyright © 2014 by Educational Testing Service. All rights reserved.

Results June 18, Copyright © 2014 by Educational Testing Service. All rights reserved. TextFeaturesAccuracyKappa Transcriptions ALL Document-level Sentence-level ASR Outputs ALL Document-level Sentence-level Mean Accuracy and Kappa value for classification results using the 239 responses in the Plagiarized set and 1000 random subsets of 239 responses from the control set.

Discussion and Future Work (1)  Precision was higher than the recall: vs on human transcriptions; vs on ASR outputs. In an operational system, it may be desirable to tune the classifier to increase the recall.  Balanced canned and control responses were obtained in experiments. Distribution of actual responses is heavily unbalanced. A much larger control set will be experimented on ASR outputs. June 18, Copyright © 2014 by Educational Testing Service. All rights reserved.

Discussion and Future Work (2)  Matching source texts were required: plagiarized responses based on unseen sources cannot been detected. Obtain additional source texts; compare a test response with all previously collected spoken responses for a given population of test takers.  Above methods may lead to high number of false positives, especially when based on ASR outputs. Apply N-best list to compute similarity metrics; introduce additional sources of information, such as stylistic patterns and prosodic features. June 18, Copyright © 2014 by Educational Testing Service. All rights reserved.

June 18, Copyright © 2014 by Educational Testing Service. All rights reserved. Questions? Comments? Xinhao Wang, Associate Research Scientist, NLP & Speech Group at ETS. Keelan Evanini, Managing Research Scientist, NLP & Speech Group at ETS. Thank You!