A subtask of text simplification Replacing words or short phrases by simpler variants in a context aware fashion Motivation To reach out to wider range of readers having limited vocabulary ▪ Children ▪ People with low literacy level or cognitive disability ▪ Second language learners
Identification of complex words or phrases Substitute lookup Synonyms from thesaurus Distributional similarity Context-based ranking
Technical Medical Language Hypertension risk factors include obesity,... High blood pressure risk factors include excessive weight,... Legal Language The Products transacted through the Service are... The Products managed through the Service are... Low Literacy Readers Hitler committed terrible atrocities during the second World War Hitler committed terrible cruelties during the second World War
Knowledge-based approach Using thesaurus, Wordnet Hard to capture all simplification contexts Lexical simplification as paraphrasing Paraphrasing does not deal with complexity reduction specifically Lexical simplification as machine translation Requires a complex-simple parallel corpora Wikipedia-Simple Wikipedia corpora ▪ Not comparable
Simple English Wikipedia (SEW) Edition of normal or Complex English Wikipedia (CEW) written in simpler constructs with restricted vocabulary Wikipedia for children, low literacy readers, second language readers etc. 121,095 content pages Semi-parallel to it’s complex counterpart Resource: For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia, Yatskar et al.
Version 1 Version 2 Edits Version n Edits
Edits in SEW versions are mix of different types of edits The task Separate out only simple edits from other edits
Probability estimation of fix edit
fix + simple edit
Resource: Putting it Simply: a Context-Aware Approach to Lexical Simplification, Biran et al. Self Study