August 6 th ISAAC 2008 Word Prediction in Hebrew Preliminary and Surprising Results Yael Netzer Meni Adler Michael Elhadad Department of Computer Science.

Slides:

Advertisements

Similar presentations

Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.

Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)

Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.

1 I256: Applied Natural Language Processing Marti Hearst Sept 13, 2006.

LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.

1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.

1 A Hidden Markov Model- Based POS Tagger for Arabic ICS 482 Presentation A Hidden Markov Model- Based POS Tagger for Arabic By Saleh Yousef Al-Hudail.

Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.

Stemming, tagging and chunking Text analysis short of parsing.

Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.

Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.

Part of speech (POS) tagging

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.

1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.

Creation of a Russian-English Translation Program Karen Shiells.

The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

1 A Chart Parser for Analyzing Modern Standard Arabic Sentence Eman Othman Computer Science Dept., Institute of Statistical Studies and Research (ISSR),

Albert Gatt Corpora and Statistical Methods Lecture 9.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓

Part-of-Speech Tagging

Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**

Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

Some Advances in Transformation-Based Part of Speech Tagging

MONGOLIAN TAGSET and CORPUS TAGGING J.Purev and Ch. Odbayar CRLP Center for Research on Language Processing National University of Mongolia (NUM)

Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.

Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat,

Language Learning Targets based on CLIMB standards.

Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad

Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.

Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.

S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.

14/12/2009ICON Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata , India ICON.

10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.

13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.

Word classes and part of speech tagging Chapter 5.

Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.

Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING.

Tokenization & POS-Tagging

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.

Natural Language Processing

CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.

Stentor A new Computer-Aided Transcription software for French language.

Hybrid Method for Tagging Arabic Text Written By: Yamina Tlili-Guiassa University Badji Mokhtar Annaba, Algeria Presented By: Ahmed Bukhamsin.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

Shallow Parsing for South Asian Languages -Himanshu Agrawal.

III. MORPHOLOGY. III. Morphology 1. Morphology The study of the internal structure of words and the rules by which words are formed. 1.1 Open classes.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Stochastic and Rule Based Tagger for Nepali Language Krishna Sapkota Shailesh Pandey Prajol Shrestha nec & MPP.

Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

Word classes and part of speech tagging Chapter 5.

A knowledge rich morph analyzer for Marathi derived forms Ashwini Vaidya IIIT Hyderabad.

Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.

BAMAE: Buckwalter Arabic Morphological Analyzer Enhancer Sameh Alansary Alexandria University Bibliotheca Alexandrina 4th International.

Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,

Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.

Language Model for Machine Translation Jang, HaYoung.

The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.

Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.

Language Identification and Part-of-Speech Tagging

Morphology Morphology Morphology Dr. Amal AlSaikhan Morphology.

CSCI 5832 Natural Language Processing

Meni Adler and Michael Elhadad Ben Gurion University COLING-ACL 2006

Presentation transcript:

August 6 th ISAAC 2008 Word Prediction in Hebrew Preliminary and Surprising Results Yael Netzer Meni Adler Michael Elhadad Department of Computer Science Ben Gurion University, Israel

August 6 th ISAAC 2008 Outline Objectives and example. Methods of Word Prediction Hebrew Morphology Experiments and Results Conclusions? Outline

August 6 th ISAAC 2008 Word Prediction - Objectives Ease word insertion in textual software –by guessing the next word –by giving a list of possible options for the next word –by completing a word given a prefix General idea: guess the next word given the previous ones [Input w 1 w 2 ]  [guess w 3 ] Objectives

August 6 th ISAAC 2008 (Example) I s_____ Word Prediction Example

August 6 th ISAAC 2008 (Example) I s_____  verb, adverb? Word Prediction Example

August 6 th ISAAC 2008 (Example) I s_____  verb sang? maybe. singularized? hopefully Word Prediction Example

August 6 th ISAAC 2008 (Example) I saw a _____ Word Prediction Example

August 6 th ISAAC 2008 (Example) I saw a _____  noun / adjective Word Prediction Example

August 6 th ISAAC 2008 (Example) I saw a b____ Word Prediction Example

August 6 th ISAAC 2008 (Example) I saw a b____  brown? big? bear? barometer? Word Prediction Example

August 6 th ISAAC 2008 (Example) I saw a bird in the _____ Word Prediction Example

August 6 th ISAAC 2008 (Example) I saw a bird in the _____  [semantics will do good] Word Prediction Example

August 6 th ISAAC 2008 (Example) I saw a bird in the z____ Word Prediction Example

August 6 th ISAAC 2008 (Example) I saw a bird in the z____  obvious (?) Word Prediction Example

August 6 th ISAAC 2008 (hidden) Hebrew example הילדה שרצה כל היום התעייפה the-girl that-run all the-day got-tired the-girl swarmed all the-day

August 6 th ISAAC 2008 Statistical Methods Statistical information –Unigrams: probability of isolated words Independent of context, offer the most likely words as candidates –More complex language models (Markov Models) Given w 1..w n, determine most likely candidate for w n+1 –Most common method in applications is the unigram (see references in [Garay-Vitoria and Abascal, 2004]) Word Prediction Methods

August 6 th ISAAC 2008 Syntactic Methods Syntactic knowledge –Consider sequences of part of speech tags [Article] [Noun]  predict [Verb] –Phrase structure [Noun Phrase]  predict [Verb] –Syntactic knowledge can be statistical or based on hand-coded rules Word Prediction Methods

August 6 th ISAAC 2008 Semantic Methods Semantic knowledge –Assign semantic categories to words –Find a set of rules which constrain the possible candidates for the next word [ eat verb]  predict [word of category food ] –Not widely used in word prediction, mostly because it requires complex hand coding and is too inefficient for real-time operation Word Prediction Methods

August 6 th ISAAC 2008 Word Prediction Knowledge Sources Corpora: texts and frequencies Vocabularies (Can be domain specific) Lexicons with syntactic and/or semantic knowledge User’s history Morphological analyzers Unknown words models Word Prediction Methods

August 6 th ISAAC 2008 Supporting methods Recency promotion: prefer words that have been used recently Trigger-target method: the occurrence of a specific word rises the rank of another word Capitalization of proper nouns (not good for Hebrew) Morphological support: automatically add inflections to words Distinguish fringe/core words in prediction Word Prediction Methods

August 6 th ISAAC 2008 Drawbacks of Word Prediction Overt action is required to verify selection Cognitive load: –“wrong candidates” distract user from the message he is composing. –Switch between 2 modes of operation: typing and selecting Word Prediction Methods

August 6 th ISAAC 2008 Evaluation of Word Prediction Keystroke savings Time savings Overall satisfaction –Cognitive overload (length of choice list vs. accuracy). A predictor is considered adequate if its hit ratio is high as the required number of selections decreases. 1-(# of actual keystrokes/# of expected keystrokes) Word Prediction Evaluation

August 6 th ISAAC 2008 Work in non-English Languages Languages with rich morphology: –n-gram-based methods offer quite reasonable prediction [Trost et al. 2005] but can be improved with more sophisticated syntactic/semantic tools Suggestions for inflected languages ( e.g. Basque) –Use two lexicons: stems and suffixes –Add syntactic information to dictionaries and grammatical rules to the system, offer stems and suffixes –Combine these two approaches: offer inflected nouns. Hebrew Word Prediction

August 6 th ISAAC 2008 Motivation for Hebrew We need word prediction for Hebrew –No known previous published research for Hebrew. We wanted to test our morphological analyzer in a useful application. Hebrew

August 6 th ISAAC 2008 Initial Hypothesis Word prediction in Hebrew will be complicated, morphological and syntactic knowledge will be needed.

August 6 th ISAAC 2008 Hebrew Specificity Unvocalized writing causes high level of ambiguity Prefixes and suffixes: prepositions, definiteness, possessives are agglutinated Rich morphology: inflectional, non-regular

August 6 th ISAAC 2008 Hebrew Ambiguity Unvocalized writing: most vowels are “dropped” inherent  inhrnt Affixation: prepositions and possessives are attached to nouns in her note  inhrnt in her net  inhrnt Rich Morphology –‘inhrnt’ could be inflected into different forms according to sing/pl, masc/fem properties.  inhrnti, inhrntit, inhrntiot –Other morphological properties may leave ‘inherent’ unmodified (construct/absolute forms for noun compounding). Hebrew

August 6 th ISAAC 2008 Ambiguity Level These variations create a high level of ambiguity: –English lexicon: inherent  inherent. adj –With Hebrew word formation rules: inhrnt  in. prep her. pro.fem.poss note. noun  in. prep her. pro.fem net. noun  inherent. adj.masc.absolute  inherent. adj.masc.construct Parts of speech tagset: –Hebrew: Theoretically: ~300K, In practice: ~3.6K distinct forms –English: tags Number of possible morphological analyses per word: –English: 1.4(Average # words / sentence: 12) –Hebrew: 2.7(Average # words / sentence: 18) Hebrew

August 6 th ISAAC 2008 (Real Hebrew) Morphological Ambiguity בצלם bzlm – בְּצֶלֶם bzelem (name of an association) – בְּצַלֵּם b-zalem (while taking a picture) – בְּצָלָם bzalam (their onion) – בְּצִלָּם b-zila-m (under their shades) – בְּצַלָּם b-zalam (in a photographer) – בַּצַּלָּם ba-zalam (in the photographer( – בְּצֶלֶם b-zelem (in an idol( – בַּצֶּלֶם ba-zelem (in the idol( Hebrew Morphology

August 6 th ISAAC 2008 Morphological Analysis Given a written form, recover the following information: Lexical category (part-of-speech) –noun, verb adjective, adverb, preposition… Inflectional properties –gender, number, person, tense, status… Affixes –Prefixes: מ ש ה ו כ ל ב (prepositions, conjunctions, definiteness) –Pronoun suffix: accusative, possessive, nominative Hebrew Morphology

August 6 th ISAAC 2008 Morphological Analysis Example: given the form בצלם propose the following analyses: בְּצֶלֶם – בצלם proper-noun בְּצַלֵּם – בצלם verb, infinitive בְּצָלָם – בצל - ם noun, singular, masculine בְּצִלָּם – ב - צל - ם noun, singular, masculine בְּצַלָּם בְּצֶלֶם – ב - צלם noun, singular, masculine, absolute – ב - צלם noun, singular, masculine, construct בַּצַּלָּם בַּצֶּלֶם – ב - צלם noun, definitive singular, masculine Hebrew Morphology

August 6 th ISAAC 2008 Morphological Disambiguation A difficult task in Hebrew: Given a written form, select in context the correct morphological analysis out of all possible analyses. We have developed a successful* system to perform morphological disambiguation in Hebrew [Adler et al, ACL06, ACL07, ACL08]. * 93% for POS tagging and 90% for full morphology analysis, which was used in this test) Hebrew Morphology

August 6 th ISAAC 2008 Word Prediction in Hebrew We looked at Word Prediction as a sample task to show off the quality of our Morphological Disambiguator But first… we checked a simple baseline Hebrew Word Prediction

August 6 th ISAAC 2008 Baseline: n-gram methods Check n-gram methods (unigram, bigram, trigram) Four sizes of selection menus: 1, 5, 7 and 9 Various training sets of 1M, 10M and 27M words to learn the probabilities of n-grams. Various genres. Hebrew Word Prediction

August 6 th ISAAC 2008 Prediction results using n-grams only Hebrew Word Prediction Keystrokes needed to enter a message in % (Smaller is better) For tri-grams model trained on 27M corpus – very good results!

August 6 th ISAAC 2008 Adding Syntactic Information P(w n |w 1,…,w n-1 ) = λ 1 P(w n-i,…,w n |LM) + λ 2 P(w 1,…,w n | μ ), – μ is the morpho-syntactic HMM (morphological disambiguator) –Combine P(w 1,…,w n | μ ) with the probabilistic language model LM in order to rank each word candidate given previous typed words. –if the user typed I saw, and the next word candidates are { him, hammer } we use the HMM model, for calculating: p(I saw him| μ ) p(I saw hammer| μ ), in order to tune the probability given by the n-gram. * Trained on a 1M sized corpus. Hebrew Word Prediction

August 6 th ISAAC 2008 Results with morpho-syntactic knowledge Hebrew Word Prediction Model sequences of parts of speech with morphological features Results w/o syntactic knowledge

August 6 th ISAAC 2008 Some Notes on Results n-grams perform very well (high level of keystroke saving) High rate for all genres And the expected: –Better prediction when trained on more data –Better prediction with tri-grams –Better prediction with larger window Morpho-syntactic information did not improve results (in fact, it hurt!) Results

August 6 th ISAAC 2008 Conclusion Statistical data on a language with rich morphology yields good results –up to 29% with nine word proposals –34% for seven proposals –54% for a single proposal Syntactic information did not improve the prediction. Explanation - morphology didn't improve due the use of p(w 1,…,w n | μ ) of an unfinished sentence Hebrew Word Prediction - Conclusions

August 6 th ISAAC 2008 תודה Thank you

August 6 th ISAAC 2008 Technical Information CMU – N-grams Storage – Berkeley DB to store knowledge for WP: Mapping n-grams More questions on technology – Hebrew Word Prediction