Natural Language Processing Projects Heshaam Feili
(1)Persian Part-of-speech tagging –The large can can hold the water – D A N AUX V D N Using N-gram probabilities Hidden Markov Model Transformation model [Church 1988], [Charniak97], [Adwait96]
POS Taggers HMM-BasedCharniak model Statistical TrigramsTnT(trainable) Decision Tree- BasedTreeTagger(trainable) maximum entropy modelMx POST(trainable) )Tranformation- BasedEric brill tagger(trainable) HMM-BasedLT POS(trainable) HMM-BasedQtag(trainable) Fast Transformation-Based Learning tagger fnTBL (trainable)
Tagged persian data set –1000 sentence –May need some hand crafted actions ! Training method Evaluation method Needs some morphological smoothing (2 person) Project:
(2) Computational Grammars Seminar Unification grammar Augmented transition network Link grammar Tree adjoining grammar Categorical grammar Dependency grammar Head driven phrase structure grammar
Projects: Design & Implementation of Persian Computational grammar Parsing Algorithm Making a prototype ( 2 person ) Full grammar development (MS project)
(3) Statistical Parsing algorithms Probabilistic model –Probabilistic Context free grammar –N-Gram model Probabilistic Computational grammar Needs bracketed data set –(S (NP ((DET the)(N man)) ( VP (V killed) (NP ( (D the)(N dog)) ) )
Projects: Bracketing Persian Data Set –Use at least 1000 tagged sentence –Bracket the data set Implement an training model Evaluation phase –PARSEVAL metrics (2 Person)
(4) Machine Translation Architecture –Direct / Transfer / Interligua History Different Strategy Problems Current Status (1 person)
(5) Statistical MT Probabilistic model Training model Architecture Corpus Management EGYPT model … (2 person)
Project: English – Persian Statistical Translation system –Small data set exists … –Implement a statistical model –Needs Persian morphological analyzer Persian Pos tagger
(6) Persian morphology analyzer Inflection Verb Noun Auxiliary Adjective Adverb … Red House خانه ي قرمز Projects (1 Person)