Improving Morphosyntactic Tagging of Slovene by Tagger Combination Jan Rupnik Miha Grčar Tomaž Erjavec Jožef Stefan Institute.

Slides:

Advertisements

Similar presentations

MULTEXT-East Version 4: multilingual morphosyntactic specifications for lots of languages Tomaž Erjavec Department of Knowledge Technologies.

Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.

Naïve-Bayes Classifiers Business Intelligence for Managers.

Decision Trees Decision tree representation ID3 learning algorithm

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

A Machine Learning Approach to Coreference Resolution of Noun Phrases By W.M.Soon, H.T.Ng, D.C.Y.Lim Presented by Iman Sen.

Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.

Classification Algorithms

Decision Tree Approach in Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,

Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.

Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)

What is Statistical Modeling

Lecture Notes for Chapter 4 Introduction to Data Mining

Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

Lecture 5 (Classification with Decision Trees)

Three kinds of learning

Boosting Applied to Tagging and PP Attachment By Aviad Barzilai.

TTI's Gender Prediction System using Bootstrapping and Identical-Hierarchy Mohammad Golam Sohrab Computational Intelligence Laboratory Toyota.

Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology

Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.

Albert Gatt Corpora and Statistical Methods Lecture 9.

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.

Short Introduction to Machine Learning Instructor: Rada Mihalcea.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Bayesian Networks. Male brain wiring Female brain wiring.

INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.

Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.

2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.

Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.

Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)

Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB

Today Ensemble Methods. Recap of the course. Classifier Fusion

Classification Techniques: Bayesian Classification

Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호

Tokenization & POS-Tagging

Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.

USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.

DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:

Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.

1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.

Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Machine Learning in Practice Lecture 6 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.

Language Identification and Part-of-Speech Tagging

Combining Bagging and Random Subspaces to Create Better Ensembles

Chapter 6 Classification and Prediction

SAD: 6º Projecto.

Machine Learning with Weka

Machine Learning in Practice Lecture 6

Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,

Extracting Why Text Segment from Web Based on Grammar-gram

Data Mining CSCI 307, Spring 2019 Lecture 8

Presentation transcript:

Improving Morphosyntactic Tagging of Slovene by Tagger Combination Jan Rupnik Miha Grčar Tomaž Erjavec Jožef Stefan Institute

Outline Introduction Motivation Tagger combination Experiments

POS tagging Veža je smrdela po kuhanem zelju in starih, cunjastih predpražnikih. N V V S A N C A A N Part Of Speech (POS) tagging: assigning morphosyntactic categories to words

Slovenian POS multilingual MULTEXT-East specification almost 2,000 tags (morphosyntactic descriptions, MSDs) for Slovene Tags: positionally coded attributes Example: MSD Agufpa –Category = Adjective –Type = general –Degree = undefined –Gender = feminine –Number = plural –Case = accusative

State of the art: Two taggers Amebis d.o.o. proprietary tagger –Based on handcrafted rules TnT tagger –Based on statistical modelling of sentences and their POS tags. –Hidden Markov Model tri-gram tagger –Trained on a large corpus of anottated sentences

Statistics: motivation Different tagging outcomes of the two taggers on the JOS corpus of 100k words Green: proportion of words where both taggers were correct Yellow: Both predicted the same, incorrect tag Blue: Both predicted incorrect but different tags Cyan: Amebis correct, TnT incorrect Purple: TnT correct, Amebis incorrect

Example TrueTnTAmebis PreiskaveNcfpn Ncfsg medSiNcmsanSi sodnimAgumsiAgumpdAgumsi postopkomNcmsiNcmpdNcmsi soVa-r3p-n pokazaleVmep-pf

Veža je smrdela po kuhanem zelju in starih, cunjastih predpražnikih. N V V S A N C A A N TnT Amebis Tagger Meta Tagger Tag TnT Tag Amb Combining the taggers Text flow Tag

Combining the taggers TnT Amebis Tagger … prepričati italijanske pravosodne oblasti... Agufpn Feature vector Meta Tagger A binary classifier; the two classes are TnT and Amebis Agufpa

Feature vector construction TnT Amebis Tagger … prepričati italijanske pravosodne oblasti... TnT features POS T =Adjective, Type T =general, Gender T =feminine, Number T =plural, Case T =nominative, Animacy T =n/a, Aspect T =n/a, Form T =n/a, Person T =n/a, Negative T =n/a, Degree T =undefined, Definiteness T =n/a, Participle T =n/a, Owner_Number T =n/a, Owner_Gender T =n/a Amebis features POS A =Adjective, Type A =general, Gender A =feminine, Number A =plural, Case A =accusative, Animacy A =n/a, Aspect A =n/a, Form A =n/a, Person A =n/a, Negative A =n/a, Degree A =undefined, Definiteness A =n/a, Participle A =n/a, Owner_Number A =n/a, Owner_Gender A =n/a Agreement features POS A=T =yes, Type A=T =yes, …, Number A=T =yes, Case A=T =no, Animacy A=T =yes, …, Owner_Gender A=T =yes This is the correct tag  label: Amebis AgufpnAgufpa

Context … italijanske pravosodne oblasti … TnT features (pravosodne) Amebis features (pravosodne) Agreement features (pravosodne) … italijanske pravosodne oblasti … TnT features (pravosodne) Amebis features (pravosodne) Agreement features (pravosodne) TnT features (oblasti) Amebis features (oblasti) TnT features (italijanske) Amebis features (italijanske) (a) No context (b) Context

Classifiers Naive Bayes –Probabilistic classifier –Assumes strong independence of features –Black-box classifier CN2 Rules –If-then rule induction –Covering algorithm –Interpretable model as well as its decisions C4.5 Decision Tree –Based on information entropy –Splitting algorithm –Interpretable model as well as its decisions

Experiments: Dataset JOS corpus - approximately 250 texts (100k words, 120k if we include punctuation) Sampled from a larger corpus FidaPLUS TnT trained with 10 fold cross validation, each time training on 9 folds and tagging the remaining fold (for the meta-tagger experiments)

Experiments Baseline 1: –majority classifier (always predict what TnT predicts) –Accuracy: 53% Baseline 2 –Naive Bayes –One feature only: Amebis full MSD –Accuracy: 71%

Baseline 2 Naive Bayes classifier with one feature (Amebis full MSD) is simplified to counting the occurrences of two events for every MSD f: –#cases where Amebis predicted the tag f and was correct: n c f –#cases where Amebis predicted the tag f and was incorrect n w f –NB gives us the following rule, given a pair of predictions MSDa and MSDt: if n c MSDa < n w MSDa predict MSDt, else predict MSDa.

Experiments: Different classifiers and feature sets Classifiers: NB, CN, C4.5 Feature sets: –Full MSD –Decomposed MSD, agreement features –Basic features subset of the decomposed MSD features set (Category, Type, Number, Gender, Case) –Union of all features considered (full + decompositions) Scenarios: –no context –Context, ignore punctuation –Context, punctuation

Results Context helps Punctuation slightly improves classification C4.5 with basic features works best Feature set / Classifier FULL TAGDECBASIC FULL+DEC NB C CN Feature set / Classifier FULL TAGDECBASIC FULL+DEC NB C CN Feature set / Classifier FULL TAGDECBASIC FULL+DEC NB C CN No context Context without punctuationContext with punctuation

Overall error rate

Thank you!