1 Overview of Machine Learning for NLP Tasks: part I (based partly on slides by Kevin Small and Scott Yih)

Slides:

Advertisements

Similar presentations

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.

Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.

University of Sheffield NLP Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell.

Information Extraction Lecture 7 – Linear Models (Basic Machine Learning) CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.

CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

SVM—Support Vector Machines

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Modeling the Evolution of Product Entities Priya Radhakrishnan 1, Manish Gupta 1,2, Vasudeva Varma 1 1 Search and Information Extraction Lab, IIIT-Hyderabad,

Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.

Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.

Boosting Applied to Tagging and PP Attachment By Aviad Barzilai.

Conceptual modelling. Overview - what is the aim of the article? ”We build conceptual models in our heads to solve problems in our everyday life”… ”By.

Part I: Classification and Bayesian Learning

Introduction to Machine Learning Approach Lecture 5.

Online Learning Algorithms

CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture Notes by Neşe Yalabık Spring 2011.

 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.

ELN – Natural Language Processing Giuseppe Attardi

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Short Introduction to Machine Learning Instructor: Rada Mihalcea.

Overview of Machine Learning for NLP Tasks: part II Named Entity Tagging: A Phrase-Level NLP Task.

Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats.

Graphical models for part of speech tagging

8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.

Machine Learning CSE 681 CH2 - Supervised Learning.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.

Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Software Release and Support.

Universit at Dortmund, LS VIII

Benk Erika Kelemen Zsolt

Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Unit-1 Introduction Prepared by: Prof. Harish I Rathod

Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.

1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.

CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.

Linear Classification with Perceptrons

CS Inductive Bias1 Inductive Bias: How to generalize on novel data.

Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.

Machine Learning Concept Learning General-to Specific Ordering

Data Mining and Decision Support

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.

1 Machine Learning in Natural Language More on Discriminative models Dan Roth University of Illinois, Urbana-Champaign

Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i candidates.size()

In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.

Introduction to Machine Learning and Text Mining

Dan Roth Department of Computer and Information Science

Introduction Machine Learning 14/02/2017.

Natural Language Processing (NLP)

Introduction to Data Science Lecture 7 Machine Learning Overview

Classification with Perceptrons Reading:

Overview of Machine Learning

CS246: Information Retrieval

Natural Language Processing (NLP)

SNoW & FEX Libraries; Document Classification

Implementation of Learning Systems

Natural Language Processing (NLP)

Presentation transcript:

1 Overview of Machine Learning for NLP Tasks: part I (based partly on slides by Kevin Small and Scott Yih)

Page 2 Goals of Introduction Frame specific natural language processing (NLP) tasks as machine learning problems Provide an overview of a general machine learning system architecture Introduce a common terminology Identify typical needs of ML system Describe some specific aspects of our tool suite in regards to the general architecture Build some intuition for using the tools Focus here is on Supervised learning

Page 3 Overview 1. Some Sample NLP Problems 2. Solving Problems with Supervised Learning 3. Framing NLP Problems as Supervised Learning Tasks 4. Preprocessing: cleaning up and enriching text 5. Machine Learning System Architecture 6. Feature Extraction using FEX

Page 4 Context Sensitive Spelling [2] A word level tagging task: I would like a peace of cake for desert. I would like a piece of cake for dessert. In principal, we can use the solution to the duel problem. In principle, we can use the solution to the dual problem.

Page 5 Part of Speech (POS) Tagging Another word-level task: Allen Iverson is an inconsistent player. While he can shoot very well, some nights he will score only a few points. (NNP Allen) (NNP Iverson) (VBZ is) (DT an) (JJ inconsistent) (NN player) (..) (IN While) (PRP he) (MD can) (VB shoot) (RB very) (RB well) (,,) (DT some) (NNS nights) (PRP he) (MD will) (VB score) (RB only) (DT a) (JJ few) (NNS points) (..)

Page 6 Phrase Tagging Named Entity Recognition – a phrase-level task: After receiving his M.B.A. from Harvard Business School, Richard F. America accepted a faculty position at the McDonough School of Business (Georgetown University) in Washington. After receiving his [MISC M.B.A.] from [ORG Harvard Business School], [PER Richard F. America] accepted a faculty position at the [ORG McDonough School of Business] ([ORG Georgetown University]) in [LOC Washington].

Page 7 Some Other Tasks Text Categorization Word Sense Disambiguation Shallow Parsing Semantic Role Labeling Preposition Identification Question Classification Spam Filtering : :

Page 8 Supervised Learning/SNoW

Page 9 Learning Mapping Functions Binary Classification Multi-class Classification Ranking Regression {Feature, Instance, Input} Space – space used to describe each instance; often Output Space – space of possible output labels; very dependent on problem Hypothesis Space – space of functions that can be selected by the machine learning algorithm; algorithm dependent (obviously)

Page 10 Multi-class Classification [3,4] One Versus All (OvA)Constraint Classification

Page 11 Online Learning [5] SNoW algorithms include Winnow, Perceptron Learning algorithms are mistake driven Search for linear discriminant along function gradient (unconstrained optimization) Provides best hypothesis using data presented up to to the present example Learning rate determines convergence Too small and it will take forever Too large and it will not converge

Page 12 Framing NLP Problems as Supervised Learning Tasks

Page 13 Defining Learning Problems [6] ML algorithms are mathematical formalisms and problems must be modeled accordingly Feature Space – space used to describe each instance; often R d, {0,1} d, N d Output Space – space of possible output labels, e.g. Set of Part-of-Speech tags Correctly spelled word (possibly from confusion set) Hypothesis Space – space of functions that can be selected by the machine learning algorithm, e.g. Boolean functions (e.g. decision trees) Linear separators in R d

Page 14 Context Sensitive Spelling Did anybody (else) want too sleep for to more hours this morning? Output Space Could use the entire vocabulary; Y={a,aback,...,zucchini} Could also use a confusion set; Y={to, too, two} Model as (single label) multi-class classification Hypothesis space is provided by SNoW Need to define the feature space

Page 15 What are ‘feature’, ‘feature type’, anyway? A feature type is any characteristic (relation) you can define over the input representation. Example: feature TYPE = word bigrams Sentence: The man in the moon eats green cheese. Features: [The_man], [man_in], [in_the], [the_moon]…. In Natural Language Text, sparseness is often a problem How many times are we likely to see “the_moon”? How often will it provide useful information? How can we avoid this problem?

Page 16 Preprocessing: cleaning up and enriching text Assuming we start with plain text: The quick brown fox jumped over the lazy dog. It landed on Mr. Tibbles, the slow blue cat. Problems: Often, want to work at the level of sentences, words Where are sentence boundaries – ‘Mr.’ vs. ‘Cat.’? Where are word boundaries -- ‘dog.’ Vs. ‘dog’? Enriching the text: e.g. POS-tagging: (DT The) (JJ quick) (NN brown) (NN fox) (VBD jumped) (IN over) (DT the) (JJ lazy) (NN dog) (..)

Page 17 Download Some Tools Software::tools, Software::packages Sentence segmenter Word segmenter POS-tagger FEX NB: RIGHT-CLICK on “download” link select “save link as...”

Page 18 Preprocessing scripts sentence-boundary.pl./sentence-splitter.pl –d HONORIFICS –i nyttext.txt -o nytsentence.txt word-splitter.pl./word-splitter.pl nytsentence.txt > nytword.txt Invoking the tagger:./tagger –i nytword.txt –o nytpos.txt Check output

Page 19 Problems running.pl scripts? Check the first line: #!/usr/bin/perl Find perl library on own machine E.g. might need... #!/local/bin/perl Check file permissions... > ls –l sentence-boundary.pl > chmod 744 sentence-boundary.pl

Page 20 Minor Problems with install Possible (system-dependent) compilation errors: doesn’t recognize ‘optarg’ POS-tagger: change Makefile in subdirectory snow/ where indicated sentence-boundary.pl: try ‘perl sentence-boundary.pl’ Link error (POS tagger): linker can’t find –lxnet remove ‘-lxnet’ entry from Makefile generally, check README, makefile for hints

Page 21 The System View

Page 22 A Machine Learning System Preprocessing Feature Extraction Machine Learner Classifier(s) Inference Raw Text Formatted Text Testing Examples Function Parameters Labels Feature Vectors Training Examples Labels

Page 23 Preprocessing Text Sentence splitting, Word Splitting, etc. Put data in a form usable for feature extraction They recently recovered a small piece of a live Elvis concert recording. He was singing gospel songs, including “Peace in the Valley.” They recently recovered a small piece 0 5 piece of : including QUOTE peace 1 8 Peace in the Valley QUOTE

Page 24 A Machine Learning System Preprocessing Feature Extraction Raw Text Formatted Text Feature Vectors

Page 25 Feature Extraction with FEX

Page 26 Feature Extraction with FEX FEX (Feature Extraction tool) generates abstract representations of text input Has a number of specialized modes suited to different types of problem Can generate very expressive features Works best when text enriched with other knowledge sources – i.e., need to preprocess text S = I would like a piece of cake too! FEX converts input text into a list of active features… 1: 1003, 1005, 1101, 1330… Where each numerical feature corresponds to a specific textual feature: 1: label[piece] 1003: word[like] BEFORE word[a]

Page 27 Feature Extraction Converts formatted text into feature vectors Lexicon file contains feature descriptions They recently recovered a small piece 0 5 piece of : including QUOTE peace 1 8 Peace in the Valley QUOTE 0, 1001, 1013, 1134, 1175, , 1021, 1055, 1085, 1182, 1252 Lexicon File

Page 28 Role of FEX  Why won't you accept the facts?  No one saw her except the postman. 1, 1001, 1003, 1004, 1006: 2, 1002, 1003, 1005, 1006: Feature Extraction FEX lab[accept], w[you], w[the], w[you*], w[*the] lab[except], w[her], w[the], w[her*], w[*the]

Page 29 Four Important Files FEX Script CorpusExample Lexicon A new representation of the raw text data 1.Control FEX’s behavior 2.Define the “types” of features Feature vectors for SNoW Mapping of feature and feature id

Page 30 Corpus – General Linear Format The corpus file contains the preprocessed input with a single sentence per line. When generating examples, Fex never crosses line boundaries. The input can be any combination of: 1 st form: words separated by white spaces 2 nd form: tag/word pairs in parentheses There is a more complicated 3 rd form, but deprecated in view of alternative, more general format (later)

Page 31 Corpus – Context Sensitive Spelling  Why won't you accept the facts? (WRB Why) (VBD wo) (NN n't) (PRP you) (VBP accept) (DT the) (NNS facts) (. ?)  No one saw her except the postman. (DT No) (CD one) (VBD saw) (PRP her) (IN except) (DT the) (NN postman) (..)

Page 32 Script – Means of Feature Engineering Fex does not decide or find good features. Instead, Fex provides you an easy method to define the feature types and extracts the corresponding features from data. Feature Engineering is in fact very important in practical learning tasks.

Page 33 Script – Description of Feature Types What can be good features? Let’s try some combinations of words and tags. Feature types in mind Words around the target word ( accept, except ) POS tags around the target Conjunctions of words and POS tags? Bigrams or trigrams? Include relative locations?

Page 34 Graphical Representation WRB Why VBD won NN 't PRP you VBP accept DT the NNS facts. ? Target Window [-2,2]

Page 35 Script – Syntax Syntax: targ [inc] [loc]: RGF [[left-offset, right-offset]] targ – target index If targ is ‘ -1 ’… target file entries are used to identify the targets If no target file is specified, then EVERY word is treated as a target inc – use the actual target instead of the generic place-holder ( ‘ * ’ ) loc – include the location of feature relative to the target RGF – define “ types ” of features like words, tags, conjunctions, bigrams, trigrams, …, etc left-offset and right-offset: specify the window range

Page 36 Basic RGF’s – Sensors (1/2) TypeMnemonicInterpretationExample Wordwthe word (spelling)w[you] Tagtpart-of-speech tagt[NNP] Vowelvactive if the word starts with a vowel v[eager] Lengthlenlength of the wordlen[5] Sensor is the fundamental method of defining “feature types.” It is applied on the element, and generates active features.

Page 37 Basic RGF’s – Sensors (2/2) TypeMnemonicInterpretationExample City ListisCityactive is the phrase is the name of a city isCity[Chicago] Verb ClassvClsreturn Levin’s verb classvCls[51.2] More sensors can be found by looking at FEX source (Sensors.h) lab: a special RGF that generates labels  lab(w), lab(t), … Sensors are also an elegant way to incorporate our background knowledge.

Page 38 Complex RGF’s Existential Usage len(x=3), v(X) Conjunction and Disjunction w&t; w|t Collocation and Sparse Collocation coloc(w,w); coloc(w,t,w); coloc(w|t,w|t) scoloc(t,t); scoloc(t,w,t); scoloc(w|t,w|t)

Page 39 (Sparse) Collocation WRB Why VBD won NN 't PRP you VBP accept DT the NNS facts. ? Target inc: coloc(w,t)[-2,2] w[‘t]-t[PRP], w[you]-t[VBP] w[accept]-t[DT], w[the]-t[NNS] -1 inc: scoloc(w,t)[-2,2] w[‘t]-t[PRP], w[‘t]-t[VBP], w[‘t]-t[DT], w[‘t]-t[NNS], w[you]-t[VBP], w[you]-t[DT], w[you]-t[NNS], w[accept]-t[DT], w[accept]-t[NNS], w[the]-t[NNS]

Page 40 Examples – 2 Scripts Download examples from tutorial page: ‘context sensitive spelling materials’ link accept-except-simple.scr -1: lab(w) -1: w[-1,1] accept-except.scr -1: lab(w) -1: w|t [-2,2] -1 loc: coloc(w|t,w|t) [-3,-3]

Page 41 Lexicon & Example (1/3) Corpus: … (NNS prices) (CC or) (VB accept) (JJR slimmer) (NNS profits) … Script: ae-simple.scr -1 lab(w); -1: w[-1,1] Lexicon: 1 label[w[except]] 2 label[w[accept]] 1001 w[or] 1002 w[slimmer] Example: 2, 1001, 1002; Generated by lab(w) Generated by w[-1,1] Feature indices of lab start from 1. Feature indices of regular features start from 1001.

Page 42 Lexicon & Example (2/3) Target file: fex -t ae.targ … accept except Lexicon file If the file does not exist, fex will create it. If the file already exists, fex will first read it, and then append the new entries to this file. This is important because we don’t want two different feature indices representing the same feature. We treat only these two words as targets.

Page 43 Lexicon & Example (3/3) Example file If the file does not exist, fex will create it. If the file already exists, fex will append new examples to it. Only active features and their corresponding lexicon items are generated. If the read-only lexicon option is set, only those features from the lexicon that are present ( active ) in the current instance are listed.

Page 44 Now practice – change script, run FEX, look at the resulting lexicon/examples >./fex –t ae.targ ae-simple.scr ae-simple.lex short-ae.pos short-ae.ex

Page 45 Citations 1) F. Sebastiani. Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34(1):1-47, ) A. R. Golding and D. Roth. A Winnow-Based Approach to Spelling Correction. Machine Learning, 34: , ) E. Allewin, R. Schapire, and Y. Singer. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers. Journal of Machine Learning Research, 1: , ) S. Har-Peled, D. Roth, and D. Zimak. Constraint Classification: A New Approach to Multiclass Classification. In Proc. 13 th Annual Intl. Conf. of Algorithmic Learning Theory, pp , ) A. Blum. On-Line Algorithms in Machine Learning

Page 46 Citations 6) T. Mitchell. Machine Learning, McGraw Hill, ) A. Blum. Learning Boolean Functions in an Infinite Attribute Space. Machine Learning, 9(4): , ) J. Kivinen and M. Warmuth. The Perceptron Algorithm vs. Winnow: Linear vs. Logarithmic Mistake Bounds when few Input Variables are Relevant. UCSC-CRL-95-44, ) T. Dietterich. Approximate Statistical Tests for Comparing Supervised Classfication Learning Algorithms. Neural Computation, 10(7): , 1998