Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Character Level Representation for POS Tagging

Similar presentations


Presentation on theme: "Learning Character Level Representation for POS Tagging"— Presentation transcript:

1 Learning Character Level Representation for POS Tagging
Cıcero Nogueira dos Santos Bianca Zadrozny Presented By Anirban Majumder

2 Introduction : Distributed Word Embedding
Useful technique to capture syntactic and semantic information about words. But for many of the NLP task such as POS tagging, the information about word morphology and shape is important which is not captured in these embeddings. Proposes a deep neural network to learn Character-Level Representation to capture intra-word information.

3 Char-WNN Architecture
joins two word and character level embedding for POS tagging extension of Collobert et al’s(2011) NN architecture Uses a convolutional layer to extract char-embedding for word of any size

4 Char-WNN Architecture
Input: Fixed sized window of words centralized in target word Output: For each word in a sentence, the NN gives each word a score for each tag τ ∈ T (Tag Set)

5 Word and Char-Level Embedding
word is from a fixed size vocabulary Vwrd and every word w ∈ Vchr , a fixed size of character vocabulary Two embedding matrix are used: Wwrd ∈ Rd wrd ×|V wrd| Wchr ∈ R d chr ×|V chr|

6 Word and Char-Level Embedding
Given a sentence with n words {w1,w2,...,wn}, each word wn is converted into a vector representation un as follows: un= { rwrd ; rwch } where rwrd ∊ Rdwrd is the word level embedding and rwch ∊ Rclu is the character level embedding

7 Word and Char-Level Embedding

8 Word and Char-Level Embedding

9 Char-Level Embedding : Details
Produces local features around each character of the word combines them to get a fixed size character-level embedding Given a Word w composed of M characters {c1,c2,...,cM}, each cM is transformed into a character embedding rmchr . Them input to the convolution layer is the sequence of character embedding of M characters.

10 Char-Level Embedding : Details
window of size kchr (character context window) of successive windows in the sequence of {rchr1 , rchr2 , ..., rchrM } The vector zm (concatenation of character embedding m)for each character embedding is defined as follows : zm = (rchrm−(kchr−1)/2 , ..., rchrm+(kchr−1)/2 )T

11 Char-Level Embedding : Details
Convolutional layer computer the jth element of the character embedding rwch of the word w as follows: [rwch]j = max1<m<M[W0zm + b0]j Matrix W0 is used to extract local features around each character window of the given word Global fixed-sized feature vector is obtained using max operator over each character window

12 Char-Level Embedding : Details
Parameter to be learned : Wchr, W0 and b0 Hyper-parameters : dchr : the size of the character vector clu : the size of the convolution unit (also the size of the character-level embedding) kchr : the size of the character context window

13 Scoring follow Collobert et al.’s (2011) window approach to score all tags T for each word in a sentence the assumption that the tag of a word depends mainly on its neighboring words to compute tag scores for the n-th word in the sentence, we first create a vector xn resulting from the concatenation of a sequence of kwrd embeddings, centralized in the n-th word

14 xn = (un−(kwrd−1)/2 , ..., un+(kwrd−1)/2)T
Scoring the vector xn : xn = (un−(kwrd−1)/2 , ..., un+(kwrd−1)/2)T the vector xn is processed by two NN layers to compute scores : s(xn) = W2 h(W1 xn + b1) + b2 where W1 ∈ Rhl u×k wrd(d wrd+cl u) W2 ∈ R|T|×hl u

15 Structured Inference :
the tags of neighbouring words are strongly dependent prediction scheme that takes into the sentence structure (Collobert et. al, 2011)

16 Structured Inference :
We compute the score for a tag path [t]1N={t1,t2,...,tn} as S ([w]1N,[t]1N,θ) = ∑n=1N(Atn−1,tn + s(xn)tn) s(xn)tn is the score for the tag tn for the word wn Atn-1,tn is a transition score for jumping from tag tn-1 to tag tn θ is the set of all trainable network parameters (Wwrd, Wchr, W0 , b0 , W1 , b1 , W2 , b2 , A)

17 Network Training : network is trained by minimizing a negative log-likelihood over the training set D, same as Collobert et al.(2011) interpret a sentence score as a conditional probability over a path log p( [t]N1|[w]N1,θ) = S([w]N1,[t]N1,θ) − log(∑X ∀[u]N1∈TN e S([w]N1,[u]N1,θ)) used stochastic gradient descent to minimize the negative log-likelihood with respect to θ

18 Experimental Setup : POS Tagging Datasets
English Datasets SET SENT TOKENS OOSV OOUV TRAINING 38, , DEVELOP. 5, , , TEST , , , WSJ Corpus Portuguese Datasets SET SENT TOKENS OOSV OOUV TRAINING 42, , DEVELOP. 2, , TEST , , Mac-Morpho Corpus

19 English POS Tagging Results:
SYSTEM FEATURES ACC. ACC.OOSV ACC. OOUV CHARWNN – WNN CAPS+SUF WNN CAPS WNN SUF WNN – Comparison of different NNs for POS Tagging of the WSJ Corpus

20 Portuguese POS Tagging Results:
SYSTEM FEATURES ACC. ACC.OOSV ACC. OOUV CHARWNN – WNN CAPS+SUF WNN CAPS WNN SUF WNN – For POS Tagging of the Mac-Morpho Corpus

21 Results: Most similar words using character-level embeddings learned with WSJ Corpus INCONSIDERABLE YEAR-OLD SHEEP-LIKE DOMESTICALLY UNSTEADINESS INCONCEIVABLE YEAR-OLD ROCKET-LIKE FINANCIALLY UNEASINESS INDISTINGUISHABLE 63-YEAR-OLD FERN-LIKE ESSENTIALLY UNHAPPINESS INNUMERABLE YEAR-OLD SLIVER-LIKE GENERALLY UNPLEASANTNESS INCOMPATIBLE YEAR-OLD BUSINESS-LIKE IRONICALLY BUSINESS INCOMPREHENSIBLE 53-YEAR-OLD WAR-LIKE SPECIALLY UNWILLINGNESS

22 Results: Most similar words using word-level embeddings learned using unlabeled English texts INCONSIDERABLE YEAR-OLD SHEEP-LIKE DOMESTICALLY UNSTEADINESS INSIGNIFICANT SEVENTEEN-YEAR-OLD BURROWER WORLDWIDE PARESTHESIA INORDINATE SIXTEEN-YEAR-OLD CRUSTACEAN-LIKE 000,000, HYPERSALIVATION ASSUREDLY FOURTEEN-YEAR-OLD TROLL-LIKE ,000, DROWSINESS UNDESERVED NINETEEN-YEAR-OLD SCORPION-LIKE SALES DIPLOPIA ± SCRUPLE FIFTEEN-YEAR-OLD UROHIDROSIS RETAILS BREATHLESSNESS

23 Results: Most similar words using word-level embeddings learned using unlabeled Portuguese texts GRADAÇÕES CLANDESTINAMENTE REVOGAÇÃO DESLUMBRAMENTO DROGASSE TONALIDADES ILEGALMENTE ANULAÇÃO ASSOMBRO – MODULAÇÕES ALI PROMULGAÇÃO EXOTISMO – CARACTERIZAÇÕES ATAMBUA CADUCIDADE ENFADO – NUANÇAS BRAZZAVILLE INCONSTITUCIONALIDADE ENCANTAMENTO – COLORAÇÕES ˜ VOLUNTARIAMENTE NULIDADE FASCÍNIO –

24 Future Work : Analyzing the interrelationship between the two embeddings in more details Applying this work to other NLP tasks such as text chunking, NER etc.

25 Thank You


Download ppt "Learning Character Level Representation for POS Tagging"

Similar presentations


Ads by Google