Presentation is loading. Please wait.

Presentation is loading. Please wait.

Syntax The study of how words are ordered and grouped together Key concept: constituent = a sequence of words that acts as a unit he the man the short.

Similar presentations


Presentation on theme: "Syntax The study of how words are ordered and grouped together Key concept: constituent = a sequence of words that acts as a unit he the man the short."— Presentation transcript:

1 Syntax The study of how words are ordered and grouped together Key concept: constituent = a sequence of words that acts as a unit he the man the short man the short man with the large hat went home to his house out of the car with her }{

2 Phrase Structure S NP PN VP VBDNPPP PRPNP Shesawa tall manwitha telescope

3 Noun Phrases Contains a noun plus descriptors, including: –Determiner: the, a, this, that –Adjective phrases: green, very tall –Head: the main noun in the phrase –Post-modifiers: prepositional phrases or relative clauses That old green couch of yours that I want to throw out detadj headPPrelative clause

4 Verb Phrases Contains a verb (the head) with modifiers and other elements that depend on the verb want to throw out headPP previously saw the man in the park with her telescope advheaddirect objectPP might have showed his boss the code yesterday indirect object DObjheadauxmodaladverb

5 Prepositional Phrases Preposition as head and NP as complement with her grey poodle headcomplement Adjective Phrases Adjective as head with modifiers extremely sure that he would win headrelative clauseadv

6 Shallow Parsing Extract phrases from text as ‘chunks’ Flat, no tree structures Usually based on patterns of POS tags Full parsing conceived of two steps: –Chunking / Shallow parsing –Attachment of chunks to each other

7 Noun Phrases Base Noun Phrase: A noun phrase that does not contain other noun phrases as a component Or, no modification to the right of the head a large green cow The United States Government every poor shop-owner’s dream? other methods and techniques?

8 Manual Methodology Build a regular-expression over POS E.g: DT? (ADJ | VBG)* (NN)+ Very hard to do accurately Lots of manual labor Cannot be easily tuned to a specific corpus

9 Chunk Tags Represent NPs by tags: [thetallman]ranwith [blindingspeed] DTADJNN1VBDPRPVBGNN0 IIIOO II Need B tag for adjacent NPs: On[Tuesday][thecompany]wentbankrupt O I BIOO

10 Transformational Learning Baseline tagger: –Most frequent chunk tag for POS or word Rule templates (100 total): current word/POScurrent ctag word/POS 1 on left/rightcurrent and left ctag current and left/right word/POScurrent and right ctag word/POS on left and on rightin two ctags to left in two words/POSs on left/rightin two ctags to right in three words/POSs on left/right

11 Some Rules Learned 1.(T 1 =O, P 0 =JJ) I  O 2.(T -2 =I, T -1 =I, P 0 =DT)  B 3.(T -2 =O, T -1 =I, P -1 =DT)  I 4.(T -1 =I, P 0 =WDT)I  B 5.(T -1 =I, P 0 =PRP)I  B 6.(T -1 =I, W 0 =who)I  B 7.(T -1 =I, P 0 =CC, P 1 =NN)O  I

12 Results TrainingPrec.RecallTag Acc. Baseline78.281.994.5 50K89.890.496.9 100K91.391.897.2 200K91.892.397.4 200K nolex90.590.797.0 950K93.193.597.8 Precision = fraction of NPs predicted that are correct Recall = fraction of actual NPs that are found

13 Memory-Based Learning Match test data to previously seen data and classify based on the most similar previously seen instances E.g: { the saw was she saw the boy saw three boy saw the boy ate the

14 k-Nearest Neighbor (kNN) Find k most similar training examples Let them ‘vote’ on the correct class for the test example –Weight neighbors by distance from test Main problem: defining ‘similar’ –Shallow parsing – overlap of words and POS –Use feature weighting...

15 Information Gain Not all features are created equal (e.g. saw in previous example is more important) Weight the features by information gain = how much does f distinguish different classes

16 C1 C2 C3 C4 high information gain low information gain

17 Base Verb Phrase Verb phrase not including NPs or PPs [ NP Pierre Vinken NP ], [ NP 61 years NP ] old, [ VP will soon be joining VP ] [ NP the board NP ] as [ NP a nonexecutive director NP ].

18 Results Context: 2 words and POS on left and 1 word and POS on right TaskContextPrec.RecallAcc. bNPcurr. word768093 curr. POS808295 2 – 194 98 bVPcurr. word687396 curr. POS758997 2 – 1949699

19 Efficiency of MBL Finding the neighbors can be costly Possibility: Build decision tree based on information gain of features to index data = approximate kNN W0W0 P -2 P -1 W -1 saw the boy

20 MBSL Memory-based technique relying on sequential nature of the data –Use “tiles” of phrases in memory to “cover” a new candidate (and context), and compute a tiling score wenttothewhitehousefordinner VBDPRP [[ DTADJNN1 ]] PRPNN1 PRP [ NP DT [ NP DT ADJ NN1 NN1 NP ] PRP PRP [ NP DT ADJ ADJ NN1 NP ]

21 Tile Evidence Memory: [ NP DT NN1 NP ] VBD [ NP DT NN1 NN1 NP ] [ NP NN2 NP ]. [ NP ADJ NN2 NP ] AUX VBG PRP [ NP DT ADJ NN1 NP ]. Some tiles: [ NP DTpos=3neg=0 [ NP DT NN1pos=2neg=0 DT NN1 NP ]pos=1neg=1 NN1 NP ]pos=3neg=1 NN1 NP ] VBDpos=1neg=0 Score tile t by f t (t) = pos / total, Only keep tiles that pass a threshhold f t (t) > 

22 Covers Tile t 1 connects to t 2 in a candidate if: –t 2 starts after t 1 –there is no gap between them (may be overlap) –t 2 ends after t 1 VBDPRP [[ DTADJNN1 ]] PRPNN1 PRP [ NP DT [ NP DT ADJ NN1 NP ] PRP A sequence of tiles covers a candidate if –each tile connects to the next –the tiles collectively match the entire candidate including brackets and maybe some context

23 Cover Graph VBDPRP [[ DTADJNN1 ]] PRPNN1 PRP [ NP DT [ NP DT ADJ NN1 NN1 NP ] PRP PRP [ NP DT ADJ ADJ NN1 NP ] STARTEND

24 Measures of ‘Goodness’ Number of different covers Size of smallest cover (fewest tiles) Maximum context in any cover (left + right) Maximum overlap of tiles in any cover Grand total positive evidence divided by grand total positive+negative evidence Combine these measures by linear weighting

25 Scoring a Candidate CandidateScore(candidate,  T ) G  CoverGraph(candidate,  T ) Compute statistics by DFS on G Compute candidate score as linear function of statistics Complexity (O(l) tiles in candidate of length l): –Creating the cover graph is O(l 2 ) –DFS is O(V+E)=O(l 2 )

26 Full Algorithm MBSL(sent,  C,  T ) 1.For each subsequence of sent, do: 1.Construct a candidate s by adding brackets [[ and ]] before and after the subsequence 2. f C (s)  CandidateScore(s,  T ) 3.If f C (s) >  C, then add s to candidate-set 2.For each c in candidate-set in decreasing order of f C (c), do: 1.Remove all candidates overlapping with c from candidate-set 3.Return candidate-set as target instances

27 Results Target Type Context size TT Prec.Recall NP30.692 SV30.68985 VO20.57790


Download ppt "Syntax The study of how words are ordered and grouped together Key concept: constituent = a sequence of words that acts as a unit he the man the short."

Similar presentations


Ads by Google