Download presentation
Presentation is loading. Please wait.
1
LING 388 Language and Computers Lecture 23 12/2/03 Sandiway FONG
2
Administrivia
3
Psychological Reality and Computation The DCG we’ve used in class has no problems handling a variety of embedded relative clauses The DCG we’ve used in class has no problems handling a variety of embedded relative clauses However, the human processor reacts quite differently … However, the human processor reacts quite differently …
4
Doubly Embedded Relative Clauses Last Time: Last Time: You guys ranked sentences containing different kinds of doubly embedded relative clauses wrt comprehensibility: 1. I hate the man that the cat that Mary saw hissed at 2. I hate the man that saw the cat that hissed at John 3. I hate the man that the cat that hissed at John saw 4. I hate the man that hissed at the cat that John saw
5
Doubly Embedded Relative Clauses Four types: Four types: Subject or object for outer and inner relative clauses Syntax: (outer/inner) Syntax: (outer/inner) 1. I hate [ NP the man [ CP that [ NP the cat [ CP that Mary saw e]] hissed at e]] (object/object) 2. I hate [ NP the man [ CP that e saw [ NP the cat [ CP that e hissed at John]]]] (subject/subject) 3. I hate [ NP the man [ CP that [ NP the cat [ CP that e hissed at John]] saw e]] (object/subject) 4. I hate [ NP the man [ CP that e hissed at [ NP the cat [ CP that John saw e]]]] (subject/object)
6
Doubly Embedded Relative Clauses Results (8 of you did the survey): Results (8 of you did the survey): 1. Object/object(hardest) 5 out of 8 agreed it was the hardest 2. Object/subject 4 out of 8 said it was hardest, the other 4 said it was the 2nd hardest 3. Subject/object 6 out of 8 agreed it was the 2nd easiest 4. Subject/subject(easiest) 6 out of 8 agreed it was the easiest More on statistics later …
7
Doubly Embedded Relative Clauses Results: Results: 1. Object/object(hardest) the man 1 … the cat 2 … saw e 2 …hissed at e 1 2. Object/subject the man 1 … the cat 2 … e 2 hissed at … saw e 1 3. Subject/object the man 1 … e 1 hissed at the cat 2 … saw e 2 4. Subject/subject(easiest) the man 1 … e 1 saw the cat 2 … e 2 hissed at
8
Doubly Embedded Relative Clauses Computational Power Differences: Computational Power Differences: Object/object and object/subject cases involve embedding and thus require a stack Subject/object and subject/subject cases do not involve embedding => computationally less complex
9
Doubly Embedded Relative Clauses Statistics: Statistics: Can we characterize the amount of uncertainty in our results? Information Theory (Shannon) Information Theory (Shannon) Uncertainty measure is given by the formula: r = 8 lg = log base 2 p i = proportion with ranking i
10
Doubly Embedded Relative Clauses Results (with uncertainty values): Results (with uncertainty values): 1. Object/object(hardest) 0.95 2. Object/subject 1.00 3. Subject/object 0.8 4. Subject/subject(easiest) 0.8 0 (0 disagreers) 0.5 (1 disagreer) Uncertainty 0.8 (2 disagreers) 0.95 (3 disagreers) 1 (4 disagreers) 2 (random)
11
POS Tagging Review: Review: Components: 1. Dictionary 2. Mechanism to assign tags: Context-free: by frequencyContext-free: by frequency Fix up tags: by local contextFix up tags: by local context
12
Tags
13
Transformation-Based Tagging (TBT) Basic idea: (Brill, 1995) Basic idea: (Brill, 1995) Tag Transformation Rules: change a tag to another tag by inspection of local context Train a system to find these rules: Search space of possible rules Error-driven procedure
14
TBT: Space of Possible Rules Fixed window around current tag: Fixed window around current tag: Prolog-based µ-TBL notation (Lager, 1999): Prolog-based µ-TBL notation (Lager, 1999): current tag > new tag new tag <- tag@[+/-N]tag@[+/-N “change current tag to new tag if tag at position +/-N” t -3 t -2 t -1 t0t0 t1t1 t2t2 t3t3
15
TBT: Rules Learned Examples of rules learned Examples of rules learned (Manning & Schütze, 1999) (µ-TBL-style format): NN > VB VB <- TO@[-1]TO@[-1 … to walk … VBP > VB VB <- MD@[-1,-2,-3]MD@[-1,-2,-3 … could have put … JJR > RBR RBR <- JJ@[1]JJ@[1 … more valuable player … VBP > VB VB <- n’t@[-1,-2] … did n’t cut … (n’t is a separate word)
16
The µ-TBL System Implements Transformation-Based Learning Implements Transformation-Based Learning Can be used for POS tagging as well as other applications Implemented in Prolog (code and data) Implemented in Prolog (code and data) Downloadable from http://www.ling.gu.se/~lager/mutbl.html Downloadable from http://www.ling.gu.se/~lager/mutbl.html http://www.ling.gu.se/~lager/mutbl.html Full system for Windows (based on Sicstus Prolog) Full system for Windows (based on Sicstus Prolog) Includes tagged Wall Street Journal corpora
17
The µ-TBL System Tagged Corpus (for training and evaluation) Tagged Corpus (for training and evaluation) Format: Format: wd(P,W) P = index of W in corpus, W = word tag(P,T) T = tag of word at index P tag(T 1,T 2,P) T 1 = tag of word at index P, T 2 = correct tag (For efficient access: Prolog first argument indexing) (For efficient access: Prolog first argument indexing)
18
The µ-TBL System Example of tagged WSJ corpus: Example of tagged WSJ corpus: wd(63,'Longer'). tag(63,'JJR'). tag('JJR','JJR',63). wd(64,maturities). tag(64,'NNS'). tag('NNS','NNS',64). wd(65,are). tag(65,'VBP'). tag('VBP','VBP',65). wd(66,thought). tag(66,'VBN'). tag('VBN','VBN',66). wd(67,to). tag(67,'TO'). tag('TO','TO',67). wd(68,indicate). tag(68,'VBP'). tag('VBP','VB',68). wd(69,declining). tag(69,'VBG'). tag('VBG','VBG',69). wd(70,interest). tag(70,'NN'). tag('NN','NN',70). wd(71,rates). tag(71,'NNS'). tag('NNS','NNS',71). wd(72,because). tag(72,'IN'). tag('IN','IN',72). wd(73,they). tag(73,'PP'). tag('PP','PP',73). wd(74,permit). tag(74,'VB'). tag('VB','VBP',74). wd(75,portfolio). tag(75,'NN'). tag('NN','NN',75). wd(76,managers). tag(76,'NNS'). tag('NNS','NNS',76). wd(77,to). tag(77,'TO'). tag('TO','TO',77). wd(78,retain). tag(78,'VB'). tag('VB','VB',78). wd(79,relatively). tag(79,'RB'). tag('RB','RB',79). wd(80,higher). tag(80,'JJR'). tag('JJR','JJR',80). wd(81,rates). tag(81,'NNS'). tag('NNS','NNS',81). wd(82,for). tag(82,'IN'). tag('IN','IN',82). wd(83,a). tag(83,'DT'). tag('DT','DT',83). wd(84,longer). tag(84,'RB'). tag('RB','JJR',84).
19
The µ-TBL System
22
Demo … Demo … Off the webpage Tag transformation rules are: Tag transformation rules are: Human readable More powerful than simple bigrams Take less “effort” to train
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.