Download presentation
Presentation is loading. Please wait.
Published byAria Berringer Modified over 9 years ago
1
1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP
2
2 every summer since 2002 1. Welcome to the lab exercise! Please form teams of ~3 people … Programmers, get a linguist on your team And vice-versa Undergrads, get a grad student on your team And vice-versa We always run this exercise on the 1 st day of the Johns Hopkins Summer School in Human Language Technology thank you JHU, NSF, and NAACL … We’ve also run variants in our JHU & CMU classes
3
3 2. Okay, team, please log in The 3 of you should use adjacent workstations Log in as individuals Your secret team directory: cd …/03-turbulent-kiwi You can all edit files there Publicly readable & writeable No one else knows the secret directory name Minimizes permissions fuss
4
4 3. Now write a grammar of English You have 2 hours. Actually, as the deadline approaches, the teams usually vote to stay an extra hour
5
5 3. Now write a grammar of English What’s a grammar? 1S1 NP VP. 1VP VerbT NP 20NP Det N’ 1NP Proper 20N’ Noun 1N’ N’ PP 1PP Prep NP Here’s one to start with. You have 2 hours.
6
6 3. Now write a grammar of English 1Noun castle 1Noun king … 1Proper Arthur 1Proper Guinevere … 1Det a 1Det every … 1VerbT covers 1VerbT rides … 1Misc that 1Misc bloodier 1Misc does … Plus initial terminal rules. 1S1 NP VP. 1VP VerbT NP 20NP Det N’ 1NP Proper 20N’ Noun 1N’ N’ PP 1PP Prep NP Here’s one to start with. Any PCFG is okay
7
7 Sample a sentence on the blackboard Any PCFG is okay 3. Now write a grammar of English Here’s one to start with. S1 1 NPVP. 1S1 NP VP. 1VP VerbT NP 20NP Det N’ 1NP Proper 20N’ Noun 1N’ N’ PP 1PP Prep NP
8
8 Sample a sentence on the blackboard Any PCFG is okay 3. Now write a grammar of English Here’s one to start with. S1 NPVP. DetN’ 20/21 1/21 1S1 NP VP. 1VP VerbT NP 20NP Det N’ 1NP Proper 20N’ Noun 1N’ N’ PP 1PP Prep NP
9
9 Sample a sentence on the blackboard Arbitrary PCFG is okay 3. Now write a grammar of English Here’s one to start with. S1 NPVP. DetN’ Noun everycastle drinks [[Arthur [across the [coconut in the castle]]] [above another chalice]] 1S1 NP VP. 1VP VerbT NP 20NP Det N’ 1NP Proper 20N’ Noun 1N’ N’ PP 1PP Prep NP
10
10 4. Okay – go! How will we be tested on this?
11
11 4. Okay – go! How will we be tested on this? 5. Evaluation procedure We’ll sample 20 random sentences from your PCFG. Human judges will vote on whether each sentence is grammatical. By the way, y’all will be the judges (double-blind). You probably want to use the sampling script to keep testing your grammar along the way. this is educational
12
12 Ok, we’re done! All our sentences are already grammatical. We’ll sample 20 random sentences from your PCFG. Human judges will vote on whether each sentence is grammatical. You’re right: This only tests precision. How about recall? 5. Evaluation procedure 1S1 NP VP. 1VP VerbT NP 20NP Det N’ 1NP Proper 20N’ Noun 1N’ N’ PP 1PP Prep NP
13
13 questions, movement, (free) relatives, clefts, agreement, subcat frames, conjunctions, auxiliaries, gerunds, sentential subjects, appositives … Development set You might want your grammar to generate … Arthur is the king. Arthur rides the horse near the castle. riding to Camelot is hard. do coconuts speak ? what does Arthur ride ? who does Arthur suggest she carry ? why does England have a king ? are they suggesting Arthur ride to Camelot ? five strangers are at the Round Table. Guinevere might have known. Guinevere should be riding with Patsy. it is Sir Lancelot who knows Zoot ! either Arthur knows or Patsy does. neither Sir Lancelot nor Guinevere will speak of it. We provide a file of 27 sample sentences illustrating a range of grammatical phenomena covered by initial grammar
14
14 questions, movement, (free) relatives, clefts, agreement, subcat frames, conjunctions, auxiliaries, gerunds, sentential subjects, appositives … Development set You might want your grammar to generate … the Holy Grail was covered by a yellow fruit. Zoot might have been carried by a swallow. Arthur rode to Camelot and drank from his chalice. they migrate precisely because they know they will grow. do not speak ! Arthur will have been riding for eight nights. Arthur, sixty inches, is a tiny king. Arthur knows Patsy, the trusty servant. Arthur and Guinevere migrate frequently. he knows what they are covering with that story. Arthur suggested that the castle be carried. the king drank to the castle that was his home. when the king drinks, Patsy drinks.
15
15 What we could have done: Cross-entropy on a similar, held-out test set 5’. Evaluation of recall (= productivity!!) How should we parse sentences with OOV words? No OOVs allowed in the test set. Fixed vocabulary. every coconut of his that the swallow dropped sounded like a horse.
16
16 What we could have done: Cross-entropy on a similar, held-out test set 5’. Evaluation of recall (= productivity!!) You should try to generate sentences that your opponents can’t parse. What we actually did, to heighten competition & creativity: Test set comes from the participants! In Boggle, you get points for finding words that your opponents don’t find. Use the fixed vocabulary creatively. What we could have done (good for your class?): Cross-entropy on a similar, held-out test set
17
17 1Nouncastle 1Nounking … 1ProperArthur 1ProperGuinevere … 1Deta 1Detevery … 1VerbTcovers 1VerbTrides … 1Miscthat 1Miscbloodier 1Miscdoes … Initial terminal rules Use the fixed vocabulary creatively. The initial grammar sticks to 3 rd -person singular transitive present-tense forms. All grammatical. But we provide 183 Misc words (not accessible from initial grammar) that you’re free to work into your grammar …
18
18 1Miscthat 1Miscbloodier 1Miscdoes … Initial terminal rules Use the fixed vocabulary creatively. The initial grammar sticks to 3 rd -person singular transitive present-tense forms. All grammatical. But we provide 183 Misc words (not accessible from initial grammar) that you’re free to work into your grammar … pronouns (various cases), plurals, various verb forms, non-transitive verbs, adjectives (various forms), adverbs & negation, conjunctions & punctuation, wh-words, …
19
19 In Boggle, you get points for finding words that your opponents don’t find. 5’. Evaluation of recall (= productivity!!) You should try to generate sentences that your opponents can’t parse. What we could have done (good for your class?): Cross-entropy on a similar, held-out test set What we actually did, to heighten competition & creativity: Test set comes from the participants!
20
20 5’. Evaluation of recall (= productivity!!) You should try to generate sentences that your opponents can’t parse. We’ll score your cross-entropy when you try to parse the sentences that the other teams generate. (Only the ones judged grammatical.) What we could have done (good for your class?): Cross-entropy on a similar, held-out test set What we actually did, to heighten competition & creativity: Test set comes from the participants! You probably want to use the parsing script to keep testing your grammar along the way.
21
21 What we actually did, to heighten competition & creativity: Test set comes from the participants! 5’. Evaluation of recall (= productivity!!) What we could have done (you could too): Cross-entropy on a similar, held-out test set We’ll score your cross-entropy when you try to parse the sentences that the other teams generate. (Only the ones judged grammatical.) What if my grammar can’t parse one of the test sentences? 0 probability?? You get the infinite penalty. So don’t do that.
22
22 S2 S2 _Noun S2 _Misc _Noun Noun _Noun Noun _Noun _Noun Noun _Misc _Misc Misc _Misc Misc _Noun _Misc Misc _Misc (etc.) Use a backoff grammar Initial backoff grammar : Bigram POS HMM _Verb Verb_Misc Misc_Punc Punc_Noun Noun S2 i.e., something that starts with a Verb rides ‘s ! swallow i.e., something that starts with a Misc... _Verb Verb_Misc Misc
23
23 S2 S2 _Noun S2 _Misc _Noun Noun _Noun Noun _Noun _Noun Noun _Misc _Misc Misc _Misc Misc _Noun _Misc Misc _Misc (etc.) S1 NP VP. VP VerbT NP NP Det N’ NP Proper N’ Noun N’ N’ PP PP Prep NP Use a backoff grammar Init. linguistic grammarInitial backoff grammar : Bigram POS HMM
24
24 S2 S2 _Noun S2 _Misc _Noun Noun _Noun Noun _Noun _Noun Noun _Misc _Misc Misc _Misc Misc _Noun _Misc Misc _Misc (etc.) S1 NP VP. VP VerbT NP NP Det N’ NP Proper N’ Noun N’ N’ PP PP Prep NP Use a backoff grammar Init. linguistic grammarInitial backoff grammar : Bigram POS HMM START S1 START S2 Initial master grammar Choose these weights wisely! Mixture model
25
25 6. Discussion What did you do? How? Was CFG expressive enough? How would you improve the formalism? Would it work for other languages? How should one pick the weights? And how could you build a better backoff grammar? Is grammaticality well-defined? How is it related to probability? What if you had 36 person-months to do it right? What other tools or data do you need? What would the resulting grammar be good for? What evaluation metrics are most important? features, gapping
26
26 7. Winners announced
27
27 7. Winners announced Of course, no one finishes their ambitious plans. Alternative: Allow 2 weeks (see paper) … Anyway, a lot of work! Helps to favor backoff grammar yay unreachable
28
28 What did they do? (see paper) More fine-grained parts of speech do-support for questions & negation Movement using gapped categories X-bar categories (following the initial grammar) Singular/plural features Pronoun case Verb forms Verb subcategorization; selectional restrictions (“location”) Comparative vs. superlative adjectives Appositives (must avoid double comma) A bit of experimentation with weights One successful attempt to game scoring system (ok with us!)
29
29 Why do we recommend this lesson? Good opening activity No programming Only very simple probability No background beyond linguistic intuitions Though w/ time constraints, helps to have a linguist on the team Works great with diverse teams Social, intense, good mixer, sets the pace http://www.clsp.jhu.edu/grammar-writing
30
30 Good opening activity Why do we recommend this lesson? Good opening activity Introduces many topics – touchstone for later teaching Grammaticality Grammaticality judgments, formal grammars, parsers Specific linguistic phenomena Desperate need for features, morphology, gap-passing Generative probability models: PCFGs and HMMs Backoff, inside probability, random sampling, … Recovering latent variables: Parse trees and POS taggings Evaluation (sort of) Annotation, precision, recall, cross-entropy, … Manual parameter tuning Why learning would be valuable, alongside expert knowledge http://www.clsp.jhu.edu/grammar-writing
31
31 A final thought The CS curriculum starts with programming Accessible and hands-on Necessary to motivate or understand much of CS In CL, the equivalent is grammar writing It was the traditional (pre-statistical) introduction Our contributions: competitive game, statistics, finite-state backoff, reusable instructional materials Much of CL work still centers around grammar formalisms We design expressive formalisms for linguistic data Solve linguistic problems within these formalisms Enrich them with probabilities Process them with algorithms Learn them from data Connect them to other modules in the pipeline Akin to programming languages
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.