Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010.

Similar presentations


Presentation on theme: "Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010."— Presentation transcript:

1 Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

2 2 Talk Outline Intro to Syntactic Parsing –Why Parse? Parsing Algorithms –CYK –Best-First –Beam-Search Exponential Decay Pruning Results

3 3 Intro to Syntactic Parsing Hierarchically cluster and label syntactic word groups (constituents) Provides structure and meaning

4 4 Intro to Syntactic Parsing Why Parse? –Machine Translation Synchronous Grammars –Language Understanding Semantic Role Labeling Word Sense Disambiguation Question-Answering Document Summarization –Language Modeling Long-distance dependencies –Because it’s fun

5 5 Intro to Syntactic Parsing What you (usually) need to parse –Supervised data: A treebank of sentences with annotated parse structure WSJ treebank: 50k sentences –A Binarized Probabilistic Context Free Grammar induced from a treebank –A parsing algorithm Example grammar rules: –S  NP VP prob=0.2 –NP  NP NN prob=0.1 –NP  JJ NN prob=0.06 –Binarize: VP  PP VB NN VP  PP @VP prob=0.2 @VP  VB NN prob=0.5

6 6 Parsing Accuracy Non- terminals Grammar Size Sec / Sent F-Score Baseline2,50064,0000.174% Parent Annotation (Johnson)6,00075,0001.078% Manual Refinement (Klein)15,00086% Latent Variable (Petrov)1,1004,000,000100.089% Lexical (Collins, Charniak)LotsImplicit89% Accuracy Improvements from grammar refinement –Split original non-terminal categories (Subject-NP vs. Object-NP) –Accuracy at the cost of speed Solution space becomes impractical to exhaustively search

7 7 Berkeley Grammar & Parser Petrov et al. automatically split non-terminals using latent variables Example grammar rules: –S_3  NP_12 VP_6 prob=0.2 –NP_12  NP_9 NN_7 prob=0.1 –NN_7  house prob=0.06 Berkeley Coarse-to-Fine parser uses six latent variable grammars –Parse input sentence once with each grammar –Posterior probabilities from pass n used to prune pass n+1 –Must know mapping between non-terminals from different grammars Grammar(2) { NP_1, NP_6 }  Grammar(3) { NP_2, NP_9, NP_14 }

8 8 Research Goals Our Research Goals –Find good solutions very quickly in this LARGE grammar space (not ML) –Algorithms should be grammar agnostic –Consider practical implications (speed, memory) This talk: Exponential Decay Pruning –Beam-Search parsing for efficient search –Searches the final grammar space directly –Balance overhead of targeted exploration (best-first) vs. memory and cache benefits of local exploration (CYK)

9 9 Parsing Algorithms: CYK Intro to Syntactic Parsing –Why Parse? Parsing Algorithms –CYK –Best-First –Beam-Search Exponential Decay Pruning Results

10 10 Parsing Algorithms: CYK Exhaustive population of all parse trees permitted by the grammar Dynamic Programming algorithm give Maximum Likelihood solution

11 11 Parsing Algorithms: CYK Fill in cells for SPAN=1,2,3,4,… Grammar S  NP VP (p=0.7) NP  NP NP (p=0.2) NP  NP VP (p=0.1) NN  court (p=0.4) VB  court (p=0.1) ….

12 12 Parsing Algorithms: CYK Grammar S  NP VP (p=0.7) NP  NP NP (p=0.2) NP  NP VP (p=0.1) NN  court (p=0.4) VB  court (p=0.1) …. N iterations through the grammar at each chart cell to consider all possible midpoints

13 13 Parsing Algorithms: Best-First Intro to Syntactic Parsing –Why Parse? Parsing Algorithms –CYK –Best-First –Beam-Search Exponential Decay Pruning Results

14 14 Parsing Algorithms: Best-First Grammar S  NP VP (p=0.7) VB  court (p=0.1) …. Frontier PQ [try][shooting,defendant] VP  VB NP fom=28.1 [try,shooting][defendant] VP  VB NP fom=14.7 [Juvenile][court] NP  ADJ NN fom=13 Frontier is a Priority Queue of all potentially buildable entries Add best entry from Frontier; expand Frontier with all possible chart + grammar extensions

15 15 Parsing Algorithms: Best-First Grammar S  NP VP (p=0.7) VB  court (p=0.1) …. Frontier PQ [try][shooting,defendant] VP  VB NP fom=28.1 [try,shooting][defendant] VP  VB NP fom=14.7 [Juvenile][court] NP  ADJ NN fom=13 Frontier is a Priority Queue of all potentially buildable entries Add best entry from Frontier; expand Frontier with all possible chart + grammar extensions

16 16 Parsing Algorithms: Best-First How do we rank Frontier entries? –Figure-of-Merit (FOM) –FOM = Inside (grammar) * Outside (heuristic) –Caraballo and Charniak, 1997 (C&C) –Problem with comparisons of different spans Grammar S  NP VP (p=0.7) VB  court (p=0.1) …. Frontier PQ [try][shooting,defendant] VP  VB NP fom=28.1 [try,shooting][defendant] VP  VB NP fom=14.7 [Juvenile][court] NP  ADJ NN fom=13

17 17 Parsing Algorithms: Beam-Search Intro to Syntactic Parsing –Why Parse? Parsing Algorithms –CYK –Best-First –Beam-Search Exponential Decay Pruning Results

18 18 Parsing Algorithms: Beam-Search Beam-Search: Best of both worlds CKY exhaustive traversal (bottom-up) At each chart cell –Compute FOM for all possible cell entries –Rank entries in a (temporary) local priority queue –Only populate the cell with the n-best entries (beam-width) Less Memory –Not storing all cell entries (CYK) nor bad frontier entries (Best-First) Runs Faster –Search space is pruned (unlike CYK) and don’t need to maintain global priority queue (Best-First) Eliminates problem of global cell entry comparison

19 19 Parsing Algorithms: Beam-Search Intro to Syntactic Parsing –Why Parse? Parsing Algorithms –CYK –Best-First –Beam-Search Exponential Decay Pruning Results

20 20 Exponential Decay Pruning What is the optimal beam-width per chart cell? –Common solutions: Relative score difference from highest ranking entry Global maximum number of candidates Exponential Decay Pruning –Adaptive beam-width conditioned on chart cell information –How reliable is our Figure-of-Merit per chart cell? –Plotted rank of Gold entry against span and sentence size FOM is more reliable for larger spans –Less dependent on outside estimate FOM is less reliable for short sentences –Atypical grammatical structure (in WSJ?)

21 21 Exponential Decay Pruning Confidence in FOM can be modeled with the Exponential Decay function –N 0 = Global beam-width maximum –n = sentence length –s = span length (number of words covered) –λ = tuning parameter

22 22 Exponential Decay Pruning Confidence in FOM can be modeled with the Exponential Decay function

23 23 Intro to Syntactic Parsing –Why Parse? Parsing Algorithms –CYK –Best-First –Beam-Search Exponential Decay Pruning Results

24 24 Results Wall Street Journal treebank –Train: Sections 2-21 (40k sentences) –Dev: Section 24 (1.3k sentences –Test: Section 23 (2.4k sentences) Berkeley SM6 Latent Variable Grammar Figure-of-Merit from Caraballo and Charniak, 1997 (C&C) Also applied Cell Closing Constraints (Roark and Hollingshead, 2008) External comparison with Berkeley Coarse-to-Fine parser using same grammar

25 25 Results: Dev AlgorithmFOMBeam- Width Cell Closing Seconds per Sent Chart Entries F-Score CYK94.116353787.2 Best-FirstInside138.015247287.2 Best-FirstC&C1.4334985.2 Beam-SearchInsideConstant5.683550187.2 Beam-SearchInsideDecay3.012000287.0 Beam-SearchC&CConstant0.62754887.0 Beam-SearchC&CDecay0.37514587.1 Beam-SearchC&CConstantYes0.31533387.4 Beam-SearchC&CDecayYes0.20383987.5 Figure-of-Merit makes a big difference Fast solution, but significant accuracy degradation

26 26 Results: Dev AlgorithmFOMBeam- Width Cell Closing Seconds per Sent Chart Entries F-Score CYK94.116353787.2 Best-FirstInside138.015247287.2 Best-FirstC&C1.4334985.2 Beam-SearchInsideConstant5.683550187.2 Beam-SearchInsideDecay3.012000287.0 Beam-SearchC&CConstant0.62754887.0 Beam-SearchC&CDecay0.37514587.1 Beam-SearchC&CConstantYes0.31533387.4 Beam-SearchC&CDecayYes0.20383987.5 Using the inside probability for the FOM –95% speed reduction with Beam-Search over Best-First –Exponential Decay adds additional 47% speed reduction

27 27 Results: Dev AlgorithmFOMBeam- Width Cell Closing Seconds per Sent Chart Entries F-Score CYK94.116353787.2 Best-FirstInside138.015247287.2 Best-FirstC&C1.4334985.2 Beam-SearchInsideConstant5.683550187.2 Beam-SearchInsideDecay3.012000287.0 Beam-SearchC&CConstant0.62754887.0 Beam-SearchC&CDecay0.37514587.1 Beam-SearchC&CConstantYes0.31533387.4 Beam-SearchC&CDecayYes0.20383987.5 Using the C&C FOM –Beam-Search is faster (57%) and more accurate than Best-First –Exponential Decay adds additional 40% speed reduction

28 28 Results: Dev AlgorithmFOMBeam- Width Cell Closing Seconds per Sent Chart Entries F-Score CYK94.116353787.2 Best-FirstInside138.015247287.2 Best-FirstC&C1.4334985.2 Beam-SearchInsideConstant5.683550187.2 Beam-SearchInsideDecay3.012000287.0 Beam-SearchC&CConstant0.62754887.0 Beam-SearchC&CDecay0.37514587.1 Beam-SearchC&CConstantYes0.31533387.4 Beam-SearchC&CDecayYes0.20383987.5

29 29 Results: Test AlgorithmFOMBeam- Width Cell Closing Seconds per Sent F-Score CYK76.6388.0 Beam-SearchC&CConstant0.4587.9 Beam-SearchC&CDecay0.2888.0 Beam-SearchC&CDecayYes0.1688.3 Berkeley C2F0.2188.3 38% relative speed-up (Decay vs. Constant beam-width) Decay pruning and Cell Closing Constraints are complementary Same ball-park as Coarse-to-Fine (perhaps a bit faster) Requires no knowledge of the grammar

30 30 Thanks

31 31 FOM Details C&C FOM Details –FOM(NT) = Outside left * Inside * Outside right –Inside = Constituent grammar score for NT –Outside left = Max { POS forward prob * POS-to-NT transition prob } –Outside right = Max { NT-to-POS transition prob * POS bkwd prob }

32 32 FOM Details C&C FOM Details

33 33 Research Goals –Find good solutions very quickly in this LARGE grammar space (not ML) –Algorithms should be grammar agnostic –Consider practical implications (speed, memory) Current projects towards these goals –Better FOM function Inside estimate (grammar refinement) Outside estimate (participation in complete parse tree) –Optimal chart traversal strategy Which areas of the search space are most promising? Cell Closing Constraints (Roark and Hollingshead, 2008) –Balance between targeted and exhaustive exploration How much “work” should be done exploring the search space around these promising areas? Overhead of targeted exploration (best-first) vs. memory and cache benefits of local exploration (CYK)


Download ppt "Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010."

Similar presentations


Ads by Google