Presentation is loading. Please wait.

Presentation is loading. Please wait.

Portability, Parallelism and Efficiency in Parsing Dan Bikel University of Pennsylvania March 11th, 2002.

Similar presentations


Presentation on theme: "Portability, Parallelism and Efficiency in Parsing Dan Bikel University of Pennsylvania March 11th, 2002."— Presentation transcript:

1 Portability, Parallelism and Efficiency in Parsing Dan Bikel University of Pennsylvania March 11th, 2002

2 Slide 1 Parsing: Where are we now? Pounding away at Penn Treebank, §23 –Collins (1999): LR 88.0, LP 88.3 –Charniak (2000): LR 89.6, LP 89.5 –Collins (2000): LR 89.6, LP 89.9 Henderson & Brill (1999) on §22: LR 90.1, LP 92.4 Room to grow: new domains, better performance

3 Slide 2 The Right Architecture for Parallel Parsing CKY Client 1CKY Client 2CKY Client N  Language Language package DecoderServer N ModelCollection Switchboard Object server DecoderServer 1 ModelCollection 

4 Slide 3 Architecture for Parallel Parsing II Highly parallel, multi-threaded –New cluster about to come on-line; poised to take advantage Fully fault-tolerant Significant flexibility: layers of abstraction Optimized for speed Highly portable for new domains, including new languages

5 Slide 4 Layer of Abstraction: Probability Structure P(t h,w h ) H (t h,w h )M i (t i,w i )M i-1 (t i-1, w i-1 )  Collins BBN

6 Slide 5 Plug-’n’-play Probability Models New engine capable of implementing a wide variety of models, including Collins, BBN Have meticulously replicated Collins’ model and performance –Cleaned up probabilistic “oddities” –Code is thoroughly documented –Will release to public

7 Slide 6 Fast Portability to New Data Sets Parsers operate over augmented tree space, T + Generative models define joint probability P(S,T,T + ) Chiang & Bikel (2002, in submission) provide –New, portable syntax for augmenting tree nodes –Method for reestimating parser models in the augmented space such that P(S,T) is maximized

8 Slide 7 Rapid Portability to New Languages with High Accuracy Bikel & Chiang (2000) described porting two parsing models developed for English to Chinese –BBN: LR 69.0, LP 74.8 (≤ 40 words) –Chiang: LR 76.8, LP 77.8 (≤ 40 words) New engine designed from ground up for multi-lingual processing: language package –Original design goal for new parsing engine: develop new language packages in 1–2 weeks Developed Chinese language package for new engine in one and a half days Compared to other known Chinese parsers on the CTB, recall is equivalent and precision is significantly superior –LR 77.0, LP 81.6 (≤ 40 words)

9 Slide 8 What’s in store… Incorporating richer lexical information into parsing/language processing, specifically… Incorporating word sense information into a parsing model, building on both –previous work extending BBN parsing model to include word sense –recent work with David Chiang, viewing word sense as yet another component of “hidden” data in a Treebank

10 Slide 9 FIN


Download ppt "Portability, Parallelism and Efficiency in Parsing Dan Bikel University of Pennsylvania March 11th, 2002."

Similar presentations


Ads by Google