Optimizing TFSG Parsing through Non- Statistical Indexing Paper Summary Sebastian Nowozin.

Optimizing TFSG Parsing through Non- Statistical Indexing Paper Summary Sebastian Nowozin

TFSGs ● TFSG = `Typed Feature Structure Grammar' ● A TFSG has: – One or more TFS: Typed Feature Structure – A set of fixed types, features and constraints ● Together builds a type signature

TFSGs: slow to parse ● Why is parsing slow? – CFG: large grammar size – TFSG: large data structures of grammatical categories ● Optimizing TFSG parsing – Techniques working on CFGs do not work well – Techniques on grammar rules do not work (there are only few rules in TFSGs) – Optimizing data structure access works

Unification ● Recap: – “Two feature structures unify if there is a feature structure that is an extension of both.” (NLU course slides, set 10, Prof. Yao Tianfang) – If the feature structures do not unify, its a “unification failure” ● Problem – How to quickly determine if unification is possible? – Answer: we have to take a closer look at the variables of a TFS

TFSG variables ● Unification combines trees of structures ● The TFSs contain variables ● There are two kind of TFS variables – “internal variables” – “external variables”

Internal vs. External variables Internal variables ● Share structure between substructures External variables ● Share structure between grammatical categories ● active external variables: – instances which are shared between a category and one or more categories described by categories visited by the parser before, while completing one rule ● inactive external variables: – all other external variables

Example: external variables ● An example phrase rule of a TFSG ● In bottom-up, left-to-right parsing, all of a mother's external variable instances would be active, because, being external, they also occur in one of the daughter descriptions ● Also: all left-most daugher's external variables instances would be inactive because this is the first description used by the parser ● active external ● inactive external

indexing? ● Indexing: each edge in the chart is assigned an associated index key, which identifies: the daughter's categories than can potentially match it ● When completing a rule, we have to – search edges in chart, that unify with a specific daughter – before: visit all edges – now: only visit the edges the daughter's index key references (→ reduces the number of unification attempts) ● Two kinds of indexing – Positional Indexing ● index key for each daughter is ● can be determined at compile-time – Path Indexing ● same as Positional Index, but also has a path index vector with type values extracted from the mother type

Indexing helps! ● Relationship to external variables – Active external variables are important for path indexing, because they represent the points at which the parser must copy structures between TFSs (costly!) ● Indexing timeline Stage 1, offline ● Static analysis of grammar rules ● Type signature, appropriateness specifications and the types and features of mother and daughter descriptions are analyzed to build an indexing scheme Stage 2, parsing ● After rule completion, all mother variables have been extended further. Now further information from the mother's content can be used to improve the indexing keys. ● Disadvantage: slowing parsing process Stage 3, parsing ● During rule completion: matching edges for daughters are searched for in the chart. Now, daughter's active external variables have been extended further. We can pre-unify the information from stage 1 to boost unification.

Experiment environment ● Authors implemented this algorithm ● two well-known TFSGs being tested: – MERGE grammar (17 rules, 136 lexical items, 1157 types and 144 introduced features) – ALE port of ERG (45 rules, 1314 lexical entries, 4305 types and 155 features) ● Test corpi – MERGE: 350 sentences between 6-15 words, from Wall Street Journal annotated parse corpus – ALE: 1030 sentences between 6-22 words, from Wall Street and Brown corpus ● Implementation in Prolog on Sun Server

Experiment results ● Improvement over non-indexing: ● Improvement over Quick-Check/Path

Experiment results 2 ● Setup time: ● Unification failures

Summary ● Indexing based parse time improvements are possible – for several classes of unification-based grammars – by static a priori analysis of grammar rules – advantage over other methods: no training necessary (as for statistical methods) – it is still possible to combine it with statistical methods for aggregate improvements

My criticism ● Experimental results questionable – authors mention on-par improvement with Quick-Check method, yet unification failure statistics do not reflect this – authors implemented both their own and the Quick-Check themselves for absolute time performance comparison, absolute time values useless – the implementation is not available, re-testing the results is only possible with large effort

The end... ● Questions?

Optimizing TFSG Parsing through Non- Statistical Indexing Paper Summary Sebastian Nowozin.

Similar presentations

Presentation on theme: "Optimizing TFSG Parsing through Non- Statistical Indexing Paper Summary Sebastian Nowozin."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Optimizing TFSG Parsing through Non- Statistical Indexing Paper Summary Sebastian Nowozin.

Similar presentations

Presentation on theme: "Optimizing TFSG Parsing through Non- Statistical Indexing Paper Summary Sebastian Nowozin."— Presentation transcript:

Similar presentations

About project

Feedback