Keeping Chess Alive – Do we need 1-unambiguous content models? Murali Mani, UCLA/CSD Extreme Markup Languages 2001 Montreal, Canada
Outline of the talk Why is 1-unambiguity important? Formalize few concepts and learn – – There exist regular languages that are inherently not 1-unambiguous. We do not need 1-unambiguity – No additional benefit – Difficulty for document processing (type inference)
Why is 1-unambiguity important? XML 1.0 specification [3.2.1] – “It is an error if an element in the document can match more than one occurrence of an element type in the content model”. [App E] – “The content model (b, c) | (b, d) is in error and may be reported as an error.”
Why is 1-unambiguity important? (contd…) XML Schema [3.8.6] – Schema Component Constraint: Unique Particle Attribution “A content model must be formed such that during validation of an element information item sequence, the particle with which we attempt to validate each item in the sequence can be uniquely determined without examining the content or attributes of that element, and without any information about items in the remainder of the sequence”
Concepts Regular expression – ‘,’, ‘|’,,’*’ (a | b)*, c Model group – other operators also – ‘+’, ‘?’, ‘&’ a?, (b | c)* = (a, (b | c)*) | (b | c)* Every regular expression is a model group Every model group can be expressed as a regular expression.
1-unambiguous content models Ambiguity in Graphs and Expressions – Book, Evan, Greibach, Ott, 1971 – Given a regular expression, E, is E ambiguous? For example, (a | (a, b*)) is ambiguous Deterministic Regular Languages – Anne Bruggemann Klein, 1991 – Studied 1-unambiguity in SGML content models
1-unambiguous content models (contd…) Reasoning about XML Schema Languages using Formal Language Theory –Dongwon Lee, Murali Mani, Makoto Murata, 2000 – Content models without the 1-unambiguous contraint open.org/cover/topics.html#ambiguity Example content model -- (whitemove, blackmove)*, whitemove?
Type assignment
Type assignment (contd…) Assumption – If the type of an element can be determined by a SAX parser on seeing the start element tag, it is sufficient. DTDs and XML-Schema have the above property even without the 1-unambiguity constraint.
Disadvantages of having the 1-unambiguity constraint Significant loss in ability to describe constraints – the game of chess might be described as (whitemove | blackmove)* We lost the following constraints – whitemove and blackmove alternate – We start with a whitemove Shall we stick to the chess rules?
Disadvantages of having the 1-unambiguity constraint (contd…) Difficult for document processing, and type inference – No characterization of 1-unambiguous model groups – Less constraints => less algebraic optimization is possible.
Conclusions One class of schema languages identified by the property – the type of an element can be determined by a depth first traversal (SAX parser) on seeing the start element tag Such schema languages do not need the 1- unambiguity constraint. 1-unambiguity constraint is difficult to work with for type inference, and for playing chess.
Acknowledgements XML-DEV mailing list – the discussions in this list largely motivated this talk.
Additional material at this conference Taxonomy of XML Schema languages using Formal Language Theory – Aug 15, 4:00 pm RELAX NG: Unification of RELAX Core and TREX – Aug 17, 9:00 am