ORM Verbalization in Malay & Mandarin Lim Shin Huei Terry Halpin
Introduction ORM data models enable many kinds of business constraints to be visualized graphically These are best validated with domain experts by verbalizing them in a controlled natural language and populating them with examples. We are currently extending ORM technology to automatically verbalize ORM models in Malay & Mandarin. These languages require special treatment in order to render natural verbalization, especially Noun Classifiers. We discuss the basic ideas behind our approach and demonstrate an initial prototype
Logical Elements & Noun Classifiers We first transform ORM constraints to an underlying logical form, which is then transformed to linguistic forms suited to the target natural languages. The logic form patterns include slots for various logical elements such as quantifiers and operators which have corresponding textual representations. Here is a list of the correspondences for the modal operators.
Here is a list of the most common quantifiers and their verbalizations in three languages. We use the symbol “ ” to denote any appropriate noun classifier for the term being quantified.
Here are a few examples of noun classifiers, with Malay classifiers listed before Mandarin classifiers. Note: Mandarin often has many choices of classifier for the same usage category. In Malay, ‘Orang’ can be used as a noun phrase or a classifier; if an entity type is named ‘Orang’ (meaning Person), no noun classifier is used for it.
Quantifiers are used in verbalizing many kinds of ORM constraints, including internal and external uniqueness and frequency constraints. Here we discuss only internal uniqueness constraints on binary fact types. We focus mainly on the n:1 patterns shown below. Patterns for 1:n, 1:1 and m:n cases may be dealt with similarly.
An ORM predicate may have many readings, which may be in mixfix form, e.g. The fact type “Person played Sport for Country” uses the predicate reading “… played … for …”. The structure of a predicate reading is irrelevant to the logical form, which denotes a predicate by a single symbol (e.g. R or S). Verbalizations may be displayed in positive form (e.g. Each Person was born on at most one Date.) or negative form (e.g. It is impossible that some Person was born on more than one Date.). Mappings for other logical or linguistic elements such as Boolean operators (and, or, not etc. respectively render as “dan” and “atau” and “ bukan” in Malay, and as “ 和 ” and “ 或 ” and “ 不 ” in Mandarin) and pronouns are also needed (e.g. “that” and “the same” respectively render as “itu” and “yang sama” in Malay, and as “ 那 ” and “ 一样的 ” in Mandarin).
Here are the logical forms of the verbalizations for the n:1 patterns. The Malay and Mandarin forms include a noun classifier (denoted here by ) to categorize the kind of thing being counted. The absence of a simple, alethic uniqueness constraint on a role of an n:1 binary is explicitly verbalized (e.g. It is possible that more than one Person was born in the same Country). E.g. the English logical form of the above is y:B 2.. x:A xRy. In Malay this is y:B 2.. x:A xRy; in Mandarin y:B 2.. x:A xRy .
Implementing ORM Verbalization in Malay & Mandarin Mandarin differs from Malay in allowing more than one choice of noun classifier for the same usage category/noun type. E.g. the combination of mandatory role and 2 frequency constraints on Fishmonger’s role in the fact type Fishmonger sells FishKind, may be verbalized in English as “Each Fishmonger sells more than one FishKind.”. In Malay, the classifier for FishKind is “jenis”. Showing logical words in bold and classifiers in red, this verbalizes in Malay as: “Setiap PenjualIkan menjual lebih daripada satu jenis Spesies Ikan.”. In Mandarin however, any of these verbalizations could be used: 每个鱼贩卖多过一种鱼类 ; 每位鱼贩卖多过一种鱼类 ; 每名鱼 贩卖多过一种鱼类. Which one is best, is decided by the user.
In ORM, each object type has a distinct name which is a noun phase (such as “Postgraduate Student”, “Lecturer”, etc). Using “NounType” for usage category, and “Classifier” for noun classifier, the situation for Mandarin may be modeled as shown. The fact type NounType has Classifier may be prepopulated with known data. However, in general the fact type NounPhrase is of NounType needs to be populated by the user.
For Malay, where the NounType has Classifier is n:1, that derived classifier isthe only possibility. For Mandarin, often more than one classifier is derived, so the user is presented with a list of possible classifiers from which to choose his/her preferred one (see prototype demo later). This metamodel fragment provides one way to view the situation if, instead of using a separate model for each language, one wishes to use a single model with multiple display options based on the language choice.
Here is a screenshot from our prototype tool for entering and verbalizing binary fact types in ORM using Bahasa Malaysia (shown here as the option BM) or Mandarin.
This screenshot shows the Classifier properties dialog that enables users to pick the suitable classifier to use for the relevant NounType.
The table shows the corresponding positive verbalizations in English and Mandarin. The negative form of the verbalizations may be displayed by selecting the negative (-) button. Verbalizations in Malay are performed in a similar manner.
This shows the final screen for the n:1 fact type Politician was born in Country in Malay Verbalization.
Conclusion This paper described our initial work in verbalizing ORM models in Malay and Mandarin, with special attention to verbalizing noun classifiers. Future plans include implementing our approach via language extensions to the NORMA tool, and fully covering all of the many ORM graphical constraint varieties in these Asian languages.