Kirrkirr: a Bidirectional Warlpiri- English Dictionary Kristen Parton
Kirrkirr: Objectives Kirrkirr aims to present the contents of a dictionary in a way which is flexible, interactive, customizable, and (especially) fun Kirrkirr has diverse target users, with varying levels of literacy, for example professional linguists, elementary school children, teachers, and native speakers Currently, Kirrkirr is used with the Australian Aboriginal language Warlpiri, spoken by about 3,000 people in northern Australia Kirrkirr uses a Warlpiri-English dictionary developed by linguists in Australia, with detailed information about each word, including glosses, definitions, dialects, grammatical comments and cross- references between words for synonyms, antonyms, “see also” and other relationships Unlike paper dictionaries, electronic dictionaries can provide an interactive educational tool customizable to various audiences
Dictionary Usability The interface has a colorful, clickable panel which links words related in different ways, rather than just relying on the alphabetical list of words; this also makes the dictionary more interactive Many words are linked to pictures and sounds, which reinforce the meaning of the words through non-textual means The dictionary uses “fuzzy spelling” to catch spelling errors made by the user when searching for a word User modes tailor the appearance of the formatted entries to each target audience: English meaning only,for novice users with English backgrounds In Warlpiri, for native speakers of Warlpiri Basic details, for intermediate users such as students Full details, for advanced users such as teachers or linguists
Lexicon Structure The dictionary is maintained by linguists in Australia in an ad- hoc text format, which is converted to a structured XML dictionary by a Perl script Rather than load the large (10Mb) XML file in memory, each headword’s XML entry is loaded individually as needed The rich structure of the XML allows XSLT stylesheet manipulation of the dictionary entries to produce output formatted differently for different users The XSLT stylesheet outputs HTML pages, which make use of the cross-references in the dictionary by creating hyperlinks between different words
Customizing Format with XSLT At run-time, the XML entries are processed by an XSLT stylesheet, which selects which elements of the entry to show, determines the order to show them in, and formats each field differently depending on the user mode For example, “Meaning only” outputs the english glosses of a word in large font, whereas “Full details” outputs all of the information in the dictionary in a normal sized font in a specific order. Since the XML is parsed at run-time, more information can be added to the XML to allow “parameter passing” from the program to the XSLT For example, the location of the images folder can only be determined at run-time, but by adding an field to the XML at run-time, the XSLT can create an tag to display an image in the HTML output
English-Warlpiri Dictionary The original dictionary is one-way Warlpiri to English, but a bidirectional bilingual dictionary is more useful for most users An English index was built from glosses in the dictionary such that each gloss links to the equivalent Warlpiri entries. Rather than being two separate monolingual dictionaries, these dictionaries share the same data, thus eliminating conflicting entries and maintaining consistency The XML entries of all the Warlpiri equivalents to an English word are merged, and passed to an XSLT spreadsheet, which creates an HTML page for the English word
English-Warlpiri Dictionary To make the English dictionary symmetric to the Warlpiri, Kirrkirr now has an English word list, English formatted entries, a much faster English search, and the capability to do “fuzzy spelling” in English Problems arise because most Warlpiri words have several English equivalents, and also because phrases in English might be indexed under several different terms For example, “yawarrangi” meaning “large male kangaroo” should be indexed under “kangaroo” rather than “large” or “male” However, the “jawirdiki” and other words that mean “stay put” should be indexed under “stay” and not “put” Words like “kirany-kiranypa” meaning “spinifex lizard” should be indexed under “spinifex” (the type) and “lizard”
Warlpiri Morphology Warlpiri is an agglutinating language, meaning that grammatical suffixes get added on to words: nyangulparnangku nya- ngu- lpa- rna- ngku See- PAST- IPFV- 1SG.SUBj- 2SG.OBJ “I was looking at you.” Root word: “nya-nyi” meaning “to see” For lookup in the dictionary, users have to know the root word This is difficult for learners of Warlpiri, given that morphemes are not always separated by hyphens and verbs are indexed with non-past tense inflections To make Kirrkirr more usable, a morphological analyzer was implemented to accept well-formed Warlpiri words and find the possible root words to look up
Morphological Analysis Suffixes from the dictionary are stored in a trie for quick lookup Each time an affix is stripped, the remaining string is checked to see whether it is in the dictionary Each possible morpheme is added to a lattice structure which holds all possible morphological decompositions of the word Grammar rules are applied to eliminate many impossible parses Some properties of Warlpiri make parsing more difficult, and show the need for a different indexing system: Verbs are stored with non-past inflections but are seen with different inflections. For example, “nya-nyi” may show up as “nya-ngu.” But indexing “nya-nyi” under “nya” creates more abiguity, since “nya” is another word. Some words have optional suffixes, such as “l(pa)” which may be seen as “l” or “lpa.” These words must be indexed under both entries.
Conclusions Making Kirrkirr a bidirectional English-Warlpiri and Warlpiri-English dictionary increases its usability and practicality, by making it easier for users who are more comfortable in English to browse and search in English. Allowing lookup of Warlpiri words from actual speech using the morphological analysis also increases usability, especially for users who are learning Warlpiri, since they do not have to figure out the root word. Future work: Improving the morphological analysis to provide roughly ranked possible parses of all morphemes of an entire word, using more grammatical information and frequency information Extending Kirrkirr to other languages