SIL FieldWorks Language Explorer: The lexicon component Gary Simons SIL International Lexicon Tools and Lexicon Standards Nijmegen, 4–5 August 2010
2 SIL FieldWorks FieldWorks is: a suite of integrated software tools to help field workers manage language and cultural data, with support for complex scripts. The Language Explorer tool is designed to: manage a lexical database produce dictionaries interlinearize texts analyze morphology
3 Quick Tour A short quick tour screen movie demonstrates the look and feel It is the first of 55 narrated screen movies available at: /brief demo menu.html /brief demo menu.html
4 Integration among areas The Lexicon, Texts, and Grammar areas all operate over the same database. In the Lexicon area, users enter lexical entries directly. In the Texts area, as new morphemes are glossed in text, new lexical entries are created behind the scenes. In the Grammar area, users describe the categories and features used in lexical description, plus the inflectional templates that guide automatic parsing in Texts.
Conceptual-modeling approach Lexicon, texts, and grammar are all stored in a single, normalized relational database. We began by working with domain experts to build a conceptual model of the areas and how they integrate. That was modeled in UML and transformed to a SQL relational database schema. See the full model with over 100 classes at: 5
Some key features Use automatic parsing to empirically verify morphological description within lexicon Build the word net via lexical relations Build richness into the lexicon by eliciting through semantic domains Use “bulk edit” for global clean up Repurpose content by developing multiple presentation views Clean separation between stored data and presentation (see example in next 2 slides) 6
Root-based dictionary (Cherokee) 7 - Stem entries just cross-refer to root - Root entries list stems as subentries - Subentries give full description
Stem-based dictionary (Cherokee) 8 - Stem entries give full description - Root entries cross-refer to stems - No subentries
Pathways to publishing First create a “configured view” to display the lexical entries as desired Then use the Pathway plug-in to take this stream of configured content and lay it out onto pages for a publishable dictionary Publishing tools supported so far: Prince XML (to PDF) Open Office (to ODF) Adobe InDesign 9
Lexical interchange Supports two import formats: From Shoebox / Toolbox via SFM “Standard Format Markers” = backslash codes User configures the mapping of markers to conceptual equivalents in FLEx database The default mapping is for MDF SFM From WeSay / Lexique Pro via LIFT Lexicon Interchange FormaT: an XML application for interchange of lexicons
Lexicon export The entire database for a language project can be dumped to Fieldworks XML XML model.doc XML model.doc The complete lexical database (a subset of the whole project) can be exported to: LIFT XML MDF-based SFM (either root- or stem-based) options in Flex.doc options in Flex.doc 11
More lexicon export Any configured view can be exported to: A streamlined version of Fieldworks XML MDF-based SFM XHTML + CSS for presentation Furthermore, one can create a Fieldworks XML Template (FXT) to define a custom export format (XML, SFM, plain text) export options.doc export options.doc 12
Interoperation with GOLD FLEX is preloaded with a grammatical categories catalog that is based on an early GOLD Similarly, a Morphosyntactic Gloss Assistant is preloaded with morphosyntactic properties from an early GOLD; see p. 10 of: Preprint.pdf Preprint.pdf Thus morphosyntactic information in lexicon and texts is implicitly aligned with GOLD The remaining step is for us to map to GOLD ids when they are standardized; then we can easily export GOLD ids in LIFT and other XML 13
Uptake October 2009: FLEx 3.0 released in Fieldworks 6.0. Free download from: 323 members of a reasonably active Google Group (~3,000 messages) 185 language projects have registered as users Over 30 did a 4-day FLEx workshop led by Beth Bryson at InField Beth will also do a one-day FLEx workshop at ICLDC, Feb