Developing an Android application for modifying speech recognition grammar models Mitchell Roberts Mentored by Raymond Schulze and Daniel Yaeger The Army’s Commander’s Toolkit (CTK) Android app was created to enhance interaction between commanders and their battlefield technology. The most significant human-computer interaction feature in CTK is its automatic speech recognition (ASR) input (see Figure 1). By using speech instead of a keyboard and mouse or even a touch display, the app – developed by the Army’s Communications-Electronics Research, Development, and Engineering Center – can easily be used in difficult environments, such as strapped into the seat of a tank. In these situations, ASR is the easiest and most effective method of input (Ruiz, 2011). The ASR engine in CTK is a small-vocabulary model, so each necessary voice command must be individually programmed in the Backus-Naur Form (BNF) grammar file. Consequently, users could not add custom commands without editing the BNF source code. The first goal was to develop a companion Android application to automatically modify the BNF file, thereby allowing users to augment the existing grammar model without any knowledge of the BNF. The grammar models generated by the app would have to detect custom and built-in commands with equal accuracy. The second goal was to create a Windows® batch script to compile the grammar file into the necessary binary format to be used by the ASR engine. Figure 1: Screenshot of CTK receiving voice command from ASR service (MITRE Corporation, 2013). Introduction The Android app was created using the Eclipse integrated development environment with the Android Development Toolkit. The user of the app selects a software action in the left column and types a new command in the right column. The app then programmatically writes BNF code based on these selections (see Figure 2). This code is inserted and saved in an existing BNF file on the device containing the default grammar (see Figure 3). Materials and Methods Before the grammar model can be used by the ASR engine, it must be compiled into a binary file. The compiler that generates the binaries was provided by Nuance Communications, the creator of the engine. As it is a Python based application, it cannot be executed on the Android device running CTK and must be run on a separate Windows personal computer. A batch script was written to transfer the necessary files between the Android device and the computer and to compile the BNF code. Using the Android Debug Bridge, the BNF grammar file containing the custom commands is pulled from the device and saved onto the computer. From there, the file is compiled into a .fcf file using the Nuance Software Development Kit. Lastly, that .fcf file is pushed back onto the Android device. The next time the ASR service is run, it will import the new grammar and recognize the custom commands. Materials and Methods (cont’d) The completed application and script were able to successfully modify the grammar model and allow the ASR service to recognize new commands (see Figure 4). The performance of custom commands was measured by recording the confidence scores output by the ASR engine for 20 different built-in commands against 20 corresponding custom commands (see Graph 1). A two-tailed two-sample t-test performed on the scores did not show a significant difference in confidence between the two data sets (p-value = 0.3605). Results Figures 4a (left) and 4b (right): Screenshots of the ASR service output when “show me everything” was spoken before (left) and after (right) the grammar model was changed. Figure 2 (left): Screenshot of grammar modification app in use. Figures 3a (top) and 3b (bottom): BNF grammar code before (top) and after (bottom) being modified by companion app using the user input from Figure 2. <custom_command>: ( dummy move map down !id(504) | all symbols on !id(665) | all symbols off !id(666) | go back !id(207) | show map !id(408) ); MITRE Corporation. (2013). Mission Command Battle Laboratory Commander’s Toolkit Study (Project No. 0713A92A-TK). Leavenworth, Kansas: Drury. Ruiz, N. (2011). Cognitive Load Measurement in Multimodal Interfaces. (Doctoral dissertation). Retrieved from National Information and Communications Technology Center of Australia https://www.nicta.com.au/pub?doc=4904 References Not only could the Android app and batch script add new custom commands with very little user input, the commands had the same recognition confidence as built-in commands. Possible confounding variables that may have led to the variances present in the data include background noise during testing and the length and complexity of the command orthographies. Future work on this project will allow modification of the grammar model for multiple apps, not just CTK. The grammar modification app will also implement on-device grammar compilation, eliminating the need for a computer in the addition of custom commands, and allowing the process to be completed on an Android device alone. Conclusions and Applications Graph 1: The confidence scores output by the ASR engine for built-in commands and custom commands added and compiled with the companion application and accompanying script. Confidence scores out of 10,000 correspond to the probability that the ASR engine correctly detected and interpreted speech. Results (cont’d)