Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group Electronics and Information Technology Exposition - ELITEX 2005 India.

Similar presentations


Presentation on theme: "Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group Electronics and Information Technology Exposition - ELITEX 2005 India."— Presentation transcript:

1 Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group mdk@cdac.in Electronics and Information Technology Exposition - ELITEX 2005 India Habitat Centre, Lodhi Road, New Delhi. 25 - 26 April 2005

2 Nurturing Living Languages © C-DAC Multimodal System (Human Computer Interface) for Indian languages Issues - Solutions

3 Nurturing Living Languages © C-DAC Multimodal System Enables users to communicate with computers via several modes such as Keyboard, OCR, Speech, Gesture, Gaze, Visual, etc. Major challenge for computer system designers lies in simplifying the Human Machine Interface. Researchers all over the world are inventing different modes of interactions, some of them with little or no success. No single mode is sufficient for effective communication with the machine. Some of the popular interaction mechanisms are Keyboard Unistroke Graffiti Predictive writing OCR Speech (limited vocabulary)

4 Nurturing Living Languages © C-DAC Multilingual – 22 scheduled languages. Complex script(s) as compared to English. (especially poses problems for OCR) While inputting, many to one and many to many relationship unlike English. Limited availability of linguistic resources. Layman terminology versus pure linguistics terminology. Various dialects poses challenge for speech input Multimodal System for Indian languages Challenges Impact Possible solution Lack of efficient Indian language based multimodal system has put restriction on content creation. Need for Development of Expert /Smart writing systems backed up with Multimodal inputs, Linguistic Resources such as Spellcheckers, Grammar checker,Synonyms, Antonyms, Thesauri, Domain based Dictionaries, Phrases and references.

5 Nurturing Living Languages © C-DAC Base character - 80 Half character – 43 vowel character – 12 Matra character - 12 Hindi Language English base Character - 26 A B C D ……… English Language

6 Nurturing Living Languages © C-DAC Because of unavailability of processing power, mechanical Typewriter were devised, which were based on the fact “the way you see the way you write” INSCRIPT - Popular and widely used & has become de-facto standard. Based on phonetic structure of Indian languages – “the way speak the way you write. And Phonetic English for Urban users Its very bulky, difficult to carry & large as compared to the target device itself Use of both hands, not suitable for portable, mobile devices Not possible to use without training More than 80 keys required with UNSHIFT / SHIFT operations Limitations in mobile world Indian language keyboard layout(s).

7 Nurturing Living Languages © C-DAC Virtual / LASER keyboard PDA’s, Cellular telephones. Tablet PCs, Laptops. Industrial, sterile & medical environments. Test Equipment. Transport (Air, Rail, Automotive). Limitations Need a proper surface to display Image. Typing is cumbersome, since the finger positions and movements are restricted. Speed limitations.

8 Nurturing Living Languages © C-DAC KITTY, a finger-mounted keyboard for data entry into PDA's, Pocket PC's and Wearable Computers has been developed at the University of California in Irvine. KITTY – Keyboard Independent Touch Typing Two hand-mounted devices connect to the target computing device with the help of Blue tooth wireless networking technology. The user can type on a hard surface like a desk or table, or into the air. © University of California and Senseboard respectively.

9 Nurturing Living Languages © C-DAC Each character is represented by a single stroke & hence no segmentation problem The system does not need to use up resources to figure out where one character ends and another begins No need to write characters within bounding boxes, characters can be recognized even when they are written one on top of the other. Even can be used by blind person. Unistroke Inputting However require the user to spend some time learning the characters. Complex implementation for Asian languages. More oriented towards English. Limitations

10 Nurturing Living Languages © C-DAC Requires minimal time for learning the alphabet. This is all because Graffiti is easy to learn while Unistrokes is comparatively harder. Though Unistrokes is a faster mode for inputting text than Graffiti, nobody uses Unistrokes Graffiti inputting

11 Nurturing Living Languages © C-DAC Non-Predictive & Predictive Inputting mechanism for Handheld / Mobile Devices By C-DAC GIST Group

12 Nurturing Living Languages © C-DAC Multitap text entry mechanism EnglishHindi / Indian languages English has 26 alphabets only. In Hindi there are around 80 basic characters, 43 half characters, 12 vowels, 12 matras making it more than 147 characters. These are spread over 9 keys. I.e. 3 to 4 characters on single key. Spreading these 80 characters & half, vowels & matras over the 10 keys, it comes to around 9 to 10 characters on one key. To get the desired character user needs to press the key up to 4 times. It will be very cumbersome when inputting in a multi-tap way. Since more key presses are required to get the required characters it becomes more tedious to type a bigger matter Inputting the bigger message using this kind of mechanism for Indian languages is next to impossible.

13 Nurturing Living Languages © C-DAC Single character 26 combinations Two character 52 combinations Three character 4056 combinations. Comparative study of English & Hindi English Single character 80 combinations Two characters 6889 combinations Three characters 571787 Hindi

14 Nurturing Living Languages © C-DAC Multitap 12 keys are required to input the character If a character is missed out then you need to restart all over again Ideally suitable for less than 3-4 character per key. Not suitable for Indian language inputting, since almost 7-8 characters are required to be placed on each key. (Basic character 80, half character 43, vowels 12, Matra 12)

15 Nurturing Living Languages © C-DAC Two key non-predictive 4 keys are required to input the character Any character entered in just two key press. Key mapping done on basis of vargas & hence easy to remember. Very short learning time. (3-5 minutes) No need to remember the keys Guiding reduces mistakes With the same keyboard layout all Indian languages can be inputted, so no need to learn again for other language.

16 Nurturing Living Languages © C-DAC Two key non-predictive 13 keys are required to input the character * Key is the mode key used for selecting halant / half character. Technology given to MNC’s

17 Nurturing Living Languages © C-DAC Predictive writing This should address the need of fast inputting using limited keys. Should not take more than one key press per characters. Should help in auto completion of word, so less key press than length of the word. Fast searching with help of most commonly used words dictionary as a backup. Can manage the user-defined words also. C-DAC GIST has developed predictive writing for Hindi language and work in progress for others.

18 Nurturing Living Languages © C-DAC Because of nature of script, more complex to implement than any other language. “Accuracy increase” is a function of continuous development process. Stepwise approach to achieve good level of prediction. Approaches for Predictive inputting for devices Pure Dictionary based. Dictionary plus rule based approach Addition of Domain specific dictionaries. Increase in accuracy by analyzing live data & accordingly enhancing built-in dictionaries.

19 Nurturing Living Languages © C-DAC Predictive writing Demo 5 keys required to complete the word Dhanyavad

20 Nurturing Living Languages © C-DAC

21 Nurturing Living Languages © C-DAC Features : Highly efficient algorithm & automatic prediction of the frequently used words by the user. Auto tracking of the frequently used words by the user & giving them priority. Currently 25,000 common “spoken Hindi” words. Addition of words by the user which are not available in the dictionary with the help of non-predictive mechanism. Current memory requirements 180 KB for 25,000 words - uncompressed 8 KB for code 3 KB scratch memory.

22 Nurturing Living Languages © C-DAC Conclusions Urgent need for Development of Expert /Smart writing systems backed up with Multimodal inputs, Linguistic Resources such as Spellcheckers, Grammar Checker,Synonyms, Antonyms, Thesauri, Domain based Dictionaries, Phrases and References. Standardization for inputting Indian languages through limited keys.

23 Nurturing Living Languages © C-DAC THANK YOU Nurturing living languages


Download ppt "Nurturing Living Languages © C-DAC Mahesh D. Kulkarni C-DAC GIST Group Electronics and Information Technology Exposition - ELITEX 2005 India."

Similar presentations


Ads by Google