HLT Research and Development for Baltic Languages in Tilde Andrejs Vasiļjevs, Raivis Skadiņš Tilde Riga, October 27, 2004
HLT needs of Baltic users Baltic users expect the same possibilities for their language as are available for other languages Raising number of users without English proficiency Bilingual/multilingual environment, language competition EU accession and globalization creates huge demand for translation
Prerequisites for HLT application development Cross-disciplinary higher education HLT industry Scientific Research HLT Industry
Challenges for Baltic Languages Rich structure, inflectivity and complexity of languages Relatively small number of speakers, small market Few researchers active in HLT, insufficient scientific base Lack of specialists resulting from lack of targeted education programs
Tilde - Basic Facts Tilde – leading Baltic developer of software and services that helps Baltic users to get most from the use of new technologies taking into account their language, business environment, culture and local information needs Established in 1991 Offices in Riga, Vilnius and Tallinn 75 employees
Tilde - Areas of Activity Language and Reference Tools for Baltic Languages proofing tools, dictionaries, translation tools, fonts, search tools, speech tools, Encyclopedias Localization Services for ICT products translation and adaptation of interface and documentation for software and hardware products Information management solutions Streaming media
Localization Services – Where Global Meets Local Requirement of users – to have interface and documentation in Latvian, Lithuanian, Estonian Full cycle of localization services Intelligent translation using translation memory, e-dictionaries and other tools Professional team across the Baltics Global clients in local markets – Microsoft, IBM, Ericsson, Nokia, Xerox, Hewlett Packard and others MS Windows, MS Office, Oracle, mobile phones – examples of localized products
Proofing Tools More than 10 years of development Tools for Latvian and Lithuanian languages Spellchecker Intelligent Autocorrect Grammar checker Hyphenator Thesaurus Collaboration with Microsoft, developments for Lotus Notes
Demo – Grammar Checker and Spellchecker
Demo - Thesaurus
Terminology Portal Partnership from NGO and private sector to provide common online database of terminology from all fields Initiated by LITTA (Latvian Information Technology and Telecommunication Association) Partners: LITTA, LMT, Tilde, Terminology Commission Currently covers >145 000 terms from 35 fields
Demo – Text-to-Speech in MS Excel
Our Products Tildes Birojs 2002 - complete solution for working with computer in Latvian Tildės Biuras 2004 - complete solution for working with computer in Lithuanian > 100 000 licensed users
Tildes Birojs and Tildes Biuras HLT Tools - Proofing tools, Dictionaries, Learning tools, Limited speech synthesis, Search facilities Language support for all popular operating systems Windows 95/98/NT/2000/Me/XP, Windows CE, Linux, MacOS Text input tools - keyboard drivers and keyboard adjustment tools, fonts, converters OCR Document Templates Information resources – smart tag references, eBooks and others
Web service – Letonika.LV Encyclopedias (General, History, Poetry, Financial, Regional) Translation Dictionaries Terminology Dictionaries Catalog of Latvian Web Advanced search system (under development)
Search and information retrieval Established infrastructure for web crawling and indexing Search result ranking and filtering methods Advanced usage of Latvian/Lithuanian morphology Integrated with Latvian Web Catalog Wide coverage, constantly updated Work on style guessing
Demo – Search in Latvian Web
Further Development Tilde’s goal: To enable Baltic users to explore results of native language HLT in their everyday work Development directions: Search and information retrieval technologies Machine Translation Speech technologies
Success factors Demand driven approach focusing on user needs Development of base technologies and integration in end-user products (like integration of morphology in dictionary) Iterative approach – efficient delivery of first results and constant improvements
Challenges Further development in advanced areas (speech, machine translation, IR etc.) is very complicated and resource consuming Technology gap separating from large languages should be narrowed Baltic and EU cooperation is critical Researchers and industry developers should target EU programs with joint projects
THANK YOU! www.tilde.lv www.tilde.lt www.tilde.ee