Download presentation
Presentation is loading. Please wait.
Published byScot Dalton Modified over 9 years ago
1
© Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential
2
Market fragmentation By domains By languages Confidential
3
WHY SHOULD LT VENDORS SHARE THEIR RESOURCES? ●Many of LT vendors have their own LT ●LTs are focused on particular domain/language(s) ●Resources are critical for enabling such technologies ●If case of share vendors may loose competitive advantage 3 Confidential
4
Technologies ability and restrictions ●Language specific = language centric = limited by language ●Difficulties - Controlled links ●Anaphora ●Long distance links ●Ellipsis ●Ontology, dictionaries, statistic = trained on limited set of data = covers only limited variety of meaning representations = sometimes good to achieve 40% of recall (NER US DoD track) 4 Confidential
5
WHAT IS BIGDATA… ●Multilingual ●Covers more than 1 domain ●85 – 90% is in unstructured text documents ●Language expression of the same meaning vary by uncountable number of ways 5 Confidential
6
A FUNDAMENTAL NATURAL LANGUAGE TECHNOLOGY REQUIRED SCALABLE BY DOMAINS AND LANGUAGES 6 Confidential
7
ABBYY Compreno as proposal 7 ●Interlingua approach: ●semantic model is based on universal language independent representation both for lexis and grammar ●Working Languages: ●Russian, English: at the stage of terminological and collocation expansion ●German: full prototype (lexis, syntax) is completed; at the stage of main lexis expansion (from core to periphery) ●French: full prototype is completed (tested on controlled MT task) ; ●Chinese: lexical system prototype is completed (challenged task never carried out before); ●It is proved that Compreno is a scalable technology to use for any language Confidential Universal Semantic Hierarchy Statistic and machine learning Syntactic and semantic analysis
8
Complete syntactic and semantic analysis The bank was located at the bank of the river; it was closed. The complete analysis helps overcome linguistic problems in the text, if any..
9
Compreno current achievements 9 Confidential Russian syntax analysis 2011PrecisionRecallF Compreno 0.950.980.97 System 2 0.930.980.96 System 3 0.900.980.94 System 4 0.890.950.92 System 5 0.860.980.92 System 6 0.86 System 7 0.790.980.87 Fact Extraction 2013 ComprenoSystem 1ComprenoSystem 2ComprenoSystem 3 Precision0.95 0.960.980.92 Recall0.930.700.840.440.920.74 F-measure0.940.810.900.610.920.82 ABBYY advantage 14%32%10%
10
Applications ●BigData analytics – analysis of facts, extraction of objects ●Intelligence, eDiscovery (any kind) ●Search by meaning rather than by concepts ●Dialogues systems by natural language ●Translation 10 Confidential
11
Few facts about Compreno ●18 years of development ●About 350 people involved ●More than 2000 man-years 11 Confidential
12
Barriers for wide implementation ●At least 3 years per language ●At least 30 linguists per language ●At least 12M € per language ●Then support and improvement 12 Confidential
13
EU project idea ●Describe ALL EU languages ●Describe Major domains: healthcare, law, government, major industries ●ABBYY commitment: ●Methodology, management, instruments 13 Confidential
14
EU BENEFITS – CREATE SINGLE DIGITAL LT MARKET ●Operate not with language but with universal model of it – interlingual approach ●Describe one domain in one language – apply in all other languages ●A platform for LT vendors to create solutions and products easy scalable by languages and domains 14 Confidential
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.