Translingual Information Management Stephan Busemann Language Technology Lab German Research Center for Artificial Intelligence
© 2004 DFKI Language Technology Lab Language Technology Lab D ATA Management Lab Director : Prof. Dr. Hans Uszkoreit Associate Lab Director: Dr. Stephan Busemann Projects: BMBF, EU, Saarland and Industry Turnover: > 2 Mio € per annum
© 2004 DFKI Language Technology Lab LT L ab S TAFF The lab employs 20 researchers and software engineers from 8 countries They are supported by 24 research assistants and guest scientists
© 2004 DFKI Language Technology Lab C OOPERATIONS Many of tasks we carry out in joint projects with partners from industry, academia and other contract research centers. We collaborate closely with: the Department of Computational Linguistics, the Department of Computer Science and other institutes at Saarland University.
© 2004 DFKI Language Technology Lab LT L AB O VERVIEW DFKI‘s Language Technology Lab has 20 researchers and software engineers from 8 countries Language resources for German, English, French, Chinese, Japanese, Spanish, Italian, Portuguese, Dutch, Slavic Languages,... Three-stage approach Develop and maintain reusable base technologies Configure complex systems Adapt or extend to build application systems
© 2004 DFKI Language Technology Lab B ASE T ECHNOLOGIES Preprocessing (tokenization, POS tagging, morphology) Shallow Parsing (statistical chunk parsing, FST grammars) Several Techniques for Categorization (machine learning) Deep Syntactic and Semantic Analysis (efficient HPSG parsing) Shallow and In-Depth Generation Several Techniques for Text Summarization (e.g. query-dependent) Text-to-Speech for German, English
© 2004 DFKI Language Technology Lab T HREE L INES OF C OMPLEX S YSTEMS Natural Communication response management, speech interpretation and production, emotion in synthesis Multilingual Authoring Support terminology and grammar checking for controlled language, tools for annotation by metadata (and other structuring information), linguistic lookup Information and Knowledge Management multilingual retrieval, information extraction, semantic-web infrastructure, automatic hyperlinking, open-domain question answering, report generation
© 2004 DFKI Language Technology Lab S OME P ROJECTS TEMSIS PARA DIME PARA DIME Multi-lingual Extraction of Travel Warning Information Cross-lingual Navigation Cross-lingual Tourism Information Multi-lingual Question Answering Indexing of Commented Video Material (soccer games) Multi-lingual Generation of Air Quality Reports
© 2004 DFKI Language Technology Lab S AMPLE A PPLICATIONS Deutsche Telekom question answering on tariff information Dresdner Bank automatic hyperlinking for structuring program listings and documentation SAP AG controlled language checking Interprice Technologies dialogue system for e-commerce product search
© 2004 DFKI Language Technology Lab I NFORMATION E XTRACTION Requirements Must adapt to shallow or more deep tasks Must be multi-lingual Must efficiently process large sets of text Sample Applications Named Entity Recognition Opinion Extraction Extraction of Travel Warning Information Hyperlinking SProUT, a Java and C++ based IE framework
© 2004 DFKI Language Technology Lab S PROUT S YSTEM Configurable Linguistic Components Tokenizers Morphological analysis components (e.g. MMorph) Feature-Enhanced Gazetteers Finite State Grammars integrating and extending the above resources Core Tools JTFS -- Type Unification and Subsumption FSM Toolkit for processing FS transducers Grammar Development Environment