Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken

Slides:



Advertisements
Similar presentations
Natural Language Generation
Advertisements

Inside an XSLT Processor Michael Kay, ICL 19 May 2000.
BAH DAML Tools XML To DAML Query Relevance Assessor DAML XSLT Adapter.
Profiles Construction Eclipse ECESIS Project Construction of Complex UML Profiles UPM ETSI Telecomunicación Ciudad Universitaria s/n Madrid 28040,
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
ISO DSDL ISO – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1.
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Delivery Context Workshop CC/PP and UAProf: Issues, improvements and future directions Mark H. Butler, PhD HP Labs Bristol.
Requirements Engineering n Elicit requirements from customer  Information and control needs, product function and behavior, overall product performance,
Chapter 20: Natural Language Generation Presented by: Anastasia Gorbunova LING538: Computational Linguistics, Fall 2006 Speech and Language Processing.
CPSC Compiler Tutorial 9 Review of Compiler.
An Introduction to NLG What is Natural Language Generation? Some Example Systems Types of NLG Applications When are NLG Techniques Appropriate? NLG System.
JSP: JavaServer Pages Juan Cruz Kevin Hessels Ian Moon.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
ASP.NET Programming with C# and SQL Server First Edition
Supplement 02CASE Tools1 Supplement 02 - Case Tools And Franchise Colleges By MANSHA NAWAZ.
Introduction to Unix (CA263) Introduction to Shell Script Programming By Tariq Ibn Aziz.
Lecture Nine Database Planning, Design, and Administration
September 15, 2003Houssam Haitof1 XSL Transformation Houssam Haitof.
Using Use Case Scenarios and Operational Variables for Generating Test Objectives Javier J. Gutiérrez María José Escalona Manuel Mejías Arturo H. Torres.
1212 Management and Communication of Distributed Conceptual Design Knowledge in the Building and Construction Industry Dr.ir. Jos van Leeuwen Eindhoven.
Configuration Management
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
Process-oriented System Automation Executable Process Modeling & Process Automation.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
ArcGIS Workflow Manager An Introduction
DHTML. What is DHTML?  DHTML is the combination of several built-in browser features in fourth generation browsers that enable a web page to be more.
JSP Standard Tag Library
Chapter 9 Database Planning, Design, and Administration Sungchul Hong.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
A Time Based Approach to Musical Pattern Discovery in Polyphonic Music Tamar Berman Graduate School of Library and Information Science University of Illinois.
Introduction to Shell Script Programming
Requirements Analysis
Role-plays for CALL: System Architecture and Resources Sabrina Wilske & Magdalena Wolska Saarland University ICL, Villach, September.
ITEC224 Database Programming
An Introduction to Software Architecture
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn XMLSnippet: A Coding Assistant for XML Configuration Snippet.
Concepts and Terminology Introduction to Database.
Mapping between SOS standard specifications and INSPIRE legislation. Relationship between SOS and D2.9 Matthes Rieke, Dr. Albert Remke (m.rieke,
ISO Environmental management — Life cycle assessment — Data documentation format.
Querying Structured Text in an XML Database By Xuemei Luo.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Chapter 13 Architectural Design
CSE 219 Computer Science III Program Design Principles.
Current and Future Applications of the Generic Statistical Business Process Model at Statistics Canada Laurie Reedman and Claude Julien May 5, 2010.
An Intelligent Analyzer and Understander of English Yorick Wilks 1975, ACM.
The european ITM Task Force data structure F. Imbeaux.
1 5 Nov 2002 Risto Pohjonen, Juha-Pekka Tolvanen MetaCase Consulting AUTOMATED PRODUCTION OF FAMILY MEMBERS: LESSONS LEARNED.
JSTL The JavaServer Pages Standard Tag Library (JSTL) is a collection of useful JSP tags which encapsulates core functionality common to many JSP applications.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Design and Implementation of a Rationale-Based Analysis Tool (RAT) Diploma thesis from Timo Wolf Design and Realization of a Tool for Linking Source Code.
Translingual Information Management Stephan Busemann Language Technology Lab German Research Center for Artificial Intelligence.
Analysis of station classification and network design INERIS (Laure Malherbe, Anthony Ung), NILU (Philipp Schneider), RIVM (Frank de Leeuw, Benno Jimmink)
Document Databases for Information Management Gregor Erbach FTW, Wien DFKI, Saarbrucken ETL, Tsukuba
JAVA BEANS JSP - Standard Tag Library (JSTL) JAVA Enterprise Edition.
Operational Semantics Mooly Sagiv Reference: Semantics with Applications Chapter 2 H. Nielson and F. Nielson
1 XSL Transformations (XSLT). 2 XSLT XSLT is a language for transforming XML documents into XHTML documents or to other XML documents. XSLT uses XPath.
Chapter 9 Database Planning, Design, and Administration Transparencies © Pearson Education Limited 1995, 2005.
MDD-Kurs / MDA Cortex Brainware Consulting & Training GmbH Copyright © 2007 Cortex Brainware GmbH Bild 1Ver.: 1.0 How does intelligent functionality implemented.
1 Software Requirements Descriptions and specifications of a system.
Distribution and components
WELCOME TO COSRI IBADAN
An Introduction to Software Architecture
Presentation transcript:

Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D Saarbrücken

Source: Stephan Busemann Language Technology I, WS 2005/2006 DFKI‘s TG/2 System Offers a Flexible Framework for NLG TG/2 is a transparent production system TG/2 interprets a separately defined set of condition-action rules TG/2 maps pieces of input onto surface strings TG/2 keeps grammars largely independent from input representations Test Predicates on properties of the input Access Paths yielding a part of the Input Input Grammar Rules (COOP threshold-passing) DECL -> PPTIME THTYPE EXCEEDS

Source: Stephan Busemann Language Technology I, WS 2005/2006 The TGL Syntax Uses Test and Access Path Descriptions Category: Context-free skeleton of derivations Test: Conjunction of Boolean predicates on the the input structure Template: Sequence of actions, each operating on a substructure of the current input (accessor functions) –Rule (:RULE), failure causes the rule to fail as well –Optional rule (:OPTRULE), ignored unless input available –Access path descriptions, determining input substructures –Functions (:FUN) into strings, failure causes rule to fail as well –ASCII Strings, directly to output Constraints: feature equations expressing agreement relations Side effects, e.g. updating a discourse memory

Source: Stephan Busemann Language Technology I, WS 2005/2006 TG/2 Grammars Integrate Canned Texts, Templates and Context-free Rules My category is DECL. IF the slot COOP is 'threshold-passing AND the slot LAW-NAME is specified THEN apply PPtime from slot TIME apply THTYPE from CURRENT-INPUT utter "(" apply LAW from slot LAW-NAME utter ") " apply EXCEEDS from slot EXCEEDS utter "." WHERE THTYPE AND EXCEEDS agree in GENDER My category is THTYPE. IF there is no slot THRESHOLD-TYPE specified THEN utter "la valeur limite autoris&e2e " WHERE THTYPE has value 'fem for GENDER En été 1999 la valeur limite autorisée ( selon le decret... ) a été dépassée une fois. [ Busemann 1996 ]

Source: Stephan Busemann Language Technology I, WS 2005/2006 The TGL Rules are Written as Condition- Action Pairs (defproduction wertueberschreitung "WU01" (:PRECOND (:CAT DECL :TEST ((pred-eq 'wertueberschreitung) (not (threshold-value-p)))) :ACTIONS (:TEMPLATE (:OPTRULE PPtime (get-param 'time)) (:RULE THTYPE (self)) (:OPTRULE POLL (get-param 'pollutant)) "(" (:RULE LAW (get-param 'law-name)) ") " (:RULE EXCEEDS (get-param 'exceeds)) "." :CONSTRAINTS (:GENDER (THTYPE EXCEEDS) :EQ)))) (defproduction thtype "THT3" (:PRECOND (:CAT THTYPE :TEST ((not (threshold-type-p)))) :ACTIONS (:TEMPLATE "la valeur limite autoris&e2e " :CONSTRAINTS (:GENDER THTYPE :VAL 'fem))))

Source: Stephan Busemann Language Technology I, WS 2005/2006 TGL Constraints are Percolated Across the Derivation Tree Constraints are PATR-II style, percolation through unification (  ) Every local tree is licensed by a grammar rule A feature can be assigned a value ( := ) Two features can be constrained to have identical values ( = ) (X1.GENDER = X2.GENDER) (X0.GENDER = X2.Gender) inflect(dépassé) X0 X1 X2  X1  X0 “la valeur limite “ (X0.GENDER := fem)

Source: Stephan Busemann Language Technology I, WS 2005/2006 The Interpreter is Based on the Context- Free Backbone of TGL-Grammars Matching –Identify all rules with the current category –For each of them perform its tests on the input structure (“IF” part) –Add those passing the tests to the conflict set Conflict resolution –Select an element of the conflict set (possibly by some preference mechanism) Firing –Evaluate the rule‘s constraints (if available, “WHERE” part) –For each element of the “THEN” part, read the new category and determine the new input structure by evaluating the associated access path descriptions THREE-STEP EVALUATION CYCLE

Source: Stephan Busemann Language Technology I, WS 2005/2006 eGram Supports Grammar Development

Source: Stephan Busemann Language Technology I, WS 2005/2006 TG/2 has been Licensed to More than 30 Sites Worldwide Development versions –Allegro Common Lisp under Linux –Server version, demo –Java version under development, joint work by DFKI and XtraMind GmbH Some systems developed with TG/2 and derivatives –Appointment negotiation dialogues (COSMA; D; DFKI) –Air quality reports (TEMSIS; D, F, E, P, CHN, JPN; DFKI and City of Saarbrücken) –Tourism information (MIETTA; D, E, I, SF; DFKI, CELI and U Helsinki) –Activity recommendation in conference scenarios (COMRIS; E; VU Brussels) –Cross-lingual Summarization (MUSI; DS; DFKI, Lexiquest and ILC) –Real estate descriptions (industrial; D, E, PL) –Software system design descriptions (feasibility study; non-NL; DFKI)

Source: Stephan Busemann Language Technology I, WS 2005/2006 TEMSIS is Developing a Transnational Environmental Management Support and Information System Project goals –distributed information and communi- cation platform for transnational co- operation –comfortable bilingual access to regional environmental data –information kiosks accessible via WWW Cooperation between administrations –French / German urban agglomeration Moselle-Est / Stadtverband Saarbrücken –Environmental information standardised and shared TAP Sector C9 (Environment) No Duration 01/96-06/98 Effort 172,5 PM

Source: Stephan Busemann Language Technology I, WS 2005/2006 Air Quality Reports Are Generated From Environmental Data at the TEMSIS-Server Parameter selection by the user –Language (German, French) –Pollutant and measurement station –Relevant period of time Stage 1: text schema construction –Querying the database –Composition of report structure –Elision of contextual redundancies Stage 2: verbalization by TG/2 –Selection of sentence patterns –Wording, phrasing, grammar HTML postprocessing Stage 1 Stage 2 Text Generation Server [Busemann/Horacek 1998, Horacek/Busemann 1998] Parameter Specs HTML/Java Code TEMSIS Database

Source: Stephan Busemann Language Technology I, WS 2005/2006 The System Covers 384 Report Structures Parameters selected from the TEMSIS navigator menus: –French language text about a German situation –Ozone data, exceeding thresholds according to decree –Measurements at Völklingen-City in summer 1997 (confirmation) Vous avez choisi la station de mesure de Völklingen-City afin de consulter la pollution atmosphérique relevée en été A la station de mesure de Völklingen-City, la valeur d'information pour l'ozone pour une exposition de 60 minutes (180 µg/m³ selon le decret allemand (Bundesimmissionsschutzverordnung)) a été dépassée une fois. La valeur d'interdiction du trafic (240 µg/m³) a aussi été dépassée une fois. En été 1996 la valeur d'information (180 µg/m³) n'a pas été dépassée. EXAMPLE

Source: Stephan Busemann Language Technology I, WS 2005/2006 Text Organization Is Schema-based Confirm important selected parameters Number the values exceeding the lowest threshold Number the values exceeding the next threshold Compare with values of preceding year Repeat the core statement („summary“) SAMPLE SCHEMA FOR SUMMER OBSERVATION, THRESHOLD PASSING A schema is instantiated on the basis of the input parameters and the retrieved data

Source: Stephan Busemann Language Technology I, WS 2005/2006 Instantiating a Schema Leads to a Report Structure Achieves text coherence by –removing redundant information –inserting particles („also“) –simple techniques of aggregating information Yields canned texts or intermediate content representations Intermediate representations are independent of particular languages TEXT ORGANISATION Shallow generation can do without explicit knowledge representation and text planning

Source: Stephan Busemann Language Technology I, WS 2005/2006 Text Coherence is Achieved by Simple Means Removing redundancy –If two subsequent slots in a schema contain identical pieces of information, which are not focussed, then delete the second instance Particle insertion –If two subsequent slots in a schema contain identical pieces of information, which are focussed, then mark the second instance as „repeating“ Aggregation –If a time series contains a sequence of identical values, describe a time interval (from... to...) and mention the value only once The means depend on the phenomena encountered in the corpus

Source: Stephan Busemann Language Technology I, WS 2005/2006 Intermediate Representations Specify the Contents Only Partly [(COOP threshold-passing) (TIME [(PRED season) (NAME [(SEASON summer) (YEAR 1997)])]) (POLLUTANT o3) (SITE "Völklingen-City") (DURATION [(MINUTE 60)]) (SOURCE [(LAW-NAME bimsch) (THRESHOLD-TYPE info-value)]) (EXCEEDS [(STATUS yes) (TIMES 1)])] En été 1997, à la station de mesure de Völklingen-City, la valeur d'information pour l'ozone pour une exposition de 60 minutes (180 µg/m³ selon le decret allemand (Bundesimmissionsschutzverordnung)) a été dépassée une fois. Stage 1: Text Organization Stage 2: Verbalization

Source: Stephan Busemann Language Technology I, WS 2005/2006 Shallow Generation is Well Suited for Small Applications What are „small“ applications? –Limited linguistic requirements –Simple text sort, e.g. technical documents –Useful in early stages The TEMSIS generation task represents a class of applications –Text organization by few variable schemata –Sufficient quality of text by simple techniques to establish coherence –Realization of intermediate representation without accessing the context –Ca 100/120 TGL rules, ca 400 linguistically different reports With larger applications, the drawbacks of shallow approaches become obvious

Source: Stephan Busemann Language Technology I, WS 2005/2006 Shallow Generation Has Pros and Cons Possible advantagesPossible drawbacks Low development effort Reusable interpreter and subgrammars Very fast processing Easy introduction of additional languages Easy extension with alternative formulations (through a preference mechanism in TG/2) Knowledge representation depends on application Implicit dependencies Scalability is inherently lower than with in-depth generators Maintaining transparency of grammars can become a cost factor ASSESSMENT

Source: Stephan Busemann Language Technology I, WS 2005/2006 Literature Stephan Busemann, ``Ten Years After - An Update on TG/2 (and friends)", in: Proc. 10th European Workshop on Natural Language Generation, Aberdeen, UK, 2005 Stephan Busemann, ``eGram - a Grammar Development Environment and Its Usage for Language Generation'', in: Proc. Fourth International Conference on Language Resources and Evaluation (LREC), Lisbon, Portugal, 2004 Stephan Busemann, ``Language Generation for Cross-Lingual Document Summarisation'', in Sheng, Huanye (ed.), International Workshop on Innovative Language Technology and Chinese Information Processing (ILT\&CIP-2001), Science Press, Chinese Academy of Sciences, Beijing, China, May Stephan Busemann and Helmut Horacek, ``A Flexible Shallow Approach to Text Generation'', in: Eduard Hovy (ed.): Proceedings of the Nineth International Natural Language Generation Workshop (INLG '98), Niagara-on-the-Lake, Canada, August 1998, Also at the Computation and Language Archive.Computation and Language Archive Stephan Busemann, ``Best-First Surface Realization'', in: Donia Scott (ed.): Proceedings of the Eighth International Natural Language Generation Workshop (INLG '96), Herstmonceux, Sussex, 1996, Also at the Computation and Language Archive.Computation and Language Archive All papers on: