Exploring XML-based Technologies and Procedures for Quality Evaluation from a Real-life Case Perspective Folkert de Vriend 1 & Giulio Maltese 2 1 Speech.

Slides:



Advertisements
Similar presentations
XML-XSL Introduction SHIJU RAJAN SHIJU RAJAN Outline Brief Overview Brief Overview What is XML? What is XML? Well Formed XML Well Formed XML Tag Name.
Advertisements

DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
XML: Extensible Markup Language
ISO DSDL ISO – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 13-1 COS 346 Day 24.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
1 Overview XML crash course –HTML vs. XML –pure XML data model (XML = linear syntax for trees) XML Schema Rubin Landau, Bertram Ludaescher, Richard Marciano,
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 3 Style Sheets: CSS WEB.
A Practical Introduction to XML in Libraries Marty Kurth NYLA October 22, 2004.
CS155b: E-Commerce Lecture 10: Feb. 13, 2003 XML and its relationship to B2B commerce Acknowledgements: R. Glushko, A. Gregory, and V. Ramachandran.
XML Introduction What is XML –XML is the eXtensible Markup Language –Became a W3C Recommendation in 1998 –Tag-based syntax, like HTML –You get to make.
COS 381 Day 16. Agenda Assignment 4 posted Due April 1 There was no resubmits of Assignment Capstone Progress report Due March 24 Today we will discuss.
Jennifer Widom XML Data XML Schema. Jennifer Widom XML Schema “Valid” XML Adheres to basic structural requirements  Also adheres to content-specific.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
MEDIN Standards Workshop Standards / XML / Validation / Transformation / ESRI.
1 1 Roadmap to an IEPD What do developers need to do?
Tool Interoperability and Data Translation Mechanisms using XML/XSL Tom Sabanosh
Sheet 1XML Technology in E-Commerce 2001Lecture 6 XML Technology in E-Commerce Lecture 6 XPointer, XSLT.
XML: More than just a cool acronym? Michael Mason DecisionSoft Limited.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Scientific Markup Languages Birds of a Feather A 10-Minute Introduction to XML Timothy W. Cole Mathematics Librarian & Professor of.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
XForms: The next generation of Web Forms Tyler St. John.
An Introduction to XML Presented by Scott Nemec at the UniForum Chicago meeting on 7/25/2006.
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
PrepTalk a Preprocessor for Talking book production Ted van der Togt, Dedicon, Amsterdam.
XML and Web Services November 21, 2005 Leo Putra Mardjuki Christopher William Lee Corey Fung Chan.
An Overview of MPEG-21 Cory McKay. Introduction Built on top of MPEG-4 and MPEG-7 standards Much more than just an audiovisual standard Meant to be a.
DP&NM Lab. POSTECH, Korea - 1 -Interaction Translation Methods for XML/SNMP Gateway Interaction Translation Methods for XML/SNMP Gateway Using XML Technologies.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
FIGIS’ML Hands-on training - © FAO/FIGIS An introduction to XML Objectives : –what is XML? –XML and HTML –XML documents structure well-formedness.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
UKOLN is supported by: Approaches to Metadata Quality Marieke Guy QA Focus A centre of expertise in digital information management
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
XP Tutorial 9 1 Working with XHTML. XP SGML 2 Standard Generalized Markup Language (SGML) A standard for specifying markup languages. Large, complex standard.
University of Maribor Faculty of Electrical Engineering and Computer Science AST ’04, July 7-9, 2004 Slovenian Lexica and Corpora in the Scope of the LC-STAR.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
R. Addie & S. Dekeyser XML for M&C / USQ ? What ? Why ? How ? When ?
1 Overview of XSL. 2 Outline We will use Roger Costello’s tutorial The purpose of this presentation is  To give a quick overview of XSL  To describe.
ISO/TC 211 WG4 WI 18 Encoding Foil no. 1 Annex C XML and XMI David Skogan SINTEF Telecom and Informatics
Jennifer Widom XML Data Introduction, Well-formed XML.
MEDIN Standards Workshop Standards / XML / Validation / Transformation / ESRI / Search.
XML, XSL, and SOAP Building Object Systems from Documents CSC/ECE 591o Summer 2000.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
MEDIN Standards Workshop Standards / XML / Validation / Transformation / ESRI / Search.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
XML A Language Presentation. Outline 1. Introduction 2. XML 2.1 Background 2.2 Structure 2.3 Advantages 3. Related Technologies 3.1 DTD 3.2 Schemas and.
Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –
Introduction to XML XML – Extensible Markup Language.
XP Tutorial 9New Perspectives on HTML and XHTML, Comprehensive 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
XML Technology. Emerging Importance of XML –HTML-tagging is display oriented. –XML-based content tagging has important uses: data mining role-oriented.
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
1 Lecture 7 Style Sheets: CSS. 2 Motivation HTML markup can be used to represent –Semantics: h1 means that an element is a top-level heading –Presentation:
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
Use of XML in the Publications Office: Critical issues for publishing Dr. Holger Bagola Publications Office DIR/R 5 “IT Projects” section “Formats & Linguistic.
XML 1. Chapter 8 © 2013 Pearson Education, Inc. Publishing as Prentice Hall SAMPLE XML SCHEMA (XSD) 2 Schema is a record definition, analogous to the.
1 Introduction to XML Babak Esfandiari. 2 What is XML? introduced by W3C in 98 Stands for eXtensible Markup Language it is more general than HTML, but.
Kynn Bartlett 11 April 2001 STC San Diego The HTML Writers Guild Copyright © 2001 XML, XHTML, XSLT, and other X-named specifications.
XML BASICS and more…. What is XML? In common:  XML is a standard, simple, self-describing way of encoding both text and data so that content can be processed.
XML: Extensible Markup Language
Unit 4 Representing Web Data: XML
XML QUESTIONS AND ANSWERS
XML in Web Technologies
Prepared for Md. Zakir Hossain Lecturer, CSE, DUET Prepared by Miton Chandra Datta
Chapter 7 Representing Web Data: XML
XML Data Introduction, Well-formed XML.
XML Problems and Solutions
Web Programming : Building Internet Applications Chris Bates CSE :
Presentation transcript:

Exploring XML-based Technologies and Procedures for Quality Evaluation from a Real-life Case Perspective Folkert de Vriend 1 & Giulio Maltese 2 1 Speech Processing Expertise Centre (SPEX) Radboud University Nijmegen, The Netherlands 2 IBM Rome Solutions Lab and Voice Technology Development Rome, Italy Future work XML-based technologies and procedures are a promising - and on principle preferable - alternative for validation of XML with text stream processing procedures that are not truly XML-aware. Drawbacks: - Some of the technologies discussed are still in development or have an unclear status. XSLT 1.0 for instance does not provide a “random” function needed for sample-based validation while implementations fully supporting XSLT 2.0 – which does have a “random” function – are still scarce. -Since there is a large variety of checks that are performed in validation of SLR’s one will probably have to “tie together” different XML-based technologies oneself. The ISO DSDL project should be an improvement in offering a framework for the diversity in schema and transformation languages for validation. However DSDL is also still very much a “work in progress”. If proven to be more efficient, XML-based technologies and procedures will be used in future projects SPEX is involved in. Current validation in LC-STARExploring XML-based technologies and procedures for validation LC-STAR For applications to be integrated into speech-driven interfaces embedded in mobile appliances and network servers, development of: - Bilingual corpora for Speech-to-Speech Translation applications. - Lexica for automatic speech recognition and text-to- speech synthesis. Example fragment of LC-STAR lexicon <NOM class="PER" gender="invariant" number="invariant"/> Abatantuono a – b a – t a n - " t u O - n o -Only XML-based technology currently used is the Document Type Definition (DTD). But DTD has weak datatyping system: no control over element or attribute content. - For most validation text stream processing procedures (Perl scripts) were used. Checks on: - Orthography - Part of Speech (POS) - Lemma - Phonetic transcription - Special software was written (in Perl) to select samples for manual validation of certain aspects of orthography, phonetic transcription and POS. Alternatives for technologies and procedures currently used in LC-STAR XML Schema - Far more control over element and attribute content than DTD because of datatyping functionality. - Datatypes can also be specified with regular expressions: - A basic Schema can be automatically generated from the DTD. The generated rules can then be made more stringent for validation purposes. XSL Transformations (XSLT) - For taking samples out of the XML- encoded data and directly marking it up with HTML for online manual validation. - Regular expressions are supported by XSLT 2.0. New possibilities for validation Easy character set validation - Part seven of the “Document Schema Definition Languages” (DSDL) framework aims at standardising a schema language specifically for checking that element and attribute content belongs to a specific subset of Unicode (Cyrillic or ISO for instance). “Character Repertoire Validation for XML” (CRVX) is a proposal for part seven: - CRVX uses XSLT 2.0 for implementation. Remote validation Supplementing production/validation cycles with on-line available XML Schema which can be used by producers for continuous self monitoring but which is still maintained by the validation centre. Introduction The project “Lexica and Corpora for Speech-to-Speech Translation Components” (LC-STAR) is an example of a project that uses Extensible Markup Language (XML) for the encoding of its Spoken Language Resources (SLR). The Speech Processing Expertise Centre (SPEX) is responsible for part of the quality evaluation (further: “validation”) in LC- STAR. Text stream processing procedures that are not truly XML-aware are not ideal for efficient validation of XML-encoded resources. Therefore SPEX explores XML-based validation technologies and procedures. This is done using the XML-encoded phonetic lexica developed in the LC-STAR project as a test bed. off-lineon-line ProductionValidation Schema monitoring