Quick Introduction to DFDL

Slides:



Advertisements
Similar presentations
1 The MITRE Using XSL to Generate XHTML Documents Roger L. Costello XML Technologies.
Advertisements

Module 2: Creating Schemas. Overview Lesson 1: Introduction to BizTalk Schemas Lesson 2: Creating XML and Flat File Schemas.
XML: text format Dr Andy Evans. Text-based data formats As data space has become cheaper, people have moved away from binary data formats. Text easier.
1 XML DTD & XML Schema Monica Farrow G30
MP IP Strategy Stateye-GUI Provided by Edotronik Munich, May 05, 2006.
SDPL 2003Notes 2: Document Instances and Grammars1 2.5 XML Schemas n A quick introduction to XML Schema –W3C Recommendation, May 2, 2001: »XML Schema Part.
XML Schema Definition Language
Lecture 14 XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name.
1 Copyright (c) [2000]. Roger L. Costello. All Rights Reserved. Using XSLT and XPath to Transform XML Documents Roger L. Costello XML Technologies.
XML Schemas. “Schemas” is a general term--DTDs are a form of XML schemas –According to the dictionary, a schema is “a structured framework or plan” When.
1 Copyright (c) [2000]. Roger L. Costello. All Rights Reserved. Using XSLT and XPath to Transform XML Documents into Text Files Roger L. Costello XML Technologies.
XML Introduction What is XML –XML is the eXtensible Markup Language –Became a W3C Recommendation in 1998 –Tag-based syntax, like HTML –You get to make.
September 15, 2003Houssam Haitof1 XSL Transformation Houssam Haitof.
Mapping Physical Formats to Logical Models to Extract Data and Metadata Tara Talbott IPAW ‘06.
Jennifer Widom XML Data XML Schema. Jennifer Widom XML Schema “Valid” XML Adheres to basic structural requirements  Also adheres to content-specific.
Introduction to XML This material is based heavily on the tutorial by the same name at
MEDIN Standards Workshop Standards / XML / Validation / Transformation / ESRI.
4/20/2017.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Introduction to XML cs3505. References –I got most of this presentation from this site –O’reilly tutorials.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
XML – Extensible Markup Language XML eXtensible – add to language. Markup – delimit info using tags. Language – a way to express info.
XSLT for Data Manipulation By: April Fleming. What We Will Cover The What, Why, When, and How of XSLT What tools you will need to get started A sample.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
XML BIS4430 – unit 10. XML Origins Extensible Markup Language (XML) 1998 Inspired by Standard Generalized Markup Language (SGML) and HTML. SGML defines.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Implementing Forms and Form Renderers in the Open Source Portfolio David McPherson, Chris Maurer Will Trillich, Janice Smith Materials by Sean Keesler.
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
Session IV Chapter 9 – XML Schemas
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
XML TUTORIAL Portions from w3 schools By Dr. John Abraham.
Extending XML Schemas XML Schemas: Best Practices A set of guidelines for designing XML Schemas Created by discussions on xml-dev.
Lecture 11 XSL Transformations (part 1: Introduction)
New Perspectives on XML, 2nd Edition
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Applying eXtensible Style Sheets (XSL) Ellen Pearlman Eileen Mullin Programming.
1 Overview of XSL. 2 Outline We will use Roger Costello’s tutorial The purpose of this presentation is  To give a quick overview of XSL  To describe.
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
MEDIN Standards Workshop Standards / XML / Validation / Transformation / ESRI / Search.
MEDIN Standards Workshop Standards / XML / Validation / Transformation / ESRI / Search.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
1 Copyright (c) [2000]. Roger L. Costello. All Rights Reserved. Using XSLT and XPath to Transform XML Documents Roger L. Costello XML Technologies.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Representing data with XML SE-2030 Dr. Mark L. Hornick 1.
XSD: XML Schema Language Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Accessing XML Documents Using DOM ©NIITeXtensible Markup Language/Lesson 8/Slide 1 of 23 Objectives In this lesson, you will learn to: * Use XML DOM objects.
Martin Kruliš by Martin Kruliš (v1.1)1.
XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name value pair;
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
CHAPTER NINE Accessing Data Using XML. McGraw Hill/Irwin ©2002 by The McGraw-Hill Companies, Inc. All rights reserved Introduction The eXtensible.
XML Extensible Markup Language
Rendering XML Documents ©NIITeXtensible Markup Language/Lesson 5/Slide 1 of 46 Objectives In this session, you will learn to: * Define rendering * Identify.
CITA 330 Section 4 XML Schema. XML Schema (XSD) An alternative industry standard for defining XML dialects More expressive than DTD Using XML syntax Promoting.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
XML DTDs and Schemas How we define our markup language.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Product Training Program
Jon Fancey Enterprise Integration with Logic Apps
XML processing in ColdFusion MX
XML in Web Technologies
Database Processing with XML
CMP 051 XML Introduction Session IV Chapter 12 – XML Namespaces
Optimising XML Schema for IODEF Data model
More XML XML schema, XPATH, XSLT
CMP 051 XML Introduction Session IV Chapter 12 – XML Namespaces
Parsing and Unparsing Files using DFDL, Part I
DFDL versus ANTLR Roger L. Costello
Unit 6 - XML Transformations
Presentation transcript:

Quick Introduction to DFDL Roger L. Costello DFDL = Data Format Description Language https://daffodil.apache.org/docs/dfdl/ Approved for Public Release; Distribution Unlimited. Public Release Case Number 19-1536

DFDL = universal parser Any binary file XML DFDL tool Any text file © 2019 The MITRE Corporation. All rights reserved.

DFDL is built on top of XML Schema © 2019 The MITRE Corporation. All rights reserved.

XML Schema permits “foreign” attributes A foreign attribute is an attribute on an XML Schema element that is not part of the XML Schema vocabulary. A foreign attribute must be bound to another namespace (not the XML Schema namespace). © 2019 The MITRE Corporation. All rights reserved.

Example of foreign attributes Below is an XML Schema. The <xs:sequence> element has two foreign attributes – separator and separatorPosition. The foreign attributes are bound to the http://www.ogf.org/dfdl/dfdl-1.0/ namespace. <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"> <xs:element name="input"> <xs:complexType> <xs:sequence dfdl:separator=":" dfdl:separatorPosition="infix"> <xs:element name="label" type="xs:string" /> <xs:element name="message" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> 2 foreign attributes © 2019 The MITRE Corporation. All rights reserved.

Traditionally, XSD is used to validate XML instance XML Schema XML Schema Validator XML instance is valid/invalid © 2019 The MITRE Corporation. All rights reserved.

XSD + DFDL is used to parse data files (JPEG, CSV, Netflow, etc) XML Schema + DFDL The DFDL tool parses the data file and produces an XML doc DFDL tool XML instance © 2019 The MITRE Corporation. All rights reserved.

Data file (JPEG, CSV, Netflow, etc) XML Schema + DFDL DFDL tool XML Called a “DFDL Schema” Data file (JPEG, CSV, Netflow, etc) XML Schema + DFDL DFDL tool XML instance © 2019 The MITRE Corporation. All rights reserved.

Filename suffix DFDL files have the suffix .dfdl.xsd Example: label-message.dfdl.xsd © 2019 The MITRE Corporation. All rights reserved.

Use DFDL to parse and unparse Data file (JPEG, CSV, Netflow, etc) parse unparse Data file (JPEG, CSV, Netflow, etc) XML DFDL tool DFDL tool DFDL Schema DFDL Schema Reconstitute the original (native) data format © 2019 The MITRE Corporation. All rights reserved.

Use DFDL to parse and unparse Data file (JPEG, CSV, Netflow, etc) parse unparse Data file (JPEG, CSV, Netflow, etc) XML DFDL tool DFDL tool DFDL Schema DFDL Schema Reconstitute the original (native) data format Same © 2019 The MITRE Corporation. All rights reserved.

Terminology Input file parse unparse Reconstituted input file XML DFDL tool DFDL tool DFDL Schema DFDL Schema © 2019 The MITRE Corporation. All rights reserved.

Typical workflow Data file (JPEG, CSV, Netflow, etc) Data file parse unparse XML XML XML XSD validation XSLT Transform (e.g., fuzz locations) DFDL parses a data format (text or binary) to generate XML. Once the data format is in XML, we have access to the vast suite of XML technologies to process that XML (i.e., we can leverage the enormous marketplace that has built up in the past 20 years to support XML). We can use XML technologies to add, remove, fuzz, and so forth. Then, after processing the XML, we can use DFDL to unparse that processed XML to reconstitute the native data format (now the data format is sanitized).

DFDL facilitates getting your data into XML! XML mantra:             1. Get data into XML as quickly as possible           2. Keep it in XML until the last possible minute           3. Bring all your XML tools to bear on solving the data processing problem                       -- Sean McGrath Slide 12 of Performing impossible feats of XML processing with pipelining DFDL facilitates getting your data into XML! © 2019 The MITRE Corporation. All rights reserved.

“Daffodil” is a DFDL tool (an implementation of the DFDL specification) Daffodil (DFDL tool) https://daffodil.apache.org/docs/dfdl/ © 2019 The MITRE Corporation. All rights reserved.

Logical structure + physical structure A DFDL schema has two parts: (1) XML Schema stuff (2) DFDL stuff Use the XML Schema stuff to specify the logical structure of the input file. Example: The input file contains a label followed by a message Use the DFDL stuff to specify the nuts-and-bolts of the physical structure of the input file. Example: The label and message are delimited by a colon, the delimiter is infix (between the label and message) © 2019 The MITRE Corporation. All rights reserved.

Advantages of DFDL Declarative description of data formats. You merely describe the format of the data and the DFDL processor figures out how to break the data apart. That is, you describe “what” the format is, the DFDL processor figures out “how” to dissect the data. Builds on top of existing technologies (XML Schema, XPath). The output of DFDL parsing is XML, which is great because you then have access to the vast suite of XML tools to analyze and process the XML. Can both parse the data to produce XML and then unparse the XML to reconstitute the data in its native data format. © 2019 The MITRE Corporation. All rights reserved.

Disadvantages of DFDL Steep learning curve! Being a “universal” parser DFDL, by definition, needs functionality to deal with every feature in every data format. Small DFDL community – not a lot of experts available to ask questions. Limited set of helpful resources – no books on DFDL, few tutorials. © 2019 The MITRE Corporation. All rights reserved.

Terminology: DFDL, Daffodil, DFDL Schema DFDL is a technology, it is a specification, it is a standard for how to represent data files. Daffodil is a tool, it is a parser, it implements the DFDL specification. There are several DFDL parsers. Daffodil is one. IBM has one. In this course we will use Daffodil. A DFDL Schema is a document that contains DFDL properties (which are defined in the DFDL specification). © 2019 The MITRE Corporation. All rights reserved.

Daffodil can output XML or JSON Input file This is the default output format Daffodil JSON Input file Use the –I json command line option. Daffodil © 2019 The MITRE Corporation. All rights reserved.