A DFDL Proposal based on Commercial Data Processing Requirements

Slides:



Advertisements
Similar presentations
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh Alan Chappell PNNL
Advertisements

XML DOCUMENTS AND DATABASES
A-Level Computing#BristolMet Session Objectives#8 express numbers in binary, octal and hexadecimal explain the use of code to represent a character set.
Computer Science Basics CS 216 Fall Operating Systems interface to the hardware for the user and programs The two operating systems that you are.
Managing Data Resources
PowerPoint Presentation for Dennis, Wixom & Tegarden Systems Analysis and Design Copyright 2001 © John Wiley & Sons, Inc. All rights reserved. Slide 1.
Binary Numbers.
CCE-EDUSAT SESSION FOR COMPUTER FUNDAMENTALS Date: Session III Topic: Number Systems Faculty: Anita Kanavalli Department of CSE M S Ramaiah.
COMP201 Computer Systems Number Representation. Number Representation Introduction Number Systems Integer Representations Examples  Englander Chapter.
COMPUTER FUNDAMENTALS David Samuel Bhatti
Abstraction – Number Systems and Data Representation.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Chapter 6 Text and Multimedia Languages and Properties
Binary Numbers and ASCII and EDCDIC Mrs. Cueni. Data Representation  Human speech is analog because it uses continuous signals (waves) that vary in strength.
Metadata Tools and Methods Chris Nelson Metanet Conference 2 April 2001.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
Introduction to MDA (Model Driven Architecture) CYT.
Information Representation. Digital Hardware Systems Digital Systems Digital vs. Analog Waveforms Analog: values vary over a broad range continuously.
© 2007 by Prentice Hall 1 Introduction to databases.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
Number Representation. Representing numbers n Numbers are represented as successive powers of a base, or radix.
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
Slide 1 Object Persistence Design Chapter 13 Alan Dennis, Barbara Wixom, and David Tegarden John Wiley & Sons, Inc. Slides by Fred Niederman Edited by.
2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 5 Data Resource Management.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 2 Number Systems: Decimal, Binary, and Hex.
Topic 14.1 Extended Hexadecimal  Decimal is base 10 and uses 10 digits (0,1,2,3,4,5,6,7,8,9).  Binary is base 2 and uses 2 digits (0,1).  Computers.
Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh
DFDL WG Session 1 Summary of Status Mike Beckerle Ascential Software.
DFDL WG Session 3 Mike Beckerle Ascential Software Two note-takers please?
1 Standardization, Internationalization Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section.
Lecture Coding Schemes. Representing Data English language uses 26 symbols to represent an idea Different sets of bit patterns have been designed to represent.
DATA REPRESENTATION - TEXT
Binary Representation in Text
Binary Representation in Text
CS 325 Spring ‘09 Chapter 1 Goals:
Chapter 8 & 11: Representing Information Digitally
Binary Numbers and ASCII and EDCDIC
INFS 211: Introduction to Information Technology
CSCI 198: Lecture 4: Data Representation
XML QUESTIONS AND ANSWERS
CSCI 161: Lecture 4: Data Representation
Information Support and Services
DATA MODELS.
What is FITS? FITS = Flexible Image Transport System
课程名 编译原理 Compiling Techniques
Application Development Theory
TOPICS Information Representation Characters and Images
Coding Schemes and Number Systems
Data Representation Question: Characters
Basic Concepts in Data Management
Chapter 2 Database Environment.
MANAGING DATA RESOURCES
Computers & Programming Languages
Number Systems Base 2, 10, 16.
File Systems and Databases
Chapter 5 Data Resource Management.
Systems Analysis and Design
MANAGING DATA RESOURCES
Metadata Framework as the basis for Metadata-driven Architecture
Abstraction – Number Systems and Data Representation
Global Grid Forum (GGF) Orientation
DATA MODELS.
Chapter 3 - Binary Numbering System
Lecture 36 – Unit 6 – Under the Hood Binary Encoding – Part 2
Software Architecture & Design
Dr. Clincy Professor of CS
Presentation transcript:

A DFDL Proposal based on Commercial Data Processing Requirements 2003-10-01 Mike Beckerle, Technology Office

Ascential Software, Inc. GGF Sponsor Enterprise Data Integration High-volume parallel processing Commercial Record-Oriented Data Complex formats: XML, Cobol, C, ad-hoc. Clusters and Intra-Enterprise Grids Deployments have 100s of computers Apps are performance critical! “Do what’s right for the customer.” Open standards for data format description

DFDL Dream Roadmap DFDL is one of the most important things the GGF is working on! 2004 GGF, initial implementations, draft std. 2005 ANSI/ISO process begins

Chronology/Thought Process Somewhere in MikeB’s brain….. The DFDL-WG really needs to see the crazy list of attributes for commercial data that we run into all the time…. Hmmm. We also already integrate metadata from SQL, Cobol, SAS, EDI, and various other sources, we use a common model for that. I’ve gathered a very comprehensive list of the representation attributes. XML has XSDL, and the information set idea, ASCL has several similar things internally So…

Requirements Came from: Ascential DataStage Products Mercator Products Cobol/Mainframe, Relational, XML, ad-hoc data sources are commonly handled Mercator Products EDI data formats, esp. X.12 OMG CWM (Common Warehouse Metamodel) RDBMS SQL data model SAS (new GGF sponsor!!!) XSDL and XML Lots of Internationalization and Unicode experience

How to Read/Interpret this Document Doc is NOT a response to any other DFDL-WG proposals Was prepared in parallel, not in response There’s still lots of TBDs Attributes list is quite comprehensive. Character sets covered comprehensively.

Themes Information Set / Abstract Data Model Goals distinct from Representation Layer Goals Read/Write Symmetry Completeness: Describe anything Without making common cases too hard Handle commercial data formats directly DFDL Information Set Representation Stream as Data Blocks Mapping to Binary Stream

Value of DFDL Information Set XML Info. Set Java C/C++ Fortran … DFDL Information Set Representation Stream as Data Blocks (FB, VBS, etc) Mapping to Binary

Record Format Complexity A typical field definition within a record: Name: SMF6JNM Length: 4 bytes EBCDIC Description: When SMF6INDC contains a X'1', this field contains a four-digit EBCDIC job number. When SMF6INDC contains a X'3' or greater, the job number has more than four digits, and this field contains zeroes. The correct job number is then found in SMF6JBID.

Favorite(?) Data Attributes yyEarliestYear Is “03” 1903, or 2003? overpunchedASCIISignStyle: e.g., +120 decimal Hex F1.F2.C0 in EBCDIC = “12{“ Hex 31.32.7D in ASCII = “12{“ digitGroupingScheme=“3,2” 12,12,34,567.89 (Thai) 121.234.567,89 (much of Europe) 121,234,567.89 (US) calendar Q: How many days old is someone born on 1923-01-01 CE? A: Depends on what country they were born in! Greece and Turkey both converted to the Gregorian calendar since 1923.

Clean up separation of rep from abstract layer Next Steps Clean up separation of rep from abstract layer Factoring of binary rep attributes from character rep attributes Clarify attribute inheritance idiom Attributed type trees are central to the proposal, but not clearly explained in this draft. Expression language Esp. the library it has available Find common ground with other DFDL proposals