A DFDL Proposal based on Commercial Data Processing Requirements 2003-10-01 Mike Beckerle, Technology Office
Ascential Software, Inc. GGF Sponsor Enterprise Data Integration High-volume parallel processing Commercial Record-Oriented Data Complex formats: XML, Cobol, C, ad-hoc. Clusters and Intra-Enterprise Grids Deployments have 100s of computers Apps are performance critical! “Do what’s right for the customer.” Open standards for data format description
DFDL Dream Roadmap DFDL is one of the most important things the GGF is working on! 2004 GGF, initial implementations, draft std. 2005 ANSI/ISO process begins
Chronology/Thought Process Somewhere in MikeB’s brain….. The DFDL-WG really needs to see the crazy list of attributes for commercial data that we run into all the time…. Hmmm. We also already integrate metadata from SQL, Cobol, SAS, EDI, and various other sources, we use a common model for that. I’ve gathered a very comprehensive list of the representation attributes. XML has XSDL, and the information set idea, ASCL has several similar things internally So…
Requirements Came from: Ascential DataStage Products Mercator Products Cobol/Mainframe, Relational, XML, ad-hoc data sources are commonly handled Mercator Products EDI data formats, esp. X.12 OMG CWM (Common Warehouse Metamodel) RDBMS SQL data model SAS (new GGF sponsor!!!) XSDL and XML Lots of Internationalization and Unicode experience
How to Read/Interpret this Document Doc is NOT a response to any other DFDL-WG proposals Was prepared in parallel, not in response There’s still lots of TBDs Attributes list is quite comprehensive. Character sets covered comprehensively.
Themes Information Set / Abstract Data Model Goals distinct from Representation Layer Goals Read/Write Symmetry Completeness: Describe anything Without making common cases too hard Handle commercial data formats directly DFDL Information Set Representation Stream as Data Blocks Mapping to Binary Stream
Value of DFDL Information Set XML Info. Set Java C/C++ Fortran … DFDL Information Set Representation Stream as Data Blocks (FB, VBS, etc) Mapping to Binary
Record Format Complexity A typical field definition within a record: Name: SMF6JNM Length: 4 bytes EBCDIC Description: When SMF6INDC contains a X'1', this field contains a four-digit EBCDIC job number. When SMF6INDC contains a X'3' or greater, the job number has more than four digits, and this field contains zeroes. The correct job number is then found in SMF6JBID.
Favorite(?) Data Attributes yyEarliestYear Is “03” 1903, or 2003? overpunchedASCIISignStyle: e.g., +120 decimal Hex F1.F2.C0 in EBCDIC = “12{“ Hex 31.32.7D in ASCII = “12{“ digitGroupingScheme=“3,2” 12,12,34,567.89 (Thai) 121.234.567,89 (much of Europe) 121,234,567.89 (US) calendar Q: How many days old is someone born on 1923-01-01 CE? A: Depends on what country they were born in! Greece and Turkey both converted to the Gregorian calendar since 1923.
Clean up separation of rep from abstract layer Next Steps Clean up separation of rep from abstract layer Factoring of binary rep attributes from character rep attributes Clarify attribute inheritance idiom Attributed type trees are central to the proposal, but not clearly explained in this draft. Expression language Esp. the library it has available Find common ground with other DFDL proposals