Presentation is loading. Please wait.

Presentation is loading. Please wait.

tFileInputEBCDIC Bug Report & Design Recommendation

Similar presentations


Presentation on theme: "tFileInputEBCDIC Bug Report & Design Recommendation"— Presentation transcript:

1 tFileInputEBCDIC Bug Report & Design Recommendation
July 2009

2 Introduction Describe the bug that is crashing a job due to a bad data record using the tFileInputEBCDIC componet. Describe how the EBCDIC component is very ineffecient as currently released.

3 Versions Using TALEND Open Studio Version: 3.1.3.r26090 Using Cobol2j

4 BUG - Setup Took a CHARTIS Cobol Copy Book.
Translated to an xc2j file using Cobol2j Translated to a TALEND Schema XML file using the xc2j2talend.xsl style sheet. Created a job using the tFileInputEBCDIC component for input.

5 BUG - Setup 2 Configured the tFileInputEBCDIC component.
Point to a data file that is in the Copybook format. Point to the xc2j file. Add a schema, import the TALEND schema generated from the XSL style sheet of the xc2j file. Connect the tFileInputEBCDIC input to a tLogRow component.

6 BUG - Result Bad record crashes the Job.

7 BUG - Error dump from TALEND run console
Starting job OGISRealignmentTest_1 at 14:03 30/07/2009. Jul 30, :03:47 PM net.sf.cobol2j.RecordSet next SEVERE: Cannot parse field: FLAT-RESERVE-TYPE-N. Data: ' ', Picture: 9(01), Type: 9, Size: 1 SEVERE: Total bytes processed before error: 102 Exception in component tFileInputEBCDIC_1 net.sf.cobol2j.RecordParseException: Couldn't parse record nr: 1. at net.sf.cobol2j.RecordSet.next(RecordSet.java:107) at talenddemosjava.ogisrealignmenttest_1_0_1.OGISRealignmentTest_1.tFileInputEBCDIC_1Process(OGISRealignmentTest_1.java:2564) at talenddemosjava.ogisrealignmenttest_1_0_1.OGISRealignmentTest_1.runJobInTOS(OGISRealignmentTest_1.java:3831) at talenddemosjava.ogisrealignmenttest_1_0_1.OGISRealignmentTest_1.main(OGISRealignmentTest_1.java:3747) Caused by: net.sf.cobol2j.FieldParseException: at net.sf.cobol2j.RecordSet.readZoned(RecordSet.java:466) at net.sf.cobol2j.RecordSet.getFieldsValues(RecordSet.java:189) at net.sf.cobol2j.RecordSet.getFieldsValues(RecordSet.java:244) at net.sf.cobol2j.RecordSet.next(RecordSet.java:89) ... 3 more Caused by: java.lang.NumberFormatException at java.math.BigDecimal.<init>(Unknown Source) at net.sf.cobol2j.RecordSet.readZoned(RecordSet.java:464) ... 7 more Job OGISRealignmentTest_1 ended at 14:03 30/07/2009. [exit code=1]

8 BUG - Explanation. java.math.BigDecimal throws a NumberFormat exception. It is a valid exception, the data in the field of the first record is garbage. This is because the data record is three bytes of zero. [0x0,0x0,0x0] Not three bytes of EBCDIC character that represents zero [0x0F, 0x0F, 0x0F]

9 BUG - Why this is a BUG. A bad record should not crash a job.
One reason to use an ETL tool is to scrub bad records. Crashing on a bad record Does not allow scrubbing. Impacts robustness on a tool that needs to be reliable in the face of bad data.

10 Design – How it works In looking at the error log for the BUG, there are few flaws highlighted in the existing design. The cobol2j package builds a translator that takes a buffer of bytes the length of the cobol copybook and tries to convert them to a set of Java objects. This particular bug crashes the job when one of these objects can’t be created.

11 Design – How it works Taking even one field out of the TALEND schema xml file generated by the xc2j XSL translation generates a runtime error in TALEND that prevents the job from running.

12 Design – What it means. Generating a lot of excess Java objects.
Just because data is in a record doesn’t mean you need it in the TALEND job. Creating obects has a performance impact because the Java VM’s perform internal house keeping (new & gc) on every object. Not graceful with bad data. Should allow the ETL job to do something graceful with bad records, not crash the job.

13 Design – Recommendation
Fix the tFileInputEBCDIC component: Option1 : Rework the schema generator to only include the fields required for the job instead of every field. On a bad field exception, mark it as an error in the stream of records (TALEND schema), but continue the job. Option2: Switch to JRecord. JRecord does not suffer the same crash on bad data in it’s library. It uses copybooks directly with out the intermediate xml format. It is easy to only translate the fields required for the job from each record. Has a java api to access copybook metadata in a .jet template during the job generation phase in TALEND.


Download ppt "tFileInputEBCDIC Bug Report & Design Recommendation"

Similar presentations


Ads by Google