MODIFY your way of thinking when it comes to anomalous data formats Steve Simon State Street Corporation
What we shall examine during this hour Data files of different formats. Examine ways and means of massaging the different formats into one ‘usable’ format. Examine ways of “manufacturing” records to facilitate generating end user reports.
A bit of history While working at a major airline a few years back, I encountered problem where one of our databases contained flight information with a start date of the service and a planned termination date for the service”.
OrigDestOrigin CityDest City Start Date End Date ALBSDFALBANY NY LOUISV ILLE KY ABQLBBLUBBOCK TX ALBUQ UERQU E NM
Our booking database on the other hand contained ‘daily records’ of the seating status of each class, for each flight segment (which could consist of one or more legs). This necessitated the break down of the data shown above into “a record per day” format.
DateOrigDestThe Key ABQLBB ABQLBB ABQLBB ABQLBB ABQLBB ABQLBB ABQLBB ABQLBB
So that we could effect a join to the “Available Seating” database.
KEYFYMNQS ABQLBB ABQLBB ABQLBB ABQLBB Available Seating
The raw data
FILEDEF RAW DISK C:/ibi/apps/steve/AirlineSchedule.txt FILEDEF AIRLINE DISK C:\ibi\apps\steve\AIRLINE.FOC -RUN CREATE FILE AIRLINE -RUN MODIFY FILE AIRLINE FIXFORM DEPARTURE/3 DEPARTURECITY/50 ARRIVAL/3 ARRIVALCITY/50 FIXFORM STARTDATE/A8 ENDDATE/A8 MATCH WITH-UNIQUES DEPARTURE ARRIVAL ON MATCH REJECT ON NOMATCH INCLUDE DATA ON RAW END
Creating that “record per day”
FILEDEF AIRLINE1 DISK C:/ibi/apps/steve/AIRLINE.OUTTT -RUN MODIFY FILE AIRLINE COMPUTE STARTDATE1/YYMD = 0; COMPUTE ENDDATE1/YYMD = 0; COMPUTE STARTCITY/A50=; COMPUTE ENDCITY/A50=; COMPUTE STARTCODE/A3=; COMPUTE ENDCODE/A3=; COMPUTE TEMPDATE/YYMD=0; PERFORM EXTRACT1 Filedef’s and variable initialization
We shall utilize the Scratch Pad Area (SPA)
Get the data from the database record by record CASE EXTRACT1 NEXT WITH-UNIQUES DEPARTURE ARRIVAL ON NEXT ACTIVATE DEPARTURECITY ARRIVALCITY STARTDATE ENDDATE ON NEXT COMPUTE STARTDATE1= D.STARTDATE; ON NEXT COMPUTE ENDDATE1 = D.ENDDATE; ON NEXT COMPUTE STARTCITY = D.DEPARTURECITY; ON NEXT COMPUTE ENDCITY = D.ARRIVALCITY; ON NEXT COMPUTE STARTCODE = D.DEPARTURE; ON NEXT COMPUTE ENDCODE = D.ARRIVAL; ON NEXT COMPUTE TEMPDATE = D.STARTDATE; ON NEXT PERFORM EXTRACT2 ON NONEXT GOTO EXIT ENDCASE
Start date greater than end date? Yes: quit case No: write the record to file CASE EXTRACT2 IF TEMPDATE GT ENDDATE1 THEN PERFORM EXTRACT1; TYPE ON AIRLINE1 " " COMPUTE TEMPDATE = TEMPDATE + 1; GOTO EXTRACT2 ENDCASE DATA END -RUN
The output
Where do we go from here? The available seating table resides in a SQL Server database
Load this data into our SQL Server data repository
Create INSERT statements
FILEDEF ROUTECOUNT DISK C:/ibi/apps/steve/AIRLINE.OUTTT -RUN APP HOLD steve TABLE FILE ROUTECOUNT PRINT * ON TABLE HOLD AS RECCOUNT END -SET &LLINES = &LINES; -START111 -SET &FILENUM = 1; -SET &CURRENTCTR =0; -SET &FIRSTLINE = 'INSERT INTO DailyFlights(Date,Start,Destination,'; -SET &FIRSTLINE1 = 'StartCity,DestinationCity)'; -SET &SECONDLINE =; -SET &THIRDLINE = ; -SET &APOST = HEXBYT(39,'A1'); -SET &DATEE=; -SET &STARTC=; -SET &DESTC=; -SET &SCDEST=; -SET &ECDEST=;
Write the SQL “Use”Statements FILEDEF ROUTECOUNT1 DISK C:/ibi/apps/steve/AIRLINE.OUTTT FILEDEF SCHEDULE DISK C:/ibi/apps/steve/AIRLINE.SQL1 -RUN -WRITE SCHEDULE USE FUSE2007 -WRITE SCHEDULE GO -WRITE SCHEDULE BEGIN TRANSACTION
Read all records & write to file -REPEAT LOOPER FOR &I FROM 1 TO &LLINES STEP 1 -READ ROUTECOUNT1 &A.2 &DATEE.10 &C.1 &STARTC.3 &A.1 &DESTC.3 &B.1 &SCDEST.50, - &CA.1 &ECDEST.50 -SET &SECONDLINE = ' VALUES (' || &APOST || &DATEE || &APOST; -SET &SECONDLINE = &SECONDLINE || ',' || &APOST || &STARTC || &APOST; -SET &SECONDLINE = &SECONDLINE || ',' || &APOST || &DESTC || &APOST; -SET &SECONDLINE = &SECONDLINE || ',' || &APOST || &SCDEST || &APOST; -SET &SECONDLINE = &SECONDLINE || ',' || &APOST || &ECDEST || &APOST; -SET &SECONDLINE = &SECONDLINE || ');'; -WRITE SCHEDULE &FIRSTLINE -WRITE SCHEDULE &FIRSTLINE1 -WRITE SCHEDULE &SECONDLINE -LOOPER -WRITE SCHEDULE COMMIT TRANSACTION
The Insert Statements USE FUSE2007 GO BEGIN TRANSACTION INSERT INTO DailyFlights(Date,Start,Destination, StartCity,DestinationCity) VALUES ('2006/11/30','ABE','MHT','ALLENTOWN, PA','MANCHESTER, NH'); INSERT INTO DailyFlights(Date,Start,Destination, StartCity,DestinationCity) VALUES ('2006/12/01','ABE','MHT','ALLENTOWN, PA','MANCHESTER, NH'); ….. COMMIT TRANSACTION
The 50 million foot view
Raw Data Sequential File SQL Statements File System Watcher & SSIS Load Package File System Watcher & SSIS Load Package
cd C:\Program Files\Microsoft SQL Server\90\DTS\Binn DTExec /f "C:\AirlineScheduleLoad\AirlineSc hedule\AirlineSchedule\bin\LoadS chedule.dtsx"
Query created by join
JOIN DATEM AND START AND DESTINATION IN DAILYFLIGHTS TO DATEE AND START AND DESTINATION IN BOOKINGS AS J1 -RUN DEFINE FILE DAILYFLIGHTS CITYPAIR/A10 = START || '-'|| DESTINATION; END TABLE FILE DAILYFLIGHTS PRINT DATEM AS 'Date‘ CITYPAIR AS 'City Pair‘ STARTCITY AS 'Origin' DESTINATIONCITY AS 'Destination‘ FCLASS AS 'F’ YCLASS AS 'Y‘ MCLASS AS 'M' NCLASS AS 'N‘ QCLASS AS 'Q‘ SCLASS AS 'S' BY START NOPRINT BY DESTINATION NOPRINT BY DATEM NOPRINT ON TABLE SUBHEAD "Orange Free State Airlines" "Flight Schedule“ …..
During this hour we Examined data files of different formats. Examined ways and means of massaging the different formats into one ‘usable’ format. Examined ways of “manufacturing” records to facilitate generating end user reports. Verified that the data was correct.
During this hour we Saw that there were many “different” ways to modify anomalous data into the format of your choice, to produce the reports that you require. Which really goes to show that you can..
MODIFY your way of thinking when it comes to anomalous data formats Steve Simon State Street Corporation PowerPoint presentation & code samples may be found at: 4c765fc825912e4d.skydrive.live.com/browse.aspx/Public or by