Presentation is loading. Please wait.

Presentation is loading. Please wait.

Indexes Lookups or “ the Good, the Bad and the Ugly ” John S. Lemon.

Similar presentations


Presentation on theme: "Indexes Lookups or “ the Good, the Bad and the Ugly ” John S. Lemon."— Presentation transcript:

1 Indexes Lookups or “ the Good, the Bad and the Ugly ” John S. Lemon

2 Indexes What do these have in common ? One person knows – rest of you will have to wait !

3 Indexes What I hope to cover  Based on the Aberdeen Maternity and Neo-natal Data Bank ( AMNDB )  History of lookups –  Dummy case  SQL tabfiles  Second data base  Second data base + secondary indexes + lookup command

4 Indexes What is a “lookup”  Converting text -> numeric code  Or numeric code -> text  Data entry staff enter ‘uterine adhesions’  Converted to ( and stored as ) 621.5  Why ?  Space

5 Indexes Why use a ‘lookup’  Alternatives are storing full text  ‘uterine adhesions’ - stored as 19 bytes / characters  This is one of the shorter ones !!!  Problem of different ‘spelling’  Uterine Adhesions  UTERINE ADHESIONS etc.  Which to use when searching

6 Indexes Why use a ‘lookup’  Store ‘uterine adhesions’ as 621.5 - stored as 8 bytes  less if using integers  or fixed length string codes ‘A6215’ - stored as 7 bytes

7 Indexes Why use a ‘lookup’  Value labels - not practical  Size limit on label length  Record specific  Problems managing an increasing list  Ordering / Sorting  Large numbers of codes:text pairs Occupations 13872 Drugs 5357 Operations 2754 Diseases / Illnesses / Complications 4404

8 Indexes Dummy case  Only option in version 2  Use a CASEID which was unique  In AMNDB used –999999  This case only contained data for the ‘lookup’ record types  It worked but ….

9 Indexes Dummy case - problems  Specific to that data base  Not shared – unless  SIR FILE DUMP- from ‘active’  READ INPUT DATA- to ‘research’  Have to repeat for all data bases that need lookups

10 Indexes Dummy case - problems  Can only go ‘one way’ unless duplicate data with different Key Fields RECORD SCHEMA99, ICD2NAME SORT IDSICDCODE RECORD SCHEMA98, NAME2ICD SORT IDSICDNAME  Huge problems of maintenance

11 Indexes Dummy case - problems  MAX KEY SIZE  May have seen in LIST STATS  What is it ?  Need to have quick look at the ‘structure’ of a SiR record  Simplified not definitive view  If you want a complete definition – ask Tony !!

12 Indexes SiR record structure  Essentially two components  Key - fixed length - same for ALL records  May not be ‘filled’ for all records  Data- variable length 1 2 3 CASEID Max Key Size Data Actual Size

13 Indexes Dummy case - problems  Without dummy case  Max Key Size - approx 20 bytes long  Text field – 50 characters long  Max Key Size - approx 60 bytes long  Extra 40 bytes  For EVERY record  1.5 million * 40 bytes = approx 60 Mbytes  Trivial today but in 1988 ……………

14 Indexes Dummy Case – solutions (sic)  How to resolve size problems  Reduced length of text field – 30 chars  One way only  Only in one data base – data entry  Used ‘marvellous feature’ of sequential data base access  SIR3 file held on tape(s)  Only PROCESS CASES ALL  No CASE IS

15 Indexes Lookups use in AMNDB  Two main programmes / suites of programmes  Batch run to convert text -> code  Interactive programme for each record that requires lookups  Need both for different reasons

16 Indexes Lookups use in AMNDB  Batch run to convert text -> code  Does all record types in data base that has data for looking up  Only converts records where text code is already available  Marks successful lookups  Leave unsuccessful ones for interactive

17 Indexes Lookups use in AMNDB  One interactive programme per record with data for looking up  Functions in same way as batch except  If no match prompts for one of following -  Retype  Delete  Edit  Add new lookup code

18 Indexes SQL tabfiles  Stuck with Dummy case until  SQL tabfiles  SAVE TABLE  SQL indexes  One external file could hold lookups

19 Indexes SQL tabfiles  One external file – many advantages  One multi-user file for all lookups  Simplified maintenance - one file to backup  Multiple indexes per table ( cf. record )  Max Key Size problem removed  BUT ……………

20 Indexes SQL tabfiles - problems  Can only see the ‘whole’ file picture  No equivalent of File dump / Add recs  Need SQL+ - clumsy  PQL programs- not intuitive  Could use Forms but not easy  Journalling was suspect  Ended up using EXPORT for backup  Exporting SQL tabfiles was idiosyncratic

21 Indexes SQL tabfiles - problems  If tabfile is ‘volatile’ – small frequent updates  Tabfile can get ‘corrupt’  Verify ‘drops’ table instead of correcting  Updates appeared to work but later find records missing / corrupt  Just had to ‘live with it’ ……… and hope for better

22 Indexes Second data base  No worries about Max Key Size  Multi user  Reliable DBMS utilities and functionality  UNLOAD  Journalling  File Dump / Add Recs  Easily look at the data  But ……………………

23 Indexes Second data base  Considered but not used  Why ?  No alternative ‘views’ / indexes  Back to two copies of data RECORD SCHEMA99, ICD2NAME SORT IDSICDCODE RECORD SCHEMA98, NAME2ICD SORT IDSICDNAME  No real advantages – ‘devil you know’

24 Indexes What do these have in common ? Still puzzled ??

25 Indexes Two data bases + Secondary Index  Then in SiR2002  Two, or more, data bases  Secondary Indexes  LOOKUP command  Decided to ‘go for broke’  Use all three ‘features’ to remove previous problems

26 Indexes Two data bases + Secondary Index  Two things spring to mind  Can of worms  Snail in a well

27 Indexes Two data bases + Secondary Index  The can of worms was trying to understand code written many years ago  How many of you have ‘revisited’ PQL you wrote 5 years ago ?  Can you remember what you were trying to do ?

28 Indexes Two data bases + Secondary Index  Do you add Comments to explain for future benefit C Only get records for Males over 60  Use | for ‘inline’ comments. END PROCESS REC | PREGNANCY  It was ‘grey hair’ time – yet again !!  Not sure which is worse  teenage daughters  old SiR code

29 Indexes Two data bases + Secondary Index  By hard work and perseverance managed to ‘decode’ the old code  This time I added Comments as I worked it out !!  So that was the can of worms sorted

30 Indexes Two data bases + Secondary Index  Just left the snail in the well  How does this relate to SiR code  Climb 3 feet during day – slip back two feet at night  At almost every stage I encountered ASFs  ‘Another SiR Feature’

31 Indexes Two data bases + Secondary Index  Enormous thanks to Tony for help, aid, assistance and patience  Six new versions of SiR within one week  Gradually felt that I was ‘climbing’ higher and higher

32 Indexes Two data bases + Secondary Index  One day after yet another new version - programme ran OK  Message to Tony “I’m out of the well !! “  Even so did some more testing

33 Indexes Two data bases + Secondary Index  Two hours later I sent a message using a phrase from this show “I’ve fallen in the water”  Yet another problem

34 Indexes Two data bases + Secondary Index  Yet again Tony came to the rescue  Apart from one bit of my code I need to sort the programmes are working OK  So how do we use  A second data base  Secondary Indexes  The LOOKUP command

35 Indexes Two data bases + Secondary Index  All lookups are held in a second, caseless data base  The key fields are the numeric codes to keep MAX KEY SIZE to minimum  Journalling is turned on

36 Indexes Two data bases + Secondary Index  The same code might refer to multiple text strings  Threatened Abortion  Thr Abr  TAbr  All mean same  Use AUTOKEY to cope  Secondary indexes on the text strings

37 Indexes Two data bases + Secondary Index  Experiences so far  Still testing but looks good  Faster and more reliable  Can look at the data easily  Correcting invalid data is easy  All the power and features of vPQL and DBMS utilities for maintenance  Only one system to learn / remember

38 Indexes Lookups – a summary  Dummy case  Rigid  Inflexible  Cumbersome  Can use DBMS etc. for maintenance  Now luckily replaced

39 Indexes Lookups – a summary  SQL tabfiles  Very flexible  Obtuse  No easy maintenance  SQL+ is cumbersome  Reliability / integrity of tabfiles  Better but for long term – flawed  Ad-hoc work – re-building every time

40 Indexes Lookups – a summary  Final solution with Second data base, Indexes & LOOKUP  Reliable  PQL is familiar  Fast  Combines good features of dummy case and tabfiles  Perhaps time will tell – but looks good


Download ppt "Indexes Lookups or “ the Good, the Bad and the Ugly ” John S. Lemon."

Similar presentations


Ads by Google