Indexing Innovations 14.2 Seminar 14.1 Seminar - Filing Procedures
Session Agenda Filing and word breaking procedures Indexing procedures - new features: Parallel processing Updating a group of indexes New indexing routines Filing Procedures 14.1 Seminar - Filing Procedures
Parallel Processing Filing Procedures 14.1 Seminar - Filing Procedures
Parallel Processing Problems with indexing batch routines in the early versions of ALEPH: Long run time Computer resources not fully utilized - single process per stage No recoverability – if indexing failed, the whole building process needed to be rerun Filing Procedures 14.1 Seminar - Filing Procedures
Parallel Processing In 14.2 all the index creation jobs (with the exception of p_manage_27) enable parallel processing. Filing Procedures 14.1 Seminar - Filing Procedures
Parallel Processing Optimal utilization of computer resources (Large databases - multiple processors) Certain stages of index creation can be split into several cycles – this allows you to divide the workload among different processors Indexing is much quicker Filing Procedures
Parallel Processing – Tracking Assignment progress table: good control of indexing stages 0001 + + ? - 000000001 000010000 0002 + ? - - 000010001 000020000 0003 + - - - 000020001 000030000 0004 ? - - - 000030001 000040000 0005 ? - - - 000040001 000050000 0006 ? - - - 000050001 000060000 0007 - - - - 000060001 000070000 0008 - - - - 000070001 000080000 0009 - - - - 000080001 000090000 + success ? in process - not processed Filing Procedures
Parallel Processing - Recovery If: database tables need to be enlarged not enough disk space - intermediate files not enough disk space - sort general disaster You do not have to rerun the whole process! Filing Procedures
Parallel Processing - Recovery Recovery stages: identify last successful section change “in process” signs (?) to “not processed” sign (-) rerun discrete stage scripts: For example: p_manage_01_a p_manage_01_c p_manage_01_d p_manage_01_d1 Filing Procedures
Parallel Processing – Main Features Indexing is quicker Tracking is easier Recoverability is possible Filing Procedures
Updating a Group of Indexes Filing Procedures 14.1 Seminar - Filing Procedures
Updating a Group of Indexes p_manage_01 and p_manage_02 have a new feature allowing you to update a specific group of indexes. Col.8 defines a group of headings/word indexes for updating: 11 W 008 F07-04 01 A WRD WYR 11 W 008 F35-03 01 A WRD WLN 11 W LOC## -o 03 WRD WCL 11 W 041## abdefg 41 A WRD WLN Filing Procedures 14.1 Seminar - Filing Procedures
Updating a Group of Indexes This option is only available when the program is run from the prompt command line. It is not available from the Web Services. The following is an example of the way in which the program should be run for fields that belong to group B: csh -f p_manage_02 USM01,1,000000000,999999999,B,1,0,00, csh -f p_manage_01 USM01,1,000000000,999999999,B,1,0,00, Filing Procedures 14.1 Seminar - Filing Procedures
Z0102 – COUNTERS FOR LOGICAL BASES Filing Procedures 14.1 Seminar - Filing Procedures
z0102 Pre-14.2 – Problem: Solution: Filing Procedures Scanning logical bases which are less than 50% of the total database is very inefficient (slow, irrelevant unlinked headings ) Solution: There is a new index z0102 which ‘divides’ z01 into sections in accordance with the existing logical bases. Filing Procedures 14.1 Seminar - Filing Procedures
z0102 Filing Procedures Example of z0102 record: 14.1 Seminar - Filing Procedures
z0102 When a logical base is being browsed, the system uses the Z0102 table to “decide” whether to display the heading (Z01) without having to retrieve the documents attached to the heading, Read them, and then “decide”. Filing Procedures 14.1 Seminar - Filing Procedures
z0102 Structure: A record is built for each Z01 and each logical base, giving the filing text and sequence (in order to make the SCAN more efficient) and a counter of the number of relevant docs. Records are built for "see" reference headings, as well as for preferred headings. The record does not include pointers to the doc records; this is still done by Z02. Filing Procedures 14.1 Seminar - Filing Procedures
z0102 Filing Procedures Run p_manage_32 to create z0102 Building the table: Run p_manage_32 to create z0102 Run p_manage_34 to update z0102 - p_manage_32 runs on all Z01 records and builds Z0102. When p_manage_02 is run, p_manage_32 should be run directly afterwards. - p_manage_34 runs on Z01 records that have been "touched" since the last time 32 or 34 were run. It should be run on a regular basis -- i.e. nightly, listed in the job_list (UTIL E/15/1). Filing Procedures 14.1 Seminar - Filing Procedures
z0102 Z01 records that have been "touched“… Filing Procedures - Z01 has a new field, Z01-UPDATE-Z0102. - p_manage_02 set this flag to "Y". p_manage_32 and _34 set this flag to "N". update of z01 sets Z01-UPDATE-Z0102 is set to"Y". - p_manage_34 re-indexes Z01 records that have Z01-UPDATE-Z0102 = "Y". Filing Procedures 14.1 Seminar - Filing Procedures
z0102 Restrictions: Filing Procedures Z0102 is used only for the WEB OPAC browse A new switch in the WEB OPAC defines which tables are involved in BROWSE. If TAB10-Z0102-IN-USE = ‘Y’ – browse is performed by z0102 If TAB10-Z0102-IN-USE = ‘N’ –z0102 does not participate in BROWSE Presently, there is no online update of z0102. Filing Procedures 14.1 Seminar - Filing Procedures
New Batch Jobs for AUT Enrichment Pre – 14.2 : AUT enrichment and correction of BIB after initial conversion or re- indexing is very time-consuming (it takes up to several days). Solution: New batch jobs for AUT enrichment and correction of BIB libraries. These batch jobs will replace the background running of ue_08 after a re-indexing of the z01 indexes. p_manage_102: enrich the BIB z01 index from the entire AUT library p_manage_104: reset the Z01 created from regular indexing to "-CHK-" status p_manage_103: send Z07 records to all potential "corrected” BIB docs. Filing Procedures 14.1 Seminar - Filing Procedures
New Batch Jobs for AUT Enrichment p_manage_102: enrich the BIB z01 index from the entire AUT library p_manage_104: reset the Z01 created from regular indexing to "-CHK-" status p_manage_103: send Z07 records to all potential "corrected" BIB docs. Filing Procedures 14.1 Seminar - Filing Procedures