Filing and Word Breaking Procedures
2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure Procedures
3 Pre-14.x Various filing and word breaking procedures existed. Each procedure included many parts, but was a closed box. Each procedure was assigned a code, such as B1, B5, C1, A3, AM, etc. Each procedure was a separate program, requiring new program development to create new procedures. For example, there was no A3 + AM filing procedure.
4 From 14.1 onwards ALEPH provides ready-made components (programs) for creation of filing and word breaking procedures /tab/tab_word_breaking - an ALEPH table which identifies word breaking procedures and defines their component parts / tab/tab_filing - a table which identifies filing procedures and defines their component parts
5 /tab/tab_word_breaking - is an ALEPH table which identifies word breaking procedures and defines their component parts. Each word breaking procedure is made up of a group of one or more programs. tab_word_breaking
!!-!-!!!!!!!!!!!!!-!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 03 L abbreviation 03 L numbers 03 L compress - 03 L to_blank col.1: procedure identifier col.2: alpha of the text col.3: procedure name col.4: procedure parameters
7 Procedures (1) compress Strips characters listed in col. 4 delete_subfield Changes sub-field sign (e.g., $$x) to blank to_blank Changes characters listed in col. 4 to blanks
8 Procedures (2) subf_to_sign Changes second and subsequent sub-field signs to the single character listed in col. 4 blank_to_carat Changes blanks to carat (^) marc21_ for separating languages in MARC21 field 041
9 Procedures (3) Abbreviation Compresses a dot between single characters (e.g., I. B. M. changes to I B M; I.B.M. changes to IBM) Numbers Compresses a comma and a dot between numbers (e.g., 2,153 changes to 2153)
10 Procedures (4) IMPORTANT NOTE The procedures must be listed in logical order. For example, numbers must be listed before compress or change_to_blank if a comma or a dot is included in them. Otherwise, they will no longer be present when the numbers procedure is used.
11 Procedures (5) Reminder Word breaking procedures are used in tab11, section W. A line can be listed several times in tab11, in order to index it multiple times, with different word breaking each time. For example, an apostrophe: O’hara Ohara O hara 11 W 100## abcdq 01 B WRD WAU 11 W 100## abcdq 04 B WRD WAU
12 unicode_to_word_gen Word indexing routines, as well as retrieval routines, use the table defined under instance WORD-FIX in./alephe/unicode/tab_character_conversi on_line. The table is traditionally called unicode_to_word_gen.
13 unicode_to_word_gen This table defines equivalencies for characters, for the purpose of creating words in the words file. All characters naturally retain their unicode value, and are stored in the system in UTF encoding. In order to translate one character into another character (e.g. translating an accented "e" to "e"), you can set an equivalency. The equivalency can be up to 5 characters: 00E #LATIN SMALL LETTER AE
14 unicode_to_word_gen The library's tab_word_breaking table can define different treatment for the same characters. In separate procedures specific characters can be set to compress or to be changed to blank. Characters dealt with in this manner should be left in their natural value, and not translated in this table. For example, you might want an apostrophe to be considered like a blank, like itself, and as if it were not there at all (e.g. o'hara, ohara). In order to be able to set the apostrophe in tab_word_breaking as both as a compressed character, it must retain its natural value, and NOT be translated in this table.
15 Special Remarks 2. When browsing a word index in the OPAC, special characters are always displayed in their converted state. I.e., if unicode_to_word_gen table sets umlaut to ue, the word will be displayed with ue, and not with an umlaut.
16 tab_filing - Example 01 L del_subfield 01 L to_lower 01 L abbreviation 01 L suppress 01 L compress ' 01 L to_blank ={}[]:";<>?,./~` 01 L mc_to_mac 01 L pack_spaces 01 L char_conv FILING-KEY C chi
17 tab_filing - Structure !!-!-!!!!!!!!!!!!!!!!!!!!-!!!!!!!!!!!!!!> 01 L compress ’ 01 L char_conv FILING-KEY-01 col.1: procedure identifier col.2: alpha of the text col.3: procedure name col.4: procedure parameters
18 tab_filing Procedures (1) compress Strips characters listed in col. 4 (e.g., ()[]:,) delete_subfield Changes subfield sign to blank (e.g., $$x) to_blank Changes characters listed in col. 4 to blanks
19 tab_filing Procedures (2) to_lower Changes all characters to lower case to_carat Changes subfield sign to two carat (^^) signs in order to achieve hierarchical sorting of headings suppress Suppresses all text contained within >, as well as the signs themselves
20 tab_filing Procedures (3) expand_num For filing numbers numerically, adds leading zeroes to numbers to fixed length of 7 (e.g. 17 -> ) mc_to_mac Changes initial “mc” to “mac” (for interfiling McKay and MacKay) non_filing Suppresses initial text according to non-filing indicator defined in tab11
21 tab_filing Procedures (4) compress_blank Strips blanks (e.g. ISBN) numbers Compresses a comma and a dot between numbers (e.g., 2,153 changes to 2153) non_numeric Deletes all non-numeric characters (for ISBN, ISSN)
22 tab_filing Procedures (5) abbreviation Compresses a dot between single characters (e.g., I. B. M. changes to I B M, I.B.M. changes to IBM) build_filing_key_lc_call_no Special procedure for correct sequencing of LC call numbers
23 tab_filing Procedures (7) char_conv Translates one character for another (up to 5), using the char_conv procedure listed in the matching line of the tab_character_conversion_line in alephe/unicode For example: 01 L char_conv FILING-KEY-01 refers to the line FILING-KEY-01 ##### # line_utf2line_sb unicode_to_filing_01
24 unicode_to_filing_nn_source This table is used for character conversion for filing. The table must be processed using UTIL P/3 in order to create the unicode_to_filing_nn table. This latter table is the one actually used by the system. It performs an additional translation in order to remove null characters.
25 unicode_to_filing_01_source Examples: Latin capital letter AE: 00C Small letter sharp s: 00DF A
26 IMPORTANT NOTE The procedures must be listed in logical order. For example: numbers must be listed before compress or change_to_blank if comma or dot are included in them. Otherwise, they will no longer be present when the numbers procedure is used.
27./tab/tab_filing - usage Filing procedures are used when building filing key for headings (Z01), index entries (Z11) and sort keys (Z101)
28./tab/tab_filing - usage Note: if no procedure for creation of sort keys has been defined in tab01.lng, the system will use the default filing procedure 99. Filing procedure 99 MUST be defined tab_filing, as far as it installs the default sort order.