Download presentation
Presentation is loading. Please wait.
Published byElijah Stevens Modified over 11 years ago
1
Problems with Non-roman Character (Korean) Searching Prepared by Prepared by Young Ki Lee Young Ki Lee Senior Cataloging Specialist Senior Cataloging Specialist Korean/Chinese Team Korean/Chinese Team RCCD RCCD Library of Congress Library of Congress
2
Topics to be covered 1.Non-roman script (Korean) searching under CJK data fields without spacing 2.No Unified index (Normalization) between Hangul (Korean) and Hancha (Chinese character) 3.Microsoft Korean IME 4.Display of search results 5.CJK Compatibility Database
3
Title Word Search for Title Word Search for Search ( : the border): -the number of hits on this ti: search is 363 -the ratio of relevant hits only 13 % (13 out of 99) in the 1 st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are picked up by System, such as : / : / /, : /, etc. -In Voyager (currently with space), same search (tkey ) retrieves only 9 hits
4
Search9
5
Title Word Search for Title Word Search for Search ( : the border): -the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1 st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are picked up by System, such as : / : / /, : /, etc. -In Voyager (currently with space), same search (tkey ) retrieves only 9 hits
6
Title Word Search for Title Word Search for Search ( : the border): -the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1 st group (Books 1970- 1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / /, = /, etc. -In Voyager (currently with space), same search (tkey ) retrieves only 9 hits
7
Title Word Search for Title Word Search for Search ( : the border): -the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1 st group (Books 1970- 1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as =, =, etc. -In Voyager (currently with space), same search (tkey ) retrieves only 9 hits
8
Title Word Search for Title Word Search for Search ( : the border): -the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1 st group (Books 1970- 1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / = / /, = /, etc. -In Voyager (currently with space), same search (tkey ) retrieves only 9 hits
9
Title Word Search for Title Word Search for Search ( : the border): -the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1 st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / /, = /, etc. -In Voyager (currently with space), same search (tkey ) retrieves only 9 hits
10
Title Word Search for Title Word Search for Search ( : the border): -the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1 st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / / = /, etc. -In Voyager (currently with space), same search (tkey ) retrieves only 9 hits
11
7
12
Title Word Search for Title Word Search for Search ( : the border): -the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1 st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / /, = /, etc. -In LC Online Catalog: (currently with space), title word search retrieves only 9 hits
13
Title Word Search for Title Word Search for Search ( : philology): -In OCLC, the number of hits on ti: search is 308 -the ratio of relevant hits is only 37% (36 out of 95) in the first group (Books 1900-1991) -Includes = = = / = / = = = / = /, = /, etc., = /, etc. -In Voyager (currently with space), same search (tkey ) retrieves 32 hits
14
Title Word Search for Title Word Search for Search ( : name of ancient Korean country) Search ( : name of ancient Korean country) retrieves irrelevant records, such as retrieves irrelevant records, such as = / / / / / = / / / / / CD-ROM = CD-ROM/ / / / /CD-ROM = CD-ROM/ / / / / = / / = / / = / / / / / / = / / / / / / = / / = / / 5 5 = / / /5 / / / / / / / / = / / /5 / / / / / / / / = / / /, etc. = / / /, etc.
15
2
16
4
17
7
18
Kochoson8
19
komunso1
20
Komunso2
21
Komunso3
22
Title Word Search for Title Word Search for ( : Korean Economy): ti: search ( : Korean Economy): ti: search -search : the number of hits 300 -search : the number of hits 652 -search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490 Title Phrase search for : ti= search
23
Title Word Search for Title Word Search for ( : Korean Economy): ti: search ( : Korean Economy): ti: search -search : the number of hits 295 -search : the number of hits 652 -search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490 Title Phrase search for : ti= search
24
Title Word Search for Title Word Search for ( : Korean Economy): ti: search ( : Korean Economy): ti: search -search : the number of hits 295 -search : the number of hits 652 -search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490 Title Phrase search for : ti= search
25
Title Word Search for Title Word Search for ( : Korean Economy): ti: search ( : Korean Economy): ti: search -search : the number of hits 295 -search : the number of hits 652 -search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490 Title Phrase search for : ti= search
26
Title Word Search for Title Word Search for ( : Korean Economy): ti: search ( : Korean Economy): ti: search -search : the number of hits 295 -search : the number of hits 652 -search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,499 Title Phrase search for : ti= search
27
Title Phrase Search for Title Phrase Search for ( : Korean Economy): ti: search ( : Korean Economy): ti: search -search : the number of hits 295 -search : the number of hits 652 -search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490 -search # : the number of hits : 461 (ti: AND ti: ) Title Phrase search for : ti= search
28
Search ti: nodongja or or or Search ti: nodongja or or or
30
Korean IME Problems 1. Personal name search with invalid character from Korean IME -Search in pn: : 0 hit. (F9E1) is invalid character from Korean IME -Search in pn: : 157 hits. (674E) is valid MARC21 character 2. Title search with invalid character from Korean IME 2. Title search with invalid character from Korean IME -Search in ti: : 0 hit. (F941) is invalid character from Korean IME -Search in ti: : 21,393 hits. (8AD6) is valid MARC21 character 3. Korean Family name 3. Korean Family name -No MARC 21 equivalent
31
Display Order 1. Browse search: sorted by Unicode value number – roman – Japanese – Hancha – Hangul 2.Keyword search: sorted by alphabet order of Romanization form number -- Romanization 3.Display order : character by character on designated value
32
sort2 Unicode total strokes radical (# : stroke) : 9280: 14 167 (gold) 8 : 9580 : 8 169 (gate) 8 : 990A: 15 184 (eat) 6 : 9B42 14 194 (ghost) 10 : AC00
33
sort3
34
Display Order 1. Browse search: sorted by Unicode value number – roman – Japanese – Hancha – Hangul 2.Keyword search: sorted by alphabet order of Romanization form number -- Romanization 3.Display order : character by character on designated value NOT word by word
36
sort1 : C9C4 : CE68 : C911 : C778
37
Display Order 1.Browse search: sorted by Unicode value number – roman – Japanese – Hancha – Hangul 2.Keyword search: sorted by alphabet order of Romanization form number -- Romanization 3.Display order : character by character on designated value NOT word by word
38
CJK Compatibility Database 1. The CJK Compatibility Database includes more than 450 non-MARC21 Chinese, Japanese and Korean characters, Hangul syllables and diacritic marks, matched with their MARC21 equivalents. 2. The database is intended to enable catalogers to quickly and conveniently replace a non-MARC21 character with its MARC21 equivalent. 3. The list of characters in the database was initially identified by LC staff, and was supplemented by entries in a similar database at Yale University. 4. The database is a cooperative undertaking, and is intended for the use of all CJK catalogers. If you encounter a non-MARC21 character in the course of your work, please report it to us so that it can be added to the database. Notify Young Ki Lee, Senior Cataloging Specialist, Korean/Chinese Team, Library of Congress, at ylee@loc.gov.
39
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.