Presentation is loading. Please wait.

Presentation is loading. Please wait.

Problems with Non-roman Character (Korean) Searching Prepared by Prepared by Young Ki Lee Young Ki Lee Senior Cataloging Specialist Senior Cataloging Specialist.

Similar presentations


Presentation on theme: "Problems with Non-roman Character (Korean) Searching Prepared by Prepared by Young Ki Lee Young Ki Lee Senior Cataloging Specialist Senior Cataloging Specialist."— Presentation transcript:

1 Problems with Non-roman Character (Korean) Searching Prepared by Prepared by Young Ki Lee Young Ki Lee Senior Cataloging Specialist Senior Cataloging Specialist Korean/Chinese Team Korean/Chinese Team RCCD RCCD Library of Congress Library of Congress

2 Topics to be covered 1.Non-roman script (Korean) searching under CJK data fields without spacing 2.No Unified index (Normalization) between Hangul (Korean) and Hancha (Chinese character) 3.Microsoft Korean IME 4.Display of search results 5.CJK Compatibility Database

3 Title Word Search for Title Word Search for Search ( : the border): -the number of hits on this ti: search is 363 -the ratio of relevant hits only 13 % (13 out of 99) in the 1 st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are picked up by System, such as : / : / /, : /, etc. -In Voyager (currently with space), same search (tkey ) retrieves only 9 hits

4 Search9

5 Title Word Search for Title Word Search for Search ( : the border): -the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1 st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are picked up by System, such as : / : / /, : /, etc. -In Voyager (currently with space), same search (tkey ) retrieves only 9 hits

6 Title Word Search for Title Word Search for Search ( : the border): -the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1 st group (Books 1970- 1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / /, = /, etc. -In Voyager (currently with space), same search (tkey ) retrieves only 9 hits

7 Title Word Search for Title Word Search for Search ( : the border): -the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1 st group (Books 1970- 1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as =, =, etc. -In Voyager (currently with space), same search (tkey ) retrieves only 9 hits

8 Title Word Search for Title Word Search for Search ( : the border): -the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1 st group (Books 1970- 1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / = / /, = /, etc. -In Voyager (currently with space), same search (tkey ) retrieves only 9 hits

9 Title Word Search for Title Word Search for Search ( : the border): -the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1 st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / /, = /, etc. -In Voyager (currently with space), same search (tkey ) retrieves only 9 hits

10 Title Word Search for Title Word Search for Search ( : the border): -the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1 st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / / = /, etc. -In Voyager (currently with space), same search (tkey ) retrieves only 9 hits

11 7

12 Title Word Search for Title Word Search for Search ( : the border): -the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1 st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / /, = /, etc. -In LC Online Catalog: (currently with space), title word search retrieves only 9 hits

13 Title Word Search for Title Word Search for Search ( : philology): -In OCLC, the number of hits on ti: search is 308 -the ratio of relevant hits is only 37% (36 out of 95) in the first group (Books 1900-1991) -Includes = = = / = / = = = / = /, = /, etc., = /, etc. -In Voyager (currently with space), same search (tkey ) retrieves 32 hits

14 Title Word Search for Title Word Search for Search ( : name of ancient Korean country) Search ( : name of ancient Korean country) retrieves irrelevant records, such as retrieves irrelevant records, such as = / / / / / = / / / / / CD-ROM = CD-ROM/ / / / /CD-ROM = CD-ROM/ / / / / = / / = / / = / / / / / / = / / / / / / = / / = / / 5 5 = / / /5 / / / / / / / / = / / /5 / / / / / / / / = / / /, etc. = / / /, etc.

15 2

16 4

17 7

18 Kochoson8

19 komunso1

20 Komunso2

21 Komunso3

22 Title Word Search for Title Word Search for ( : Korean Economy): ti: search ( : Korean Economy): ti: search -search : the number of hits 300 -search : the number of hits 652 -search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490 Title Phrase search for : ti= search

23 Title Word Search for Title Word Search for ( : Korean Economy): ti: search ( : Korean Economy): ti: search -search : the number of hits 295 -search : the number of hits 652 -search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490 Title Phrase search for : ti= search

24 Title Word Search for Title Word Search for ( : Korean Economy): ti: search ( : Korean Economy): ti: search -search : the number of hits 295 -search : the number of hits 652 -search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490 Title Phrase search for : ti= search

25 Title Word Search for Title Word Search for ( : Korean Economy): ti: search ( : Korean Economy): ti: search -search : the number of hits 295 -search : the number of hits 652 -search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490 Title Phrase search for : ti= search

26 Title Word Search for Title Word Search for ( : Korean Economy): ti: search ( : Korean Economy): ti: search -search : the number of hits 295 -search : the number of hits 652 -search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,499 Title Phrase search for : ti= search

27 Title Phrase Search for Title Phrase Search for ( : Korean Economy): ti: search ( : Korean Economy): ti: search -search : the number of hits 295 -search : the number of hits 652 -search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490 -search # : the number of hits : 461 (ti: AND ti: ) Title Phrase search for : ti= search

28 Search ti: nodongja or or or Search ti: nodongja or or or

29

30 Korean IME Problems 1. Personal name search with invalid character from Korean IME -Search in pn: : 0 hit. (F9E1) is invalid character from Korean IME -Search in pn: : 157 hits. (674E) is valid MARC21 character 2. Title search with invalid character from Korean IME 2. Title search with invalid character from Korean IME -Search in ti: : 0 hit. (F941) is invalid character from Korean IME -Search in ti: : 21,393 hits. (8AD6) is valid MARC21 character 3. Korean Family name 3. Korean Family name -No MARC 21 equivalent

31 Display Order 1. Browse search: sorted by Unicode value number – roman – Japanese – Hancha – Hangul 2.Keyword search: sorted by alphabet order of Romanization form number -- Romanization 3.Display order : character by character on designated value

32 sort2 Unicode total strokes radical (# : stroke) : 9280: 14 167 (gold) 8 : 9580 : 8 169 (gate) 8 : 990A: 15 184 (eat) 6 : 9B42 14 194 (ghost) 10 : AC00

33 sort3

34 Display Order 1. Browse search: sorted by Unicode value number – roman – Japanese – Hancha – Hangul 2.Keyword search: sorted by alphabet order of Romanization form number -- Romanization 3.Display order : character by character on designated value NOT word by word

35

36 sort1 : C9C4 : CE68 : C911 : C778

37 Display Order 1.Browse search: sorted by Unicode value number – roman – Japanese – Hancha – Hangul 2.Keyword search: sorted by alphabet order of Romanization form number -- Romanization 3.Display order : character by character on designated value NOT word by word

38 CJK Compatibility Database 1. The CJK Compatibility Database includes more than 450 non-MARC21 Chinese, Japanese and Korean characters, Hangul syllables and diacritic marks, matched with their MARC21 equivalents. 2. The database is intended to enable catalogers to quickly and conveniently replace a non-MARC21 character with its MARC21 equivalent. 3. The list of characters in the database was initially identified by LC staff, and was supplemented by entries in a similar database at Yale University. 4. The database is a cooperative undertaking, and is intended for the use of all CJK catalogers. If you encounter a non-MARC21 character in the course of your work, please report it to us so that it can be added to the database. Notify Young Ki Lee, Senior Cataloging Specialist, Korean/Chinese Team, Library of Congress, at ylee@loc.gov.

39 Thank you


Download ppt "Problems with Non-roman Character (Korean) Searching Prepared by Prepared by Young Ki Lee Young Ki Lee Senior Cataloging Specialist Senior Cataloging Specialist."

Similar presentations


Ads by Google