Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________ January 29, 2010.

Similar presentations


Presentation on theme: "Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________ January 29, 2010."— Presentation transcript:

1 Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________ January 29, 2010

2 Jan 2010 Outline _____________________________________________________ Background about the Arcadia non-Roman script project Introductions Orbis vs. YUFind and systems like YUFind Requirements discussion Wrap-up

3 Jan 2010 Project Goals _____________________________________________________ Gap analysis of multilingual, multi-script functionality in Lucene-Solr-Solrmarc discovery applications (e.g., YUFind) Identification of desirable functionality Collaboration opportunities, community interest Recommendations with level-of-effort analysis

4 Jan 2010 Orbis vs. Yufind _____________________________________________________

5 vs Chinese example: “ 中日韩经济合作的新起点 ” N-gram tokens, where N=2:

6 Jan 2010 Background: NR Scripts in Catalog Records _____________________________________________________

7 Jan 2010 JACKPHY _____________________________________________________ Japanese Arabic Chinese Korean Persian Hebrew Yiddish

8 Jan 2010 One-to-Many (CJK) _____________________________________________________ Example: “Mao Zedong” 毛泽东 Simplified 毛澤東 Traditional 毛沢東 Kanji (Modern)

9 Jan 2010 One-to-Many (CJK) _____________________________________________________ “Mao Zedong” in simplified Chinese characters retrieves 527 results

10 Jan 2010 One-to-Many (CJK) _____________________________________________________ The same search in traditional Chinese characters yields154 hits. Also Note paired fields

11 Jan 2010 One-to-Many (Digraphs) _____________________________________________________ ו וירטשאפט The Yiddish word “Virtshaft” is entered here with two separate vavs (i.e., key stroke ‘u’ in Microsoft’s Hebrew IME): U05D5 + U05D5

12 Jan 2010 One-to-Many (Digraphs) _____________________________________________________ N = 49 results

13 Jan 2010 One-to-Many (Digraphs) _____________________________________________________ װירטשאפט The same word is this time entered as a double-vav digraph = U05F0 (via MS Hebrew IME key combo right-alt+u)

14 Jan 2010 One-to-Many (Digraphs) _____________________________________________________ N = 11 results

15 Jan 2010 NR Spelling Suggestions _____________________________________________________ Unhelpful suggestion?

16 Jan 2010 Labels and Facets _____________________________________________________ Should script/language of query determine script/language of facets?

17 Jan 2010 Labels and Facets _____________________________________________________ Better would be: 杉本つとむ, 1927- (11) 高橋幹夫, 1935- (11) 野口武彦. (8) 渡辺信一郎, 1934- (7) OR: Sugimoto, Tsutomu, 1927- (11) Takahashi, Mikio, 1935- (11) Noguchi, Takehiko. (8) Watanabe, Shin’ichirō, 1934- (7) But not both mixed together. Let end user decide?

18 Jan 2010 Labels and Facets _____________________________________________________ We would like to choose our preference of display script here. For example, 江戸 By: 野村兼太郎, 1896-1960. Published: 1942 Format: Book, Electronic Resource 江戶 の 翻訳家たち By: 杉本 つとむ, 1927- Published: 1995 Format: Book, Electronic Resource We would like to ask library users the best option for displaying parallel field data: 江戶 / 田中優子編. Contributors: 田中優子, 1952- Format: Book Language: Japanese Published: 東京 : 作品社, 1998. Series: 日本の名随筆. 03 别卷 ; 94 江戶 / 田中優子編. Edo / Tanaka Yūko hen. Contributors: 田中優子, 1952- Tanaka, Yūko, 1952- Format: Book Language: Japanese Published: 東京 : 作品社, 1998. Tōkyō : Sakuhinsha, 1998. Series: 日本の名随筆. 03 别卷 ; 94 Nihon no meizuihitsu. 03 Bekkan ; 94

19 Jan 2010 Language/Script of Interface _____________________________________________________ OCLC’s brief record display Interface easily flipped to one of several languages

20 Jan 2010 Language/Script of Interface _____________________________________________________ OCLC’s detailed record display with Japanese language interface

21 Language/Script of Interface OCLC WorldCat.org does localization of labels and instructions as well as localization of mapped facet values. Examples here in Chinese.

22 Jan 2010 Language/Script of Interface _____________________________________________________

23 Jan 2010 Language/Script of Interface & Text Directionality _____________________________________________________

24 Jan 2010 Sorting of Results _____________________________________________________ 江戸文学俗信辞典 Edo bungaku zokushin jiten 江戸文学地名辞典Edo bungaku chimei jiten 江戸文学辞典Edo bungaku jiten 江戸文様辞典Edo mon’yo jiten

25 Jan 2010 Sorting of Results _____________________________________________________ Also note bi- directional text

26 Jan 2010 Sorting within result sets: Options to Consider _____________________________________________________ For multiple languages sharing a script, e.g. Chinese ideographs, Arabic, Hebrew, or Latin, how would the users prefer to see the result sets sorted? We consider here the Chinese & Arabic cases…

27 Jan 2010 Sorting within Result Sets: Options to Consider _____________________________________________________ Sorting of results returned in Chinese script— Three sort strategies: (a) sort by Romanized equivalents; (b) sort by pronunciation; or (c) sort by radical- stroke?

28 Jan 2010 Sorting within Results Sets: Arabic script _____________________________________________________ How to handle additional Arabic-script characters in use for languages such as Persian, Kurdish, and/or Urdu? ڤ (vah, derived from ﻑ, fah) پ‎(pah) ﭺ (chah, derived from ج, g ̌ im) گ (gaf) ژ (zāī, derived from ز, zayin)

29 Jan 2010 Discussion User Needs and Expectations


Download ppt "Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________ January 29, 2010."

Similar presentations


Ads by Google