Download presentation
Presentation is loading. Please wait.
1
International Domain Name TWNIC Nai-Wen Hsu snw@twnic.net.tw
2
Domain name RFC 1035 A label can not longer than 63 characters A domain name can not longer than 255 characters Maximum labels: 127 Only accept a-z,0-9, ’ - ’ as domain name Limited ASCII character code point, 37 LDH (Letter-Digit-Hyphen)
3
International Domain Name IETF IDN WG adopt UNICODE 3.2 Greek, Cyrillic, Armenian, Hebrew, Arabic, Syriac, Thaana, Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala, Thai, … 95,156 characters
4
International Domain Name sample レコード会社.jp gwm ö bler.com 慎昌鐘錶.tw 阿克苏诺贝尔油漆公司.cn 소프트웨어.kr לארשי. םוק
5
IETF IDN Standard IDNA (RFC3490) Internationalizing Domain Names in Applications NAMEPREP(RFC3491) A Stringprep Profile for Internationalized Domain Names PUNYCODE(RFC3492) A Bootstring encoding of Unicode for Internationalized Domain Names in Applications STRINGPREP(RFC3454) Preparation of Internationalized Strings
6
User IDNA-aware Application (ToASCII and ToUnicode operations may be called here) Resolver DNS Servers Application Servers DNS Protocol ACE Call to resolver ACE Application-specific Protocol: ACE Unless the protocol Is updated to handle Other encodings Input and display: local interface methods (pen, keyboard,...) End system "Application" is where the application splits a host name into labels, sets the appropriate flags, and performs the ToASCII and ToUnicode operations. IDNA components and interfaces IDNA xn--de-jg4avhby1noc0d
7
IDNA Structure NAMEPREP Mapping Normalization Prohibit ACE (PUNYCODE) User input (UNICODE) STRINGPREP To resolver ACE Nameprep: A Stringprep Profile for Internationalized Domain Names IDNA ToASCIIToUnicode
8
NAMEPREP A Stringprep Profile for Internationalized Domain Names Mapping Stringprep table B.1,B.2 Normalization Form KC Prohibited Output Stringprep table C.1.2,2.2,3,4,5,6,7,8,9
9
NAMEPREP -- Mapping Commonly mapped to nothing: 27 Ex: Mapping for case-folding used with NFKC: 1371 Ex: A a (U+0041 U+0061) (U+03AB U+03CB) (U+3371 U+0068 U+0070 U+0061)
10
NAMEPREP -- Normalization Unicode normalization with form KC
11
NAMEPREP -- Normalization ‘u’+‘ ‥ ’ ‘ü’ ‘ a ’ ‘ a ’
12
NAMEPREP – Prohibited output Non-ASCII space characters: 17 Ex: (NO-BREAK SPACE) Non-ASCII control characters: 54 Ex: (DEVICE CONTROL STRING) Private use: 133371 Non-character code points: 49 Surrogate codes: 2048
13
NAMEPREP – Prohibited output Inappropriate for plain text: 4 Inappropriate for canonical representation: 12 Change display properties or are deprecated: 13 Tagging characters: 97
14
PUNYCODE A Bootstring encoding of Unicode for IDNA One of the ACE( ASCII Compatible Encoding) Translate non-ASCII characters to ASCII characters Prefix: xn-- Ex: 慎昌鐘錶.tw xn--ciun9hb52c2za.tw
15
Insufficient in IDN standard Current IDN standard (IDNA, NAMEPREP, PUNYCODE) can not solve Chinese domain name requirement Tradition/Simplify Chinese mapping Ex: 台 臺 Writing variant mapping Ex: 峰 峯
17
Insufficient in IDN standard They are the same meaning but it is different character in different countries In China: 劝 (529D) In Japan: 勧 (52E7) In Taiwan: 勸 (52F8)
18
IDN administration guide line Registration policy to solve those problems listed above Every language has a variant table with 3 fields: valid code point recommended variant character variant
19
Variant Table sample Valid code point (VCP) Recommended variants by.tw (twRV) Recommended variants by.cn (cnRV) Character Variant(s) (CV) Remarks 丁 (4E01) Singular-relation character(1) 丄 (4E04) 上 (4E0A) 丄 (4E04) 上 (4E0A) Pair-relation characters (2.1) 上 (4E0A) 丄 (4E04) 上 (4E0A) 万 (4E07) 萬 (842C) Pair-relation characters (2.2) 萬 (842C) 万 (4E07) 萬 (842C)
20
Valid code point (VCP) Recommended variants by.tw (twRV) Recommended variants by.cn (cnRV) Character Variant(s) (CV) remarks 叶 (53F6) 葉 (8449) 叶 (53F6) 葉 (8449) Pair-relation characters (2.3) 葉 (8449) 叶 (53F6) 葉 (8449) 个 (4E2A) 個 (500B) 个 (4E2A) 个 (4E2A) 個 (500B) 箇 (7B87) Multiple-relation Characters 個 (500B) 个 (4E2A) 个 (4E2A) 個 (500B) 箇 (7B87) 箇 (7B87) 個 (500B) 个 (4E2A) 个 (4E2A) 個 (500B) 箇 (7B87) Variant Table sample
21
Variant Table Singular-relation character (VCP=twRV=cnRV=CV): 13888(66.4%) VCP=twRV≠cnRV: 2783 (13.3%) VCP=cnRV≠twRV: 2453(11.7%) VCP≠(twRV=cnRV): 333(1.6%) VCP≠twRV≠SCR: 387(1.9%)
22
Variant Table Number of character variant(s) 12345678 Number of Characters 13888 66.4% 5156 24.7% 1158 5.5% 424 2.0% 165 0.79% 60 0.29% 35 0.17% 16 0.08%
23
Variant Table The table draft is prepared by the CCMT Task force organized by TWNIC from January, 2002. Task force members have 9 experts from language linguist, computer experts and DNS experts. The table draft has submitted to the Bureau of Standards, Ministry of Economic Affairs to final review.
24
Registration procedure A Registrant should select the language(s) Activation of the requested domain name(s) & Reservation of the equivalence(s) should be provided by the Registry, within the language-based character set The registrant can require the activation of the reserved equivalent domain name(s) at any time
25
Registration sample A user select zh-tw and zh-cn language with domain name 丁上萬.com 丁上萬.com (Recommended variants for zh-tw) 丁上万.com (Recommended variants for zh-cn) 丁丄万.com (Character Variant) 丁丄萬.com (Character Variant)
26
Q & A
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.