Presentation is loading. Please wait.

Presentation is loading. Please wait.

Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer Education Technology Services.

Similar presentations


Presentation on theme: "Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer Education Technology Services."— Presentation transcript:

1 Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer ejp10@psu.edu Education Technology Services

2 Supporting Multiple Languages v Unpopular Language Support (Easy): All English Alphabet, all the time. All English Alphabet, all the time. “Escribes vous Russki (Russian)? No” “Escribes vous Russki (Russian)? No” v Preferred Language Support (Harder): Display native scripts and punctuation Display native scripts and punctuation Display appropriate punctuation/symbols Display appropriate punctuation/symbols «¿Escribes vous Русский ? !Sí!» «¿Escribes vous Русский ? !Sí!»

3 Script versus Language v Arabic Script used for – Arabic, Ottoman Turkish, Persian (Farsi), etc. v Cyrillic Script used for – Russian, Ukrainian, Uzbek, Bulgarian, etc. v Serbo-Croatian (1 language) Cyrillic Text = “Serbian” Cyrillic Text = “Serbian” Roman (English alphabet) Text = “Croatian” Roman (English alphabet) Text = “Croatian” v Hindi-Urdu (also 1 language) (Hin = Devanagari / Urd = Arabic script)

4 Language of Scripts i18n = internationalization v Roman/Latin alphabet = English alphabet v Cyrillic = Russian v RTL =Right to Left (e.g. Arabic/Hebrew) v CJK = Chinese-Japanese-Korean Chinese has largest character count Chinese has largest character count v South Asian = Scripts of India (many)

5 Taxonomy of scripts C = Consonant; V = Vowel v Alphabet - 1 letter = 1 vowel or consonant Roman, Cyrillic, Greek, Runes, Georgian, Armenian, etc Roman, Cyrillic, Greek, Runes, Georgian, Armenian, etc Typing - map single letters to character Typing - map single letters to character v Syllabary - 1 character = 1 CV syllable Japanese, Cherokee, Ethiopic, Sumerian Japanese, Cherokee, Ethiopic, Sumerian Typing - map CV sequence into character Typing - map CV sequence into character (e.g. Jap Katagana na-wa = ナワ ) (e.g. Jap Katagana na-wa = ナワ )

6 Taxonomy of scripts C = Consonant; V = Vowel v Ideographic (Chinese) - 1 character / 1 meaning Symbols combined to make compounds Symbols combined to make compounds Typing - map CV sequence to list of possible characters Typing - map CV sequence to list of possible characters Ideographic scripts can have syllabary component Ideographic scripts can have syllabary component v Consonantal Syllabary - letters are consonants; vowels are diacritics on C’s Korean, Thai, languages of India, Cree, etc. Korean, Thai, languages of India, Cree, etc. Typing uses CV sequences. Fonts must alter characters depending on surrounding sounds Typing uses CV sequences. Fonts must alter characters depending on surrounding sounds E.g. Susi = suis E.g. Susi = suis

7 Scripts & Encoding v ASCII - assign a number to a character Excel Formula =CHAR(65) results in “A” Excel Formula =CHAR(65) results in “A” v Modern Encoding expands the repertoire beyond ASCII but with inconsistent implementations for different platforms/scripts v Know the encoding for your script/language. Needed for debugging.

8 Some Notable Encodings v Latin 1 (ISO-8859-1) English, Most W. Europe, Africa, Pacific Is., Nat. American English, Most W. Europe, Africa, Pacific Is., Nat. American v Latin 2 (ISO-8859-2) (Latin 3/Latin 4…) Central Europe (Hungarian, Polish, Czech) Central Europe (Hungarian, Polish, Czech) v Big5 (Chinese only), Shift-JIS (Japanese only), etc. v “ISO” vs. “Windows” Parallel Encodings (e.g. Hebrew) ISO-8859-8 (Visual Hebrew) ISO-8859-8 (Visual Hebrew) Windows-1255 (Windows Hebrew) (also MacHebrew) Windows-1255 (Windows Hebrew) (also MacHebrew) Parallel ISO/Windows for many scripts (Arabic, Cyrillic, etc) Parallel ISO/Windows for many scripts (Arabic, Cyrillic, etc) v Unicode (Super Encoding, all scripts) “Exotic Latin Alphabet” - Welsh, Hawaiian, Old Irish etc. “Exotic Latin Alphabet” - Welsh, Hawaiian, Old Irish etc. Also Chinese, Japanese, Cyrillic, Arabic, Hebrew, Greek… Also Chinese, Japanese, Cyrillic, Arabic, Hebrew, Greek…

9 Now What do I do? v Step 1 - Select target languages (don’t forget English) v Step 2 - Determine which encoding supports language. v Step 3 - Develop properly encoded page. Aim for Unicode (even English). v Step 4 - Declare encoding & language in HTML Meta tags

10 How do I get properly encoded text? v Latin 1 (English, Spanish, French, German) Use entity codes (e.g. ñ for ñ) Use entity codes (e.g. ñ for ñ) Declare encoding Declare encoding v Major World Language Set up keyboards Set up keyboards Type in text editor/HTML editor Type in text editor/HTML editor Declare encoding & language Declare encoding & language v Undersupported Language Get correct fonts/keyboards or “PDF it”. Get correct fonts/keyboards or “PDF it”.

11 Character Codes (Latin 1 Langs) v Applies to “Western European” languages only v Always use for backwards compatability Some examples: v Accent codes - e.g. ñ = ñ v Punctuation - e.g. © = © v Old Math - e.g. ° = ° v New Math (recent browsers only) Σ =  σ =  Σ =  σ =  ∫ = ∫ ≠ = ≠ ∫ = ∫ ≠ = ≠

12 Encoding & Language Tags v Set encoding in header Latin 1 Unicode Shift_JIS (Japanese) v Declare Page Language (ISO-639 code) English-U.S. Spanish/French/German/Japanese Document fr = French, de = German, zh = Chinese, jp = Japanese, etc. Spanish P (or any HTML text tag)

13 Challenge Set 1: v How do you insert the name José Espiño into HTML? v How do you declare the language Spanish? (multiple options) v What encoding is needed (assume English page with Spanish word)

14 Stray Unicode Characters  You can hard-code a four-digit Unicode numeric code to force a character to appear. E.g. (Cyrillic “D” Д = Д or Д (hex))  Best used for small spans of text or “exotic” Latin characters (e.g. a#/a( ) v If you use hex version, add the “x” prefix and add leading zero (to make 4 digits total) v Set encoding to “utf-8” with meta-tag

15 Challenge 2:  How do you insert the ¿Escribes vous Русский ? !Sí! into HTML? (Note: 1st letter capital in Cyrillic) v How do you declare the page to be Unicode?

16 Setting Up Keyboards for Other Scripts v Activate required keyboards from Control Panel or Systems Preferences (OS X) v You may need to install language utilities for East Asian and other unusual scripts from the System Disk Quick Demo

17 Typing with Encoded Fonts v Keyboarding utilities which match the “keys” to the right encoded number must be installed. v Keyboards can arrange one encoding in several layouts QWERTY (AKA “transliterated/phonetic”) QWERTY (AKA “transliterated/phonetic”) Preferred by U.S. students Preferred by U.S. students Native layout (native script typewriters) Native layout (native script typewriters) Preferred by native speakers (e.g. instructors) Preferred by native speakers (e.g. instructors)

18 Dreamweaver/Front Page: Options for Inputting Text A. Switch keyboard (editor may add meta tag) B. Type C. Or cut and paste encoded text D. Or Import from international text editors via Save As HTML Global Writer (Windows) Global Writer (Windows) Simple Text (free from Apple) Simple Text (free from Apple) Others for specific scripts Others for specific scripts Avoid import from Word Avoid import from Word Mini Demo 2

19 Challenge 3 (Research): v What encodings can I use for Russian? http://ourworld.compuserve.com/homepages/PaulGor/ http://www.brama.com/compute/encode.html http://www.brama.com/compute/encode.html http://www.brama.com/compute/encode.html v How about Modern Greek vs. Ancient Greek? http://www.hri.org/fonts/ http://www.hri.org/fonts/ http://www.hri.org/fonts/ http://www.stoa.org/unicode/quickstart.html http://www.stoa.org/unicode/quickstart.html http://www.stoa.org/unicode/quickstart.html

20 Undersupported Scripts Ultimate Challenge v “Undersupported” = minority languages, ancient/medieval, small populations v Third Party utilities may be needed Unicode font (TrueType.ttf format) Unicode font (TrueType.ttf format) Keyboard Utility (if you can get it) Keyboard Utility (if you can get it) Print Font for PDF’s (the last resort) Print Font for PDF’s (the last resort) v Test, Test, Test (esp. Mac vs. Win)

21 Print Font vs.Web Font 1. Replaces ASCII characters with random characters 2. Both parties must have same font to read document correctly 3. Ideal for print/PDF documents when no data transmission occurs 4. E.g. Symbol, Webdings 1. Complies with some encoding (e.g. ASCII) 2. Alternative fonts with same encoding can be used (e.g. Times or Arial) 3. Ideal for Web transmission, still difficult for typing purposes 4. E.g. Arial Unicode, Lucida Sans Unicode, Lucida Grande, TITUS Cyberbit (free) etc.

22 When Websites show Gibberish v Problem: No Encoding Specified (see gibberish) Go to View menu and manually switch encoding Go to View menu and manually switch encoding v Problem: No HTML entity codes for accents (See gibberish for accented letters) Try switching View to Latin 1, Windows-1252, MacRoman, UTF-8 (Unicode) Try switching View to Latin 1, Windows-1252, MacRoman, UTF-8 (Unicode)

23 ANGEL & Other Web Tools 1. Activate keyboards for needed scripts  See http://tlt.psu.edu/suggestions/international/keyboards 2. Open Netscape 7/Mozilla 3. Go to ANGEL or other Web tool 4. Switch keyboards 5. Type! 6. Users can view in Netscape 7/Mozillia, IE5+ (Win) or Safari (OSX)

24 ¡Escribez Русский! Where to Find Out More v Penn State Computing with Accents http://tlt.psu.edu/suggestions/international http://tlt.psu.edu/suggestions/international http://tlt.psu.edu/suggestions/international v Titus Cyberbit Unicode Font (free) http://titus.uni-frankfurt.de/indexe.htm http://titus.uni-frankfurt.de/indexe.htm http://titus.uni-frankfurt.de/indexe.htm Look under “Instrumentalia” Look under “Instrumentalia”


Download ppt "Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer Education Technology Services."

Similar presentations


Ads by Google