Globalisation & Computer systems Week 4 writing systems and their implications for globalisation character representation ASCII extended ASCII code pages Practical: code pages in VB
Week 6 Writing systems and their implication for globalisation Directionality (Arabic, Hebrew) Code space: Chinese Context sensitive characters: Arabic Compositionality (Amharic)
Representation bits and bytes characters code points glyphs fonts standardization
Representation What is a bit? ‘a binary digit’, i.e either 0 or 1 What is a byte? ‘the fixed no. of bits that can be treated as a unit by the computer hardware’ A byte can be used to express a character such as “A”
Representation ASCII: American standard code for information interchange A standard character encoding system The bytes were originally 7-bits Given this, how many bit patterns? Each pattern maps onto a decimal code point, and that maps onto a character
Representation Glyphs the pictures used to represent a given character; many to one: The character “A” -> A A AA A A A A A
Representation Glyphs the pictures used to represent a given pictures used to represent a given character; many to one: The character “A” -> A A AA A A A A A Fonts the collection, or ‘picture gallery’ of glyphs
Representation ASCII: The problem with 7-bit bytes… What about French la tête What about Greek κεφαλη Extend ASCII to 8-bit bytes ISO (International organization for standardization) Now 256 bit-patterns
Representation Extended ASCII: With 8-bit bytes you get 256 bit-patterns For consistency, the first 128 code-points remain the same from ISO-7 The next 128 used for a range of languages For each language, you need an interpretation of these 128 code points The encoding is handled by a code page
Representation Extended ASCII: For code point 154: CP_EASTEUROPE (code page 1250): š CP_RUSSIAN (code page 1251): љ What about code point 65 for these two code pages? Now represent your names with your own orthographies in mind, using the code pages
Representation Code pages in VB Public Enum ValidCharsets ANSI_CHARSET = 0 GREEK_CHARSET = 161 THAI_CHARSET = 222 End Enum Private Sub Form_Load() Dim X As New StdFont X.Charset = 161 X.Bold = True X.Size = 8 X.Name = "Times New Roman" Set frmTest.Font = X Set frmTest.Label1.Font = X Set frmTest.Text1.Font = X frmTest.Label1.Caption = Chr(181) + Chr(225) + Chr(226) frmTest.Text1.Text = Chr(181) + Chr(225) + Chr(226) End Sub
Representation and UNICODE What about Chinese? Thousands of characters – 256 bit-patterns clearly not enough
Representation and UNICODE What about Chinese? Thousands of characters – 256 bit-patterns clearly not enough Make the bytes bigger… Bytes have 16-bits, which gives bit- patterns UNICODE
UNICODE – design principles Reference: The Unicode Standard, Version Online: