Review1 What is multilingual computing? Bilingual, trilingual, vs. Multilingual What are the fundamental issues in multi-lingual computing? –Representation.

Slides:



Advertisements
Similar presentations
Unicode Mark Davis Unicode Consortium President IBM Chief SW Globalization Architect
Advertisements

Murray Sargent III Microsoft Corporation Text Services Group, Word Tips & Tricks on Editing and Displaying Unicode Text.
1 The Ideographic Composition Scheme and Its Applications in Chinese Text Processing Qin LU Department of Computing, The Hong Kong Polytechnic University.
© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager.
Information Representation
Free Pascal compiler internationalisation Rimgaudas Laucius Institute of Mathematics and Informatics, Vilnius University Lithuania.
Graphics 2D 1 Subject:T0934 / Multimedia Programming Foundation Session:6 Tahun:2009 Versi:1/0.
Computer Science Basics CS 216 Fall Operating Systems interface to the hardware for the user and programs The two operating systems that you are.
Lecture 2 1 Encoding Schemes Encoding methods: a method of encoding at binary level to ensure identification and the use of a mixture of different character.
Lecture 9 1 Chinese Character Output Character 字符 : abstract object recognized by human in communication, it is the representation at the conceptual level.
Lecture 101 Unicode support status in various platforms (Microsoft Windows) Windows 9x / ME –Do not support Unicode internally –Limited Unicode APIs are.
Introduction to Computers and Programming. Some definitions Algorithm: –A procedure for solving a problem –A sequence of discrete steps that defines such.
8 November Forms and JavaScript. Types of Inputs Radio Buttons (select one of a list) Checkbox (select as many as wanted) Text inputs (user types text)
Chapter 8_2 Bits and the "Why" of Bytes: Representing Information Digitally.
Data Representation Kieran Mathieson. Outline Digital constraints Data types Integer Real Character Boolean Memory address.
1/25 Writing Character sets Unicode Input methods.
Outline Chapter 1 Hardware, Software, Programming, Web surfing, … Chapter Goals –Describe the layers of a computer system –Describe the concept.
Lecture 3 1 ISO/IEC and Unicode It is a coded character set(codeset) –Designed for text processing and exchange Features: –Universal: characters.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
CCE-EDUSAT SESSION FOR COMPUTER FUNDAMENTALS Date: Session III Topic: Number Systems Faculty: Anita Kanavalli Department of CSE M S Ramaiah.
CIS 234: Character Codes Dr. Ralph D. Westfall April, 2011.
CHARACTERS Data Representation. Using binary to represent characters Computers can only process binary numbers (1’s and 0’s) so a system was developed.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
©Brooks/Cole, 2003 Chapter 2 Data Representation.
Introduction to Java Appendix A. Appendix A: Introduction to Java2 Chapter Objectives To understand the essentials of object-oriented programming in Java.
Globalisation & Computer Systems week 5 1. Localisation presentations 2.Character representation and UNICODE UNICODE design principles UNICODE character.
Unicode (and Java) Brice Giesbrecht.
ASCII and Unicode.
Encoding and fonts Edward Garrett Software Developer, ELAR.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Representing text Each of different symbol on the text (alphabet letter) is assigned a unique bit patterns the text is then representing as.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 10 This presentation © 2004, MacAvon Media Productions Characters & Fonts.
Agenda Data Representation – Characters Encoding Schemes ASCII
Data Representation Prepared by Dr P Marais (Modified by D Burford)
Computer System Basics 1 Number Systems & Text Representation Computer Forensics BACS 371.
Globalisation & Computer systems Week 4 writing systems and their implications for globalisation character representation ASCII extended ASCII code pages.
CHAPTER FIVE TEXT.
Building digital libraries in Indian languages: case studies with Hindi and Kannada B.S. Shivaram Trainee ( ) National Center for Science Information.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Introduction Lecture 01.
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
Introduction to Interactive Media Interactive Media Components: Text.
Computer System Basics 1 Number Systems & Text Representation Computer Forensics BACS 371.
Globalisation & Computer systems Week 5/6 Character representation ACII and code pages UNICODE.
Anlab ( ) Kim, Yangjung Characters & Fonts.
Data Representation, Number Systems and Base Conversions
Number Systems Denary Base 10 Binary Base 2 Hexadecimal Base 16
Data Representation. What is data? Data is information that has been translated into a form that is more convenient to process As information take different.
1 Problem Solving using Computers “Data....Representation, and Storage.
© 2001, Penn State University Encoding on the Internet Elizabeth J. Pyatt CETS.
MISSION CRITICAL COMPUTING SQL Server Special Considerations.
Characters CS240.
Unicode WTF is UTF? (for Secondary School Students) Jan Zidek Tieto Czech s.r.o. ☺ U+263A.
Basics of Unicode (base upon a presentation by NRSI, SIL International)
Introduction to Algorithm. What is Algorithm? an algorithm is any well-defined computational procedure that takes some value, or set of values, as input.
Data Representation COE 308 Computer Architecture
Data Representation.
Chapter 8 & 11: Representing Information Digitally
Data Representation ICS 233
Lesson Objectives Aims You should be able to:
Data Representation.
Characters & Fonts Digital Multimedia, 2nd edition
TOPICS Information Representation Characters and Images
Data Representation Question: Characters
Data Representation COE 301 Computer Organization
Ch2: Data Representation
Use of Mathematics using Technology (Maltlab)
Characters & Fonts Digital Multimedia, 2nd edition
Data Representation ICS 233
Computer Applications -Generic Elective
Data Representation COE 308 Computer Architecture
Presentation transcript:

Review1 What is multilingual computing? Bilingual, trilingual, vs. Multilingual What are the fundamental issues in multi-lingual computing? –Representation of each language in a computer –Ways to distinguish different scripts –How can a system be designed so that it can be used by different languages with minimal changes –How can a system be designed so that it can be used for multiple languages

Review2 Characteristics of different scripts What is a script? What are the different types of scripts and examples of them ? –Token-based/Alphabet-based scripts, –phonetic based scripts, –Ideographs What is a phonetic transcription system and examples of them? What is Romanization?

Review3 Characteristics of Chinese Graphemics Variant writing (e.g. 教 都 ) Phonetics ( the sound, 音 ) Types of phonemes Semantics (the meaning, 義 ) Independence of meaning

Review4 Computer representation of characters Selection of a finite set of characters → character set –Uniqueness → each character/symbol Design of a coded character set → codeset –Uniqueness → each codepoint assignment –Different coding length → different codesets What are the following terms mean? –Codepoint Length of a codepoint –Code space Size of a code space –Code range –Order of characters ( in a char. Set vs. a codeset)

Review5 What are the different numerical notations? –Decimal notation –Binary notation –Hexadecimal notation –Scalar value Characteristics of the ASCII codeset What is the Row-cell notation? What are character subsets and why? Character set comparison operations Codeset comparison operations –Character set –Codepoint assignment Compatibility

Review6 What is an encoding method and why do we need it? What is the so called high-bit on scheme? What are the characteristics of GB-2312? –No. of Rows, No. of columns → code space –Code range? –Major subsets? –Full characters vs. half characters What are the characteristics of Big5 and Etan Big5? –Rows, columns → code space –Major subsets? –What are UDAs and VDAs for? HKSCS

Review7 Other codesets using high-bit on schemes? Encodings using designation( 指定 )? –ISO 2022 –Extended Unix Code(EUC) What is Charset registry and why? Problems with different codesets? –Compatibility → wrong interpretation of data –Solutions: Codeset announcement(using designation) and conversion → conversion problems

Review8 ISO and Unicode What are the design principles of ISO 10646? What are the different coding structures in ISO 10646? What is the structure of UCS-4? What is the characteristics of BMP? What is the structure of BMP? What is UCS-2? What is the compatibility zone for? What is the difference between ISO and Unicode? Big Endian vs Little Endian notation: FEFF vs FFFE

Review9 What is Extension A and Extension B? –Where were they coded? What is Surrogate pairs, what is the need for surrogate pairs, and how does it work? What is UTF, what is its purpose and how does UTF-8 work? What is the difference between a character and a glyph? What is the difference between multi-byte character and wide character ?

Review10 Input Methods What is an input method, why do we need it? What are the different types of input methods? What is a keyboard-based input method? How to design an IM? –What is the basic requirement? –What are the limitations? –What information can be used in IM design? Who are the main users? Efficiency consideration? What are the two types of IM? –Applicability and limitations What is keyboard arrangement, why do we need it?

Review11 Software L10N and I18N What is L10N and why do we need it? What is I18N and why do we need it? What are the principles in I18N? How to design I18N programs? What is POSIX and what is its purpose? What is the name of the POSIX facility for a specific region? What are the components in a POSIX NLS package? What is a locale and what are the classes in each locale?

Review12 POSIX provides a set of interface functions, how are their behaviors defined and in where? What are the major files in each locale? If POSIX where never developed, can you still develop an I18N program on top of an operating system? What is a symbolic name and where are they used? How do we know the binary code of a symbolic name? Programming using wide character data type vs multi-byte characters What is collation and how does it work?

Review13 Open systems What is an open system? Why do we want open systems? What are the measurements of an open system? What is an open specification? What are the two types of portability issues? What mechanisms can be used to improve portability or how can we write portable programs?

Review14

Review15 Output What are characters, glyphs and fonts? What are their relationships and/or difference? –Internal representation vs. external representation What is the difference of character box and bounding box? Why should there are space between the character box and bounding box? What does rendering mean? What are the two different glyph/font representations

Review16 What are the characteristics of bitmap fonts and outline fonts? –Representations, scaling (distortion), space requirement, compression How to deal with distortion in the scaling of bitmap fonts? –Ad hoc smoothing algorithms –Smoothing spline and interpolation Understanding of Bazier’s cubic curves –Control points and the equations Why bitmap to outline conversion is needed? How does erosion work?

Review17 Unicode on different platforms Unicode is supported on what platforms and in what forms? –Unix, Windows, Mac, Linux, What is a code page? Can Unicode be used if the operating system is not coded using Unicode? Why would encoding needs to be specified when compiling a Java program? What are the data structures supporting multi-byte and Unicode in Java?

Review18 I18N vs. multilingual applications What is the difference between an I18N program and a multilingual application? Can a multilingual application be designed/implemented using I18N What needs to be separately considered in the design of multilingual applications What is the relationship between multi-lingual applications to Unicode?

Review19 IDCs and the IDS What are ideographic description characters(IDCs)? –Different types of IDCs Why introducing IDCs? What is a ideograph description sequence? How is an IDS between expressed? For a given character, is its IDS unique? For a given IDS does it uniquely define a character?

Review20 Information retrieval Differences of IRS from Database system Basic components of an IRS What is the purpose of VSM? what are the data associated with a VSM? What are the similarity functions for? What is term selection for and methods to do term selection What kinds of information can be used as weights for the VSM?