Download presentation
Presentation is loading. Please wait.
Published byWyatt Coffey Modified over 11 years ago
1
Hon Wah Chan Murray Sargent III Microsoft Corporation Text Services Group, Word Multilingual Editing using RichEdit 4+
2
Introduction u RichEdit is a text engine with a hierarchy of presentation formats u Features such as automatic choice of fonts, rich text, 2D text objects u Handling nonUnicode documents in Unicode text engines u Describe interfaces and component usage u Ways to input Unicode text using IMEs, speech u Demo
3
Whats RichEdit? u RichEdit 4.x is set of plain/rich-text, single/multiline Unicode/ANSI edit controls and combo/listboxes in single world-wide binary u Multilevel undo, message & com interfaces, Word compatibility, pretty rich text u Outline view, zoom, font binding, latest in IME support, and rich complex script support (BiDi, Indic, and Thai)
4
Clients include u Handheld PC PocketWord u eBooks u OE (for mail header) u Borlands Delphi u SQL server dev tools, RAID u MSN Companion chat u Via Win2k Wrapper – cc:mail, WebEditPro, Eudora, Encarta, Money(US), Sibelius, Borland TRichedit class, apps created with VB, MFC… u Outlook mail note, post-it u Most Office dialogs u All OSes since Win98 u Wordpad, Charmap u Darwin installer u WebCalc u Project u Visual Studio, DaVinci u Publisher u Front Page
5
Some Fancier Features u Features added for ebooks: pagination, hyphenation, kerning, ClearType support, text wrap around embedded objects u Multilevel tables u Autocorrect u AutoURL detection (improved from 3.0)
6
2D Text Objects u RichEdit 4.5 (in development) supports WYSISYG editing of many 2D objects u Ruby, Tatenakayoko, Warichu, Kumimoji u Math: fractions, autosizing brackets, boxes, matrices, integrals u Demo will show some of these features
7
Backward Compatibility u Unicode text engines need to import/export text in other character sets u Given nonUnicode plain text, which codepage should one use to convert to/from Unicode? u On localized systems, system code page is a good bet u In multilingual text, you can enter text using keyboards in a variety of languages that need either Unicode or multiple code pages u For searching text, best choice seems to be to use the current keyboard code page u If text begins with a BOM, its Unicode u If text begins with a rich-text header, e.g., {\rtf or, use appropriate conversion routine
8
Backward Compatibility (cont) u Need a little rich-text functionality to display Unicode plain text unambiguously in some CJK scenarios u This functionality handles font choices and language- dependent glyph variants u When a user types in text using a keyboard charset, edit engine knows charset and therefore can insert accurate Unicode text including which CJK glyph variant to use u Client gets text as pure ANSI (or Unicode) text without script clues u Would be handy to have script tags
9
Complex Scripts u Unicode covers many complex scripts, e.g., Arabic, Indic, Thai, ancient Korean u Complex-scripts require layout engine that translates character codes to glyph indices (often referencing ligatures) u RichEdit uses Uniscribe and the MS line- layout component for complex scripts
10
Font Binding u Most Unicode characters belong to scripts u Associate with each position in a document a font bundle u When inserting characters, assign each one to a script u For CJK, check surrounding characters for Kana and Hangul as clues to use Japanese or Korean fonts instead of Chinese u Assign scripts to neutrals and digits u Keyboard language, especially IMEs, provide strong binding clues u Format inserted characters with fonts assigned to scripts. Check current font to see if it supports required script u RichEdit 4.0 has 50 scripts for Unicode 3.1. Client can specify what default font to use for a given script.
11
Language Detection & Font Binding u Korean and Japanese are often easy to spot because of Hangul and Kana characters, respectively u For CJK can convert back to codepage and see if errors occur (Ken Lundes suggestion) u For proofing purposes, accurate language identification is needed. For font binding, script identification is usually sufficient u Typically more than one language corresponds to a script, e.g., Latin script. Essentially only one uses the Korean script u Natural language processing techniques allow good language identification if more than a few words are involved, e.g., a sentence
12
Font Sizing u In dialogs, 8-pt Latin characters are commonly used u 8-pt Chinese characters are hard to read, so better to use 9 points in combination with 8-pt Latin characters u Latin characters have bigger descenders than Chinese characters, since latter only need room for underline u Combining 8-pt Latin characters with 9-point Chinese characters and keeping same baseline increases line height to 9 pts plus extra height for Latin descender u Result is more like 10 points: shifts text too high in dialog box originally designed to handle one language
13
Unicode Surrogate Pairs u Using 2 16-bit surrogates to represent a single character complicates more than measurement and display of characters: u Arrow-key handlers and other methods that change character position must avoid ending up in between lead and trail surrogates u Input methods need to map to surrogate pair u Case changes, line-breaking rules, sorting, file formats, and backing-store manipulations in general have to recognize and deal with pairs u Surrogate code ranges make them easy to work with relative to multibyte encoding systems
14
Nonspacing Combining Marks u Multicode characters (surrogate pairs, CRLFs, combining-mark and variant-tag sequences) require special display/navigation handling u Render combining-mark sequences by standard systems calls and fonts that support combining marks. Better display needs layout engine that talks to OpenType u Simple caret movement across combining-mark sequences prevents stopping inside a sequence. Backspace key deletes one mark at a time u Mouse-cursor hit testing leaves selection at beginning/end of combining-mark sequence (more elegant model allows selection and editing of individual marks) u Cool thing: if you can navigate past CRLF combinations, you can modify corresponding code to handle surrogate pairs and combining- mark sequences quite easily
15
Interfaces u Messages and keyboard u File read/write (plain text or RTF) u TOM (Text Object Model) u ITextServices/ITextHost interfaces
16
RichEdit Message Interface u System messages u keyboard messages u mouse messages u clipboard messages u Edit messages – RichEdit supports all but four of the system edit messages u RichEdit messages u Character/paragraph formatting u Text input/query u Notification
17
File Formats u Plain text can be saved/read encoded in any codepage, including Unicode and UTF-8 u RTF is the principle rich-text format u UTF-8 RTF is used preferentially for cut/copy/paste. Can be used in stream operations u Copying text to/from Word can be a handy way to get desired formatting into a RichEdit instance u HTML is available via system converters
18
TOM ( Text Object Model) u A set of COM dual interfaces that allow Unicode rich/plain text to be manipulated by VB, C/C++, and Java clients. u Access for spelling/grammar checkers u Accessibility u Powerful and efficient text processing primitives. Embedded scripts
19
TOM(cont) ITextDocument Top-level editing object ITextStoryRanges Enumerator for stories in document ITextRange Primary text interface: range of text ITextFont Character-attribute interface ITextPara Paragraph-attribute interface ITextTag HTML Tag interface ITextAttributes Tag-attribute enumerator ITextSelection Screen highlighted text range TextRange Selection inherits all range methods
20
ITextServices/ITextHost Interfaces u Windowless interfaces that go beyond message interface u In-place active state – use window of the container u Fewer system resources u Faster activation and deactivation
21
Other Components used u Uniscribe u MS line-layout component u Windows Text Services Framework u Callbacks for access to word-break, auto correct, hyphenation, and Clear Type libraries
22
Input methods u Support for the latest IMEs u Speech and handwriting input (Windows Text Services Framework) u Alt-x Unicode input method u Standard hot keys
23
IMEs u Support Level 2 and Level 3 IMEs u Support Active Input Method Manager (AIMM) u Reconversion - user can convert final string back to composition mode, allowing easy selection of a different candidate string. u Document feed - provides IME with text for current paragraph to increase conversion accuracy during typing. u Mouse Operation - gives user better control over candidate and UI windows u Caret position - gets current caret and line info, which IME98 uses to position UI windows (e.g., candidate list).
24
Windows Text Services Framework u Provide support for Far East input across language Win32 platforms to aware applications. u Provide consistent UI for different input methods u speech, handwriting, IME u Coordinated input u Data persistence for dynamic text editing u Richedit supports both the native mode and Active Input Method Manager (AIMM) mode
25
Hex to Unicode Input Method u Type Unicode character hexadecimal code u Make corrections as need be u Type Alt+x to convert to character u Type Alt+x to convert back to hex (useful especially for missing glyph character) u Resolve ambiguities by selection u Input higher-plane chars using 5 or 6-digit code u MS Word 2002 standard
26
Unicode combobox/listbox u Emulate the system combobox and listbox u Unicode supports on all Win32 platforms u Allow mixed languages between items u Modified EM_SETTEXTEX for inserting items u Use in Office applications
27
Demo
28
Conclusions u Have described RichEdit, an engine for text display and editing with a hierarchy of presentation formats u Automatic choice of fonts for Unicode plain text including surrogate-pair characters, combining mark sequences u Handling nonUnicode documents in Unicode text engines u Described interfaces and component usage u Ways to input Unicode text using IMEs, speech u Clients include many Office and Windows apps u Able to display 2D Text Objects such as Ruby and Warichu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.