Download presentation
Presentation is loading. Please wait.
1
Anti-Virus Product Development Cliff Penton Head of Software Development Sophos Plc Slides © 1999 Sophos Plc http://www.sophos.com/
2
Who are Sophos? Founded in 1980 as an electronic design partnership Moved into data security in 1985 In 1989, among the first to respond to computer viruses Anti-Virus is the main focus of the business World leading enterprise-wide anti-virus software Cover more platforms than any other anti-virus vendor
3
What do we make?
4
Conventional product development WORD 95OFFICE 97 OFFICE 2000 SR2SR1
5
Anti-virus product development... NOVDECJANFEBMARAPR PRODUCT FEATURES OS DEVELOPMENTS VIRUS RESEARCH
6
Anti-virus product development... Presents simultaneous development challenges: Complexity Transparency Quality Regularity
7
Anti-virus product development... User Interface Virus Detection Engine Virus Descriptions Coping with the complexity
8
Anti-virus product development... There are many issues, but I will focus on two today... Multiple operating systems: DOS/Windows 3.x, Windows 95/98, Windows NT, OS/2, NetWare, Macintosh, OpenVMS, Unix... Dealing with multiple languages: English, French, German, Spanish, Japanese...
9
Multiple operating systems The key issues in cross-platform development are: Endianism Packing and alignment Multitasking Memory management File I/O
10
Endianism Different hardware platforms store numbers in memory in a different order Big endian (e.g. SPARC) Little endian (e.g. Intel) When exchanging information must be aware of endian related problems
11
Endianism 01020304 Big endian: 0x01020304 Little endian: 0x04030201
12
Packing and alignment Some platforms strictly enforce data alignment when reading and writing memory Careless memory references may lead to disaster (SIGBUS, or GPF) Usually happens when reading structures from a file with packing set to single byte Better to read/write struct elements by assignment
13
Packing and alignment typedef struct { long a; char b; short c; } x; How big is this structure?
14
Packing and alignment typedef struct { long a; char b; short c; } x; How big is this structure? 8 with default packing 7 with 1 byte packing Compiling with Visual C++ 6.0
15
Multitasking Different operating systems use different scheduling schemes Cooperative/competitive multitasking Preemptive multitasking Tight loops and other compute bound operations need careful tweaking to maintain performance on competitive multitasking systems
16
Memory management Not all operating systems have virtual memory, so we cannot rely on malloc() and free() Some require explicit virtual memory management, such as DOS and NetWare Need to use an intermediate layer to conditionally choose between implicit and explicit virtual memory management
17
Memory management Explicit virtual memory management involves: Allocating a handle to a memory block Locking the handle to get a pointer to physical memory Using the memory as usual Unlocking the handle, releasing physical memory Deallocating the handle when finished
18
File I/O File I/O primitives differ between operating systems File security considerations need to be taken into account Standard library calls may not provide the required functionality
19
Multiple languages Our Windows products ship in five languages: English, French, German, Spanish, and Japanese Introduces issues of character encoding: UNICODE vs. SBCS vs. MBCS Adds the overhead of translation to the development process, which can be significant
20
Internationalisation Character sets, alphabets and character encoding Code pages Dates and times Generic coding techniques Adding resources for multiple languages
21
English language 26 characters plus others < 256 7 bits == ASCII or 8 bits == ANSI 1 character == 1 byte SBCS or Single Byte Character Set Very familiar to anyone who has used strxxx() functions
22
European languages Accented characters are part of many languages à, ôFrench õ, ¡¿Spanish ö, ßGerman Characters 0-127 are the same (ASCII) Characters 128-255 are called extended characters Still SBCS, but requires code pages...
23
Code pages The extended characters of each language are supported via code pages. The code pages in DOS and Windows are different! DOS - English (British) code page 850 (Latin 1) DOS - English (US) code page 437 (Latin US) Windows Latin 1 (ANSI) code page 1252
24
Example code page problem DOS CP 850Windows CP 1252
25
Far East languages Now the fun begins… Chinese has more than 10,000 characters Japanese has several character types: Hiragana phonetic characters Katakana phonetic characters, used to spell words taken from foreign languages Kanji characters of Chinese origin
26
Double byte character encoding Say hello to DBCS, or Double Byte Character Set, where: 0x00 -> 0x7F is ASCII as usual 0x80 ->0xFF is a combination of Kana (single-byte), and Kanji lead-bytes Used on Win95, WinNT, Mac, NetWare, OS/2
27
Double byte character encoding
28
Programming for DBCS 1 character != 1 byte If the character is double-byte, both bytes of the character must be dealt with together 0x00 is always NUL, so it is safe to scan a string for '\0' Trail byte values can be confused with other characters (e.g. \) if not handled properly Never scan with pointer arithmetic (i.e. ptr++ )
29
UNICODE Instead of using 1 byte per character, Unicode uses 2 bytes per character 65536 possible characters in one character set No need for code pages
30
UNICODE
31
Word breaking Sentences in Japanese do not have spaces between words. Sentences can be broken at any Japanese character. Break sentences on spaces and lead bytes.
32
Dates and times Date and time representations are not universal UK22/05/98 USA05/22/98 Japan10Y 05M 22D Either use an OS call (e.g. GetDateFormat() on Windows), or Embed a date format string in a language-dependant resource
33
Generic coding techniques Use the libraries available, e.g. for Win32 _tcsinc() maps to strinc() for SBCS _mcsinc() for MBCS _wcsinc() for UNICODE Use TCHAR not char Always enclose literal text with the _T() macro
34
Formatting messages Never concatenate two strings to form a sentence Take care when using printf(), as language variations may dictate reordering of insertion objects Win32 can use FormatMessage() NetWare can use NWprintf() etc.
35
External resources Avoid hard-coding text into application source code Win32, Mac and OS/2 use resource files NetWare uses message databases DOS, VMS, etc. have to store strings in separate modules, which are linked individually or loaded at run time
36
Delivering multiple languages Multiple language resources linked into executable -- good for small programs with limited text Multiple executables -- e.g. SWEEP for DOS Multiple resource-only DLLs -- extremely flexible solution if OS supports DLLs Multiple text-only message files for text-only operating systems
37
Cliff Penton Sophos Plc Oxford England Tel +44 1235 559933 Email cp@sophos.com http://www.sophos.com/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.