1 Portability The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 8 Professor Jennifer Rexford.

Slides:



Advertisements
Similar presentations
1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.
Advertisements

INSTRUCTION SET ARCHITECTURES
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
1 Lecture 1  Getting ready to program  Hardware Model  Software Model  Programming Languages  The C Language  Software Engineering  Programming.
24/06/2015CSE1303 Part B lecture notes 1 Words, bits and pieces Lecture B05 Lecture notes section B05.
8 November Forms and JavaScript. Types of Inputs Radio Buttons (select one of a list) Checkbox (select as many as wanted) Text inputs (user types text)
Introduction to C Programming Overview of C Hello World program Unix environment C programming basics.
Data types and variables
Portability CPSC 315 – Programming Studio Spring 2008 Material from The Practice of Programming, by Pike and Kernighan.
Primitive Types Java supports two kinds of types of values – objects, and – values of primitive data types variables store – either references to objects.
1 Portable Programming CS Portability We live in a heterogeneous computing environment  Multiple kinds of HW: IA32, IA64, PowerPC, Sparc, MIPS,
Data Types in the Kernel Sarah Diesburg COP 5641.
CHARACTERS Data Representation. Using binary to represent characters Computers can only process binary numbers (1’s and 0’s) so a system was developed.
CENG 311 Machine Representation/Numbers
Summer 2014 Chapter 1: Basic Concepts. Irvine, Kip R. Assembly Language for Intel-Based Computers 6/e, Chapter Overview Welcome to Assembly Language.
Assembly Language for x86 Processors 7th Edition
Instruction Set Architecture
DEPARTMENT OF COMPUTER SCIENCE & TECHNOLOGY FACULTY OF SCIENCE & TECHNOLOGY UNIVERSITY OF UWA WELLASSA 1 CST 221 OBJECT ORIENTED PROGRAMMING(OOP) ( 2 CREDITS.
Variables and Objects, pointers and addresses: Chapter 3, Slide 1 variables and data objects are data containers with names the value of the variable is.
Engineering Computing I Chapter 1 – Part A A Tutorial Introduction.
CS 11 java track: lecture 1 Administrivia need a CS cluster account cgi-bin/sysadmin/account_request.cgi need to know UNIX
IT253: Computer Organization Lecture 3: Memory and Bit Operations Tonga Institute of Higher Education.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Tevfik Bultan Lecture 12: Pointers continued, C strings.
CS 270: Computer Organization Bits, Bytes, and Integers
CPS120: Introduction to Computer Science
CSCI 136 Lab 1: 135 Review.
Bits, Bytes, and Integers Topics Representing information as bits Bit-level manipulations Boolean algebra Expressing in C Representations of Integers Basic.
These notes were originally developed for CpSc 210 (C version) by Dr. Mike Westall in the Department of Computer Science at Clemson.
1 Writing Portable Programs COS Goals of Today’s Class Writing portable programs in C –Sources of heterogeneity –Data types, evaluation order,
Java Language Basics By Keywords Keywords of Java are given below – abstract continue for new switch assert *** default goto * package.
Info stored in computer (memory) Numbers All in binaray – can be converted to octal, hex Characters ASCII – 1-byte/char Unicode – 2-byte/char Unicode-table.com/en.
School of Computer Science & Information Technology G6DICP - Lecture 4 Variables, data types & decision making.
Chapter 7 C supports two fundamentally different kinds of numeric types: (a) integer types - whole numbers (1) signed (2) unsigned (b) floating types –
CPS120: Introduction to Computer Science Variables and Constants.
Tokens in C  Keywords  These are reserved words of the C language. For example int, float, if, else, for, while etc.  Identifiers  An Identifier is.
0 Chap.2. Types, Operators, and Expressions 2.1Variable Names 2.2Data Types and Sizes 2.3Constants 2.4Declarations 2.5Arithmetic Operators 2.6Relational.
Types Chapter 2. C++ An Introduction to Computing, 3rd ed. 2 Objectives Observe types provided by C++ Literals of these types Explain syntax rules for.
Java Basics. Tokens: 1.Keywords int test12 = 10, i; int TEst12 = 20; Int keyword is used to declare integer variables All Key words are lower case java.
Address alignment When a word (4-bytes) is loaded or stored the memory address must be a multiple of four. This is called an alignment restriction. Addresses.
Announcements Assignment 2 Out Today Quiz today - so I need to shut up at 4:25 1.
Chapter 1 slides1 What is C? A high-level language that is extremely useful for engineering computations. A computer language that has endured for almost.
7-Nov Fall 2001: copyright ©T. Pearce, D. Hutchinson, L. Marshall Oct lecture23-24-hll-interrupts 1 High Level Language vs. Assembly.
Lecture 3: More Java Basics Michael Hsu CSULA. Recall From Lecture Two  Write a basic program in Java  The process of writing, compiling, and running.
Variables Bryce Boe 2012/09/05 CS32, Summer 2012 B.
© 2004 Pearson Addison-Wesley. All rights reserved August 27, 2007 Primitive Data Types ComS 207: Programming I (in Java) Iowa State University, FALL 2007.
Bitwise Operations C includes operators that permit working with the bit-level representation of a value. You can: - shift the bits of a value to the left.
Computer Architecture & Operations I
Computer Architecture & Operations I
Course Contents KIIT UNIVERSITY Sr # Major and Detailed Coverage Area
The Machine Model Memory
Big-Endians Little-Endians and Bi-Endians
Data in Memory variables have multiple attributes symbolic name
Chapter 6: Data Types Lectures # 10.
Primitive Data Types August 28, 2006 ComS 207: Programming I (in Java)
Microprocessor and Assembly Language
CPSC 315 – Programming Studio Spring 2012
Portability CPSC 315 – Programming Studio
Bits and Bytes Topics Representing information as bits
Bits and Bytes Topics Representing information as bits
Bits and Bytes Topics Representing information as bits
Bits and Bytes Topics Representing information as bits
File Input and Output.
Bits and Bytes Topics Representing information as bits
Homework Applied for cs240? (If not, keep at it!) 8/10 Done with HW1?
Introduction to Microprocessor Programming
Comp Org & Assembly Lang
Comp Org & Assembly Lang
Real-World File Structures
Review In last lecture, done with unsigned and signed number representation. Introduced how to represent real numbers in float format.
Presentation transcript:

1 Portability The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 8 Professor Jennifer Rexford

2 Goals of this Lecture Learn to write code that works with multiple: Hardware platforms Operating systems Compilers Human cultures Why? Moving existing code to a new context is easier/cheaper than writing new code for the new context Code that is portable is (by definition) easier to move; portability reduces software costs Relative to other high-level languages (e.g., Java), C is notoriously non-portable

3 The Real World is Heterogeneous Multiple kinds of hardware 32-bit Intel Architecture 64-bit IA, PowerPC, Sparc, MIPS, Arms, … Multiple operating systems Linux Windows, Mac, Sun, AIX, … Multiple character sets ASCII Latin-1, Unicode, … Multiple human alphabets and languages

4 Portability Goal: Run program on any system No modifications to source code required Program continues to perform correctly  Ideally, the program performs well too

5 C is Notoriously Non-Portable Recall C design goals… Create Unix operating system and associated software Reasonably “high level”, but… Close to the hardware for efficiency So C90 is underspecified Compiler designer has freedom to reflect the design of the underlying hardware But hardware systems differ! So C compilers differ Extra care is required to write portable C code

6 General Heuristics Some general portability heuristics…

7 Intersection (1) Program to the intersection Use only features that are common to all target environments I.e., program to the intersection of features, not the union When that’s not possible…

8 Encapsulation (2) Encapsulate Localize and encapsulate features that are not in the intersection Use parallel source code files -- so non-intersection code can be chosen at link-time Use parallel data files – so non-intersection data (e.g. textual messages) can be chosen at run-time When that’s not possible, as a last resort…

9 Conditional Compilation (3) Use conditional compilation And above all… #ifdef __UNIX__ /* Unix-specific code */ #endif … #ifdef __WINDOWS__ /* MS Windows-specific code */ #endif …

10 Test!!! (4) Test the program with multiple: Hardware (Intel, MIPS, SPARC, …) Operating systems (Linux, Solaris, MS Windows, …) Compilers (GNU, MS Visual Studio, …) Cultures (United States, Europe, Asia, …)

11 Hardware Differences Some hardware differences, and corresponding portability heuristics…

12 Natural Word Size Obstacle: Natural word size In some systems, natural word size is 4 bytes In some (esp. older) systems, natural word size is 2 bytes In some (esp. newer) systems, natural word size is 8 bytes C90 intentionally does not specify sizeof(int); depends upon natural word size of underlying hardware

13 Natural Word Size (cont.) (5) Don’t assume data type sizes Not portable: Portable: int *p; … p = malloc(4); … int *p; … p = malloc(sizeof(int)); …

14 Right Shift Obstacle: Right shift operation In some systems, right shift operation is logical  Right shift of a negative signed int fills with zeroes In some systems, right shift operation is arithmetic  Right shift of a negative signed int fills with ones C90 intentionally does not specify semantics of right shift; depends upon right shift operator of underlying hardware

15 Right Shift (cont.) (6) Don’t right-shift signed ints  Not portable:  Portable: … -3 >> 1 … Logical shift => Arithmetic shift => -2 … /* Don't do that!!! */ …

16 Byte Order Obstacle: Byte order Some systems (e.g. Intel) use little endian byte order  Least significant byte of a multi-byte entity is stored at lowest memory address Some systems (e.g. SPARC) use big endian byte order  Most significant byte of a multi-byte entity is stored at lowest memory address The int 5 at address 1000: The int 5 at address 1000:

17 Byte Order (cont.) (7) Don’t rely on byte order in code Not portable: Portable: int i = 5; char c; … c = *(char*)&i; /* Silly, but legal */ Little endian: c = 5 Big endian: c = 0; int i = 5; char c; … /* Don't do that! Or... */ c = (char)i;

18 Byte Order (cont.) (8) Use text for data exchange Not portable: unsigned short s = 5; FILE *f = fopen("myfile", "w"); fwrite(&s, sizeof(unsigned short), 1, f); Run on a little endian computer Run on a big endian computer: Reads 1280!!! fwrite() writes raw data to a file unsigned short s; FILE *f = fopen("myfile", "r"); fread(&s, sizeof(unsigned short), 1, f); fread() reads raw data from a file myfile

19 Byte Order (cont.) Portable: unsigned short s = 5; FILE *f = fopen("myfile", "w"); fprintf(f, "%hu", s); fprintf() converts raw data to ASCII text Run on a big or little endian computer Run on a big or little endian computer: Reads 5 myfile unsigned short s; FILE *f = fopen("myfile", "r"); fscanf(f, "%hu", &s); fscanf() reads ASCII text and converts to raw data ASCII code for ‘5’

20 Byte Order (cont.) If you must exchange raw data… (9) Write and read one byte at a time unsigned short s = 5; FILE *f = fopen("myfile", "w"); fputc(s >> 8, f); /* high-order byte */ fputc(s & 0xFF, f); /* low-order byte */ Run on a big or little endian computer Run on a big or little endian computer: Reads 5 unsigned short s; FILE *f = fopen("myfile", "r"); s = fgetc(f) << 8; /* high-order byte */ s |= fgetc(f) & 0xFF; /* low-order byte */ myfile Decide on big-endian data exchange format

21 OS Differences Some operating system differences, and corresponding portability heuristics…

22 End-of-Line Characters Obstacle: Representation of “end-of-line” Unix (including Mac OS/X) represents end-of-line as 1 byte: (binary) Mac OS/9 represents end-of-line as 1 byte: (binary) MS Windows represents end-of-line as 2 bytes: (binary)

23 End-of-Line Characters (cont.) (10) Use binary mode for textual data exchange Not portable:  Trouble if read via fgetc() on “wrong” operating system FILE *f = fopen("myfile", "w"); fputc('\n', f); Run on UnixRun on Mac OS/9Run on MS Windows \n\r \r \n Open the file in ordinary text mode

24 End-of-Line Characters (cont.) Portable:  No problem if read via fgetc() in binary mode on “wrong” operating system  I.e., there is no “wrong” operating system! FILE *f = fopen("myfile", "wb"); fputc('\n', f); Run on Unix, Mac OS/9, or MS Windows \n Open the file in binary mode

25 Data Alignment Obstacle: Data alignment Some hardware requires data to be aligned on particular boundaries Some operating systems impose additional constraints: Moreover… If a structure must begin on an x-byte boundary, then it also must end on an x-byte boundary  Implication: Some structures must contain padding OScharshortintdouble Linux1244 MS Windows1248 Start address must be evenly divisible by:

26 Data Alignment (cont.) (11) Don’t rely on data alignment Not portable: struct S { int i; double d; } … struct S *p; … p = (struct S*)malloc(sizeof(int)+sizeof(double)); Windows: i pad d Linux: i d Allocates 12 bytes; too few bytes on MS Windows

27 Data Alignment (cont.) Portable: struct S { int i; double d; } … struct S *p; … p = (struct S*)malloc(sizeof(struct S)); Windows: i pad d Linux: i d Allocates 12 bytes on Linux 16 bytes on MS Windows

28 Character Codes Obstacle: Character codes Some operating systems (e.g. IBM OS/390) use the EBCDIC character code Some systems (e.g. Unix, MS Windows) use the ASCII character code

29 Character Codes (cont.) (12) Don’t assume ASCII Not portable: A little better: Portable: if ((c >= 65) && (c <= 90)) … if ((c >= 'A') && (c <= 'Z')) … #include … if (isupper(c)) … Assumes ASCII Assumes that uppercase char codes are contiguous; not true in EBCDIC For ASCII: (c >= 'A') && (c <= 'Z') For EBCDIC: ((c >= 'A') && (c <= 'I')) || ((c >= 'J') && (c <= 'R')) || ((c >= 'S') && (c <= 'Z'))

30 Compiler Differences Compilers may differ because they: Implement underspecified features of the C90 standard in different ways, or Extend the C90 standard Some compiler differences, and corresponding portability heuristics…

31 Compiler Extensions Obstacle: Non-standard extensions Some compilers offer non-standard extensions

32 Compiler Extensions (13) Stick to the standard language For now, stick to C90 (not C99) Not portable: Portable: … for (int i = 0; i < 10; i++) … int i; … for (i = 0; i < 10; i++) … Many systems allow definition of loop control variable within for statement, but a C90 compiler reports error

33 Evaluation Order Obstacle: Evaluation order C90 specifies that side effects and function calls must be completed at “;” But multiple side effects within the same expression can have unpredictable results

34 Evaluation Order (cont.) (14) Don’t assume order of evaluation Not portable: Portable (either of these, as intended): strings[i] = names[++i]; i++; strings[i] = names[i]; strings[i] = names[i+1]; i++; i is incremented before indexing names ; but has i been incremented before indexing strings ? C90 doesn’t say

35 Evaluation Order (cont.) Not portable: Portable (either of these, as intended): printf("%c %c\n", getchar(), getchar()); i = getchar(); j = getchar(); printf("%c %c\n", i, j); i = getchar(); j = getchar(); printf("%c %c\n", j, i); Which call of getchar() is executed first? C90 doesn’t say

36 Char Signedness Obstacle: Char signedness C90 does not specify signedness of char On some systems, char means signed char On other systems, char means unsigned char

37 Char Signedness (cont.) (15) Don’t assume signedness of char If necessary, specify “signed char” or “unsigned char” Not portable: Portable: int a[256]; char c; c = (char)255; … … a[c] … int a[256]; unsigned char c; c = 255; … … a[c] … If char is unsigned, then a[c] is a[255] => fine If char is signed, then a[c] is a[-1] => out of bounds

38 Char Signedness (cont.) Not portable: Portable: int i; char s[MAX+1]; for (i = 0; i < MAX; i++) if ((s[i] = getchar()) == '\n') || (s[i] == EOF)) break; s[i] = ‘\0’; If char is unsigned, then this always is FALSE int c, i; char s[MAX+1]; for (i = 0; i < MAX; i++) { if ((c = getchar()) == '\n') || (c == EOF)) break; s[i] = c; } s[i] = ‘\0’;

39 Library Differences Some library differences, and corresponding portability heuristics…

40 Library Extensions Obstacle: Non-standard functions “Standard” libraries bundled with some development environments (e.g. GNU, MS Visual Studio) offer non- standard functions

41 Library Extensions (16) Stick to the standard library functions For now, stick to the C90 standard library functions Not portable: Portable: char *s = "hello"; char *copy; … copy = strdup(s); … char *s = "hello"; char *copy; … copy = (char*)malloc(strlen(s) + 1); strcpy(copy, s); … strdup() is available in many “standard” libraries, but is not defined in C90

42 Cultural Differences Some cultural differences, and corresponding portability heuristics…

43 Character Code Size Obstacle: Character code size United States  Alphabet requires 7 bits => 1 byte per character  Popular character code: ASCII Western Europe  Alphabets require 8 bits => 1 byte per character  Popular character code: Latin-1 China, Japan, Korea, etc.  Alphabets require 16 bits => 2 bytes per character  Popular character code: Unicode

44 Character Code Size (17) Don’t assume 1-byte character code size Not portable: Portable:  C90 has no good solution  C99 has “wide character” data type, constants, and associated functions  But then beware of byte-order portability problems!  Future is not promising char c = 'a'; #include … wchar_t c = L'\x3B1'; /* Greek lower case alpha */

45 Human Language Obstacle: Humans speak different natural languages!

46 Human Language (cont.) (18) Don’t assume English Not portable: Can’t avoid natural language! So… /* somefile.c */ … printf("Bad input"); …

47 Human Language (cont.) Encapsulate code  Choose appropriate “message.c” file at link-time /* somefile.c */ #include "messages.h" … printf(getMsg(5)); … /* englishmessages.c */ char *getMsg(int msgNum) { switch(msgNum) { … case 5: return "Bad input"; … } /* spanishmessages.c */ char *getMsg(int msgNum) { switch(msgNum) { … case 5: return "Mala entrada"; … } /* messages.h */ char *getMsg(int msgNum); Messages module, with multiple implementations

48 Human Language (cont.) Maybe even better: encapsulate data  Choose appropriate “message.txt” file at run-time /* messages.c */ enum {MSG_COUNT = 100}; char *getMsg(int msgNum) { static char *msg[MSG_COUNT]; static int firstCall = 1; if (firstCall) { <Read all messages from appropriate messages.txt file into msg> firstCall = 0; } return msg[msgNum]; } /* englishmessages.txt */ … Bad input … /* spanishmessages.txt */ … Mala entrada … /* messages.h */ char *getMsg(int msgNum); Messages module

49 Summary General heuristics (1) Program to the intersection (2) Encapsulate (3) Use conditional compilation (as a last resort) (4) Test!!!

50 Summary (cont.) Heuristics related to hardware differences (5) Don’t assume data type sizes (6) Don’t right-shift signed ints (7) Don’t rely on byte order in code (8) Use text for data exchange (9) Write and read 1 byte at a time Heuristics related to OS differences (10) Use binary mode for textual data exchange (11) Don’t rely on data alignment (12) Don’t assume ASCII

51 Summary (cont.) Heuristics related to compiler differences (13) Stick to the standard language (14) Don’t assume evaluation order (15) Don’t assume signedness of char Heuristic related to library differences (16) Stick to the standard library Heuristics related to cultural differences (17) Don’t assume 1-byte char code size (18) Don’t assume English