Data Representation, Data Structures, and Multi-file compilation
Data Representation : Binary representation Octal, Hexadecimal Data types
Memory concepts Every piece of information stored on computer is encoded as combination of ones and zeros. These ones and zeros are called bits. One byte is a sequence of eight consecutive bits. A word is some number (typically 4) of consecutive bytes.
Binary representation bit 0bit 6bit 5bit 4bit 3bit 2bit 1bit 7 A single (unsigned) byte of memory In decimal representation, this number is: 1* * * * * * * *2 7 = 233
Binary representation bit 0bit 6bit 5bit 4bit 3bit 2bit 1bit 7 A single (signed) byte of memory /- In decimal representation, this number is: 1* * * * * * *2 6 = +/- 105 One bit must be used to store sign of number
Binary representation, cont. What is the range of numbers that can be stored in a single signed/unsigned byte? How would you write a program to convert an arbitrary base 10 number to binary? How would you write a program to convert an arbitrary binary number to base 10? What is the effect of right/left shifting bits (assuming the lost bit is set to zero)?
Octal representation Octal representation: base 8 Just a simple extension of binary and decimal but using only the digits 0-7. Best seen with an example: What is the value of the octal number 711? 1* * *8 2 = 457 What is the octal representation of the number 64? 100 (since 0* * *8 2 = 64) Try this in C using the "%o" format expression with printf: printf("%o\n", 457);
Hexadecimal representation Hexadecimal representation: base 16 Just a simple extension of binary, octal, and decimal but using 16 "digits": 0-9,a,b,c,d,e,f Example: What is the value of the hexadecimal number 10ef? 15* * * *16 3 = 4351 Try this in C using the "%x" format expression with printf: printf("%x\n", 4351);
Understanding datatypes at a more fundamental level int and char
char revisited Before doing some example bitwise operations, we first revisit our simple C datatypes to understand them at a deeper level. Recall that we have just a few basic types: Char, int, float, double Recall also that char represents a single byte of storage, while int is typically 4 bytes Important: Do not be misled by the name "char" ; the char datatype is really no different from int (other than its storage capacity) What do I mean by "no different from int"? We explore this with some examples on the next slide
Char vs. int Consider the following declarations: int j = 4; char k = 4; In memory, these appear as: j k They are both perfectly valid ways to represent the number 4. In one case (int), there is much more "wasted" memory. In the other case (char), there is a much stricter limit on how large the number can be if you choose to change it.
Char, cont. Why would you not always use char to represent a small number, such as 4? Consider what happens in this case: char j = 4; j = j + 300; /* bad! Can't store 304 in a char! So, it is safer to use a larger type, such as int, unless you are 100% sure that the char limit will never be exceeded in the program!
Char as "character" storage So, if char is just an abbreviated int, what does it have to do with characters? The answer is twofold: First, char can do nothing special with characters that int can't do. Both store equivalent ASCII integer code when single quotes are placed around a single character in an assignment Example: char c = 'e'; /* store the integer (ASCII) code for the character e in the byte c */ Int c = 'e'; /* same as above, but store integer in 4-byte (ie int) sequence.
Char example The best way to understand this is with a simple example. /* char_int1.c */ #include main(){ char c; int j; j = 100; c = 100; /* random choice < 255 */ printf("%d %d\n", j, c); /* print j and c as decimal ints */ printf("%c %c\n", j, c); /* print j and c as characters */ j = 'h'; c = 'h'; /* change assignment */ printf("%c %c\n", j, c); /* what is printed here? */ printf("%d %d\n", j,c); /* print asci code for 'h' */ }
#include int main(int argc, char* argv[]){ int input; if (argc !=2){ printf("%s\n", "Must enter a single argument"); exit(1); } input = atoi(argv[1]); /* grab input as integer */ if (input > 255 || input < 0){ printf("%s\n", "Must enter a number > 0 < 256"); exit(1); } printf("%s: %c\n", "The corresponding character is", input); }
#include int main(int argc, char* argv[]){ char input; if (argc !=2){ printf("%s\n", "Must enter a single argument"); exit(1); } input = *argv[1]; /* grab single character from keyboard */ printf("%s %c: %d\n", "The ascii code for", input, input); } Note: We will not understand why the * needs to be here until we study pointers. However, you should be able go write an equivalent code using scanf.
Very low-level stuff Bitwise operations in C
Bitwise operations C contains six operators for performing bitwise operations on integers: & Logical AND: if both bits are 1 the result is 1 | Logical OR: if either bit is 1, the result is 1 ^ Logical XOR (exlusive OR): if one and only one bit equals 1, the result is 1 ~ Logical invert: if the bit is 1, the result is 0; if the bit is 0, the result is 1 << n Left shift n places >> n Right shift n places
Bitwise operations Bitwise operations are considered "low- level" programming by today's standards. For many programs, manipulating individual bits is never necessary. Sometimes, this level of control is needed for memory or performance optimization In any case, it is very important for a conceptual understanding of programming
Bitwise examples: AND Bitwise AND: Char j = 11; char k = 14; j: k: = 10
OR Bitwise OR: Char j = 11; char k = 14; j: k: = 15
XOR Bitwise XOR: Char j = 11; char k = 14; j: k: = 5
Shifting Logical invert: Char j = 11; j: ~j: = 244 Shifting char j = 11; j << 1: = 22 j >> 1: = 5
Data Structures and Algorithms
Sorting Comes up all the time Demonstrates important techniques Can be done many ways Different algorithms.
Bubble Sort Very simple Terrible Go through list, swapping out-of- order neighbors Continue until no more swaps
Bubble Sort N = number of items If first number is initially at bottom of list, have to go through list N times Each time, looking/maybe swapping N times Total of N 2 operations S..L..O..W.. for long lists But if list is very nearly sorted, can be quick. No one would really use this algorithm.
Insertion sort About as simple, but better Way most people sort cards Keep inserting in order Still ~N 2, but faster on average
Data Structures Both these methods very array-based Have to look through half/most/all of list each iteration Definitely need ~N iterations Doomed to be fairly slow For faster techniques, need different ways of looking at data.
Binary Trees A binary tree is either empty, or consists of a node with a left and a right child. Left and right children are binary trees
Complete Binary Trees In a complete binary tree, every node has either 2 or 0 children, and all nodes w/ 0 nodes (`leaf nodes') are on the bottom level. A complete binary tree with L levels has 2 L -1 nodes; One with N nodes has log 2 (N+1) levels
Heaps A binary tree with values (`keys') stored at each node. Almost complete binary tree Partial ordering: root's key is less than either of children, and both children are roots of heaps
Storing a heap in an array Can easily store a heap in an array Parent node i has left child (2*i+1) and right child (2*i+2).
Why bother? Putting things in this partial order easier than sorting Very easy to find lowest value in data once data is in heap This is useful: Priority queue Sorting!
Heap Sort teaser Get data into heap Top value is lowest value. Delete top value; re- heap Repeat until no more data Results are sorted list!
Heap Operations: insert Put # into existing heap: Put number in first available leaf node. If parent tree no longer a heap, swap. Then repeat this process until you hit the root.
Heap Operations: delete root Take bottom-most value from the tree, put it where root used to be Remove that node. Go down heap, swapping if node larger than children.
Heap Ops: build heap from data It's much easier to insert into an existing heap than build one at once. Single nodes are always heaps! Start from bottom, working up, inserting parents into heaps. Repeat until no more data
Notice: Heap insert/delete operations take ~lg(N) operations (one per level of the tree). To build heap, each piece of data needs to be put in; ~ N lg N operations To pull out sorted list, need to do N operations of a delete which takes ~lg N steps; another N lg N operations. N lg N is much less than N 2 for large N!!
Heapsort Algorithm: Build heap from scratch For each piece of data, Get root value Delete from heap
Multiple-File compilation
Why more than one file? As program gets bigger, having whole program in one file gets quickly awkward. File hard to read Takes forever to edit a 1M line file! Hard to re-use code Have to re-compile entire program even if just small change in one routine
Compilation vs. Linking Compilation: compile source code into machine language. Generates object file (.o) Linking: bring in code from other libriaries that we might need Link in code for printf() from std. C library; link in code for sin() from math library, etc. Generates an executable
Compilation vs. Linking If all of program is in one file, the distinction isn't important, and gcc will do the compile/link in one step. Otherwise, do it seperately Running Average Example Sort Example