CompSci 100e 8.1 Plan for the Course! l Understand Huffman Coding  Data compression  Priority Queues  Bits and Bytes  Greedy Algorithms l Algorithms.

1 CompSci 100e 8.1 Plan for the Course! l Understand Huffman Coding  Data compression  Priority Queues  Bits and Bytes  Greedy Algorithms l Algorithms + Data Structures = Programs  What does this mean and who said it? l Graphs & the Oracle of Bacon

2 CompSci 100e 8.2 Scoreboard l What else might we want to do with a data structure? AlgorithmInsertionDeletionSearch Unsorted ArrayList Sorted ArrayList Linked list Hash Table/Map Binary Search Tree

3 CompSci 100e 8.3 Text Compression l Input: String S Output: String S  Shorter  S can be reconstructed from S

4 CompSci 100e 8.4 Text Compression: Examples SymbolASCIIFixed length Var. length a01100001000 b0110001000111 c0110001101001 d01100100011001 e0110010110010 “abcde” in the different formats ASCII: 01100001011000100110001101100100… Fixed: 000001010011100 Var: 000110100110 0 0 0 0 0 0 0 1 11 1 abcde a d bce 0 0 0 01 1 1 1 Encodings ASCII: 8 bits/character Unicode: 16 or 32 bits/character

5 CompSci 100e 8.5 Huffman coding: go go gophers l Encoding uses tree:  0 left/1 right  How many bits? 37!!  Savings? Worth it? ASCII 3 bits Huffman g 103 1100111 000 00 o 111 1101111 001 01 p 112 1110000 010 1100 h 104 1101000 011 1101 e 101 1100101 100 1110 r 114 1110010 101 1111 s 115 1110011 110 101 sp. 32 1000000 111 101 3 s 1 * 2 2 p 1 h 1 2 e 1 r 1 4 g 3 o 3 6 3 2 p 1 h 1 2 e 1 r 1 4 s 1 * 2 7 g 3 o 3 6 13

6 CompSci 100e 8.6 Huffman Coding l D.A Huffman in early 1950’s l Before compressing data, analyze the input stream l Represent data using variable length codes l Variable length codes though Prefix codes  Each letter is assigned a codeword  Codeword is for a given letter is produced by traversing the Huffman tree  Property: No codeword produced is the prefix of another  Letters appearing frequently have short codewords, while those that appear rarely have longer ones l Huffman coding is optimal per-character coding method

7 CompSci 100e 8.7 Building a Huffman tree l Begin with a forest of single-node trees (leaves)  Each node/tree/leaf is weighted with character count  Node stores two values: character and count  There are n nodes in forest, n is size of alphabet? l Repeat until there is only one node left: root of tree  Remove two minimally weighted trees from forest  Create new tree with minimal trees as children, New tree root's weight: sum of children (character ignored) l Does this process terminate? How do we get minimal trees?  Remove minimal trees, need to order based on what?

8 CompSci 100e 8.8 Priority Queue l Stacks: Last-in First-Out  java.util.Stack, java.util.Deque l Queues: First-in First-out  java.util.LinkedList, java.util.Deque l Priority Queues: Highest-priority first-out  java.util.PriorityQueue  Supports two basic operations insert -– an element into the priority queue delete – the minimal element from the priority queue  Code below sorts. Complexity? public static void sort(ArrayList a){ PriorityQueue pq = new PriorityQueue (); pq.addAll(a); for(int k=0; k < a.size(); k++) a.set(k, pq.remove()); }

9 CompSci 100e 8.9 Priority Queues l Basic operations  Insert  Remove extremal l What properties must the data have? l Applications  Event-driven simulation:Colliding particles  AIA* - Best-first search  Operating systemsLoad balancing & scheduling  StatisticsMaintain largest m values  Graph searchingDijkstra's algorithm  Data Compression: Huffman coding  PhysicsMolecular dynamics simulation

10 CompSci 100e 8.10 l What about objects inserted into pq?  If deletemin is supported, what properties must inserted objects have, e.g., insert non-comparable?  Change what minimal means?  Implementation uses heap l If we use a Comparator for comparing entries we can make a min-heap act like a max-heap, see PQDemo  Where is class Comparator declaration? How used?  What's a static inner class? A non-static inner class? In Java 5/6 there is a Queue interface and PriorityQueue class  The PriorityQueue class also uses a heap

11 CompSci 100e 8.11 Heap implements PriorityQueue l Heap is an array-based implementation of a binary tree used for implementing priority queues, supports:  insert, findmin, deletemin: complexities? l Using array minimizes storage (no explicit pointers), faster too --- children are located by index/position in array l Heap is a binary tree with shape property, heap/value property  shape: tree filled at all levels (except perhaps last) and filled left-to-right (complete binary tree)  each node has value smaller than both children

12 CompSci 100e 8.12 Array-based heap l store “node values” in array beginning at index 1 l for node with index k  left child: index 2*k  right child: index 2*k+1 l why is this conducive for maintaining heap shape? l what about heap property? l is the heap a search tree? l where is minimal node? l where are nodes added? deleted? 012345678910 6 717132592119 6 10 7 17 13 9 21 19 25

13 CompSci 100e 8.13 Thinking about heaps Where is minimal element?  Root, why? l Where is maximal element?  Leaves, why? l How many leaves are there in an N-node heap (big-Oh)?  O(n), but exact? l What is complexity of find max in a minheap? Why?  O(n), but ½ N? l Where is second smallest element? Why?  Near root? 6 10 7 17 13 9 21 19 25 012345678910 6 717132592119

14 CompSci 100e 8.14 Adding values to heap l to maintain heap shape, must add new value in left-to-right order of last level  could violate heap property  move value “up” if too small l change places with parent if heap property violated  stop when parent is smaller  stop when root is reached l pull parent down, swapping isn’t necessary (optimization) 13 6 10 7 17 9 21 19 25 8 13 6 10 7 17 9 21 19 25 6 10 7 17 9 21 19 25 13 8 insert 8 bubble 8 up 6 7 17 9 21 19 25 8 13 10

15 CompSci 100e 8.15 Adding values, details (pseudocode) void add(Object elt) { // add elt to heap in myList myList.add(elt); int loc = myList.size(); while (1 < loc && elt.compareTo(myList[loc/2]) < 0 ) { myList[loc] = myList[loc/2]; loc = loc/2; // go to parent } // what’s true here? myList.set(loc,elt); } 13 6 10 7 17 9 21 19 25 8 13 6 10 7 17 9 21 19 25 012345678910 6 717132592119 int[] myList

16 CompSci 100e 8.16 Removing minimal element l Where is minimal element?  If we remove it, what changes, shape/property? l How can we maintain shape?  “last” element moves to root  What property is violated? l After moving last element, subtrees of root are heaps, why?  Move root down (pull child up) does it matter where? l When can we stop “re- heaping”?  Less than both children  Reach a leaf 13 6 10 7 17 9 21 19 25 13 25 10 7 17 9 21 19 13 7 10 25 17 9 21 19 13 7 10 9 17 25 21 19

17 CompSci 100e 8.17 Priority Queue implementations l Implementing priority queues: average and worst case Insert average Getmin (peek) Insert worst Getmin (delete) Unsorted vector Sorted vector Heap Balanced binary search tree ???? l Heap has O(1) find-min (no delete) and O(n) build heap

18 CompSci 100e 8.18 How do we create Huffman Tree/Trie? l Insert weighted values into priority queue  What are initial weights? Why?  l Remove minimal nodes, weight by sums, re-insert  Total number of nodes? PriorityQueue pq = new PriorityQueue (); for(int k=0; k < freq.length; k++){ pq.add(new TreeNode(k,freq[k],null,null)); } while (pq.size() > 1){ TreeNode left = pq.remove(); TreeNode right = pq.remove(); pq.add(new TreeNode(0,left.weight+right.weight, left,right)); } TreeNode root = pq.remove();

19 CompSci 100e 8.19 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 C 1 F 1 P 2 U 2 R 2 L 2 D 2 G 3 T 3 O 3 B 3 A 4 M 4 S

20 CompSci 100e 8.20 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 C 1 F 1 P 2 U 2 R 2 L 2 D 2 G 3 T 3 O 3 B 3 A 4 M 4 S 2 2

21 CompSci 100e 8.21 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 T 3 O 3 B 3 A 4 M 4 S 2 2 3 3

22 CompSci 100e 8.22 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 T 3 O 3 B 3 A 4 M 4 S 2 2 3 3 4 4

23 CompSci 100e 8.23 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 T 3 O 3 B 3 A 4 M 4 S 2 2 3 3 4 4 4 4

24 CompSci 100e 8.24 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 2 3 3 4 4 4 4 5 5

25 CompSci 100e 8.25 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 4 4 4 4 5 5 6 6

26 CompSci 100e 8.26 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 4 4 4 4 5 5 6 6 6 6

27 CompSci 100e 8.27 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 4 4 4 4 5 5 6 6 6 8 8 6

28 CompSci 100e 8.28 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 44 5 5 6 6 6 8 8 6 8 8

29 CompSci 100e 8.29 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 44 5 5 6 6 6 8 8 6 8 8 10

30 CompSci 100e 8.30 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 6 8 8 6 8 8 10 11

31 CompSci 100e 8.31 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 8 6 8 8 10 11 12

32 CompSci 100e 8.32 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 11 12 16

33 CompSci 100e 8.33 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 1621

34 CompSci 100e 8.34 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 23 162123

35 CompSci 100e 8.35 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 2337

36 CompSci 100e 8.36 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60

37 CompSci 100e 8.37 Encoding 1. Count occurrence of all occurring character O( ) 2. Build priority queue O ( ) 3. Build Huffman tree O( ) 4. Create Table of codes from tree O( ) 5. Write Huffman tree and coded data to file O( )

38 CompSci 100e 8.38 Properties of Huffman coding l Want to minimize weighted path length L ( T )of tree T l  w i is the weight or count of each codeword i  d i is the leaf corresponding to codeword i l How do we calculate character (codeword) frequencies? l Huffman coding creates pretty full bushy trees?  When would it produce a “bad” tree? l How do we produce coded compressed data from input efficiently?

39 CompSci 100e 8.39 Writing code out to file l How do we go from characters to encodings?  Build Huffman tree  Root-to-leaf path generates encoding l Need way of writing bits out to file  Platform dependent?  Complicated to write bits and read in same ordering l See BitInputStream and BitOutputStream classes  Depend on each other, bit ordering preserved l How do we know bits come from compressed file?  Store a magic number

40 CompSci 100e 8.40 Creating compressed file l Once we have new encodings, read every character  Write encoding, not the character, to compressed file  Why does this save bits?  What other information needed in compressed file? l How do we uncompress?  How do we know foo.hf represents compressed file?  Is suffix sufficient? Alternatives? l Why is Huffman coding a two-pass method?  Alternatives?

41 CompSci 100e 8.41 Uncompression with Huffman l We need the trie to uncompress  000100100010011001101111 l As we read a bit, what do we do?  Go left on 0, go right on 1  When do we stop? What to do? l How do we get the trie?  How did we get it originally? Store 256 int/counts How do we read counts?  How do we store a trie? Traversal order? Reading a trie? Leaf indicator? Node values?

42 CompSci 100e 8.42 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 01100000100001001101

43 CompSci 100e 8.43 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 1100000100001001101

44 CompSci 100e 8.44 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 100000100001001101

45 CompSci 100e 8.45 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 00000100001001101

46 CompSci 100e 8.46 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 0000100001001101 G

47 CompSci 100e 8.47 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 000100001001101 G

48 CompSci 100e 8.48 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 00100001001101 G

49 CompSci 100e 8.49 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 0100001001101 G

50 CompSci 100e 8.50 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 100001001101 G

51 CompSci 100e 8.51 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 00001001101 GO

52 CompSci 100e 8.52 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 0001001101 GO

53 CompSci 100e 8.53 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 001001101 GO

54 CompSci 100e 8.54 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 01001101 GO

55 CompSci 100e 8.55 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 1001101 GO

56 CompSci 100e 8.56 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 001101 GOO

57 CompSci 100e 8.57 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 01101 GOO

58 CompSci 100e 8.58 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 1101 GOO

59 CompSci 100e 8.59 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 101 GOO

60 CompSci 100e 8.60 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 0101 GOO

61 CompSci 100e 8.61 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 1 GOOD

62 CompSci 100e 8.62 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 23 445 6 8 6 8 16 10 21 11 12 2337 60 01100000100001001101 GOOD

63 CompSci 100e 8.63 Decoding 1. Read in tree dataO( ) 2. Decode bit string with treeO( )

64 CompSci 100e 8.64 Other Huffman Issues l What do we need to decode?  How did we encode? How will we decode?  What information needed for decoding? l Reading and writing bits: chunks and stopping  Can you write 3 bits? Why not? Why?  PSEUDO_EOF  BitInputStream and BitOutputStream: API l What should happen when the file won’t compress?  Silently compress bigger? Warn user? Alternatives?

65 CompSci 100e 8.65 Good Compsci 100(e) Assignment? l Array of character/chunk counts, or is this a map?  Map character/chunk to count, why array? l Priority Queue for generating tree/trie  Do we need a heap implementation? Why? l Tree traversals for code generation, uncompression  One recursive, one not, why and which? l Deal with bits and chunks rather than ints and chars  The good, the bad, the ugly l Create a working compression program  How would we deploy it? Make it better? l Benchmark for analysis  What’s a corpus ?

66 CompSci 100e 8.66 Other methods l Adaptive Huffman coding l Lempel-Ziv algorithms  Build the coding table on the fly while reading document  Coding table changes dynamically  Protocol between encoder and decoder so that everyone is always using the right coding scheme  Works well in practice ( compress, gzip, etc.) l More complicated methods  Burrows-Wheeler ( bunzip2 )  PPM statistical methods

67 CompSci 100e 8.67 Data Compression YearSchemeBit/Cha r 1967ASCII7.00 1950Huffman4.70 1977Lempel-Ziv (LZ77)3.94 1984Lempel-Ziv-Welch (LZW) – Unix compress 3.32 1987(LZH) used by zip and unzip3.30 1987Move-to-front3.24 1987gzip2.71 1995Burrows-Wheeler2.29 1997BOA (statistical data compression) 1.99 l Why is data compression important? l How well can you compress files losslessly?  Is there a limit?  How to compare? l How do you measure how much information?

68 CompSci 100e 8.68 Views of programming l Writing code from the method/function view is pretty similar across languages  Organizing methods is different, organizing code is different, not all languages have classes,  Loops, arrays, arithmetic, … l Program using abstractions and high level concepts  Do we need to understand 32-bit twos-complement storage to understand x =x+1?  Do we need to understand how arrays map to contiguous memory to use ArrayLists?  Top-down vs. bottom-up?

69 CompSci 100e 8.69 From bit to byte to char to int to long l Ultimately everything is stored as either a 0 or 1  Bit is binary digit a byte is a binary term (8 bits)  We should be grateful we can deal with Strings rather than sequences of 0's and 1's.  We should be grateful we can deal with an int rather than the 32 bits that comprise an int l If we have 255 values for R, G, B, how can we pack this into an int?  Why should we care, can’t we use one int per color?  How do we do the packing and unpacking?

70 CompSci 100e 8.70 Signed, unsigned, and why we care l Some applications require attention to memory-use  Differences: one-million bytes, chars, and int First requires a megabyte, last requires four megabytes When do we care about these differences? l int values are stored as two's complement numbers with 32 bits, for 64 bits use the type long, a char is 16 bits l Java byte, int, long are signed values, char unsigned Java signed byte : -128..127, # bits?  What if we only want 0-255? (Huff, pixels, …)  Convert negative values or use char, trade-offs? Java char unsigned: 0..65,536 # bits?  Why is char unsigned? Why not as in C++/C?

71 CompSci 100e 8.71 More details about bits l How is 13 represented?  … _0_ _0_ _1_ _1_ _0_ _1_ 2 4 2 3 2 2 2 1 2 0  Total is 8+4+1 = 13 l What is bit representation of 32? Of 15? Of 1023?  What is bit-representation of 2 n - 1 ?  What is bit-representation of 0? Of -1? Study later, but -1 is all 1’s, left-most bit determines < 0 l Determining what bits are on? How many on?  Understanding, problem-solving

72 CompSci 100e 8.72 How are data stored? l To facilitate Huffman coding we need to read/write one bit  Why do we need to read one bit?  Why do we need to write one bit?  When do we read 8 bits at a time? 32 bits? l We can't actually write one bit-at-a-time. We can't really write one char at a time either.  Output and input are buffered,minimize memory accesses and disk accesses  Why do we care about this when we talk about data structures and algorithms? Where does data come from?

73 CompSci 100e 8.73 How do we buffer char output? l Done for us as part of InputStream and Reader classes  InputStreams are for reading bytes  Readers are for reading char values  Why do we have both and how do they interact? Reader r = new InputStreamReader(;  Do we need to flush our buffers? l In the past Java IO has been notoriously slow  Do we care about I? About O?  This is changing, and the java.nio classes help Map a file to a region in memory in one operation

74 CompSci 100e 8.74 Buffer bit output l To buffer bits we store bits in a buffer (duh)  When the buffer is full, we write it.  The buffer might overflow, e.g., in process of writing 10 bits to 32-bit capacity buffer that has 29 bits in it  How do we access bits, add to buffer, etc.? l We need to use bit operations  Mask bits -- access individual bits  Shift bits – to the left or to the right  Bitwise and/or/negate bits

75 CompSci 100e 8.75 Representing pixels l Pixel typically stores RGB and alpha/transparency values  Each RGB is a value in the range 0 to 255  The alpha value is also in range 0 to 255 Pixel red = new Pixel(255,0,0,0); Pixel white = new Pixel(255,255,255,0); l A picture is simply an array of int values void process(int pixel){ int blue = pixel & 0xff; int green = (pixel >> 8) & 0xff; int red = (pixel >> 16) & 0xff; }

76 CompSci 100e 8.76 Bit masks and shifts void process(int pixel){ int blue = pixel & 0xff; int green = (pixel >> 8) & 0xff; int red = (pixel >> 16) & 0xff; } l Hexadecimal number: 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f  f is 15, in binary this is 1111, one less than 10000  The hex number 0xff is an 8 bit number, all ones l Bitwise & operator creates an 8 bit value, 0—255  Must use an int/char, what happens with byte?  1&1 == 1, otherwise we get 0 like logical and  Similarly we have |, bitwise or

