David Evans http://www.cs.virginia.edu/evans Lecture 18: Deep C Garbage Collection CS201j: Engineering Software University of Virginia Computer Science David Evans http://www.cs.virginia.edu/evans
Menu Pointers in C Type checking in C Pointer Arithmetic Type checking in C Why is garbage collection hard in C? 7 November 2002 CS 201J Fall 2002
What are those arrows really? Stack Heap sb “hello” 7 November 2002 CS 201J Fall 2002
Pointers In Java, an object reference is really just an address in memory But Java doesn’t let programmers manipulate addresses directly Heap Stack 0x80496f0 0x80496f4 0x80496f8 hell 0x80496fb o\0\0\0 sb 0x80496f8 0x8049700 0x8049704 0x8049708 7 November 2002 CS 201J Fall 2002
Pointers in C &expr Evaluates to the address of the Addresses in memory Programs can manipulate addresses directly &expr Evaluates to the address of the location expr evaluates to *expr Evaluates to the value stored in the address expr evaluates to 7 November 2002 CS 201J Fall 2002
&*%&@#*! s == 1, t == 1 s == 2, t == 1 s == 3, t == 1 s == 3, t == 3 int f (void) { int s = 1; int t = 1; int *ps = &s; int **pps = &ps; int *pt = &t; **pps = 2; pt = ps; *pt = 3; t = s; } s == 1, t == 1 s == 2, t == 1 s == 3, t == 1 s == 3, t == 3 7 November 2002 CS 201J Fall 2002
Rvalues and Lvalues What does = really mean? int f (void) { int s = 1; int t = 1; t = s; t = 2; } left side of = is an “lvalue” it evaluates to a location (address)! right side of = is an “rvalue” it evaluates to a value There is an implicit * when a variable is used as an rvalue! 7 November 2002 CS 201J Fall 2002
BLISS Aside BLISS [Wulf71] Made getting values explicit s = .t; Puts the value in the location t in the location s 7 November 2002 CS 201J Fall 2002
The value of i (3) is passed, not its location! Parameter Passing in C Actual parameters are rvalues void swap (int a, int b) { int tmp = b; b = a; a = tmp; } int main (void) { int i = 3; int j = 4; swap (i, j); … The value of i (3) is passed, not its location! swap does nothing 7 November 2002 CS 201J Fall 2002
The value of &i is passed, which is the address of i Parameter Passing in C Can pass addresses around void swap (int *a, int *b) { int tmp = *b; *b = *a; *a = tmp; } int main (void) { int i = 3; int j = 4; swap (&i, &j); … The value of &i is passed, which is the address of i 7 November 2002 CS 201J Fall 2002
Beware! *ip == 3 *ip == 35 int *value (void) { > splint value.c int i = 3; return &i; } void callme (void) int x = 35; int main (void) { int *ip; ip = value (); printf (“*ip == %d\n", *ip); callme (); printf ("*ip == %d\n", *ip); > splint value.c Splint 3.0.1.7 --- 08 Aug 2002 value.c: (in function value) value.c:4:10: Stack-allocated storage &i reachable from return value: &i A stack reference is pointed to by an external reference when the function returns. The stack-allocated storage is destroyed after the call, leaving a dangling reference. (Use -stackref to inhibit warning) … But it could really be anything! *ip == 3 *ip == 35 7 November 2002 CS 201J Fall 2002
Manipulating Addresses char s[6]; s[0] = ‘h’; s[1] = ‘e’; s[2]= ‘l’; s[3] = ‘l’; s[4] = ‘o’; s[5] = ‘\0’; printf (“s: %s\n”, s); expr1[expr2] in C is just syntactic sugar for *(expr1 + expr2) s: hello 7 November 2002 CS 201J Fall 2002
Obfuscating C char s[6]; *s = ‘h’; *(s + 1) = ‘e’; 2[s] = ‘l’; *(s + 4) = ‘o’; 5[s] = ‘\0’; printf (“s: %s\n”, s); s: hello 7 November 2002 CS 201J Fall 2002
Fun with Pointer Arithmetic int match (char *s, char *t) { int count = 0; while (*s == *t) { count++; s++; t++; } return count; } int main (void) { char s1[6] = "hello"; char s2[6] = "hohoh"; printf ("match: %d\n", match (s1, s2)); printf ("match: %d\n", match (s2, s2 + 2)); printf ("match: %d\n", match (&s2[1], &s2[3])); &s2[1] &(*(s2 + 1)) s2 + 1 The \0 is invisible! match: 1 match: 3 match: 2 7 November 2002 CS 201J Fall 2002
Condensing match int match (char *s, char *t) { int count = 0; while (*s == *t) { count++; s++; t++; } return count; } int match (char *s, char *t) { char *os = s; while (*s++ == *t++); return s – os - 1; } s++ evaluates to spre, but changes the value of s Hence, C++ has the same value as C, but has unpleasant side effects. 7 November 2002 CS 201J Fall 2002
Type Checking in C Java: only allow programs the compiler can prove are type safe C: trust the programmer. If she really wants to compare apples and oranges, let her. Exception: run-time type errors for downcasts and array element stores. 7 November 2002 CS 201J Fall 2002
(earlier versions of Windows would just crash the whole machine) Type Checking int main (void) { char *s = (char *) 3; printf ("s: %s", s); } Windows2000 (earlier versions of Windows would just crash the whole machine) 7 November 2002 CS 201J Fall 2002
In Praise of Type Checking int match (int *s, int *t) { int *os = s; while (*s++ == *t++); return s - os; } int main (void) { char s1[6] = "hello"; char s2[6] = "hello"; printf ("match: %d\n", match (s1, s2)); match: 2 7 November 2002 CS 201J Fall 2002
Different Matching different: 29 int different (int *s, int *t) { int *os = s; while (*s++ != *t++); return s - os; } int main (void) { char s1[6] = "hello"; printf ("different: %d\n", different ((int *)s1, (int *)s1 + 1)); different: 29 7 November 2002 CS 201J Fall 2002
So, why is it hard to garbage collect C? 7 November 2002 CS 201J Fall 2002
Mark and Sweep (Java version) active = all objects on stack while (!active.isEmpty ()) newactive = { } foreach (Object a in active) mark a as reachable foreach (Object o that a points to) if o is not marked newactive = newactive U { o } active = newactive sweep () // remove unmarked objects on heap 7 November 2002 CS 201J Fall 2002
Mark and Sweep (C version?) active = all pointers on stack while (!active.isEmpty ()) newactive = { } foreach (pointer a in active) mark *a as reachable foreach (address p that a points to) if *p is not marked newactive = newactive U { *p } active = newactive sweep () // remove unmarked objects on heap 7 November 2002 CS 201J Fall 2002
There may be objects that only have pointers to their middle! GC Challenges char *f (void) { char *s = (char *) malloc (sizeof (char) * 100); s = s + 20; *s = ‘a’; return s – 20; } There may be objects that only have pointers to their middle! 7 November 2002 CS 201J Fall 2002
GC Challenges char *f (void) { char *s = (char *) malloc (sizeof (char) * 100); int x = (int) s; s = 0; return (char *) x; } There may be objects that are reachable through values that have non-pointer apparent types! 7 November 2002 CS 201J Fall 2002
GC Challenges char *f (void) { char *s = (char *) malloc (sizeof (char) * 100); int x = (int) s; x = x - &f; s = 0; return (char *) (x + &f); } There may be objects that are reachable through values that have non-pointer apparent types and have values that don’t even look like addresses! 7 November 2002 CS 201J Fall 2002
Why not just do reference counting? Where can you store the references? Remember C programs can access memory directly, better not change how objects are stored! 7 November 2002 CS 201J Fall 2002
Summary Garbage collection depends on: Both of these are problems in C Knowing which values are addresses Knowing that objects without references cannot be reached Both of these are problems in C Nevertheless, there are some garbage collectors for C. Change meaning of some programs Slow down programs a lot Are not able to find all garbage 7 November 2002 CS 201J Fall 2002
Charge PS6 due Tuesday Exam 2 out Thursday Remaining classes: Send review questions if you want a review class Remaining classes: Java Byte Codes Security Concurrency without synchronization Project Management Garbage Collectors (COAX, Seoul, 18 June 2002) 7 November 2002 CS 201J Fall 2002