Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Kenneth C. Louden, 20031 Chapter 5 - Basic Semantics Programming Languages: Principles and Practice, 2nd Ed. Kenneth C. Louden.

Similar presentations


Presentation on theme: "© Kenneth C. Louden, 20031 Chapter 5 - Basic Semantics Programming Languages: Principles and Practice, 2nd Ed. Kenneth C. Louden."— Presentation transcript:

1 © Kenneth C. Louden, 20031 Chapter 5 - Basic Semantics Programming Languages: Principles and Practice, 2nd Ed. Kenneth C. Louden

2 Chapter 5Louden, Programming Languages2 Attributes Properties of language entities, especially identifiers used in a program. Important examples: –Value of an expression –Data type of an identifier –Maximum number of digits in an integer –Location of a variable –Code body of a function or method Declarations ("definitions") bind attributes to identifiers. Different declarations may bind the same identifier to different sets of attributes.

3 Chapter 5Louden, Programming Languages3 Binding times can vary widely: Value of an expression: during execution or during translation (constant expression). Data type of an identifier: translation time (Java) or execution time (Smalltalk, Lisp). Maximum number of digits in an integer: language definition time or language implementation time. Location of a variable: load or execution time. Code body of a function or method: translation time or link time or execution time.

4 Chapter 5Louden, Programming Languages4 Symbol table and environment A dictionary or table is used to maintain the identifier/attribute bindings. It can be maintained either during translation or execution or both. (Pre-translation entities are entered into the initial or default table.) During translation this table is called the symbol table. During execution this table is called the environment. If both are maintained, the environment can usually dispense with names, keeping track only of locations (names are maintained implicitly).

5 Chapter 5Louden, Programming Languages5 Semantic functions Formally, we can think of the symbol table and environment as functions. If the symbol table and environment are maintained separately (compiler): SymbolTable: Names  Static Attributes Environment: Names  Locations Memory: Locations  Values If the symbol table and environment are maintained together (interpreter): Environment: Names  Attributes (including locations and values)

6 Chapter 5Louden, Programming Languages6 Declarations Declarations bind identifiers to attributes. The collection of bound attributes (including the identifier) can be viewed as equivalent to the declaration itself. Attributes may be explicit or implicit in a declaration. A declaration may fail to fully specify all necessary attributes—a supplemental declaration is then necessary elsewhere. By abuse of language, we sometimes refer to the declaration of x, or just x —using the identifier to stand for the declaration.

7 Chapter 5Louden, Programming Languages7 Examples of declarations (C) int x = 0; Explicitly specifies data type and initial value. Implicitly specifies scope (see next slide) and location in memory. int f(double); Explicitly specifies type ( double  int ). Implicitly specifies nothing else: needs another declaration specifying code. The former is called a definition in C, the latter is simply a declaration.

8 Chapter 5Louden, Programming Languages8 Scope The scope of a declaration is the region of the program to which the bindings established by the declaration apply. (If individual bindings apply over different regions: scope of a binding.) Scope is typically indicated implicitly by the position of the declaration in the code, though keywords can modify it. In a block-structured language, the scope is typically the code from the end of the declaration to the end of the "block" (indicated by braces {…} in C and Java) in which the declaration occurs. Scope can extend backwards to the beginning of the block in certain cases (class declarations in Java and C++, top-level declarations in Scheme).

9 Chapter 5Louden, Programming Languages9 Lexical vs. dynamic scope Scope is maintained by the properties of the lookup operation in the symbol table or environment. If scope is managed statically (prior to execution), the language is said to have static or lexical scope ("lexical" because it follows the layout of the code in the file). If scope is managed directly during execution, then the language is said to have dynamic scope. The next slide has an example showing the difference. It is possible to maintain lexical scope during execution (i.e. by the environment), but it requires extra links and a somewhat unusual lookup operation (see Chapter 8). (Scheme does this.)

10 Chapter 5Louden, Programming Languages10 Java scope example public class Scope { public static int x = 2; public static void f() { System.out.println(x); } public static void main(String[] args) { int x = 3; f(); } } Of course, this prints 2, but under dynamic scope it would print 3 (the most recent declaration of x in the execution path is found).

11 Chapter 5Louden, Programming Languages11 Dynamic scope evaluated Almost all languages use lexical scope: with dynamic scope the meaning of a variable cannot be known until execution time, thus there cannot be any static checking. In particular, no static type checking. Originally used in Lisp. Scheme could still use it, but doesn't. Some languages still use it: VBScript, Javascript, Perl (older versions). Lisp inventor (McCarthy) now calls it a bug. Still useful as a pedagogical tool to understand the workings of scope. In some ways a lot like dynamic binding of methods.

12 Chapter 5Louden, Programming Languages12 Scope holes Under either lexical or dynamic scope, a nested or more recent declaration can mask a prior declaration. Indeed, in slide 10, the local declaration of x in main masks the static declaration of x in the Scope class. How would you access the static x inside main in Java? Use Scope.x in place of x : public static void main(String[] args) { int x = 3; Scope.x = 4;...} Exercise: how to do this for non-statics?

13 Chapter 5Louden, Programming Languages13 Symbol table structure A table of little stacks of declarations under each name. For example the table for the Scope class of slide 10 would look as follows inside main (using lexical scope):

14 Chapter 5Louden, Programming Languages14 Symbol table structure (2) Alternatively, a stack of little tables, one for each scope. For example, the previous example would look as follows (lexical scope): Can be deleted after leaving f Current table inside main

15 Chapter 5Louden, Programming Languages15 Symbol table structure (3) Symbol table is constructed as declarations are encountered (insert operation). Insertions follow static structure of source code with lexical scope. Insertions follow execution path with dynamic scope. Lookups occur as names are encountered in dynamic scope (in symbol table to that point). In lexical scope, lookups occur either as names are encountered in symbol table to that point (declaration before use—C), or all lookups are delayed until after the symbol table is fully constructed and then performed (Java class—scope applies backwards to beginning of class).

16 Chapter 5Louden, Programming Languages16 Symbol table structure (4) Using dynamic scope, the same example would look as follows: Current table inside f Current table inside main

17 Chapter 5Louden, Programming Languages17 Symbol table structure evaluated Which organization is better? Table of little stacks is simpler (C, Pascal). Stack of little tables is more versatile, and helpful when you need to recover outer scopes from within inner ones or from elsewhere in the code (Ada, Java, C++). Normally, no specific table structure is part of a language specification: any structure that provides the appropriate properties will do.

18 Chapter 5Louden, Programming Languages18 Ada example (Fig. 5.17, p.151):

19 Chapter 5Louden, Programming Languages19 Overloading Overloading is a property of symbol tables that allows them to successfully handle declarations that use the same name within the same scope. It is the job of the symbol table to pick the correct choice from among the declarations for the same name in the same scope. This is called overload resolution. It must do so by using extra information, typically the data type of each declaration, which it compares to the probable type at the use site, picking the best match. If it cannot successfully do this, a static semantic error occurs.

20 Chapter 5Louden, Programming Languages20 Overloading (2) Overloading typically applies only to functions or methods. Overloading must be distinguished from dynamic binding in an OO language. Overloading is made difficult by weak typing, particularly automatic conversions. In the presence of partially specified types, such as in ML, overload resolution becomes even more difficult, which is why ML disallows it. Scheme disallows it for a different reason: there are no types on which to base overload resolution, even during execution.

21 Chapter 5Louden, Programming Languages21 Overloading (3) An example in Java: public class Overload { public static int max(int x, int y) { return x > y ? x : y;} public static double max(double x, double y) { return x > y ? x : y;} public static int max(int x, int y, int z) { return max(max(x,y),z);} public static void main(String[] args) { System.out.println(max(1,2)); System.out.println(max(1,2,3)); System.out.println(max(4,1.3)); }} Adding more max functions that mix double and int parameters is ok. But adding ones that mix double and int return values is not!

22 Chapter 5Louden, Programming Languages22 Overloading (4) C++ and Ada are even more challenging for overload resolution: C++, because it allows many more automatic conversions, Ada because the return type is also used to resolve overloading (Ada gets away with this only because it allows no automatic conversions). It is possible for languages to also keep different symbol tables for different kinds of declarations. In Java these are called "name spaces," and they also represent a kind of overloading. Java is particularly ugly in this respect: there are different name spaces for classes, methods, params/vars, labels, and even packages -- see Figure 5.23, page 158.

23 Chapter 5Louden, Programming Languages23 The Environment Can be constructed entirely statically (Fortran): all vars and functions have fixed locations for the duration of execution. Can also be entirely dynamic: functional languages like Scheme and ML. Most language use a mix: C, C++, Java, Ada. Consists of three components: –A fixed area for static allocation –A stack area for lifo allocation (usually the processor stack) –A "heap" area for on-demand dynamic allocation (with or without garbage collection)

24 Chapter 5Louden, Programming Languages24 Typical environment organization (possible C) [Figure 5.25, p. 165)] © 2003 Brooks/Cole - Thomson Learning ™

25 Chapter 5Louden, Programming Languages25 The Runtime Stack Used for: –Procedure/function/method calls –temporaries –local variables Temporaries: intermediate results that cannot be kept in registers; not considered here further. Procedure calls: Chapter 8. Local variables: part of calls, but can be considered independently, showing LIFO behavior for nested scopes (next slide).

26 Chapter 5Louden, Programming Languages26 Example of stack-based allocation in C within a procedure: (1)A: { int x; (2) char y; (4) B: { double x; (5) int a; (7) } /* end B */ (8) C: { char y; (9) int b; (11) D: { int x; (12) double y; (14) } /* end D */ (16) } /* end C */ (18) } /* end A */ Point #1 Point #2

27 Chapter 5Louden, Programming Languages27 Stack at Point #1: © 2003 Brooks/Cole - Thomson Learning ™

28 Chapter 5Louden, Programming Languages28 Stack at Point #2: © 2003 Brooks/Cole - Thomson Learning ™

29 Chapter 5Louden, Programming Languages29 An alternative: a flat local space All local variables allocated at once, regardless of nesting. Wastes some space, but not critically. With this approach, only complete function/method calls get allocated on the stack. Even with the previous approach, the primary structure of the stack is still the call structure: a complete record of a call on the stack is called an activation record or frame, and the stack is referred to as the call stack. (Chapter 8) Java promotes a flat space by forbidding nested redeclarations, but this is not an essential property: a symbol table can easily distinguish nested declarations as A.x, A.y, A.B.x, A.B.a, etc.

30 Chapter 5Louden, Programming Languages30 Heap Allocation In "standard" languages (C, C++, Java) heap allocation requires a special operation: new. Any kind of data can be allocated on the heap in C/C++; in Java all objects and only objects are allocated on the heap. Even with heap allocation available in Java & C/C++, the stack is still used to represent calls. In C/C++, deallocation is typically by hand (destructors), but it is hard to do right. Java uses a garbage collector that periodically sweeps the heap looking for data that cannot be accessed any more by the program and adding it back to free space.

31 Chapter 5Louden, Programming Languages31 Heap Allocation (2) In functional languages (Scheme, ML) heap allocation is performed automatically, and virtually everything, including function calls, are allocated on the heap. Of course, functional languages also use garbage collection, since deallocation is automatic as well. (Indeed, SML/NJ as its default still quaintly announces calls to the garbage collector, including some statistics.) A lot of study and effort has been made by both the functional language and OO language community to make garbage collection efficient in both time and space. Sadly, C and C++ still lack standard garbage collectors.

32 Chapter 5Louden, Programming Languages32 Benefit (of system controlled storage management): ability to delay the binding of a storage segment's size and/or location reuse of a storage segment for different jobs (from system supervisor point of view) reuse of storage for different data structures increased generality, not have to specify maximum data structure size dynamic data structures recursive procedures - garbage collection is automatic Benefits of programmer controlled storage management Disadvantage: burden on programmer & may interfere with necessary system control May lead to subtle errors May interfere with system-controlled storage management Advantage: difficult for system to determine when storage may be most effectively allocated and freed

33 Chapter 5Louden, Programming Languages33 Heap management –Single-size cells vs. variable-size cells –Reference counters (eager approach) vs. garbage collection (lazy approach) 1. Reference counters: maintain a counter in every cell that store the number of pointers currently pointing at the cell –Disadvantages: space required, complications for cells connected circularly Expensive - when making a pointer assignment p=q 1.decrement count for old value of p 2.if 0, return to free storage. Check if contains references to other blocks. Could be recursive 3.do pointer assignment 4.Increment reference count for q

34 Chapter 5Louden, Programming Languages34 One-bit reference counting Another variation on reference counting, called one-bit reference counting, uses a single bit flag to indicate whether each object has either "one" or "many" references. If a reference to an object with "one" reference is removed, then the object can be recycled. If an object has "many" references, then removing references does not change this, and that object will never be recycled. It is possible to store the flag as part of the pointer to the object, so no additional space is required in each object to store the count. One-bit reference counting is effective in practice because most actual objects have a reference count of one.

35 Chapter 5Louden, Programming Languages35 2. Garbage collection: allocate until all available cells allocated; then begin gathering all garbage –Every heap cell has an extra bit used by collection algorithm –All cells initially set to garbage –All pointers traced into heap, and reachable cells marked as not garbage –All garbage cells returned to list of available cells

36 Chapter 5Louden, Programming Languages36 Disadvantages: when you need it most, it works worst (takes most time when program needs most of cells in heap)

37 Chapter 5Louden, Programming Languages37 Mark-Sweep - Java uses In a mark-sweep collection, the collector first examines the program variables; any blocks of memory pointed to are added to a list of blocks to be examined. For each block on that list, it sets a flag (the mark) on the block to show that it is still required, and also that it has been processed. It also adds to the list any blocks pointed to by that block that have not yet been marked. In this way, all blocks that can be reached by the program are marked. In the second phase, the collector sweeps all allocated memory, searching for blocks that have not been marked. If it finds any, it returns them to the allocator for reuse Can find circular references. Easy if regular use of pointers (Like in LISP) All elements must be reachable by a chain of pointers which begins outside the heap Have to be able to know where all pointers are - both inside the heap and outside. How can a chain be followed from a pointer if there is no predefined location for that pointer in the pointed-to cell?

38 Chapter 5Louden, Programming Languages38 Conservative garbage collection Although garbage collection was first invented in 1958, many languages have been designed and implemented without the possibility of garbage collection in mind. It is usually difficult to add normal garbage collection to such a system, but there is a technique, known as conservative garbage collection, that can be used. The usual problem with such a language is that it doesn't provide the collector with information about the data types, and the collector cannot therefore determine what is a pointer and what isn't. A conservative collector assumes that anything might be a pointer. It regards any data value that looks like a pointer to or into a block of allocated memory as preventing the recycling of that block. You might think that conservative garbage collection could easily perform quite poorly, leaving a lot of garbage uncollected. In practice, it does quite well, and there are refinements that improve matters further. Because references are ambiguous, some objects may be retained despite being actually unreachable. In practice, this happens rarely, and refinements such as black-listing can further reduce the odds.

39 Chapter 5Louden, Programming Languages39Lifetime/Extent The lifetime or extent of a program entity is the duration of its allocation in the environment. Allocation is static when the lifetime is the duration of the entire program execution. Lifetime is related to but not identical to scope. With scope holes, lifetime can extend to regions of the program where the program entity is not accessible. It is also possible for scope to exceed lifetime when a language allows locations to be manipulated directly (as for example manual deallocation). This is of course very dangerous!

40 Chapter 5Louden, Programming Languages40 Variables and Constants A variable is an object whose stored value can change during execution. A constant is an object whose value does not change throughout its lifetime. Constants are often confused with literals: constants have names, literals do not. Constants may be: –compile-time static (may not ever be allocated) –load-time static –dynamic

41 Chapter 5Louden, Programming Languages41 Constants (2) Compile-time constant in Java: static final int zero = 0; Load-time constant in Java: static final Date now = new Date(); Dynamic constant in Java: any non-static final assigned in a constructor. Java takes a very general view of constants, since it is not very worried about getting rid of them during compilation. C takes a much stricter view of constants, essentially forcing them to be capable of elimination during compilation.

42 Chapter 5Louden, Programming Languages42 Aliases, Dangling References, and Garbage An alias occurs when the same object is bound to two different names at the same time. This is fairly common with Java objects. A dangling reference is a location that has been deallocated from the environment, but is still accessible within the program. Dangling references are impossible in a garbage-collected environment with no direct access to addresses. Garbage is memory that is still allocated in the environment but has become inaccessible to the program. Garbage can be a problem in a non- garbage collected environment, but is much less serious than dangling references.


Download ppt "© Kenneth C. Louden, 20031 Chapter 5 - Basic Semantics Programming Languages: Principles and Practice, 2nd Ed. Kenneth C. Louden."

Similar presentations


Ads by Google