CS 3304 Comparative Languages

Slides:



Advertisements
Similar presentations
ISBN Chapter 3 Describing Syntax and Semantics.
Advertisements

CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax Lecture 2 - Syntax, Spring CSE3302 Programming Languages, UT-Arlington ©Chengkai.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
CS 355 – PROGRAMMING LANGUAGES Dr. X. Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
Grammars CPSC 5135.
C H A P T E R TWO Syntax and Semantic.
ISBN Chapter 3 Describing Syntax and Semantics.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
ISBN Chapter 3 Describing Syntax and Semantics.
1 CS Programming Languages Class 04 September 5, 2000.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Department of Software & Media Technology
Describing Syntax and Semantics Chapter 3: Describing Syntax and Semantics Lectures # 6.
CS 3304 Comparative Languages
More on scanning: NFAs and Flex
Chapter 3 – Describing Syntax
Describing Syntax and Semantics
Intro to compilers Based on end of Ch. 1 and start of Ch. 2 of textbook, plus a few additional references.
Describing Syntax and Semantics
Concepts of Programming Languages
Lexical and Syntax Analysis
CS 326 Programming Languages, Concepts and Implementation
CS 404 Introduction to Compiler Design
CS510 Compiler Lecture 4.
Lexical and Syntax Analysis
Chapter 2 :: Programming Language Syntax
Lexical Analysis (Sections )
Chapter 2 :: Programming Language Syntax
CSc 453 Lexical Analysis (Scanning)
Chapter 3 – Describing Syntax
Concepts of Programming Languages
What does it mean? Notes from Robert Sebesta Programming Languages
Automata and Languages What do these have in common?
PROGRAMMING LANGUAGES
Chapter 2 :: Programming Language Syntax
CS 363 Comparative Programming Languages
Syntax.
CS416 Compiler Design lec00-outline September 19, 2018
CSE 3302 Programming Languages
Compiler Design 4. Language Grammars
Lexical and Syntax Analysis
Describing Syntax and Semantics
Introduction CI612 Compiler Design CI612 Compiler Design.
Review: Compiler Phases:
Chapter 2 :: Programming Language Syntax
R.Rajkumar Asst.Professor CSE
CS 3304 Comparative Languages
Lecture 4: Lexical Analysis & Chomsky Hierarchy
September 13th Grammars.
CS416 Compiler Design lec00-outline February 23, 2019
Chapter 2 :: Programming Language Syntax
Chapter 3 Describing Syntax and Semantics.
Chapter 2 :: Programming Language Syntax
High-Level Programming Language
Describing Syntax and Semantics
Chapter 10: Compilers and Language Translation
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Lecture 5 Scanning.
Intro to compilers Based on end of Ch. 1 and start of Ch. 2 of textbook, plus a few additional references.
Faculty of Computer Science and Information System
Presentation transcript:

CS 3304 Comparative Languages Lecture 3: Scanning 24 January 2012

Introduction Syntax: the form or structure of the expressions, statements, and program units. Semantics: the meaning of the expressions, statements, and program units. Syntax and semantics provide a language’s definition. Users of a language definition: Other language designers. Implementers. Programmers (the users of the language). Basic terminology: A sentence is a string of characters over some alphabet. A language is a set of sentences. A lexeme is the lowest level syntactic unit of a language. A token is a category of lexemes (e.g., identifier).

Defining Languages Recognizers: Generators: A recognition device reads input strings over the alphabet of the language and decides whether the input strings belong to the language. Example: syntax analysis part of a compiler (scanning). Generators: A device that generates sentences of a language. One can determine if the syntax of a particular sentence is syntactically correct by comparing it to the structure of the generator.

Regular Expressions A regular expression is one of the following: A character. The empty string, denoted by ε. Two regular expressions concatenated. Two regular expressions separated by | (i.e., or). A regular expression followed by the Kleene star (concatenation of zero or more strings). Numerical literals in Pascal may be generated by the following:

Context-Free Grammars Developed by Noam Chomsky in the mid-1950s. Language generators, meant to describe the syntax of natural languages. Define a class of languages called context-free languages. Backus-Naur Form (1959): Invented by John Backus to describe Algol 58. BNF is equivalent to context-free grammars (CFGs). A CFG consists of: A set of terminals T. A set of non-terminals N. A start symbol S (a non-terminal). A set of productions.

BNF Fundamentals In BNF, abstractions are used to represent classes of syntactic structures: they act like syntactic variables (also called nonterminal symbols, or just terminals). Terminals are lexemes or tokens. A rule has a left-hand side (LHS), which is a nonterminal, and a right-hand side (RHS), which is a string of terminals and/or nonterminals. Nonterminals are often italic or enclosed in angle brackets. Examples of BNF rules: <ident_list> → identifier | identifier, <ident_list> <if_stmt> → if <logic_expr> then <stmt> Grammar: a finite non-empty set of rules. A start symbol is a special element of the nonterminals of a grammar.

Context-Free Grammar Example Expression grammar with precedence and associativity:

Parse Tree Example 1 Parse tree for expression grammar (with precedence) for 3 + 4 * 5

Parse Tree Example 2 Parse tree for expression grammar (with left associativity) for 10 - 4 - 3

Scanner Recall scanner is responsible for: Tokenizing source. Removing comments. (Often) dealing with pragmas (i.e., significant comments). Saving text of identifiers, numbers, strings. Saving source locations (file, line, column) for error messages.

Scanning Example I Suppose we are building an ad-hoc (hand-written) scanner for Pascal: We read the characters one at a time with look-ahead. If it is one of the one-character tokens: { ( ) [ ] < > , ; = + - etc } we announce that token. If it is a ., we look at the next character: If that is a dot, we announce . Otherwise, we announce . and reuse the look-ahead.

Scanning Example II If it is a <, we look at the next character if that is a = we announce <= otherwise, we announce < and reuse the look-ahead, etc. If it is a letter, we keep reading letters and digits and maybe underscores until we can't anymore: Then we check to see if it is a reserve word. If it is a digit, we keep reading until we find a non-digit: If that is not a . we announce an integer. Otherwise, we keep looking for a real number. If the character after the . is not a digit we announce an integer and reuse the . and the look-ahead.

Deterministic Finite Automaton Pictorial representation of a scanner for calculator tokens, in the form of a finite automaton. This is a deterministic finite automaton (DFA): Lex, scangen, ANTLR, etc. build these things automatically from a set of regular expressions. Specifically, they construct a machine that accepts the language.

The Longest Possible Token Rule We scanover and over to get one token after another. Nearly universal rule: always take the longest possible token from the input, thus: foobar is foobar and never f or foo or foob. The rule means you return only when the next character can't be used to continue the current token: The next character will generally be saved for the next token. In some cases, you may need to peek at more than one character of look-ahead in order to know whether to proceed: In Pascal, for example, when you have a 3 and you a see a dot Do you proceed (in hopes of getting 3.14)? or Do you stop (in fear of getting 3..5)? Regular expressions "generate" a regular language. DFAs "recognize” a regular language.

Building Scanners Scanners tend to be built three ways: Ad-hoc. Semi-mechanical pure DFA (usually as nested case statements). Table-driven DFA. Ad-hoc generally yields the fastest, most compact code by doing lots of special-purpose things, though good automatically-generated scanners come very close. Writing a pure DFA as a set of nested case statements is a surprisingly useful programming technique (Figure 12.1): It is often easier to use perl, awk, sed or similar tools. Table-driven DFA is what lex and scangen produce: lex (flex): C code scangen: numeric tables and a separate driver (Figure 2.12). ANTLR: Java code.

Summary BNF and context-free grammars are equivalent meta- languages that are well-suited for describing the syntax of programming languages. Syntax analysis is a common part of language implementation Scanners (lexical analyzers) use pattern matching to isolate small-scale parts of a program. ANTLR provides supports for scanners (lexers), parsers, and tree-parsers.