CS 540 Spring 2013. CS 540 Spring 2013 GMU2 The Course covers: Lexical Analysis Syntax Analysis Semantic Analysis Runtime environments Code Generation.

Slides:



Advertisements
Similar presentations
CSE 311 Foundations of Computing I
Advertisements

4b Lexical analysis Finite Automata
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
1 CIS 461 Compiler Design and Construction Fall 2012 slides derived from Tevfik Bultan et al. Lecture-Module 5 More Lexical Analysis.
Lexical Analysis - Scanner Computer Science Rensselaer Polytechnic Compiler Design Lecture 2.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
2. Lexical Analysis Prof. O. Nierstrasz
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Compiler Construction
BİL744 Derleyici Gerçekleştirimi (Compiler Design)1.
CS Chapter 2. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Compiler course 1. Introduction. Outline Scope of the course Disciplines involved in it Abstract view for a compiler Front-end and back-end tasks Modules.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
May 31, May 31, 2016May 31, 2016May 31, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University,
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Lecture # 4 Chapter 1 (Left over Topics) Chapter 3 (continue)
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
Topic #1: Introduction EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Introduction to Compiling
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Lexical Analysis - Scanner- Contd Computer Science Rensselaer Polytechnic Compiler Design Lecture 3(01/21/98)
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2007.
1 Topic 2: Lexing and Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
Finite-State Machines Fundamental Data Structures and Algorithms Peter Lee March 11, 2003.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 Compiler Construction Vana Doufexi office CS dept.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2006.
Presented by : A best website designer company. Chapter 1 Introduction Prof Chung. 1.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
CS510 Compiler Lecture 1. Sources Lecture Notes Book 1 : “Compiler construction principles and practice”, Kenneth C. Louden. Book 2 : “Compilers Principles,
Converting Regular Expressions to NFAs Empty string   is a regular expression denoting  {  } a is a regular expression denoting {a} for any a in 
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Compiler Design (40-414) Main Text Book:
Chapter 1 Introduction.
Lexical analysis Finite Automata
Chapter 1 Introduction.
Compiler Lecture 1 CS510.
Two issues in lexical analysis
Recognizer for a Language
CS416 Compiler Design lec00-outline September 19, 2018
Review: Compiler Phases:
4b Lexical analysis Finite Automata
CS416 Compiler Design lec00-outline February 23, 2019
4b Lexical analysis Finite Automata
Compiler Construction
Lecture 5 Scanning.
Presentation transcript:

CS 540 Spring 2013

CS 540 Spring 2013 GMU2 The Course covers: Lexical Analysis Syntax Analysis Semantic Analysis Runtime environments Code Generation Code Optimization

CS 540 Spring 2013 GMU3 Pre-requisite courses Strong programming background in C, C++ or Java – CS 310 Formal Language (NFAs, DFAs, CFG) – CS 330 Assembly Language Programming and Machine Architecture –CS 367

CS 540 Spring 2013 GMU4 Operational Information Office: Engineering Building, Rm Class Web Page: Blackboard Discussion board: Piazza Computer Accounts on zeus.vse.gmu.edu (link on ‘Useful Links’)

CS 540 Spring 2013 GMU5 CS 540 Course Grading Programming Assignments (45 % ) –5% + 10% + 10% + 20% Exams – midterm and final (25 %, 30 % )

CS 540 Spring 2013 GMU6 Resources Textbooks: –Compilers: Principles, Techniques and Tools, Aho, Lam, Sethi & Ullman, 2007 (required) –lex & yacc, Levine et. al. Slides Sample code for Lex/YACC (C, C++, Java)

CS 540 Spring 2013 GMU7 Distance Education CS 540 Spring ‘13 session is delivered to the Internet section (Section 540-DL) online by NEW Students in distance section will access to online lectures and can play back the lectures and download the PDF slide files The distance education students will be given the midterm and final exam on campus, on the same day/time as in class students. Exam locations will be announced closer to the exam dates.

Lecture 1: Introduction to Language Processing & Lexical Analysis CS 540

CS 540 Spring 2013 GMU9 What is a compiler? A program that reads a program written in one language and translates it into another language. Source language Target language Traditionally, compilers go from high-level languages to low-level languages.

CS 540 Spring 2013 GMU10 Compiler Architecture Front End – language specific Back End – machine specific Source Language Target Language Intermediate Language In more detail: Separation of Concerns Retargeting

CS 540 Spring 2013 GMU11 Compiler Architecture Scanner (lexical analysis) Parser (syntax analysis) Code Optimizer Semantic Analysis (IC generator) Code Generator Symbol Table Source language tokens Syntactic structure Intermediate Language Target language Intermediate Language

CS 540 Spring 2013 GMU12 Lexical Analysis - Scanning Scanner (lexical analysis) Parser (syntax analysis) Code Optimizer Semantic Analysis (IC generator) Code Generator Symbol Table Tokens described formally Breaks input into tokens Remove white space Source language tokens

CS 540 Spring 2013 GMU13 Input: result = a + b * c / d Tokens: ‘result’, ‘=‘, ‘a’, ‘+’, ‘b’, ‘*’, ‘c’, ‘/’, ‘d’ identifiers operators

CS 540 Spring 2013 GMU14 Static Analysis - Parsing Scanner (lexical analysis) Parser (syntax analysis) Code Optimizer Semantic Analysis (IC generator) Code Generator Symbol Table Source language tokens Syntactic structure Target language Syntax described formally Tokens organized into syntax tree that describes structure Error checking (syntax)

CS 540 Spring 2013 GMU15 Exp ::= Exp ‘+’ Exp | Exp ‘-’ Exp | Exp ‘*’ Exp | Exp ‘/’ Exp | ID Assign ::= ID ‘=‘ Exp Assign ID ‘=‘ Exp Exp ‘+’ Exp Exp ‘*’ Exp Exp ‘/’ Exp ID Input: result = a + b * c / d

CS 540 Spring 2013 GMU16 Semantic Analysis Scanner (lexical analysis) Parser (syntax analysis) Code Optimizer Semantic Analysis (IC generator) Code Generator Symbol Table Source language Syntactic structure Syntactic/semantic structure Target language “Meaning” Type/Error Checking Intermediate Code Generation – abstract machine Syntactic/semantic structure

CS 540 Spring 2013 GMU17 Optimization Scanner (lexical analysis) Parser (syntax analysis) Code Optimizer Semantic Analysis (IC generator) Code Generator Symbol Table Source language Syntactic/semantic structure Target language Improving efficiency (machine independent) Finding optimal code is NP Syntactic/semantic structure

CS 540 Spring 2013 GMU18 Code Generation Scanner (lexical analysis) Parser (syntax analysis) Code Optimizer Semantic Analysis (IC generator) Code Generator Symbol Table Source language Syntactic/semantic structure Target language IC to real machine code Memory management, register allocation, instruction selection, instruction scheduling, … Syntactic/semantic structure

CS 540 Spring 2013 GMU19 Issues Driving Compiler Design Correctness Speed (runtime and compile time) –Degrees of optimization –Multiple passes Space Feedback to user Debugging

CS 540 Spring 2013 GMU20 Related to Compilers Interpreters (direct execution) Assemblers Preprocessors Text formatters (non-WYSIWYG) Analysis tools

CS 540 Spring 2013 GMU21 Why study compilers? Bring together: –Data structures & Algorithms –Formal Languages –Computer Architecture Influence: –Language Design –Architecture (influence is bi-directional) Techniques used influence other areas (program analysis, testing, …)

CS 540 Spring 2013 GMU22 Review of Formal Languages Regular expressions, NFA, DFA Translating between formalisms Using these formalisms

CS 540 Spring 2013 GMU23 What is a language? Alphabet – finite character set (  String – finite sequence of characters – can be , the empty string (Some texts use  as the empty string) Language – possibly infinite set of strings over some alphabet – can be { }, the empty language.

CS 540 Spring 2013 GMU24 Suppose  = {a,b,c}. Some languages over  could be: {aa,ab,ac,bb,bc,cc} {ab,abc,abcc,abccc,...} {  } { } {a,b,c,  …

CS 540 Spring 2013 GMU25 Why do we care about Regular Languages? Formally describe tokens in the language –Regular Expressions –NFA –DFA Regular Expressions  finite automata Tools assist in the process

CS 540 Spring 2013 GMU26 Regular Expressions The regular expressions over finite  are the strings over the alphabet  + { ), (, |, * } such that: 1.{ } (empty set) is a regular expression for the empty set  is a regular expression denoting  {  } 3.a is a regular expression denoting set { a } for any a in 

CS 540 Spring 2013 GMU27 Regular Expressions 4. If P and Q are regular expressions over , then so are: P | Q (union) If P denotes the set {a,…,e}, Q denotes the set {0,…,9} then P | Q denotes the set {a,…,e,0,…,9} PQ (concatenation) If P denotes the set {a,…,e}, Q denotes the set {0,…,9} then PQ denotes the set {a0,…,e0,a1,…,e9} Q* (closure) If Q denotes the set {0,…,9} then Q* denotes the set { ,0,…,9,00,…99,…}

CS 540 Spring 2013 GMU28 Examples If  = {a,b} (a | b)(a | b) (a | b)*b a*b*a* a*a (also known as a+) (ab*)|(a*b)

CS 540 Spring 2013 GMU29 Nondeterministic Finite Automata A nondeterministic finite automaton (NFA) is a mathematical model that consists of 1.A set of states S 2.A set of input symbols  3.A transition function that maps state/symbol pairs to a set of states: S x {  +  }  set of S 4.A special state s 0 called the start state 5.A set of states F (subset of S) of final states INPUT: string OUTPUT: yes or no

CS 540 Spring 2013 GMU30 STATE ab  00, Transition Table: 0123 a,b a bb  S = {0,1,2,3} S 0 = 0  = {a,b} F = {3} Example NFA

CS 540 Spring 2013 GMU31 NFA Execution An NFA says ‘yes’ for an input string if there is some path from the start state to some final state where all input has been processed. NFA(int s0, int input) { if (all input processed && s 0 is a final state) return Yes; if (all input processed && s 0 not a final state) return No; for all states s 1 where transition(s 0,table[input]) = s 1 if (NFA(s 1,input_element+1) == Yes) return Yes; for all states s 1 where transition(s 0,  ) = s 1 if (NFA(s 1,input_element) == Yes) return Yes; return No; } Uses backtracking to search all possible paths

CS 540 Spring 2013 GMU32 Deterministic Finite Automata A deterministic finite automaton (DFA) is a mathematical model that consists of 1.A set of states S 2.A set of input symbols  3.A transition function that maps state/symbol pairs to a state: S x   S 4.A special state s 0 called the start state 5.A set of states F (subset of S) of final states INPUT: string OUTPUT: yes or no

CS 540 Spring 2013 GMU33 DFA Execution DFA(int start_state) { state current = start_state; input_element = next_token(); while (input to be processed) { current = transition(current,table[input_element]) if current is an error state return No; input_element = next_token(); } if current is a final state return Yes; else return No; }

CS 540 Spring 2013 GMU34 Regular Languages 1.There is an algorithm for converting any RE into an NFA. 2.There is an algorithm for converting any NFA to a DFA. 3.There is an algorithm for converting any DFA to a RE. These facts tell us that REs, NFAs and DFAs have equivalent expressive power. All three describe the class of regular languages.

CS 540 Spring 2013 GMU35 Converting Regular Expressions to NFAs The regular expressions over finite  are the strings over the alphabet  + { ), (, |, *} such that: { } (empty set) is a regular expression for the empty set Empty string  is a regular expression denoting  {  } a is a regular expression denoting {a } for any a in   a

CS 540 Spring 2013 GMU36 Converting Regular Expressions to NFAs If P and Q are regular expressions with NFAs  N p, N q : P | Q (union) PQ (concatenation) NpNp NqNq NqNq NpNp     

CS 540 Spring 2013 GMU37 Converting Regular Expressions to NFAs If Q is a regular expression with NFA N q : Q* (closure) NqNq   

CS 540 Spring 2013 GMU38 Example (ab* | a*b)* ab* a*b a b ba Starting with: 12 a ab* | a*b     a b b

CS 540 Spring 2013 GMU39 Example (ab* | a*b)* 12 a a ab* | a*b (ab* | a*b)* 78            a b b b b a

CS 540 Spring 2013 GMU40 Converting NFAs to DFAs Idea: Each state in the new DFA will correspond to some set of states from the NFA. The DFA will be in state {s 0,s 1,…} after input if the NFA could be in any of these states for the same input. Input: NFA N with state set S N, alphabet , start state s N, final states F N, transition function T N : S N x  + {  }  set of S N Output: DFA D with state set S D, alphabet , start state s D =  -closure(s N ), final states F D, transition function T D : S D x   S D

CS 540 Spring 2013 GMU41  -closure() Defn:  -closure(T) = T + all NFA states reachable from any state in T using only  transitions b  a  b b a  -closure({1,2,5}) = {1,2,5}  -closure({4}) = {1,4}  -closure({3}) = {1,3,4}  -closure({3,5}) = {1,3,4,5}

CS 540 Spring 2013 GMU42 Algorithm: Subset Construction s D =  -closure( s N ) -- create start state for DFA S D = { s D } (unmarked) while there is some unmarked state R in S D mark state R for all a in  do s =  -closure(   (R,a)); if s not already in S D then add it (unmarked) T D (R,a) = s; end for end while F D = any element of S D that contains a state in F N

CS 540 Spring 2013 GMU43 Example 1: Subset Construction  b a b a,b NFA

CS 540 Spring 2013 GMU44 Example 1: Subset Construction  b a b a,b NFA 1,2 ab {1,2}

CS 540 Spring 2013 GMU45 Example 1: Subset Construction  b a b a,b NFA 1,24,5 3,5 b a ab {1,2}{3,5}{4,5} {3,5} {4,5}

CS 540 Spring 2013 GMU46 Example 1: Subset Construction  b a b a,b NFA 1,24,5 3,54 b a b ab {1,2}{3,5}{4,5} {3,5}-{4} {4,5} {4}

CS 540 Spring 2013 GMU47 Example 1: Subset Construction  b a b a,b NFA 1,24,55 3,54 a,bb a b ab {1,2}{3,5}{4,5} {3,5}-{4} {4,5}{5} {4} {5}

CS 540 Spring 2013 GMU48 Example 1: Subset Construction  b a b a,b NFA 1,24,55 3,54 a,b b a b ab {1,2}{3,5}{4,5} {3,5}-{4} {4,5}{5} {4}{5} {5}{5}-- All final states since the NFA final state is included

CS 540 Spring 2013 GMU49 Example 2: Subset Construction b  a b b a NFA 

CS 540 Spring 2013 GMU50 Example 2: Subset Construction b  a b b a 1 1,3,4 2 1,4,5 a b b NFADFA  1,3,4,5 a a a b b b

CS 540 Spring 2013 GMU51 Example 3: Subset Construction  b  1,2,4 NFADFA a b a 5 3,43, b b a b a b b a

CS 540 Spring 2013 GMU52 Converting DFAs to REs 1.Combine serial links by concatenation 2.Combine parallel links by alternation 3.Remove self-loops by Kleene closure 4.Select a node (other than initial or final) for removal. Replace it with a set of equivalent links whose path expressions correspond to the in and out links 5.Repeat steps 1-4 until the graph consists of a single link between the entry and exit nodes.

CS 540 Spring 2013 GMU53 Example d a b c d 0 da d b b c da|b|cd 0 da d b b|c parallel edges become alternation

CS 540 Spring 2013 GMU54 Example 345 d (a|b|c) d d 0 a b (b|c) d da|b|cd 0 da d b b|c serial edges become concatenation

CS 540 Spring 2013 GMU55 Example 345 d (a|b|c) d d 0 a b (b|c) d 345 d (a|b|c) d d 0 a b(b|c)da Find paths that can be “shortened”

CS 540 Spring 2013 GMU56 Example 345 d (a|b|c) d d 0 a b(b|c)da 345 d (a|b|c) d (b(b|c)da)*d 0 a 5 d (a|b|c) d(b(b|c)da)*d 0 a eliminate self-loops serial edges become concatenation

CS 540 Spring 2013 GMU57 Describing Regular Languages Generate all strings in the language Generate only strings in the language Try the following: –Strings of {a,b} that end with ‘abb’ –Strings of {a,b} that don’t end with ‘abb’ –Strings of {a,b} where every a is followed by at least one b

CS 540 Spring 2013 GMU58 Strings of (a|b)* that end in abb re: (a|b)*abb 0123 abb ba a b a 0123 abb a,b NFA DFA

CS 540 Spring 2013 GMU59 Strings of (a|b)* that don’t end in abb re: ?? 0123 abb ba a b a DFA/NFA

CS 540 Spring 2013 GMU60 Strings of (a|b)* that don’t end in abb 0123 abb ba a b a 0123 b*aa*bb a b a 012 b*aa*b a bb ba 012 b*aa*b a*bba | a*ba a*bbb

CS 540 Spring 2013 GMU61 Suggestions for writing NFA/DFA/RE Typically, one of these formalisms is more natural for the problem. Start with that and convert if necessary. In NFA/DFAs, each state typically captures some partial solution Be sure that you include all relevant edges (ask – does every state have an outgoing transition for all alphabet symbols?)

CS 540 Spring 2013 GMU62 Non-Regular Languages Not all languages are regular” The language ww where w=(a|b)* Non-regular languages cannot be described using REs, NFAs and DFAs.