Lexing & Parsing (the intro version). What you have Input is string (via JTextField.getText() ) “(Lane OR Brown) AND (CS351 OR Java)” Useful to transform.

Slides:



Advertisements
Similar presentations
1 2.Lexical Analysis 2.1Tasks of a Scanner 2.2Regular Grammars and Finite Automata 2.3Scanner Implementation.
Advertisements

Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
報告者:會計四 簡思佳 The process of converting a program written in a high-level language into a machine-executable form. language implementation Ex.C++
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
CS252: Systems Programming
From Cooper & Torczon1 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language?
Functional Design and Programming Lecture 9: Lexical analysis and parsing.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
Context-Free Grammars Lecture 7
Recursive Descent Parsing Or: before you can parse, Grasshopper, first you must learn to parse...
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
1 Languages and Compilers (SProg og Oversættere) Parsing.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
CISC 471 First Exam Review Game Questions. Overview 1 Draw the standard phases of a compiler for compiling a high level language to machine code, showing.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
CS 280 Data Structures Professor John Peterson. How Does Parsing Work? You need to know where to start (“statement”) This grammar is constructed so that.
1 Top Down Parsing. CS 412/413 Spring 2008Introduction to Compilers2 Outline Top-down parsing SLL(1) grammars Transforming a grammar into SLL(1) form.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
Left Recursion Lecture 7 Fri, Feb 4, 2005.
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
1.  It is the first phase of compiler.  In computer science, lexical analysis is the process of converting a sequence of characters into a sequence.
Introduction Lecture 1 Wed, Jan 12, The Stages of Compilation Lexical analysis. Syntactic analysis. Semantic analysis. Intermediate code generation.
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Introduction to Compiling
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
ISBN Chapter 4 Lexical and Syntax Analysis.
2/1/20161 Programming Languages and Compilers (CS 421) Grigore Rosu 2110 SC, UIUC Slides by Elsa Gunter, based in.
Description on Labs Task Lab1: Lexical Analyzer Programming Lab2: Syntax Parser Programming Notices: 1)The programming language is.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
C H A P T E R T W O Linking Syntax And Semantics Programming Languages – Principles and Paradigms by Allen Tucker, Robert Noonan.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
©SoftMoore ConsultingSlide 1 Context-Free Grammars.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
Comp 311 Principles of Programming Languages Lecture 2 Syntax Corky Cartwright August 26, 2009.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
CS 3304 Comparative Languages
Lexical and Syntax Analysis
CSc 453 Lexical Analysis (Scanning)
RegExps & DFAs CS 536.
Syntax Analysis Chapter 4.
Compiler Lecture 1 CS510.
4 (c) parsing.
CS416 Compiler Design lec00-outline September 19, 2018
Basic Program Analysis: AST
Lexical and Syntax Analysis
C H A P T E R T W O Syntax.
CS 3304 Comparative Languages
Lecture 4: Lexical Analysis & Chomsky Hierarchy
CS 3304 Comparative Languages
CS416 Compiler Design lec00-outline February 23, 2019
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Presentation transcript:

Lexing & Parsing (the intro version)

What you have Input is string (via JTextField.getText() ) “(Lane OR Brown) AND (CS351 OR Java)” Useful to transform it to a Reader ( StringReader ) Gives uniform interface; isolates parser from source of data -- could read from file or network as easily as from string Char stream gives you data one character at a time char Reader.read()

What you want Parse tree Data structure that reflects syntactic structure of input phrase

2 Questions How do you get from a character stream ( Reader ) to a parse tree? Why do you want a parse tree anyway?

The parsing pipeline LexerParser Character data ( String ) Reader produces consume s read()nextTok() produces consume s Parse tree

Roles of the modules Reader: Provides a uniform interface to character data Generates one character at a time, on request Notifies caller when stream empty (EOF) Lexer (lexical analyzer): Converts groups of characters into tokens Generates one token at a time, on request Notifies caller when no more tokens are avail. Parser: Converts groups of tokens into parse trees Generates a complete parse tree, on request

In call-stack notation ParserLexerReader parseANDExpr() nextTok() read()...

Why Lexer and Parser? What’s the difference between the lexer and the parser? View 1 (abstract): Not much -- both are stream transforms View 2 (practical): A lot. Tokens and parse trees are conceptually different beasts Can implement lexer w/ DFA (finite state machine) Need recursion to handle full parsing (CFG; stack machine; recursive descent parsing)

The joy of lex Job of lexer (a.k.a., tokenizer) : convert stream of characters to stream of tokens Tokens are (roughly) things that can be described with regular expressions (regexps) “haruspex” “[a-zA-Z]+” “[0-9]+(.[0-9]+)?” Key idea: state machine determines what to do next Given [state,char], know: Whether to complete and return token What next state should be

Suppose you want to lex the following token types: TT_FNORD := “fnord” TT_WORD := “[a-zA-Z]*” TT_NUMBER := “[0-9]+” Any other char separates tokens, but is otherwise discarded Examples: “123kumquat 9” => “123” “kumquat” “9” “fnordling” => “fnordling” “fnord1ing” => “fnord” “1” “ing” A simple token set

Definitions Shift: Save current character on accumulator ( StringBuffer ) and continue Return: Create new token from accumulator Clear accumulator Shift current character onto accumulator Return new token DRet (drop and return): Create new token from accumulator Clear accumulator Return new token

The state machine Plus a lot more -- very complex diagram. Lot of cases to consider...

Turning it into code while (1) { char c=inputStream.read(); switch (currState) { case ST_S1: if (c==’f’) { currState=ST_S2; doShift(c); next; } if (Character.isDigit(c)) { currState=ST_S8; doShift(c); next; } // etc.

Turning it into code case ST_S6: if (Character.isDigit(c)) { currState=ST_S8; Token t=doReturn(c,TT_FNORD); return t; } if (Character.isLetter(c)) { currState=ST_S7; next; } // etc. case ST_S7: if (Character.isDigit(c)) { currState=ST_S8; Token t=doReturn(c,TT_WORD); return t; }