CO4301 – Advanced Games Development Week 2 Introduction to Parsing Gareth Bellaby
Regular expressions
Regular expression A regular expression is a sequence of symbols and characters which define a search pattern. Typically their main use is pattern matching with strings, e.g. search for string within a text. Powerful: search and replace, variants upon a pattern, etc.
Regular expression The characters are literals. The symbols are meta-characters which describe what constitutes a match. For example, wildcard for text search: *.txt
regex C11 has introduced a library for regular expressions: <regex> By default uses the ECMAScript syntax. Based on JavaScript JavaScript aims to be compatible with ECMAScript. http://www.ecma-international.org/ecma-262/7.0/index.html http://www.cplusplus.com/reference/regex/ECMAScript/http://www.ecma-international.org/ecma-262/7.0/index.html
regex Can also set flags to follow some other common grammars: awk, grep. Set the flag when you define the expression. For a broad overview see: https://msdn.microsoft.com/en-us/library/bb982382.aspx std::regex ex("\\bword$", ECMAScript | icase);
regex regex word("[[:alpha:]]+"); Define a regular expression using: regex regex word("[[:alpha:]]+");
regex regex expression("dog"); string str = "dog"; if (regex_match(str, expression)) { cout << "match" << endl; } str = "doggy"; expression = "(dog)(.*)";
regex Two other functions commands in the library: regex_search Search for sequence replace_replace Replace matched sequence
Parsing an arithmetic operation
Parsing Pronounced with a ‘z’: “parzing”. You would parse a sentence, or you would parse an arithmetic expression. Parsing is the process of breaking a sentence or expression into its constituent parts. At the same time you determine the relationship between the parts. This is done in order to analyse the sentence or expression. The tool that does this is called a parser. The result of the evaluation is called a parse.
Applications Parsing is an important topic within computing. There are natural language parsers which break a sentence down into its parts of speech, e.g. noun, verb, noun-phrase, etc. Compilers parse program code. eXtensible Markup Language (XML) is a parsed language that defines a set of rules for encoding a text.
Arithmetic expressions The purpose of parsing is to convert input into a form that can be processed or analysed. In some cases it is also possible to automate the analysis. The parsing of arithmetic expressions is straightforward because they following a limited set of clear rules. With an arithmetic expression what is required is that we can enter an expression and then evaluate it.
Expression types Operator. An operator operates on operands, function, e.g. + - / * Operand. An operand is the number, the thing the operator is operating upon. 3 + 4 + is an operator, 3 and 4 are operands Three types of arithmetic expressions: Infix Prefix Postfix
Expression types The root term refers to the position of the operator. Infix is where the operator appears between the operands. 4 + 5 Prefix is where the operator appears before the operands. Known as "Polish notation". Avoids the need for parentheses. Used with a type of LISP expression and in certain calculators. + 4 5
Expression types Postfix is where the operator appears afterthe operands. 4 5 + Known as "Reversed Polish notation". Reverse Polish Notation was used in a number of calculators for a while. The reason for this is that it straightforward to implement the notation using a stack and it was easy to implement a stack on the existing electronic architecture.
Parsing infix notation Standard mathematics is written using infix notation. Use a tree structure in order to parse and evaluate it. Each operand becomes a leaf node. Each operator is a branching node. Lowest priority operators appear higher in the tree. Evaluate by walking from left to right, moving through the tree structure.
Parsing infix notation
Parsing postfix notation I want to start with the parsing and evaluation of postfix notation. Avoids need for parentheses. No ambiguities with precedence. Postfix is more straightforward than infix. Prefix is similar to postfix so a bit redundant to look at both of these.
Examples a b + a + b a b / a / b a b - a - b 4 2 / 3 + The result is 5 because: 4 2 / 3 + 4 / 2 = 2, put result back 2 3 + = 5
Examples 6 3 2 4 + - * 2 + 4 = 6 6 3 6 - * 3 – 6 = -3 6 -3 * 6 * -3 = -18 -18 3 6 + 4 * 2 / 3 + 6 = 9 9 4 * 2 / 9 * 4 = 36 36 2 / 36 / 2 = 18 18
Using a stack The arithmetic expression is read from left to right. Use a stack in order to process each separate operation. Operands (numbers) are pushed onto the stack and popped from the stack. The operators are used to perform an arithmetic operation on the operands popped from the stack. The result of an operation is pushed on to the stack.
Algorithm Parse from left to right. While input: If an operand is read, push it on to the stack. If an operator is read, pop the top two operands from the stack. Perform the given operation on them. Push the result on to the stack. If no more input, push the result from the stack. Note: no error conditions!
Example 1 Infix: 2 * 4 + 1 + 3 * 5 Postfix: 2 4 * 1 + 3 5 * + Answer is 24
Example 1
Examples Infix: ( a * b ) - c Postfix: a b * c - Infix: ( a * b ) / c Infix: a - (b + c) * d Postfix: a b c + d * -
Examples Expression is: 2 3 + 1 4 - * 2 3 + 1 4 - * first operator (2 3 +) 1 4 - * 5 1 4 - * next operator 5 (1 4 -) * 5 -3 * next operator (5 -3 *) -15
Examples 6 3 2 4 + - * 2 + 4 6 3 6 - * 3 - 6 6 -3 * 6 * -3 -18 6 3 2 4 + - * 2 + 4 6 3 6 - * 3 - 6 6 -3 * 6 * -3 -18 3 6 + 4 * 2 / 3 + 6 9 4 * 2 / 9 * 4 36 2 / 36 / 2 18
Examples 3 2 + 6 4 - * 3 + 2 5 6 4 - * 6 - 4 5 2 * 5 * 2 10 6 4 3 3 1 - + * + 3 - 1 6 4 3 2 + * + 3 + 2 6 4 5 * + 4 * 5 6 20 + 6 + 20 26
Examples 6 4 + 3 3 1 - * + 6 + 4 10 3 3 1 - * + 3 - 1 10 3 2 * + 3 * 2 6 4 + 3 3 1 - * + 6 + 4 10 3 3 1 - * + 3 - 1 10 3 2 * + 3 * 2 10 6 + 10 + 6 16