CSE 340 Recitation Week 3 : Sept 1st – 7th Regular Expressions Questions Project 2 Regular Expressions Valid Syntax Using REs to Define a Language Evaluating whether “????” is in a Language RE v. MATH Application of REs -- Tokens
Questions? Current Project Homework Lecture Other?
Project 2 Project 2 is due 9/9/16 FRIDAY before 11:59pm Part 1 (30 points) Debug buggy program Common mistakes Extract with $tar –xzvf <filename> Included test script, runs the 6 test cases in test.sh Use diff command to create patch : $ diff -uwB buggy_program.c good_program.c > fix_bugs.patch
Project 2 – Part 2 Part 2 – Lexer + Linked List Will use getToken() to get input from STDIN getToken() returns token_type enum ID, NUM, IF, WHILE, DO, THEN, PRINT Global variables set by getToken() T_type – same as returned by getToken() current_token – contains token value or blank token_length – length of string stored in current_token line – the line number of current_token
Project 2 – Part 2 Part 2 – Lexer + Linked List ID and NUM tokens stored into Linked List Need to store Token Type, Token Value, Line number Will need to print output out in reverse (one option might be to use a doubly linked list) Token Type Value Line # Next Token Type Value Line # Next Token Type Value Line # Next Next Next Next Previous Previous Previous
Project 2 – Part 2 Part 2 – Lexer + Linked List Must use a Linked List data structure Must create reversed output from list, will not receive full credit if output to string to and print string in reverse. Token Type Value Line # Next Token Type Value Line # Next Token Type Value Line # Next Next Next Next Previous Previous Previous
Project 2 – Part 2 Part 2 – Lexer + Linked List
Project 2 – Part 2 Part 2 – Lexer + Linked List Standard Output Standard Input Standard Output
Project 2 – Part 2 Part 2 – Lexer + Linked List Evaluation Testing A test script that runs multiple test cases is provided Details on test scripts in project document Evaluation Graded on whether the test cases are passed Must use a C-Style linked list (a struct with a self-referential field) CANNOT USE the STL
Regular Expressions Valid Syntax for REs Definition of Languages using REs Is“????” Language (i.e. L(RE) or {}) RE v MATH
Valid Syntax for REs A syntactically valid regular expression has ∅ 𝜺 a, where a is an element of the alphabet R1 | R2, where R1 and R2 are regular expressions R1 . R2, where R1 and R2 are regular expressions (R), where R is a regular expression R*, where R is a regular expression
Valid Syntax for REs Given: = {a, b, c, d, 1, 2 , 3, 4} Are these valid REs? Z = a | b | c Y = (1 | 2 | 3)* X = (Z | 𝜺) W = ((1.2 | 2.3 ) 4*) V = X.Z.Y.X.Y ∅ 𝜺 a, where a is an element of the alphabet R1 | R2, where R1 and R2 are regular expressions R1 . R2, where R1 and R2 are regular expressions (R), where R is a regular expression R*, where R is a regular expression
Valid Syntax for REs Given: = {a, b, c, d, 1, 2 , 3, 4} Are these valid REs? Z = a | b | c Y = (1 | 2 | 3)* X = (Z | 𝜺) W = ((1.2 | 2.3 ).4*) V = X.Z.Y.X.Y ∅ 𝜺 a, where a is an element of the alphabet R1 | R2, where R1 and R2 are regular expressions R1 . R2, where R1 and R2 are regular expressions (R), where R is a regular expression R*, where R is a regular expression
Definition of Languages using REs What is 𝛴? What is 𝛴*? Given : = {a, b, c, d, 1, 2, 3} Is “dddddddddddddddcccccccccccccccaaaaaaaaaaaaaaaa11111111aaaaaaaaaaaaaa333333333333bbbbbbbbbbbbaaaaaaaa3333333” 𝛴*?
Definition of Languages using REs What is 𝛴? What is 𝛴*? Given : = {a, b, c, d, 1, 2, 3} Is “dddddddddddddddcccccccccccccccaaaaaaaaaaaaaaaa11111111aaaaaaaaaaaaaa333333333333bbbbbbbbbbbbaaaaaaaa3333333” 𝛴*? YES
Definition of Languages using REs What is a Language? A Language is a subSET of 𝛴* i.e., L 𝛴* How do we describe that subset? Using REs
Definition of Languages using REs = {a, b, c, d, 1, 2, 3} Z = a | b | c Y = (1 | 2 | 3)* X = (Z | 𝜺) W = (2.3.4*) | 𝜺 V = X.Y.Z.W.X.Y L(V) = {…}
Definition of Languages using REs L(V) = {…} This uses the V regular expression to define the subset 𝛴* Thus, L(V) 𝛴* = {a, b, c, d, 1, 2, 3} -------------------- Z = a | b | c Y = (1 | 2 | 3)* X = (Z | 𝜺) W = (2.3.4*) | 𝜺 V = X.Y.Z.W.X.Y
Definition of Languages using REs What are some examples of strings that are in L(V)? L(V) = {a1a234a3, 1aa3, …} Is “a1a234a3” 𝛴* YES = {a, b, c, d, 1, 2, 3} -------------------- Z = a | b | c Y = (1 | 2 | 3)* X = (Z | 𝜺) W = (2.3.4*) | 𝜺 V = X.Y.Z.W.X.Y
Is“????” Language (i.e. L(RE) or {}) Does L(V) contain: a b123 ab123ab123 ab123c8 a12344321 312d333 Why is it not in L(V)? = {a, b, c, d, 1, 2, 3, 4} -------------------- Z = a | b | c Y = (1 | 2 | 3)* X = (Z | 𝜺) W = (2.3.4*) | 𝜺 V = X.Y.Z.W.X.Y √ X
Is“????” Language (i.e. L(RE) or {}) What must L(V) always contain? a, b, or c Why? b/c of RE Z = {a, b, c, d, 1, 2, 3} -------------------- Z = a | b | c Y = (1 | 2 | 3)* X = (Z | 𝜺) W = (2.3.4*) | 𝜺 V = X.Y.Z.W.X.Y
EPSILON Is this a valid set? Is this a valid regular expression? = {a, b, c, d, 𝜺, |, .} Is this a valid regular expression? Z=a.b.c .𝜺.𝜺 Is 𝜺 the character represented by the alphabet or the RE representation of empty string?
EPSILON Usually we will define like this, for clarity. = {a, b, c, d, \𝜺, \|, \.} Is this a valid regular expression? Z=a.b.c.\ 𝜺 . 𝜺 “abc𝜺” L(Z)?
EPSILON = {a, b, c, d} Is this a valid regular expression? Z=a.b.c.(a|𝜺).b.c What strings are in L(Z)? Is “abc𝜺” L(Z)?
EPSILON √ X = {a, b, c, d} Is this a valid regular expression? Z=a.b.c.(a|𝜺).b.c What is in the language? L(Z) = {abcabc, abcbc} Are these strings in L(Z)? “abcbc” L(Z)? “abc𝜺bc” L(Z)? WHY? √ X
RE v. Math A regular expression defines a subset of 𝛴* L(∅) = ∅ L(a) = {a} L(R1 | R2) = L(R1) ∪ L(R2) L(R1 . R2) = L(R1) . L(R2) L((R)) = L(R) L(R*) = L(R*) = ∪i≥0 Li(R) ∅ 𝜺 a, where a is an element of the alphabet R1 | R2, where R1 and R2 are regular expressions R1 . R2, where R1 and R2 are regular expressions (R), where R is a regular expression R*, where R is a regular expression
RE v. Math Operator Precedence Just like Math () () ^ * . . | similar to +
RE v. Math L(R1 . R2) = L(R1) . L(R2) For two sets A and B of strings: A . B = {xy : x ∈ A and y ∈ B}
RE v. Math = {a, b, c, d, 1, 2, 3} -------------------- Z = a | b | c X = (Z | 𝜺) A . B = {xy : x ∈ A and y ∈ B} Example: L(X.Z) = L(X).L(Z) = L(Z|𝜺).L(a|b|c) = (L(Z) U L(𝜺)).(L(a) U L(b) U L(c))= (L(a) U L(b) U L(c) U L(𝜺)).(L(a) U L(b) U L(c))= ({a} U {b} U {c} U {𝜺}) . ({a} U {b} U {c}) = {a, b, c, 𝜺}.{a,b,c} = {aa, ab, ac, ba, bb, bc, ca, cb, cc, a, b, c}
RE v. Math L(R*) = ∪i≥0 Li(R), where L0(R) = {𝜺} Definition Li(R) = Li-1(R) . L(R) L(R*) = ∪i≥0 (Li-1(R) . L(R))
RE v. Math L(R*) = ∪i≥0 Li(R), where L0(R) = {𝜺} L (R*) = {𝜺} ∪ L(R) ∪ L(R) . L( R) . L(R) . L(R) …
RE v. Math Example: L(Y) L((1 | 2 | 3)*) = {𝜺} U L(1|2|3) U L(1|2|3). L(1|2|3) U L(1|2|3). L(1|2|3) . L(1|2|3) U … L(1|2|3) = L(1) U L(2) U L(3) = {1} U {2} U {3} = {1,2,3} = {a, b, c, d, 1, 2, 3} -------------------- Y = (1 | 2 | 3)* L(R1 | R2) = L(R1) ∪ L(R2) L(R1 . R2) = L(R1) . L(R2) A . B = {xy : x ∈ A and y ∈ B} L(R*) = ∪i≥0 Li(R) where L0(R) = {𝜺}