AND FINITE AUTOMATA… Ruby Regular Expressions. Why Learn Regular Expressions? RegEx are part of many programmer’s tools  vi, grep, PHP, Perl They provide.

Slides:



Advertisements
Similar presentations
Regular Expressions using Ruby Assignment: Midterm Class: CPSC5135U – Programming Languages Teacher: Dr. Woolbright Student: James Bowman.
Advertisements

Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 2: 8/23.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
1 Foundations of Software Design Lecture 22: Regular Expressions and Finite Automata Marti Hearst Fall 2002.
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Regular expression. Validation need a hard and very complex programming. Sometimes it looks easy but actually it is not. So there is a lot of time and.
Scripting Languages Chapter 8 More About Regular Expressions.
CPSC 388 – Compiler Design and Construction
slides created by Marty Stepp
Regex Wildcards on steroids. Regular Expressions You’ve likely used the wildcard in windows search or coding (*), regular expressions take this to the.
$address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
AND OTHER LANGUAGES… Ruby Regular Expressions. Why Learn Regular Expressions? RegEx are part of many programmer’s tools  vi, grep, PHP, Perl They provide.
Introduction to Computing Using Python Regular expressions Suppose we need to find all addresses in a web page How do we recognize addresses?
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
BTANT129 w61 Regular expressions step by step Tamás Váradi
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
SCRIBE SUBMISSION GROUP 8 Date: 7/8/2013 By – IKHAR SUSHRUT MEGHSHYAM 11CS10017 Lexical Analyser Constructing Tokens State-Transition Diagram S-T Diagrams.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Regular Expression in Java 101 COMP204 Source: Sun tutorial, …
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2015, Fred McClurg, All Rights.
BY Sandeep Kumar Gampa.. What is Regular Expression? Regex in.NET Regex Language Elements Examples Regular Expression API How to Test regex in.NET Conclusion.
Regular Expression - Intro Patterns that define a set of strings (or, pieces of a string) Not wildcards (similar notion, but different thing) Used by utilities.
VBScript Session 13.
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
Regular Expressions Regular Expressions. Regular Expressions  Regular expressions are a powerful string manipulation tool  All modern languages have.
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:
C# Strings 1 C# Regular Expressions CNS 3260 C#.NET Software Development.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2010 All Rights Reserved. 1.
Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
12. Regular Expressions. 2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned,
Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2.
CSC 4630 Meeting 21 April 4, Return to Perl Where are we? What is confusing? What practice do you need?
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen.
CSC 2720 Building Web Applications PHP PERL-Compatible Regular Expressions.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
Regular Expressions /^Hel{2}o\s*World\n$/ SoftUni Team Technical Trainers Software University
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CMSC330 More Ruby. Last lecture Scripting languages Ruby language –Implicit variable declarations –Many control statements –Classes & objects –Strings.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Looking for Patterns - Finding them with Regular Expressions
Ruby Regular Expressions
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Regular Expressions
CIT 383: Administrative Scripting
Regular Expression: Pattern Matching
Presentation transcript:

AND FINITE AUTOMATA… Ruby Regular Expressions

Why Learn Regular Expressions? RegEx are part of many programmer’s tools  vi, grep, PHP, Perl They provide powerful search (via pattern matching) capabilities Simple regex are easy, but more advanced patterns can be created as needed From:

Finite Automata Formally a finite automata is a five-tuple(S,  ,  , s 0, S F ) where S is the set of states, including error state S e. S must be finite.  is the alphabet or character set used by recognizer. Typically union of edge labels (transitions between states).  (s,c) is a function that encodes transitions (i.e., character c in   changes to state s in S. ) s 0 is the designated start state S F is the set of final states, drawn with double circle in transition diagram

Simple Example Finite automata to recognize fee and fie: S = {s 0, s 1, s 2, s 3, s 4, s 5, s e }  = {f, e, i}  (s,c) set of transitions shown above s 0 = s 0 S F = { s 3, s 5 } Set of words accepted by a finite automata F forms a language L(F). Can also be described by regular expressions. S0S0 S4S4 S1S1 f S3S3 S5S5 S2S2 e i e e What type of program might need to recognize fee/fie/etc.?

Finite Automata & Regular Expressions /fee|fie/ /f[ei]e/ S0S0 S4S4 S1S1 f S3S3 S5S5 S2S2 e i e e

Another Example: Pascal Identifier /[A-Za-z][A-Za-z0-9]*/ S0S0 S1S1 A-Za-z A-Za-z0-9

Quick Exercise Create an FSA to recognize dates

Regular Expressions in Ruby Closely follows syntax of Perl 5 Need to understand  Regexp patterns – how to create a regular expression  Pattern matching – how to use =~  Regexp objects  how to work with regexp in Ruby  Match data and named captures are useful Handy resource: rubular.com

Defining a Regular Expression Constructed as  /pattern/  /pattern/options  %r{pattern}  %r{pattern}options  Regexp.new/Regex.compile/Regex.union Options provide additional info about how pattern match should be done, for example:  i – ignore case  m – multiline, newline is an ordinary character to match  u,e,s,n – specifies encoding, such as UTF-8 (u) From:

Literal characters /ruby/ /ruby/i s = "ruby is cool" if s =~ /ruby/ puts "found ruby" end puts s =~ /ruby/ if s =~ /Ruby/i puts "found ruby - case insensitive" end

Character classes /[0-9]/ match digit /[^0-9]/ match any non-digit /[aeiou]/ match vowel /[Rr]uby/ match Ruby or ruby

Special character classes /./ #match any character except newline /./m # match any character, multiline /\d/ # matches digit, equivalent to [0-9] /\D/ #match non-digit, equivalent to [^0-9] /\s/ #match whitespace /[ \r\t\n\f]/ \f is form feed /\S/ # non-whitespace /\w/ # match single word chars /[A-Za-z0-9_]/ /\W/ # non-word characters NOTE: must escape any special characters used to create patterns, such as. \ + etc.

Repetition + matches one or more occurrences of preceding expression  e.g., /[0-9]+/ matches “1” “11” or “1234” but not empty string ? matches zero or one occurrence of preceding expression  e.g., /-?[0-9]+/ matches signed number with optional leading minus sign * matches zero or more copies of preceding expression  e.g., /yes[!]*/ matches “yes” “yes!” “yes!!” etc.

More Repetition /\d{3}/ # matches 3 digits /\d{3,}/ # matches 3 or more digits /\d{3,5}/ # matches 3, 4 or 5 digits

Non-greedy Repetition Assume s = perl> / / # greedy repetition, matches perl> / / # non-greedy, matches Where might you want to use non-greedy repetition?

Grouping /\D\d+/ # matches a1111 /(\D\d)+/ # matches a1b2a3… /([Rr]uby(, )?)+/ Would this recognize:  “Ruby”  “Ruby ruby”  “Ruby and ruby”  “RUBY”

Alternatives /cow|pig|sheep/ # match cow or pig or sheep

Anchors – location of exp /^Ruby/ # Ruby at start of line /Ruby$/ # Ruby at end of line /\ARuby/ # Ruby at start of line /Ruby\Z/ # Ruby at end of line /\bRuby\b/ # Matches Ruby at word boundary Using \A and \Z are preferred

Pattern Matching =~ is pattern match operator string =~ pattern OR pattern =~ string Returns the index of the first match or nil puts "value: 30" =~ /\d+/ => 7 (location of 30) # nil doesn’t show when printing, but try: found = "value: abc" =~ /\d+/ if (found == nil) puts "not found" end

Regexp class Can create regular expressions using Regexp.new or Regexp.compile (synonymous) ruby_pattern = Regexp.new("ruby", Regexp::IGNORECASE) puts ruby_pattern.match("I love Ruby!") => Ruby puts ruby_pattern =~ "I love Ruby!“ => 7

Regexp Union Creates patterns that match any word in a list lang_pattern = Regexp.union("Ruby", "Perl", /Java(Script)?/) puts lang_pattern.match("I know JavaScript") => JavaScript Automatically escapes as needed pattern = Regexp.union("()","[]","{}")

MatchData After a successful match, a MatchData object is created. Accessed as $~. Example:  "I love petting cats and dogs" =~ /cats/  puts "full string: #{$~.string}"  puts "match: #{$~.to_s}"  puts "pre: #{$~.pre_match}"  puts "post: #{$~.post_match}"

Named Captures str = "Ruby 1.9" if /(? \w+) (? \d+\.(\d+)+)/ =~ str puts lang puts ver end Read more: expressions-with-ruby-1-9-named-captures/ expressions-with-ruby-1-9-named-captures/ (look for Capturing)

Creating a Regular Expression Complex regular expressions can be difficult Finite automata are equivalent to regular expressions (language theory)

Quick Exercise Create regex for the following. Use rubular.com to check it out. Phone numbers  (303)   Date  nn-nn-nn  Try some other options

Some Resources regular-expressions-in-ruby-part-1-of-3/ regular-expressions-in-ruby-part-1-of-3/ ial-guide-to-regular-expressions-tools-tutorials-and- resources/ ial-guide-to-regular-expressions-tools-tutorials-and- resources/ between-a-z-and-in-ruby-regular-expressions (thanks, Austin and Santi) between-a-z-and-in-ruby-regular-expressions

Topic Exploration not-use-regular-expressions not-use-regular-expressions advanced-regular-expressions/ advanced-regular-expressions/ from-strings from-strings A little more motivation to use… expressions expressions Submit on BB (3 points) and report back: 3-5 things you want to remember about regex. Include the URL. Feel free to read others not in the list. This is an individual exercise.