Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Slides:



Advertisements
Similar presentations
Introduction Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Advertisements

Control Flow Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Chapter 14 Perl-Compatible Regular Expressions Part 1.
Regular expressions and the Corpus Query Language
1 Regular Expressions: grep LING 5200 Computational Corpus Linguistics Martha Palmer.
Automata & Formal Languages, Feodor F. Dragan, Kent State University 1 CHAPTER 1 Regular Languages Contents Finite Automata (FA or DFA) definitions, examples,
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Scripting Languages Chapter 8 More About Regular Expressions.
Introduction Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Sequential circuit design
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
Rabie A. Ramadan Lecture 2
PHP Using Strings 1. Replacing substrings (replace certain parts of a document template; ex with client’s name etc) mixed str_replace (mixed $needle,
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
Regular Expression Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Pipes and Filters Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Instructor: Craig Duckett Lecture 08: Thursday, October 22 nd, 2015 Patterns, Order of Evaluation, Concatenation, Substrings, Trim, Position 1 BIT275:
By: Andrew Cory. Grouping Things & Hierarchical Matching Grouping characters – ( and ) Allows parts of a regular expression to be treated as a single.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Corpus Linguistics- Practical utilities (Lecture 7) Albert Gatt.
Introduction Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
Regular Expressions This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this.
More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
Introduction Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
CS 153: Concepts of Compiler Design October 10 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
Lexical Analysis S. M. Farhad. Input Buffering Speedup the reading the source program Look one or more characters beyond the next lexeme There are many.
Working With Objects Tonga Institute of Higher Education.
CSC 2720 Building Web Applications PHP PERL-Compatible Regular Expressions.
More Patterns Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
1 DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida Information Management.
1 Validating user input is the bane of every software developer’s existence. When you are developing cross-browser web applications (IE4+ and NS4+) this.
Finding Things Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
Operators Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
What is grep ?  % man grep  DESCRIPTION  The grep utility searches text files for a pattern and prints all lines that contain that pattern. It uses.
Invasion Percolation: Randomness Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Patterns Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Computer Science 516 A Little On Finite State Machines.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
Department of Software & Media Technology
Python Aliasing Copyright © Software Carpentry 2010
UNIT V STATES, STATE GRAPHS, AND TRANSITION TESTING STATE GRAPHS
Perl-Compatible Regular Expressions Part 1
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Sets and Dictionaries Examples Copyright © Software Carpentry 2010
What Are They? Who Needs ‘em? An Example: Scoring in Tennis
Instructor: Alexander Stoytchev
Sequential circuit design
Instructor: Alexander Stoytchev
Sequential circuit design
Sequential circuit design
Theory of Computation Languages.
What Are They? Who Needs ‘em? An Example: Scoring in Tennis
Version Control Basic Operation Copyright © Software Carpentry 2010
Version Control Introduction Copyright © Software Carpentry 2010
EGR 2131 Unit 12 Synchronous Sequential Circuits
Instructor: Alexander Stoytchev
Multiplying Decimals and Whole Numbers
Regular Expressions.
Presentation transcript:

Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See for more information. Regular Expressions

Mechanics Notebook #1 Site Date Evil (millivaders) Baker Baker Baker Baker Baker Baker ⋮ ⋮ ⋮

Regular ExpressionsMechanics Notebook #2 Site/Date/Evil Davison/May 22, 2010/ Davison/May 23, 2010/ Pertwee/May 24, 2010/ Davison/June 19, 2010/ Davison/July 6, 2010/ Pertwee/Aug 4, 2010/ Pertwee/Sept 3, 2010/ ⋮ ⋮ ⋮

Regular ExpressionsMechanics '(.+)/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/(.+)' This pattern matches: - one or more characters - a slash - a single upper-case letter - one or more lower-case letters - a space - one or two digits - a comma if one is there - a space - exactly four digits - a slash - one or more characters

Regular ExpressionsMechanics How?

Regular ExpressionsMechanics How? Using finite state machines

Regular ExpressionsMechanics a Match a single 'a'

Regular ExpressionsMechanics Match a single 'a' start here a

Regular ExpressionsMechanics Match a single 'a' match this character a

Regular ExpressionsMechanics Match a single 'a' must be here at the end a

Regular ExpressionsMechanics Match a single 'a' must be here at the end a a

Regular ExpressionsMechanics Match one or more 'a' a a

Regular ExpressionsMechanics Match one or more 'a' a a match this as before

Regular ExpressionsMechanics Match one or more 'a' a a match this as before match again and again

Regular ExpressionsMechanics Match one or more 'a' a a don't have to stop here the first time, just have to be here at the end match this as before match again and again

Regular ExpressionsMechanics Match one or more 'a' a a don't have to stop here the first time, just have to be here at the end a a+ match this as before match again and again

Regular ExpressionsMechanics Match 'a' or nothing a

Regular ExpressionsMechanics Match 'a' or nothing a transition is "free"

Regular ExpressionsMechanics Match 'a' or nothing a transition is "free" So this is '(a|)'

Regular ExpressionsMechanics Match 'a' or nothing a transition is "free" So this is '(a|)' Which is 'a?'

Regular ExpressionsMechanics Match 'a' or nothing a transition is "free" So this is '(a|)' Which is 'a?' a a+ a?

Regular ExpressionsMechanics Match zero or more 'a' a a

Regular ExpressionsMechanics Match zero or more 'a' a a Combine ideas

Regular ExpressionsMechanics Match zero or more 'a' a a Combine ideas This is 'a*'

Regular ExpressionsMechanics Match zero or more 'a' a a Combine ideas This is 'a*' a a+ a? a*

Regular ExpressionsMechanics a a d c What regular expression is this? b

Regular ExpressionsMechanics a a d c What regular expression is this? a+|(b(c|d)) b

Regular ExpressionsMechanics Action at a node depends only on:

Regular ExpressionsMechanics Action at a node depends only on: - arcs out of that node

Regular ExpressionsMechanics Action at a node depends only on: - arcs out of that node - characters in target data

Regular ExpressionsMechanics Action at a node depends only on: - arcs out of that node - characters in target data Finite state machines have no memory

Regular ExpressionsMechanics Action at a node depends only on: - arcs out of that node - characters in target data Finite state machines have no memory Means it is impossible to write a regular expression to check if arbitrarily nested parentheses match

Regular ExpressionsMechanics Action at a node depends only on: - arcs out of that node - characters in target data Finite state machines have no memory Means it is impossible to write a regular expression to check if arbitrarily nested parentheses match "(((....)))" requires memory

Regular ExpressionsMechanics Action at a node depends only on: - arcs out of that node - characters in target data Finite state machines have no memory Means it is impossible to write a regular expression to check if arbitrarily nested parentheses match "(((....)))" requires memory (or at least a counter)

Regular ExpressionsMechanics Action at a node depends only on: - arcs out of that node - characters in target data Finite state machines have no memory Means it is impossible to write a regular expression to check if arbitrarily nested parentheses match "(((....)))" requires memory (or at least a counter) Similarly, only way to check if a word contains each vowel once is to write 5! = 120 clauses

Regular ExpressionsMechanics Why use a tool with limits?

Regular ExpressionsMechanics Why use a tool with limits? They're fast

Regular ExpressionsMechanics Why use a tool with limits? They're fast - After some pre-calculation, a regular expression only has to look at each character in the input data once

Regular ExpressionsMechanics Why use a tool with limits? They're fast - After some pre-calculation, a regular expression only has to look at each character in the input data once It's readable

Regular ExpressionsMechanics Why use a tool with limits? They're fast - After some pre-calculation, a regular expression only has to look at each character in the input data once It's readable - More readable than procedural equivalent

Regular ExpressionsMechanics Why use a tool with limits? They're fast - After some pre-calculation, a regular expression only has to look at each character in the input data once It's readable - More readable than procedural equivalent And regular expressions can do a lot more than what we've seen so far

June 2010 created by Greg Wilson Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See for more information.