1 DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida Information Management.

Slides:



Advertisements
Similar presentations
LIS651 lecture 4 regular expressions Thomas Krichel
Advertisements

Session 3BBK P1 ModuleApril 2010 : [#] Regular Expressions.
BBK P1 Module2010/11 : [‹#›] Regular Expressions.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
Strings Testing for equality with strings.
ECS 15 if and random. Topic  Testing user input using if statements  Truth and falsehood in Python  Getting random numbers.
Chapter 14 Perl-Compatible Regular Expressions Part 1.
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
Asp.NET Core Vaidation Controls. Slide 2 ASP.NET Validation Controls (Introduction) The ASP.NET validation controls can be used to validate data on the.
Regular Expression (1) Learning Objectives: 1. To understand the concept of regular expression 2. To learn commonly used operations involving regular expression.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
W3101: Programming Languages (Perl) 1 Perl Regular Expressions Syntax for purpose of slides –Regular expression = /pattern/ –Broader syntax: if (/pattern/)
LING 388: Language and Computers Sandiway Fong Lecture 3: 8/28.
Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching.
Regular Expressions Comp 2400: Fall 2008 Prof. Chris GauthierDickey.
Scripting Languages Chapter 8 More About Regular Expressions.
UNIX Filters.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
REGULAR EXPRESSIONS CHAPTER 14. REGULAR EXPRESSIONS A coded pattern used to search for matching patterns in text strings Commonly used for data validation.
Regular Language & Expressions. Regular Language A regular language is one that a finite state machine (fsm) will accept. ‘Alphabet’: {a, b} ‘Rules’:
Lesson 3 – Regular Expressions Sandeepa Harshanganie Kannangara MBCS | B.Sc. (special) in MIT.
1 DIG 3134 – Lecture 3 Forms Michael Moshell University of Central Florida Media Software Design.
Last Updated March 2006 Slide 1 Regular Expressions.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Faculty of Sciences and Social Sciences HOPE JavaScript Validation Regular Expression Stewart Blakeway FML
PHP Workshop ‹#› Data Manipulation & Regex. PHP Workshop ‹#› What..? Often in PHP we have to get data from files, or maybe through forms from a user.
ECS 10 10/8. Outline Announcements Homework 2 questions Boolean expressions If/else statements State variables and avoiding sys.exit(…) Example: Coin.
PHP Using Strings 1. Replacing substrings (replace certain parts of a document template; ex with client’s name etc) mixed str_replace (mixed $needle,
Data types, Literals (constants) and Variables Data types specify what kind of data, such as numbers and characters, can be stored and manipulated within.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Finding the needle(s) in the textual haystack
RegExp. Regular Expression A regular expression is a certain way to describe a pattern of characters. Pattern-matching or keyword search. Regular expressions.
Programming Languages Meeting 13 December 2/3, 2014.
1 DIG 3134 – Lecture 10: Regular Expressions and preg_match in PHP and Validating Inputs Michael Moshell University of Central Florida Internet Software.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Regular Expression (continue) and Cookies. Quick Review What letter values would be included for the following variable, which will be used for validation.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
CSE1222: Lecture 2The Ohio State University1. mathExample2.cpp // math example #include using namespace std; int main() { cout
Regular Expressions.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2015, Fred McClurg, All Rights.
Xmania!.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2010 All Rights Reserved. 1.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
Appendix A: Regular Expressions It’s All Greek to Me.
12. Regular Expressions. 2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned,
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
1 DIG 3134 – Lecture 4: Arrays and Strings Michael Moshell University of Central Florida Media Software Design.
Powerpoint Templates Page 1 Powerpoint Templates GROUP 8:REGULAR EXPRESSION GURU BESAR: PN. SARINA SULAIMAN CIKGU-CIKGU: 1.CIKGU NENI 2.CIKGU
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
CSC 2720 Building Web Applications PHP PERL-Compatible Regular Expressions.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
Validation using Regular Expressions. Regular Expression Instead of asking if user input has some particular value, sometimes you want to know if it follows.
Operators Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
Basic Scripting & Variables Yasar Hussain Malik - NISTE.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
OVERVIEW OF CLIENT-SIDE SCRIPTING
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
Lecture 19 Strings and Regular Expressions
Data Manipulation & Regex
String Processing 1 MIS 3406 Department of MIS Fox School of Business
1.5 Regular Expressions (REs)
Validation using Regular Expressions
Boolean in C++ CSCE 121.
PYTHON - VARIABLES AND OPERATORS
Presentation transcript:

1 DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida Information Management

2 If you don’t know how to Do something, Don’t hide under a bush. Tell me Or Come see me. DO NOT BE A RABBIT! Naturphoto.cz

3 Regular Expressions A "grammar" for validating input useful for many kinds of pattern recognition The basic built-in Boolean function in PHP is called 'preg_match'. It takes two or three arguments: the pattern, like "cat" the test string, like "catastrophe" and an (optional) array variable, which we can ignore for now It returns TRUE if the pattern matches the test string.

4 POSIX Regular Expressions Always begin with "/ and end with /" (for today's lesson) $instring = "catastrophe"; if (preg_match("/cat/",$instring)) { print "I found a cat!"; } else { print "No cat here."; }

5 Regular Expressions $instring = "catastrophe"; if (preg_match("/cat/",$instring)) { print "I found a cat!"; } else { print "No cat here."; } I found a cat!

6 PRACTICE 1: "/cat/"  that is the regular expression Make up a Regular Expression to recognize Not the word cat, but rather the word dog. Write it on your paper, now.

7 PRACTICE 1: "/cat/"  that is the regular expression Make up a Regular Expression to recognize Not the word cat, but rather the word dog. Write it on your paper, now. Yes, I mean YOU. Where is your paper and pencil? (You can use your laptop if that’s what you have…)

8 PRACTICE 1: "/cat/"  that is the regular expression Make up a Regular Expression to recognize Not the word cat, but rather the word dog. Write it on your paper, now. Answer: "/dog/" Yep, it’s that simple. But I gotta get you STARTED.

9 Regular Expressions Wild cards:period. matches any single character $instring = "cotastrophe"; if (preg_match("/c.t/",$instring)) { print "I found a c.t!"; } else { print "No c.t here."; }

10 Regular Expressions Wild cards:period. matches any single character $instring = "cotastrophe"; if (preg_match("/c.t/",$instring)) { print "I found matching string!"; } else { print "No c.t here."; } I found a matching string!

11 Regular Expressions Wild cards:a* matches any number of a characters (or the "null character"!) $instring = "caaaatastrophe"; if (preg_match("/ca*t/",$instring)) { print "I found a match!"; } else { print "No ca*t here."; } I found a match!

12 Regular Expressions Wild cards:.* matches any string of characters (or the "null character"!) $instring = "cotastrophe"; if (preg_match("/c.*t/",$instring)) { print "I found a c.*t!"; } else { print "No c.*t here."; } I found a c.*t!

13 Regular Expressions Wild cards:.* matches any string of characters (or the "null character"!) $instring = "cflippingmonstroustastrophe"; if (preg_match("/c.*t/",$instring)) { print "I found a c.*t!"; } else { print "No c.*t here."; }

14 Regular Expressions Wild cards:.* matches any string of characters (or the "null character"!) $instring = "cflippingmonstroustastrophe"; if (preg_match("/c.*t/",$instring)) { print "I found a c.*t!"; } else { print "No c.*t here."; } I found a c.*t!

15 PRACTICE 2: "/c.t/"  that is a model RE for you "/c.*t/"  that is a model RE for you "/ca*t/"  that is a model RE for you Make up a Regular Expression to recognize Rob or Rb or Roob or Rooob, etc. But to REJECT Reb and Rab and Rats and Mike ….

16 PRACTICE 2: "/c.t/"  that is a model RE for you "/c.*t/"  that is a model RE for you "/ca*t/"  that is a model RE for you Answer: ”/Ro*b/”

17 Quantification Multiple copies of something: a+ means ONE OR MORE a’s Example: "/fa+ther/" matches father, faather, faaather, etc. a* means ZERO OR MORE a’s Example: "/fa*ther/" matches fther, father, faather, etc. a? means ZERO OR ONE a Example: "/flavou?r/" will match flavor AND flavour. a{33} means 33 instances of a

18 Quantification Example a+ means ONE OR MORE a’s Example: "/fa+ther/" matches father, faather, faaather, etc. a* means ZERO OR MORE a’s Example: "/fa*ther/" matches fther, father, faather, etc. a? means ZERO OR ONE a Example: "/flavou?r/" will match flavor AND flavour. a{33} means 33 instances of a How to recognize “Rob” or “Robb”?

19 Quantification Example a+ means ONE OR MORE a’s Example: "/fa+ther/" matches father, faather, faaather, etc. a* means ZERO OR MORE a’s Example: "/fa*ther/" matches fther, father, faather, etc. a? means ZERO OR ONE a Example: "/flavou?r/" will match flavor AND flavour. a{33} means 33 instances of a How to recognize “Rob” or “Robb”?”/Robb?/"

20 Quantification Example a+ means ONE OR MORE a’s Example: "/fa+ther/" matches father, faather, faaather, etc. a* means ZERO OR MORE a’s Example: "/fa*ther/" matches fther, father, faather, etc. a? means ZERO OR ONE a Example: "/flavou?r/" will match flavor AND flavour. a{33} means 33 instances of a How to recognize “Rob” or “Robb”?Another way: ”/Rob{1,2}/"

21 Escaping Backslash means "don't interpret this:" \. is just a dot \* is just an asterisk.

22 The concept: Would $t="/a{3}\.b{1,4}/"; $s= "aaa.bbb"; this would or would not be accepted? preg_match($t,$s) – true or false?

23 The concept: Would $t="/a{3}\.b{1,4}/"; $s= "aaa.bbb"; this would or would not be accepted? preg_match($t,$s) – true or false? TRUE, because $s matches the pattern string $t. three a, one dot, and between one and four b characters.

24 The concept: Would $t="/a{3}\.b{1,4}/"; $s= "aaa.bbbbb"; this would or would not be accepted? preg_match($t,$s) – true or false?

25 The concept: Would $t="/a{3}\.b{1,4}/"; $s= "aaa.bbbbb"; this would or would not be accepted? preg_match($t,$s) – true or false? Perhaps surprisingly, TRUE: because $s contains three a and 4 b.

26 The concept: Would $t="/a{3}\.b{1,4}/"; $s= "aaa.bbbbb"; this would or would not be accepted? preg_match($t,$s) – true or false? Perhaps surprisingly, TRUE: because $s contains three a and 4 b. If you have $1.00 and I asked you “do you have 75 cents?” the answer would be YES.

27 The concept: Would $t="/a{3}\.b{1,4}/"; $s= "aaa.bbbbb"; this would or would not be accepted? preg_match($t,$s) – true or false? Perhaps surprisingly, TRUE: because $s contains three a and 4 b. If you wanted an EXACT match, I'll show you how In a bit.

28 Grouping Multiple copies of something: (abc)+ means ONE OR MORE string abc’s (abc)* means ZERO OR MORE string abc’s like abcabcabc SETS: [0-9] matches any single integer character [A-Z] matches any uppercase letter [AZ] matches A or Z [AZ]? (i.e. 0 or 1 of the previous) matches null, A or Z

29 Starting and Ending preg_match("/cat/","abunchofcats") is TRUE but preg_match("/^cat/","abunchofcats") is FALSE because ^ means the RE must match the first letter. preg_match("/cats$/","abunchofcats") is TRUE but preg_match("/cats$/","mycatsarelazy") is FALSE So, ^ marks the head and $ marks the tail.

30 Exact Matching with ^ and $ $t="/^a{3}\.b{1,4}$/"; $s= "aaa.bbbbb"; this would or would not be accepted? preg_match($t,$s) – true or false? FALSE, because the ending $ in the pattern says "no more input is acceptable" but more stuff comes. This would also reject $s="aaa.bbbbAndMoreText";

31 Alternatives - the 'or' mark | $t="/flav(o|ou)r/"; This will match 'flavor' and 'flavour'. And (yes!) there are often more than one way to do things; for instance our good old ? Mark. "/flavou?r/"

32 Sets - Examples [A-E]{3} matches AAA, ABA, ADD,... EEE [PQX]{2,4} matches PP, PQ, PX... up to XXXX [A-Za-z]+ matches any alphabetic string with 1 or more characters [A-Z][a-z]* matches any alpha string with first letter capitalized. [a-z0-9]+ matches any string of lowercase letters and numerals

33 Practice in class Write a RE that recognizes any string that begins with "sale". Here's an example for you to look at, help you remember ^cat From now on, the RE is just ^cat. You don't need to write the other stuff (preg_match, "/, etc.)

34 Practice 1) Write a RE that recognizes any string that begins with "sale". Answer: ^sale

35 Practice 1) Write a RE that recognizes any string that begins with "sale". Answer: ^sale 2) Write a RE that recognizes a string that begins with "smith" and a two digit integer, like smith23 or smith99. Here's an example from your recent past: a{3}\.b{1,4}

36 Practice 1) Write a RE that recognizes any string that begins with "sale". Answer: ^sale 2) Write a RE that recognizes a string that begins with "smith" and a two digit integer, like smith23 or smith99. Answer: ^smith[0-9]{2}

37 3) Write a RE that recognizes Social Security numbers in the form like Helpers from the recent past:^smith[0-9]{2} a{3}\.b{1,4}

38 3) Write a RE that recognizes Social Security numbers in the form like Answer: [0-9]{3}\-[0-9]{2}\-[0-9]{4}

39 3) Write a RE that recognizes Social Security numbers in the form like Answer: [0-9]{3}\-[0-9]{2}\-[0-9]{4} NOTE: That's a conservative answer. It turns out that the dash character is not a special symbol outside sets, and so you could also write [0-9]{3}-[0-9]{2}-[0-9]{4} But I don't like to remember stuff, so I use \ a lot.

40 How to study this stuff? Practice making up RE for problems like these: The UCF NID French telephone numbers like ( ) Dollars and cents, like $ A field that may contain only lowercase strings with exactly ONE vowel. How do you know if they're good? If you know PHP You can test them. Otherwise, check out each others' work. (OR come see me in office hours!)(Or by appointment!)