Perl Day 4. Fuzzy Matches We know about eq and ne, but they only match things exactly We know about eq and ne, but they only match things exactly –Sometimes.

Slides:



Advertisements
Similar presentations
Session 3BBK P1 ModuleApril 2010 : [#] Regular Expressions.
Advertisements

Specifying Languages Our aim is to be able to specify languages for use in the computer. The sketch of the FSA is easy for us to understand, but difficult.
BBK P1 Module2010/11 : [‹#›] Regular Expressions.
Python: Regular Expressions
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Scripting Languages Chapter 8 More About Regular Expressions.
Regex Wildcards on steroids. Regular Expressions You’ve likely used the wildcard in windows search or coding (*), regular expressions take this to the.
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
More on Regular Expressions Regular Expressions More character classes \s matches any whitespace character (space, tab, newline etc) \w matches.
Binary Search Trees continued Trees Draw the BST Insert the elements in this order 50, 70, 30, 37, 43, 81, 12, 72, 99 2.
Slide 6a-1 CHAPTER 6 Matching Patterns: Using Regular expressions to match patterns.
Regular Expressions A regular expression defines a pattern of characters to be found in a string Regular expressions are made up of – Literal characters.
Last Updated March 2006 Slide 1 Regular Expressions.
1 Day 3 Directories Files Moving & Copying. 2 Case Sensitive First thing to learn about UNIX is that everything is case sensitive. Thus the files: –enda.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
Regular Expressions Dr. Ralph D. Westfall May, 2011.
Overview of the grep Command Alex Dukhovny CS 265 Spring 2011.
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
Faculty of Sciences and Social Sciences HOPE JavaScript Validation Regular Expression Stewart Blakeway FML
PHP Workshop ‹#› Data Manipulation & Regex. PHP Workshop ‹#› What..? Often in PHP we have to get data from files, or maybe through forms from a user.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
Introduction to Unix – CS 21 Lecture 6. Lecture Overview Homework questions More on wildcards Regular expressions Using grep Quiz #1.
Collecting Things Together - Lists 1. We’ve seen that Python can store things in memory and retrieve, using names. Sometime we want to store a bunch of.
Python Regular Expressions Easy text processing. Regular Expression  A way of identifying certain String patterns  Formally, a RE is:  a letter or.
Regular Expressions.
PHP with Regular Expressions Web Technologies Computing Science Thompson Rivers University.
WHAT IS A DATABASE? A DATABASE IS A COLLECTION OF DATA RELATED TO A PARTICULAR TOPIC OR PURPOSE OR TO PUT IT SIMPLY A GENERAL PURPOSE CONTAINER FOR STORING.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
Satisfy Your Technical Curiosity Regular Expressions Roy Osherove Methodology & Team System Expert Sela Group The.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
Copyright © 2003 Pearson Education, Inc. Slide 6a-1 The Web Wizard’s Guide to PHP by David Lash.
Test Automation For Web-Based Applications Portnov Computer School Presenter: Ellie Skobel.
CS346 Regular Expressions1 Pattern Matching Regular Expression.
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
Sys Prog & Scrip - Heriot Watt Univ 1 Systems Programming & Scripting Lecture 12: Introduction to Scripting & Regular Expressions.
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
1 Validating user input is the bane of every software developer’s existence. When you are developing cross-browser web applications (IE4+ and NS4+) this.
CompSci 101 Introduction to Computer Science November 18, 2014 Prof. Rodger.
Decision Structures, String Comparison, Nested Structures
Validation final steps Stopping gaps being entered in an input.
Validation using Regular Expressions. Regular Expression Instead of asking if user input has some particular value, sometimes you want to know if it follows.
Unit 11 –Reglar Expressions Instructor: Brent Presley.
Copyright © 2003 Pearson Education, Inc. Slide 6a-1 The Web Wizard’s Guide to PHP by David Lash.
What are Regular Expressions?What are Regular Expressions?  Pattern to match text  Consists of two parts, atoms and operators  Atoms specifies what.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Unix RE’s Text Processing Lexical Analysis.   RE’s appear in many systems, often private software that needs a simple language to describe sequences.
Filters and Utilities. Notes: This is a simple overview of the filtering capability Some of these commands are very powerful ▫Only showing some of the.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
Perl Regular Expression in SAS
Strings and Serialization
Looking for Patterns - Finding them with Regular Expressions
Lexical Analysis.
The ‘grep’ Command Colin Masterson.
Data Manipulation & Regex
Validation using Regular Expressions
REGEX.
Presentation transcript:

Perl Day 4

Fuzzy Matches We know about eq and ne, but they only match things exactly We know about eq and ne, but they only match things exactly –Sometimes you want to match things more vaguely, like  I don’t care what number it is, but I care it’s a number  Or I need to make sure the phone number entered has 10 digits. Regular Expressions make it possible Regular Expressions make it possible –They are arguably the most powerful part of perl.

Match Let’s first deal with matching things Let’s first deal with matching things –Imagine you ask the user to type in a phone number. You should next check you got a valid phone number (10 digits). print(“Enter a phone number\n”); $PhoneNum=<STDIN>;chomp($PhoneNum);if($PhoneNum=~/\d+/) { print(“good job\n”); } else { print(“That wasn’t a phone number\n”); }

=~ Up until now we’ve dealt with ==, !=,, >=,, >=, <=, eq and ne in tests. =~ means you are doing a regular expression match. Note you are not looking for the string ‘/\d+/’, you are looking for what that means. =~ means you are doing a regular expression match. Note you are not looking for the string ‘/\d+/’, you are looking for what that means.

\d, \w, \s In a regular expression, you’ll often see \[something]. They each have different meanings: In a regular expression, you’ll often see \[something]. They each have different meanings: –\d – A digit (0-9) –\w – A word character (a-Z, _) –\s – A space, or tab –. Matches absolutely anything –\. Matches only a dot. –Any words will match exactly (e.g. /enda/ would match only enda).

+ * {} Any of the previous tokens can be followed by Any of the previous tokens can be followed by –+ Means there must be 1 or more –* Means there can be any number (including 0) –{7} Means there must be exactly 7 –{1,4} Means there must be between 1 and 4 –{,10} Means there must be less than 10 e.g: e.g: =~/\d{7}/ means there must be 7 digits =~/\d{7}/ means there must be 7 digits

Endings After the last /, you can put additional things: After the last /, you can put additional things: –i This makes the match case insensitive –g Allow it to match more than once (globally) –m Allow it to match on multiple lines

Search and Replace Uses the same language as matching Uses the same language as matching –However after the =~ you put an s –When you were doing matching there was secretly a m there, it’s optional $Text=‘abc123 def456’; $Text=~s/\d/x/g; –This will search for a digit and replace it with x. The g indicates it’ll do it everywhere it finds a digit  The result: $Text=‘abcxxx defxxx’;

More Examples $Text=‘abc123 def456’; $Text=~s/\d+//g;$Text=~s/c/b/g;$Text=~s/\d{2}/a/;$Text=~s/\s*//g;

Matching Specific Places 2 additional special characters: 2 additional special characters: –^ Means match at start of string only –$ Means match at end of string only Sometimes you only want the first 3 digits: Sometimes you only want the first 3 digits:$Phone= ; –This would remove the area code: $Phone=~s/^\d{3}//g; Sometimes you only want the last 4: Sometimes you only want the last 4:$Phone=~s/\d{4}$//g;

Capturing Anything you wrap in ()’s will be captured: Anything you wrap in ()’s will be captured: –The first ()’s are $1, the second are $2 etc. $Phone= ;$Phone=~/(\d{3})(\d{3})(\d{4})/;$AreaCode=$1;$Exchange=$2;$Extension=$3;

Translation Changing strings to upper case is easy Changing strings to upper case is easy –The Command is Translate (tr), works like match, search and replace. $Text=“this is lower case”; $Text=~tr/[a-z]/[A-Z]/; The square brackets create a “character class” The square brackets create a “character class” –a-z means all letters between a and z. –[c-k] would be all letters from c to k –[asdf] would be a, s, d and f –[ab]+ would be any combinations of a and b, like:  A  Ab  Aaaaa  bbbbb

Is Tomato a Fruit or Veg? grep can help. It looks in an array to tell you if a pattern is in the array. grep can help. It looks in an array to tell you if a pattern is in the array. –The pattern can be any regular expression like what you just learned. print(“It’s a fruit\n”); print(“It’s a print(“It’s a veg\n”); print(“It’s a veg\n”);}

Split If you have a string, and you want to make it into an array, split can help. If you have a string, and you want to make it into an array, split can help. $Text=“This is a