Regular Expressions 'RegEx'.

Slides:



Advertisements
Similar presentations
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 9 Strings.
Advertisements

Modifying existing content Adding/Removing content on a page using jQuery.
1 A Balanced Introduction to Computer Science, 2/E David Reed, Creighton University ©2008 Pearson Prentice Hall ISBN Chapter 17 JavaScript.
Regular Expression ASCII Converting. Regular Expression Regular Expression is a tool to check if a string matches some rules. It is a very complicated.
Scripting Languages Chapter 8 More About Regular Expressions.
Chapter 8: String Manipulation
Regular Expressions A regular expression defines a pattern of characters to be found in a string Regular expressions are made up of – Literal characters.
Last Updated March 2006 Slide 1 Regular Expressions.
PHP : Hypertext Preprocessor
Regular Expressions Week 07 TCNJ Web 2 Jean Chu. Regular Expressions Regular Expressions are a powerful way to validate and format text strings that may.
Regular Expressions Dr. Ralph D. Westfall May, 2011.
CS190/295 Programming in Python for Life Sciences: Lecture 3 Instructor: Xiaohui Xie University of California, Irvine.
CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.
RegExp. Regular Expression A regular expression is a certain way to describe a pattern of characters. Pattern-matching or keyword search. Regular expressions.
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
Regular Expression (continue) and Cookies. Quick Review What letter values would be included for the following variable, which will be used for validation.
5 BASIC CONCEPTS OF ANY PROGRAMMING LANGUAGE Let’s get started …
Working with the VB IDE. Running a Program u Clicking the”start” tool begins the program u The “break” tool pauses a program in mid-execution u The “end”
Regular Expressions. 2 3 Using Regular Expressions Regular expressions give you much more power to handle strings in a script. They allow you to form.
PHP with Regular Expressions Web Technologies Computing Science Thompson Rivers University.
WHAT IS A DATABASE? A DATABASE IS A COLLECTION OF DATA RELATED TO A PARTICULAR TOPIC OR PURPOSE OR TO PUT IT SIMPLY A GENERAL PURPOSE CONTAINER FOR STORING.
Clearly Visual Basic: Programming with Visual Basic 2008 Chapter 24 The String Section.
Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
1 Validating user input is the bane of every software developer’s existence. When you are developing cross-browser web applications (IE4+ and NS4+) this.
Working with Strings. Learning Objectives By the end of this lecture, you should be able to: – Appreciate the need to search for and extract information.
Unit 11 –Reglar Expressions Instructor: Brent Presley.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
Chapter 23 The String Section (String Manipulation) Clearly Visual Basic: Programming with Visual Basic nd Edition.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Introduction to Arrays. Learning Objectives By the end of this lecture, you should be able to: – Understand what an array is – Know how to create an array.
CSC-305 Design and Analysis of AlgorithmsBS(CS) -6 Fall-2014CSC-305 Design and Analysis of AlgorithmsBS(CS) -6 Fall-2014 Design and Analysis of Algorithms.
String and Lists Dr. José M. Reyes Álamo. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
Regular Expressions In Javascript cosc What Do They Do? Does pattern matching on text We use the term “string” to indicate the text that the regular.
JavaScript Part 1 Introduction to scripting The ‘alert’ function.
JavaScript: Conditionals contd.
JavaScript/ App Lab Programming:
Chapter 9: Value-Returning Functions
More about comments Review Single Line Comments The # sign is for comments. A comment is a line of text that Python won’t try to run as code. Its just.
CS 330 Class 7 Comments on Exam Programming plan for today:
Chapter 6 JavaScript: Introduction to Scripting
Lecture 19 Strings and Regular Expressions
Lecture 5 Transition Diagrams
Arrays: Checkboxes and Textareas
EGR 2261 Unit 4 Control Structures I: Selection
Chapter 19 PHP Part II Credits: Parts of the slides are based on slides created by textbook authors, P.J. Deitel and H. M. Deitel by Prentice Hall ©
Retrieving information from forms
Introduction to Objects
Pattern Matching in Strings
Intro to PHP & Variables
Input/Output Input/Output operations are performed using input/output functions Common input/output functions are provided as part of C’s standard input/output.
I/O in C Lecture 6 Winter Quarter Engineering H192 Winter 2005
Number and String Operations
Messages and Input boxes
Data Manipulation & Regex
CS190/295 Programming in Python for Life Sciences: Lecture 3
Homework Reading Programming Assignments Finish K&R Chapter 1
String Processing 1 MIS 3406 Department of MIS Fox School of Business
Chapter 17 JavaScript Arrays
EGR 2131 Unit 12 Synchronous Sequential Circuits
CIS 136 Building Mobile Apps
Working with Strings.
Modifying HTML attributes and CSS values
Using the Rule Normal Quantile Plots
Retrieving information from forms
Introduction to scripting
Using the Rule Normal Quantile Plots
Introduction to Objects
Presentation transcript:

Regular Expressions 'RegEx'

Learning Objectives By the end of this lecture, you should be able to: Define what is meant by a regular expressions Be able to create regular expression statements Compare regular expressions with strings and decide if they match

Finding patterns in strings We have already discussed how to find a specific substring inside a string. For example, you could search a string containing a list of names for the substring 'Lisa'. But suppose that instead of searching for a specific substring, we instead wanted to find a pattern of characters? For example, suppose we wanted to check a zip code to make sure that it contained 5 and only 5 characters, and to ensure that all of them were digits (i.e. no letters)? This is where an invaluable programming tool called 'regular expressions' can be applied. Other examples of things you can check for using regular expressions include: Checking a phone number to make sure it is in the form 3 numbers, a dash, three more numbers, another dash, and then 4 numbers Checking to make sure that an email address has at least 1 character followed by an '@' sign, followed by at least one dot (period) character Checking a date to make sure it is entered in the form two numbers, followed by a '/' followed by two more numbers, followed by another '/' followed by 4 numbers. In fact, there is much, much more power and flexibility in regular expressions than is described by the above examples.

Creating a regular expression Begin by creating a "regular expression literal". This is a pattern of regex characters placed inside forward slashes. The forward slashes indicate a regular expression, in the same way that quotation marks indicate a string. In JavaScript, we use a method called search() to attempt to match our "regex literal" with a string. We invoke the function using the string, and place our regex literal as the argument to the function search(). some_string.search(regex_literal); Example: "have a nice day".search(/ice/); The search()function will return the index of where the expression was found. If the expression is not found, then the search()function will return a -1. In the above example, the search()function will return 8 since 'ice' was found at index 8 in the string.

Pop-Quiz What will be stored in found_position in each of the following? var found_position; var quote = "To be or not to be, that is the question."; found_position = quote.search(/To be/); Answer: 0 found_position = quote.search(/to be/); Answer: 13 found_position = quote.search(/To Be/); Answer: -1 found_position = quote.search(/be/); Answer: 3 (the function returns the index of the first occurrence of a match)

search() versus indexOf() In the previous example, we used the search() function to match. We observed that search() returns an integer corresponding to the location of the match. If no match is found, then search() returns -1. Doesn't this sound exactly like indexOf() that we use with strings? In fact, they are indeed similar. However, whereas indexOf() only allows us to search for specific text , the search()function is far more powerful as it will also allow us to search for regular expression patterns.

Common Pattern-Matching Symbols In an earlier regular expression, our literal was: /to be/ However, we could have simply used indexOf() to do this and not gone to the trouble of using a regular expression. The true power of regex lies in our ability to match patterns. To do so, we need a sort of 'code' in the form of a combination of letters and symbols characters that allow us to define a pattern. Here is a table showing some of the most common pattern-matching characters:

Common Pattern-Matching Symbols Regular expressions will match one character at a time. Example: Suppose you wanted to make sure that the first character in a certain string was a 'Q'. To do so, you would need the put the character 'Q' in your regex literal. However, that wouldn't be enough. Since our goal is to make sure that the 'Q' is the first character in our string, we would also need the '$' character. So in this case, our literal would be: /^Q/ As always, the literal goes inside the forward slash characters The caret sign says that whatever comes next must be the first item in the string. Suppose we now wanted to make sure that the first character in our string was a digit (i.e. 0-9). in this case, our literal would be: /^\d/ The \d will match any character that is a digit.

What value would be stored in found_position in the following examples? var found_position; var url = "www.depaul.edu"; found_position = url.search(/^www/); Answer: 0 found_position = url.search(/www$/); Answer: -1  This would only match if 'www' was at the end of the string. found_position = url.search(/edu$/); Answer: 11 found_position = url.search(/$edu/); Answer: -1 (the $ must appear after the 'edu'). found_position = url.search(/\d/); Answer: -1 (no digit is present anywhere inside the string) found_position = url.search(/\W/); Answer: 3  \W matches anything that is NOT a letter, digit, or underscore.

Example: 5-digit zip code Now let's check a string to see if it matches a 5-digit U.S. zip code. We might begin by checking the string to see if it at least matches one digit (i.e. one number). From our chart we can see that \d matches any digit 0 through 9. Of course, we want to match five numbers. However, this is easily solved by simply repeating the \d five times: \d\d\d\d\d So: var zipCode = "60614"; var regExZIP = /\d\d\d\d\d/; var result = zipCode.search(regExZIP); //result will hold the value 0

Example: 5-digit zip code contd Still there is a problem – can you see it? While this expression would indeed match 60614, it would also match "12341234126061423452345" or "abcdefg60614hijklmn" and so on. What we need is a way to indicate that the 60614 should be the beginning and end of our string. In this case, we want to have 5 characters. This can be enforced by the presence of exactly 5 characters in our regex literal. We can specify any and all characters by a period. By placing 5 periods in our literal, we will only match strings that have a minimum 5 characters in them. So in this case: /...../ We then stipulate that it must be exactly 5 chracters. A clever way to do this might be by stipulating that the string must begin with 5 characters, and also end with 5 characters. all five of these characters mus be digits: /^.....$/ We now stipulate that all of those characters must be digits: /^\d\d\d\d\d$/

We've only scratched the surface Because working with strings is such a major part of today's computing and data-science world, regular expressions have evolved into a somewhat detailed topic. We have only scratched the surface. There are countless additional techniques, shortcuts, and levels of complexity that can still be explored and entire books have been written on the topic. For example, there are shortcuts you can use. In the zip code example, rather than writing out \d five times, you could say: \d{5} which accomplishes the same thing: /^\d{5}$/ For example, you might want to expand on the previous example with a regular expression that will accept either the typical 5-digit zip code, or a zip code in the format of #####-#### that is also widely used. Because they are so widely used, there are many pre-written regular expressions that you can easily find online and use or modify for your needs.

Example zip_code_checker.htm