Regular Expressions Upsorn Praphamontripong CS 1110

Slides:



Advertisements
Similar presentations
Regular Expressions BKF03 Brian Ciccolo. Agenda Definition Uses – within Aspen and beyond Matching Replacing.
Advertisements

Regular Expressions (in Python). Python or Egrep We will use Python. In some scripting languages you can call the command “grep” or “egrep” egrep pattern.
Python: Regular Expressions
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
JavaScript, Third Edition
Regular Expressions & Automata Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
1 Overview Regular expressions Notation Patterns Java support.
Regex Wildcards on steroids. Regular Expressions You’ve likely used the wildcard in windows search or coding (*), regular expressions take this to the.
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
More on Regular Expressions Regular Expressions More character classes \s matches any whitespace character (space, tab, newline etc) \w matches.
Binary Search Trees continued Trees Draw the BST Insert the elements in this order 50, 70, 30, 37, 43, 81, 12, 72, 99 2.
Regular Language & Expressions. Regular Language A regular language is one that a finite state machine (fsm) will accept. ‘Alphabet’: {a, b} ‘Rules’:
Lesson 3 – Regular Expressions Sandeepa Harshanganie Kannangara MBCS | B.Sc. (special) in MIT.
Last Updated March 2006 Slide 1 Regular Expressions.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
Methods in Computational Linguistics II with reference to Matt Huenerfauth’s Language Technology material Lecture 4: Matching Things. Regular Expressions.
Introduction to Computing Using Python Regular expressions Suppose we need to find all addresses in a web page How do we recognize addresses?
RegExp. Regular Expression A regular expression is a certain way to describe a pattern of characters. Pattern-matching or keyword search. Regular expressions.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Python Regular Expressions Easy text processing. Regular Expression  A way of identifying certain String patterns  Formally, a RE is:  a letter or.
Regular Expressions.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Module 6 – Generics Module 7 – Regular Expressions.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
CompSci 101 Introduction to Computer Science November 18, 2014 Prof. Rodger.
Unit 11 –Reglar Expressions Instructor: Brent Presley.
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
COMPE 111 Introduction to Computer Engineering Programming in Python Atılım University
1 CSE 2337 Chapter 7 Organizing Data. 2 Overview Import unstructured data Concatenation Parse Create Excel Lists.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Regular expressions Day 11 LING Computational Linguistics Harry Howard Tulane University.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
May 2006CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
CompSci 101 Introduction to Computer Science April 7, 2015 Prof. Rodger.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
Regular Expressions In Javascript cosc What Do They Do? Does pattern matching on text We use the term “string” to indicate the text that the regular.
Regular Expressions Copyright Doug Maxwell (
RE Tutorial.
Perl Regular Expression in SAS
Strings and Serialization
CSC1018F: Regular Expressions
Looking for Patterns - Finding them with Regular Expressions
Lecture 19 Strings and Regular Expressions
CSC 594 Topics in AI – Natural Language Processing
Perl-Compatible Regular Expressions Part 1
String Processing Upsorn Praphamontripong CS 1110
CSC 594 Topics in AI – Natural Language Processing
Advanced Find and Replace with Regular Expressions
i206: Lecture 19: Regular Expressions, cont.
Selenium WebDriver Web Test Tool Training
CS 1111 Introduction to Programming Fall 2018
Nate Brunelle Today: Regular Expressions
Regular Expressions and Grep
Lecture 25: Regular Expressions
Nate Brunelle Today: Regular Expressions
Regular Expression in Java 101
Nate Brunelle Today: Regular Expressions
Nate Brunelle Today: Regular Expressions
Nate Brunelle Today: Regular Expressions
REGEX.
ADVANCE FIND & REPLACE WITH REGULAR EXPRESSIONS
Lecture 23: Regular Expressions
Perl Regular Expressions – Part 1
LING 388: Computers and Language
Presentation transcript:

Regular Expressions Upsorn Praphamontripong CS 1110 Introduction to Programming Spring 2017

Overview: Regular Expressions What are regular expressions? Why and when do we use regular expressions? How do we define regular expressions? How are regular expressions used in Python? CS 1110

What is Regular Expression? Special string for describing a pattern of characters May be viewed as a form of pattern matching Regular expression Description [abc] One of those three characters [a-z] A lowercase [a-z0-9] A lowercase or a number . Any one character \. An actual period * 0 to many ? 0 or 1 + 1 to many CS 1110

Why and When ? Why ? When ? To find all of one particular kind of data To verify that some piece of text follows a very particular format When ? Used when data are unstructured or string operations are inadequate to process the data Example of unstructured data https://cs1110.cs.virginia.edu/s16/code/2012debate.txt Example of structured data where we know how each piece is separated https://www.wunderground.com/history/airport/KCHO/2015/10/15/DailyHistory.html?format=1 CS 1110

How to Define Regular Expressions Mark regular expressions as raw strings r" Use square brackets "[" and "]" for “any character” r"[bce]" matches either “b”, “c”, or “e” Use ranges or classes of characters r"[A-Z]" matches any uppercase letter r"[a-z]" matches any lowercase letter r"[0-9]" matches any number Note: use "-" right after [ or before ] for an actual "-" r"[-a-z]" matches "-" followed by any lowercase letter CS 1110

How to Define Regular Expressions (2) Combine sets of characters r"[bce]at" starts with either “b”, “c”, or “e”, followed by “at” This regex matches text with “bat”, “cat”, and “eat”. How about “concatenation”? Use "." for “any character” r".at" matches three letter words, ending in “at” Use "\." for an actual period r"at\." matches “at.” CS 1110

How to Define Regular Expressions (3) Use "*" for 0 to many r"[a-z]*" matches text with any number of lowercase letter Use "?" for 0 or 1 r"[a-z]?" matches text with 0 or 1 lowercase letter Use "+" for 1 to many r"[a-z]+" matches text with at least 1 lowercase letter CS 1110

How to Define Regular Expressions (4) Use "^" for negate r"[^a-z]" matches anything except lowercase letters r"[^0-9]" matches anything except decimal digits Use "^" for “start” of string r"^[a-zA-Z]" must start with a letter Use "$" for “end” of string r".*[a-zA-Z]$" must end with a letter Use "{" and "}" to specify the number of characters r"[a-zA-Z]{2,3}" must contain 2-3 long letters CS 1110

Predefined Character Classes \d matches any decimal digit -- i.e., [0-9] \D matches any non-digit character -- i.e., [^0-9] \s matches any whitespace character -- i.e., [\t\n] (tab, new line) \S matches any non-whitespace -- i.e., [^\t\n] \\ matches a literal backslash CS 1110

Exercise: Defining Regular Expressions Names Phone numbers UVA Computing ID r"[A-Z][a-z]+" r"[0-9][0-9][0-9]-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]" r"[a-z][a-z][a-z]?[0-0][a-z][a-z]?" CS 1110

How to Use Regular Expressions in Python Import re module Define a regular expression (use a tool, http://regexr.com/) Create a regular expression object that match the pattern Search / find the pattern in the given text or import re regex = re.compile(r"[A-Z][a-z]*") results = regex.search(text) results = regex.findall(text) CS 1110

re.compile(pattern) Compile a regular expression pattern into a regular expression object regex = re.compile(r"[A-Z][a-z]*") CS 1110

re.search(pattern, string) Scan through string looking for the first location where the pattern matches and return a match object. Otherwise, return None if a match is not found. A match object contains group()-return the match object, start()-return first index of the match, and end()-return last index of the match regex = re.compile(r"[A-Z][a-z]*") results = regex.search(text) = results = re.search(r"[A-Z][a-z]*"), text) CS 1110

re.findall(pattern, string) Return a list of strings of all non-overlapping matches of pattern in string The string is scanned left-to-right The matches are returned in the order found regex = re.compile(r"[A-Z][a-z]*") results = regex.findall(text) CS 1110