Download presentation
Presentation is loading. Please wait.
Published byLindsay Riley Modified over 9 years ago
1
12. Regular Expressions
2
2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned, sentiment is my forte. I keep science for life. - Oscar Wilde
3
3 Concepts Regular Expressions –allows to search for a pattern within a text string –the patterns can be rather complex same idea as "wildcard" characters – compare SQL – but much more expressive –often abbreviated, e.g. as RegExp –RegExps match as much as possible they are greedy Theoretical underpinnings –nondeterministic final automata (NFA) –regular grammars –but some constructs extend the functionality further even beyond CFG (context-free grammars)
4
4 Support Popular, widely supported Directly in scripting languages –JavaScript special syntax –PHP functions –Ruby –Perl as libraries –Java's java.lang.regex package
5
5 JavaScript RegExp Directly as argument of methods of String object – string.match(regexp) returns an array of substrings that matched regexp pattern – string.replace(regexp,by) returns a new string where the first (or all) matched patterns were replaced with by string – string.search(regexp) returns the index of first substring that matched regexp pattern, -1 if there is no match – string.split(regexp) returns an array of the substrings of string separated by regexp regexp argument –enclosed in / e.g., /ex/ matches first occurrence of "ex" –optional modifiers placed as suffix g (global); used in replace() –e.g., /ex/g matches all occurrences of "ex" i (ignore case) –e.g., /ex/i matches all occurrences of "ex", "EX", "Ex" and "eX" m (multiline)
6
6 PHP RegExp functions with $regexp and $string arguments – ereg($regexp,$string [,&$matches]) returns length of matched string, false if there is no match array reference &$matches if given, will be filled with the string in $matches[0] and the matched substrings in subsequent elements – ereg_replace($regexp,$by,$string) returns a string where the first (or all) matched patterns were replaced with $by string – split($regexp,$string [,$limit]) returns an array of substrings of $string that were separated by patterns matching $regexp optional $limit determines how many substrings to return (the last one contains the remainder) – eregi(), eregi_replace(), spliti() same as ereg() and ereg_replace(), but ignores case – preg_match($regexp,$string ) similar to ereg(), see PHP documentation if global search for all matches is to be performed, ereg() or ereg_replace() must be called in a loop
7
7 Syntax in JavaScript by "element" we mean a character or a group. any character ? one occurrences of preceding element or nothing * any number of occurrences of preceding element, incl. none e.g., a.*z matches the largest substring that starts with a and ends with z, incl. "az" + any number of occurrences of preceding element, but at least one e.g., a.+z matches the largest substring that starts with a and ends with z, not including "az" –note that "azz" and "aaz" are matched {n} exactly n occurrences of preceding element {m,n} between n and m occurrences of preceding element ^ beginning of the string $ end of the string sequence of elements means that such sequence must be matched e.g., a.z matches "axz", "a5z", "aQz", etc. [] alternative elements –e.g., [ab] means a or b [^ ] none of the alternative elements –e.g., [^ab] means not a and not b - range –e.g., [a-zA-Z] means a through z or A through Z, i.e. all lower-case and upper-case letters | or –e.g., ab|yz matches "ab" and "yz"
8
8 Special Characters Denoted by \ – \/ : / – \b : space/blank – \t : tab character – \n : line feed – \r : carriage return – \f : form feed – \s : whitespace character, i.e. [ \t\r\n] – \d : digit, i.e. [0-9] – \w : word character, i.e. [a-zA-Z0-9_] – \S : not a whitespace character, i.e. [^\s] – \D : not a digit, i.e. [^\d] – \W : not a word character, i.e. [^\] –any other character preceded by \ means the character itself –the "meta-characters" need to be escaped: \\, \/, \[, \], \., \?, \[, \], \|, \+, \*, \(, \), \^, \$, \-, \{, \}
9
9 RegExp Capturing If you enclose subpattern(s) ( and ) within a RegExp it the pattern(s) that will be captured, i.e. returned or used –e.g., \b(.*)@ will capture the first part of an email
10
10 Sample RegExp hex digit: – [0-9a-fA-F] identifier: – [a-zA-Z_][a-zA-Z_0-9]* email address: – \b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.