Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern.

Similar presentations


Presentation on theme: "Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern."— Presentation transcript:

1 Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern

2 What are they? A special text pattern for describing a search pattern This text pattern allows special sequences to have special meaning Any other characters may just appear in the searched string Copyright © 2008-2015 Curt Hill

3 Specials The special characters include –[ ]\^*$.?+(){} –The braces may be literal or special depending on their usage Any other character just matches itself Thus Hello as a pattern just matches the obvious string Since many of these characters are valuable in strings the escape is used to match them Copyright © 2008-2015 Curt Hill

4 Escape The backslash character is the escape Thus to look for an asterisk (a special) in a string it must be escaped: \* –This allows a search to find the asterisk The C family uses some of the same escape sequences: –\n newline or linefeed –\t tab –\r carriage return Copyright © 2008-2015 Curt Hill

5 Coded escapes An x and two hexadecimal digits may also follow the backslash Thus \x4E gives the ASCII character with hexadecimal value 4E (an N in ASCII) Copyright © 2008-2015 Curt Hill

6 Positioning There are two specials that force a position ^ matches the beginning of the line $ matches the end of the line Both of these match a position rather than a character Without these a pattern could match anywhere within a string Copyright © 2008-2015 Curt Hill

7 Positioning examples The pattern: ^Hi will match any line that starts with the two characters H and I The pattern:,$ will match any line that ends with a comma The pattern: ^Hello$ will match only a line that has Hello as its only content Copyright © 2008-2015 Curt Hill

8 Wildcards The dot will match any one character –Except end of line control characters Thus A.B could match ABB, ACB, A.B or any other three character sequence starting with A and ending with B Copyright © 2008-2015 Curt Hill

9 Repetition It is often desirable to repeat a pattern a fixed number of times This is done by following the pattern with a set of braces with an integer inside Thus abbbc is the same as ab{3}c Copyright © 2008-2015 Curt Hill

10 Repetition There are three repetition characters which are more general Closure is the * –It represents zero or more repetitions of the previous item The + represents one or more repetitions of the previous item The ? represent zero or one occurrences of the previous item Copyright © 2008-2015 Curt Hill

11 Examples ~* matches any number (including zero) of successive tildes \-* matches zero or more dashes.+ matches one or more of any character hats? matches either hat or hats Copyright © 2008-2015 Curt Hill

12 Grouping The repetitions could only be applied to a single character What is next needed is some type of grouping This is provided by the parenthesis Enclosing a pattern in parenthesis makes it a group This group can then be followed by a repetition character Copyright © 2008-2015 Curt Hill

13 Examples (*-)* will match –*- –*-*- –*-*-*- etc The * is greedy – it will try to match as many of these as is possible Copyright © 2008-2015 Curt Hill

14 More interesting patterns A number is pretty to understand from our perspective but not so easy to describe –Except in regular expressions An integer is a string of digits –Possibly preceded by a plus or minus So how is this done? With sets and repetition Copyright © 2008-2015 Curt Hill

15 A set A pair of brackets may be filled with character This will match any one of them Thus the digits could be done with: [0123456789] An integer could then be: [-+]? [0123456789]+ Any single vowel is: [aeiouAEIOU] Copyright © 2008-2015 Curt Hill

16 Ranges in sets The letters are somewhat more than we want to type The range is handled by a dash: [0-9] is the same as [0123456789] The letters are then: [a-zA-Z] If you want a dash in a set place it first Copyright © 2008-2015 Curt Hill

17 Complement or Negation You may place a caret ^ at the beginning of a set to ask for any character but those present Thus [^0-9] is any character but a digit Copyright © 2008-2015 Curt Hill

18 Shortcut sets Several classes are so commonly used that a shortcut exists This is an escaped character \d is a digit [0-9] \D is not a digit [^0-9] \w is an alphanumeric [a-zA-Z0-9_] \W is not an alphanumeric [^a-zA-Z0- 9_] \s is whitespace [ \r\n\t\f\v] –\f is formfeed, \v is vertical tab \S is not whitespace [^ \r\n\t\f\v] Copyright © 2008-2015 Curt Hill

19 Specials In some sense the right parenthesis, right bracket and dash are ambiguous as specials If found in certain contexts they are regular and in others as specials The rights are only special if there is a leading left Dash is only special in a set and following another character Copyright © 2008-2015 Curt Hill

20 Alternation A set provides intuitive alternation The match process may choose any character within the set to use The alternation is only applied to number of single characters There is also an alternation character –The vertical bar | This allows either simple or complicated patterns to alternate Copyright © 2008-2015 Curt Hill

21 Alternation Thus: A|E|I|O|U is equivalent to [AEIOU] However, more interesting alternations are possible and useful –(abc)|(123) will match either of the two strings –([-+]?\d)+|(\w+) will match any string of characters that looks like a number or word Copyright © 2008-2015 Curt Hill

22 How to use in JavaScript? There are two ways that deserve some attention Strings have a search and replace method –Easiest –Will deal with this one first The RegExp object –Most versatile and most complicated Copyright © 2008-2015 Curt Hill

23 String search The search method takes a RegExp pattern and returns an integer position The result is the index if found and -1 if the pattern has not been found If the pattern is a string it is cast into a RegExp –You cannot always use the other features of the RegExp object –It is a powerful feature anyhow Copyright © 2008-2015 Curt Hill

24 One little glitch Since the escape is the \ for both strings and regular expressions we have a little problem To code the pattern \. for a literal dot, we would have to code: “\\.” Since this awkward we do something else Copyright © 2008-2015 Curt Hill

25 Regular Expression Pattern JavaScript has an alternative form for regular expression patterns Instead of enclosing the string in quotes where the escape sequence must be dealt with it uses the forward slash as the delimitter Thus: /\./ is a valid regular expression pattern equivalent to “\\.” Copyright © 2008-2015 Curt Hill

26 Slash Notation This notation looks funny but avoids the doubling of the escape character It may be assigned to variables: var s = /\$\d*/ Doing so makes s a RegExp object Copyright © 2008-2015 Curt Hill

27 Pattern Modifiers There are several pattern modifiers –Lower case letters that follow the slash pattern notation An i means ignore case on whole pattern –/[A-Z]*/i will match any string of any letters Others are possible as well –m and g These are also known as flags Copyright © 2008-2015 Curt Hill

28 Search example Consider s = "2314 Misc $23.85 in stock"; // A pattern for money numpat = /\$\d*\.\d*/; int = s.search(numpat); document.write( " position is ",int); The result displayed is 12 Copyright © 2008-2015 Curt Hill

29 String Replace A search is not the only thing available –There is also a replace Takes two parameters –The search pattern –The replacement string Returns the new string Only one pattern will be replaced Copyright © 2008-2015 Curt Hill

30 Example: This code s = "Welcome to VCSU. VCSU is cool.“; t = s.replace(/VCSU/, "Valley City State"); document.write(" ",t); Will provide the following output Welcome to Valley City State. VCSU is cool. Copyright © 2008-2015 Curt Hill

31 Match The match method is somewhat more complicated and will not be considered seriously here It is similar to search Depending on property settings it may return a single integer position or an array of integers containing all matches Copyright © 2008-2015 Curt Hill

32 RegExp object Clearly there is more than could be learned from the pattern match We would like to know –What actual string was matched –What was the last position of the matched string –Among many others This will also help us to modify how things are done Copyright © 2008-2015 Curt Hill

33 Constructor Just assigning a pattern to a variable does construction: re = /\d*/i; You may also use a regular constructor –The first parameter is the pattern –The second the modifiers –re = new RegExp(/Hello/,”i”) Copyright © 2008-2015 Curt Hill

34 exec Method The exec method returns the characters that matched The parameter is the string Example: re = /[0-9]+/; s = re.exec( “answers 239 and 512”); Returns the 239 as a string Does the search thing but produces a string instead of a number –Returns null for failure Copyright © 2008-2015 Curt Hill

35 Global searching You may set the global searching modifier with the g suffix Each search will set the lastIndex property to the where the search pattern ended –First location not matched A subsequent search will start at this location If the object does not have global set, the lastIndex will not be changed Copyright © 2008-2015 Curt Hill

36 Example Consider: re = /[0-9]+/g; str = "the answers are 239 and 512“; s = re.exec(str); t = re.exec(str); The s will hold the 239 and the t the 512 More serious manipulations could use lastIndex to do more complicated things Copyright © 2008-2015 Curt Hill


Download ppt "Copyright © 2008-2015 Curt Hill Regular Expressions Providing a Search Pattern."

Similar presentations


Ads by Google