Presentation is loading. Please wait.

Presentation is loading. Please wait.

BTANT129 w61 Regular expressions step by step Tamás Váradi

Similar presentations


Presentation on theme: "BTANT129 w61 Regular expressions step by step Tamás Váradi"— Presentation transcript:

1 BTANT129 w61 Regular expressions step by step Tamás Váradi varadi@nytud.hu

2 BTANT129 w62 What are they? Regular expressions (regexp) define a pattern, which may match a whole series of strings Powerful, compact, fast Useful for all sorts of text processing tasks

3 BTANT129 w63 Where can I use them? In text editors/word processors (even in Ms Word to some extent!) like: –Textpad, EditPad Pro (to name but two) Special programs to search a set of files: –grep, egrep, sed (free) –powergrep –Visual REGEXP In programming languages –Perl, Python and other so-called script languages

4 BTANT129 w64 What about INTEX? Yes, INTEX has a built-in regexp facility But it is a little limited and peculiar (INTEX offers graphs as an alternative) In this lecture, we are going to cover regular expressions as used in the text processing tools mentioned above

5 BTANT129 w65 Is there a standard variety? More or less There are variants that differ in – notation –features (expressive power, elegance etc) Here we'll concentrate on what you can expect regular expressions to do

6 BTANT129 w66 First things first Any character will match itself Except characters with a special meaning (metacharacters): \ | ( ) [ { ^ $ * + ?. The pattern is applied from top to bottom left to right, as if a sliding window onto the text

7 BTANT129 w67 Special characters. will match any one character ? will match the preceding character zero or once (at most once) +will match the preceding character one or any number of times (at least once) * will match the preceding character zero or any number of times {n,m}

8 BTANT129 w68 Examples.at matches bat, cat, fat, pat, rat c*at matches at and cat and ccat, cccat etc. guess what c* will match and why? c+at matches cat and ccat, cccat etc. but not at c?at matches at and cat,

9 BTANT129 w69 Anchor points A regexp is matched against the text at any point where the first char of the regexp matches a char in the target text – a sliding window matching is done line-by line by default ^ : match at the beginning $ : match at the end

10 BTANT129 w610 Groups and alternations (bla)* Sir|Madam

11 BTANT129 w611 Character classes [aeiou] matches one of the set [^aeiou] matches any other char except one in the set [a-zA-Z0-9] consecutive characters can be referred to with a range Note: whatever the length of the set, it always represents a single character in the pattern – so it's a single character alternation ('or' relation between characters

12 BTANT129 w612 Extended features \da digit \Da non-digit \sa space, tab, linefeed, newline \Sa non-whitespace \wa word-character \Wa non-wordcharacter \b word-boundary \na newline \ta tabulator

13 BTANT129 w613 Longest vs. shortest match When using quantifiers with non-literal characters (".","\w","\S" etc.) one can easily get unintended matches.+longest match (default).+? shortest match

14 BTANT129 w614 The escape character Problem: What if we want to find characters that are special metacharacters for regexp (\ | ( ) [ { ^ $ * + ?. ) Solution: They have to be preceded by "\" to strip them of their special value e.g.: \( \$ \[ \? etc.

15 BTANT129 w615 Things to do Look up the tutorial at http://www.zvon.org/other/PerlTutorial/Output/contents.html http://www.zvon.org/other/PerlTutorial/Output/contents.html Download one of the tools VisualRegexp, Prowergrep,EditPad Pro and experiment with texts Follow the tutorial of EditPad Pro, which you can find in its Help


Download ppt "BTANT129 w61 Regular expressions step by step Tamás Váradi"

Similar presentations


Ads by Google