Download presentation
Presentation is loading. Please wait.
Published byAlexandrina Cain Modified over 9 years ago
1
BTANT129 w61 Regular expressions step by step Tamás Váradi varadi@nytud.hu
2
BTANT129 w62 What are they? Regular expressions (regexp) define a pattern, which may match a whole series of strings Powerful, compact, fast Useful for all sorts of text processing tasks
3
BTANT129 w63 Where can I use them? In text editors/word processors (even in Ms Word to some extent!) like: –Textpad, EditPad Pro (to name but two) Special programs to search a set of files: –grep, egrep, sed (free) –powergrep –Visual REGEXP In programming languages –Perl, Python and other so-called script languages
4
BTANT129 w64 What about INTEX? Yes, INTEX has a built-in regexp facility But it is a little limited and peculiar (INTEX offers graphs as an alternative) In this lecture, we are going to cover regular expressions as used in the text processing tools mentioned above
5
BTANT129 w65 Is there a standard variety? More or less There are variants that differ in – notation –features (expressive power, elegance etc) Here we'll concentrate on what you can expect regular expressions to do
6
BTANT129 w66 First things first Any character will match itself Except characters with a special meaning (metacharacters): \ | ( ) [ { ^ $ * + ?. The pattern is applied from top to bottom left to right, as if a sliding window onto the text
7
BTANT129 w67 Special characters. will match any one character ? will match the preceding character zero or once (at most once) +will match the preceding character one or any number of times (at least once) * will match the preceding character zero or any number of times {n,m}
8
BTANT129 w68 Examples.at matches bat, cat, fat, pat, rat c*at matches at and cat and ccat, cccat etc. guess what c* will match and why? c+at matches cat and ccat, cccat etc. but not at c?at matches at and cat,
9
BTANT129 w69 Anchor points A regexp is matched against the text at any point where the first char of the regexp matches a char in the target text – a sliding window matching is done line-by line by default ^ : match at the beginning $ : match at the end
10
BTANT129 w610 Groups and alternations (bla)* Sir|Madam
11
BTANT129 w611 Character classes [aeiou] matches one of the set [^aeiou] matches any other char except one in the set [a-zA-Z0-9] consecutive characters can be referred to with a range Note: whatever the length of the set, it always represents a single character in the pattern – so it's a single character alternation ('or' relation between characters
12
BTANT129 w612 Extended features \da digit \Da non-digit \sa space, tab, linefeed, newline \Sa non-whitespace \wa word-character \Wa non-wordcharacter \b word-boundary \na newline \ta tabulator
13
BTANT129 w613 Longest vs. shortest match When using quantifiers with non-literal characters (".","\w","\S" etc.) one can easily get unintended matches.+longest match (default).+? shortest match
14
BTANT129 w614 The escape character Problem: What if we want to find characters that are special metacharacters for regexp (\ | ( ) [ { ^ $ * + ?. ) Solution: They have to be preceded by "\" to strip them of their special value e.g.: \( \$ \[ \? etc.
15
BTANT129 w615 Things to do Look up the tutorial at http://www.zvon.org/other/PerlTutorial/Output/contents.html http://www.zvon.org/other/PerlTutorial/Output/contents.html Download one of the tools VisualRegexp, Prowergrep,EditPad Pro and experiment with texts Follow the tutorial of EditPad Pro, which you can find in its Help
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.