Regular Expression Basic and Extended regular expressions. The grep, egrep. Typical examples involving different regular expressions. By: Abhilash C B.

Slides:



Advertisements
Similar presentations
Regular Expressions (in Python). Python or Egrep We will use Python. In some scripting languages you can call the command “grep” or “egrep” egrep pattern.
Advertisements

Lecture 5  Regular Expressions;  grep; CSE4251 The Unix Programming Environment.
CSCI 330 T HE UNIX S YSTEM Regular Expressions. R EGULAR E XPRESSION A pattern of special characters used to match strings in a search Typically made.
Regular Expressions grep
Regular Expressions grep and egrep. Previously Basic UNIX Commands –Files: rm, cp, mv, ls, ln –Processes: ps, kill Unix Filters –cat, head, tail, tee,
7 Searching and Regular Expressions (Regex) Mauro Jaskelioff.
1 CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller
CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
Lecture 4 Regular Expressions grep and sed intro.
Regular Expressions Lecturer: Prof. Andrzej (AJ) Bieszczad Phone: “UNIX for Programmers and Users” Third Edition,
2000 Copyrights Danielle S. Lahmani UNIX Tools G , Fall 2000 Danielle S. Lahmani Lecture 5.
Regular Expressions. u A regular expression is a pattern which matches some regular (predictable) text. u Regular expressions are used in many Unix utilities.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Lecture 4 Regular Expressions grep and sed. Previously Basic UNIX Commands –Files: rm, cp, mv, ls, ln –Processes: ps, kill Unix Filters –cat, head, tail,
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
Regular Expressions A regular expression defines a pattern of characters to be found in a string Regular expressions are made up of – Literal characters.
Last Updated March 2006 Slide 1 Regular Expressions.
Overview of the grep Command Alex Dukhovny CS 265 Spring 2011.
System Programming Regular Expressions Regular Expressions
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
INFO 320 Server Technology I Week 7 Regular expressions 1INFO 320 week 7.
1 Regular Expressions CIS*2450 Advanced Programming Techniques Material for this lectures has been taken from the excellent book, Mastering Regular Expressions,
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
Introduction to Unix – CS 21 Lecture 6. Lecture Overview Homework questions More on wildcards Regular expressions Using grep Quiz #1.
I/O Redirection and Regular Expressions February 9 th, 2004 Class Meeting 4.
Regular Expression - Intro Patterns that define a set of strings (or, pieces of a string) Not wildcards (similar notion, but different thing) Used by utilities.
1 grep & regular expression CSRU3130, Spring 2008 Ellen Zhang 1.
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
BASH – Text Processing Utilities Erick, Joan © Sekolah Tinggi Teknik Surabaya 1.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
What is grep ?  % man grep  DESCRIPTION  The grep utility searches text files for a pattern and prints all lines that contain that pattern. It uses.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Regular Expressions grep and sed. Regular Expressions –Allow you to search for text in files –grep command Stream manipulation: –sed.
May 2006CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
CS:414 INTRODUCTION TO UNIX AND LINUX Part 3: Regular Expressions and vi editor By Dr. Noman Hasany.
PROGRAMMING THE BASH SHELL PART III by İlker Korkmaz and Kaya Oğuz
Regular Expressions Copyright Doug Maxwell (
RE Tutorial.
CSCI The UNIX System sed - Stream Editor
Regular expressions, egrep, and sed
Department of Computer Science and Engineering
Regular Expressions ICCM 2017
Looking for Patterns - Finding them with Regular Expressions
CST8177 sed The Stream Editor.
Regular Expression - Intro
Lecture 9 Shell Programming – Command substitution
CSE 390a Lecture 7 Regular expressions, egrep, and sed
Regular Expression Beihang Open Source Club.
Folks Carelli, Instructor Kutztown University
CSC 352– Unix Programming, Spring 2016
Unix Talk #2 grep/egrep/fgrep (maybe add more to this one….)
Unix Talk #2 (sed).
CSE 390a Lecture 7 Regular expressions, egrep, and sed
Chin-Chih Chang CS 497C – Introduction to UNIX Lecture 28: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
Regular expressions, egrep, and sed
Regular expressions, egrep, and sed
CSE 303 Concepts and Tools for Software Development
Regular Expressions and Grep
Regular Expressions grep and sed intro
CSCI The UNIX System Regular Expressions
grep & regular expression
1.5 Regular Expressions (REs)
Regular Expressions grep Familiy of Commands
CSE 390a Lecture 7 Regular expressions, egrep, and sed
Review.
Presentation transcript:

Regular Expression Basic and Extended regular expressions. The grep, egrep. Typical examples involving different regular expressions. By: Abhilash C B

Regular Expression A pattern of special characters used to match strings in a search Typically made up from special characters called metacharacters Regular expressions are used thoughout UNIX: Editors: ed, ex, vi Utilities: grep, egrep, sed, and awk CSCI 330 - The UNIX System

Metacharacters any non-metacharacter matches itself RE Metacharacter . Any one character, except new line [a-z] Any one of the enclosed characters (e.g. a-z) * Zero or more of preceding character ? or \? Zero or one of the preceding characters + or \+ One or more of the preceding characters CSCI 330 - The UNIX System

The grep Utility “grep” command: searches for text in file(s) Examples: % grep root mail.log % grep r..t mail.log % grep ro*t mail.log % grep ‘ro*t’ mail.log % grep ‘r[a-z]*t’ mail.log CSCI 330 - The UNIX System

more Metacharacters RE Metacharacter Matches… ^ beginning of line $ end of line \char Escape the meaning of char following it [^] One character not in the set \< Beginning of word anchor \> End of word anchor ( ) or \( \) Tags matched characters to be used later (max = 9) | or \| Or grouping x\{m\} Repetition of character x, m times (x,m = integer) x\{m,\} Repetition of character x, at least m times x\{m,n\} Repetition of character x between m and m times CSCI 330 - The UNIX System

An operator combines regular expression atoms. An atom specifies what text is to be matched and where it is to be found. An operator combines regular expression atoms. CSCI 330 - The UNIX System

An atom specifies what text is to be matched and where it is to be found. CSCI 330 - The UNIX System

Single-Character Atom A single character matches itself CSCI 330 - The UNIX System

matches any single character except for a new Dot Atom matches any single character except for a new line character (\n) CSCI 330 - The UNIX System

Class Atom matches only single character that can be any of the characters defined in a set: Example: [ABC] matches either A, B, or C. Notes: 1) A range of characters is indicated by a dash, e.g. [A-Q] 2) Can specify characters to be excluded from the set, e.g. [^0-9] matches any character other than a number. CSCI 330 - The UNIX System

Example: Classes CSCI 330 - The UNIX System

short-hand classes [:alnum:] [:alpha:] [:upper:] [:lower:] [:digit:] [:space:] CSCI 330 - The UNIX System

Anchors Anchors tell where the next character in the pattern must be located in the text data. CSCI 330 - The UNIX System

Back References: \n used to retrieve saved text in one of nine buffers can refer to the text in a saved buffer by using a back reference: ex.: \1 \2 \3 ...\9 more details on this later CSCI 330 - The UNIX System

Operators CSCI 330 - The UNIX System

Sequence Operator In a sequence operator, if a series of atoms are shown in a regular expression, there is no operator between them. CSCI 330 - The UNIX System

Alternation Operator: | or \| operator (| or \| ) is used to define one or more alternatives Note: depends on version of “grep” CSCI 330 - The UNIX System

Repetition Operator: \{…\} The repetition operator specifies that the atom or expression immediately before the repetition may be repeated. CSCI 330 - The UNIX System

Basic Repetition Forms CSCI 330 - The UNIX System

Short Form Repetition Operators: * + ? CSCI 330 - The UNIX System

Group Operator In the group operator, when a group of characters is enclosed in parentheses, the next operator applies to the whole group, not only the previous characters. Note: depends on version of “grep” use \( and \) instead CSCI 330 - The UNIX System

Grep detail and examples grep is family of commands grep common version egrep understands extended REs (| + ? ( ) don’t need backslash) fgrep understands only fixed strings, i.e. is faster rgrep will traverse sub-directories recursively CSCI 330 - The UNIX System

Commonly used “grep” options: Print only a count of matched lines. -i Ignore uppercase and lowercase distinctions. -l List all files that contain the specified pattern. -n Print matched lines and line numbers. -s Work silently; display nothing except error messages. Useful for checking the exit status. -v Print lines that do not match the pattern. CSCI 330 - The UNIX System

Example: grep with pipe % ls -l | grep '^d' drwxr-xr-x 2 krush csci 512 Feb 8 22:12 assignments drwxr-xr-x 2 krush csci 512 Feb 5 07:43 feb3 drwxr-xr-x 2 krush csci 512 Feb 5 14:48 feb5 drwxr-xr-x 2 krush csci 512 Dec 18 14:29 grades drwxr-xr-x 2 krush csci 512 Jan 18 13:41 jan13 drwxr-xr-x 2 krush csci 512 Jan 18 13:17 jan15 drwxr-xr-x 2 krush csci 512 Jan 18 13:43 jan20 drwxr-xr-x 2 krush csci 512 Jan 24 19:37 jan22 drwxr-xr-x 4 krush csci 512 Jan 30 17:00 jan27 drwxr-xr-x 2 krush csci 512 Jan 29 15:03 jan29 % ls -l | grep -c '^d' 10 Pipe the output of the “ls –l” command to grep and list/select only directory entries. Display the number of lines where the pattern was found. This does not mean the number of occurrences of the pattern. CSCI 330 - The UNIX System

Example: grep with \< \> % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 Print the line if it contains the word “north”. % grep '\<north\>' grep-datafile north NO Ann Stephens 455000.50 CSCI 330 - The UNIX System

Example: grep with a\|b % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 Print the lines that contain either the expression “NW” or the expression “EA” % grep 'NW\|EA' grep-datafile northwest NW Charles Main 300000.00 eastern EA TB Savage 440500.45 Note: egrep works with | CSCI 330 - The UNIX System

Example: egrep with + Note: grep works with \+ % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 Print all lines containing one or more 3's. % egrep '3+' grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 Note: grep works with \+ CSCI 330 - The UNIX System

Example: egrep with RE: ? % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 Print all lines containing a 2, followed by zero or one period, followed by a number. % egrep '2\.?[0-9]' grep-datafile southwest SW Lewis Dalsass 290000.73 Note: grep works with \? CSCI 330 - The UNIX System

Note: grep works with \( \) \+ Example: egrep with ( ) % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 Print all lines containing one or more consecutive occurrences of the pattern “no”. % egrep '(no)+' grep-datafile northwest NW Charles Main 300000.00 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 Note: grep works with \( \) \+ CSCI 330 - The UNIX System

Example: egrep with (a|b) % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 Print all lines containing the uppercase letter “S”, followed by either “h” or “u”. % egrep 'S(h|u)' grep-datafile western WE Sharon Gray 53000.89 southern SO Suan Chin 54500.10 Note: grep works with \( \) \| CSCI 330 - The UNIX System

Example: fgrep % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 Find all lines in the file containing the literal string “[A-Z]****[0-9]..$5.00”. All characters are treated as themselves. There are no special characters. % fgrep '[A-Z]****[0-9]..$5.00' grep-datafile Extra [A-Z]****[0-9]..$5.00 CSCI 330 - The UNIX System

Print all lines beginning with the letter n. Example: Grep with ^ % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 Print all lines beginning with the letter n. % grep '^n' grep-datafile northwest NW Charles Main 300000.00 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 CSCI 330 - The UNIX System

Print all lines ending with a period and exactly two zero numbers. Example: grep with $ % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 Print all lines ending with a period and exactly two zero numbers. % grep '\.00$' grep-datafile northwest NW Charles Main 300000.00 southeast SE Patricia Hemenway 400000.00 Extra [A-Z]****[0-9]..$5.00 CSCI 330 - The UNIX System

Example: grep with \char % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 Print all lines containing the number 5, followed by a literal period and any single character. % grep '5\..' grep-datafile Extra [A-Z]****[0-9]..$5.00 CSCI 330 - The UNIX System

Print all lines beginning with either a “w” or an “e”. Example: grep with [ ] % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 Print all lines beginning with either a “w” or an “e”. % grep '^[we]' grep-datafile western WE Sharon Gray 53000.89 eastern EA TB Savage 440500.45 CSCI 330 - The UNIX System

Print all lines ending with a period and exactly two non-zero numbers. Example: grep with [^] % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 Print all lines ending with a period and exactly two non-zero numbers. % grep '\.[^0][^0]$' grep-datafile western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 eastern EA TB Savage 440500.45 CSCI 330 - The UNIX System

Example: grep with x\{m\} % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 Print all lines where there are at least six consecutive numbers followed by a period. % grep '[0-9]\{6\}\.' grep-datafile northwest NW Charles Main 300000.00 southwest SW Lewis Dalsass 290000.73 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 north NO Ann Stephens 455000.50 central CT KRush 575500.70 CSCI 330 - The UNIX System

Example: grep with \< % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 Print all lines containing a word starting with “north”. % grep '\<north' grep-datafile northwest NW Charles Main 300000.00 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 CSCI 330 - The UNIX System

Summary regular expressions for grep family of commands CSCI 330 - The UNIX System

Regular Expressions: Exact Matches c k s regular expression UNIX Tools rocks. match UNIX Tools sucks. match UNIX Tools is okay. no match Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-4954

Regular Expressions: Multiple Matches A regular expression can match a string in more than one place. a p p l e regular expression Scrapple from the apple. match 1 match 2 Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-4954

Regular Expressions: Matching Any Character The . regular expression can be used to match any character. o . regular expression For me to poop on. match 1 match 2 Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-4954

Regular Expressions: Alternate Character Classes Character classes [] can be used to match any specific set of characters. b [eor] a t regular expression beat a brat on a boat match 1 match 2 match 3 Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-4954

Regular Expressions: Negated Character Classes Character classes can be negated with the [^] syntax. b [^eo] a t regular expression beat a brat on a boat match no match Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-4954

Regular Expressions: Other Character Classes Other examples of character classes: [aeiou] will match any of the characters a, e, i, o, or u [kK]orn will match korn or Korn Ranges can also be specified in character classes [1-9] is the same as [123456789] [abcde] is equivalent to [a-e] You can also combine multiple ranges [abcde123456789] is equivalent to [a-e1-9] Note that the - character has a special meaning in a character class but only if it is used within a range [-123] would match the characters -, 1, 2, or 3 Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-4954

Regular Expressions: Named Character Classes Commonly used character classes can be referred to by name alpha, lower, upper, alnum, digit, punct, cntl Syntax [:name:] [a-zA-Z] [[:alpha:]] [a-zA-Z0-9] [[:alnum:]] [45a-z] [45[:lower:]] Important for portability across languages Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-4954

Regular Expressions: Anchors Anchors are used to match at the beginning or end of a line (or both). ^ means beginning of the line $ means end of the line regular expression ^ b [eor] a t beat a brat on a boat match regular expression b [eor] a t $ beat a brat on a boat match ^word$ ^$ Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-4954

Regular Expression: Repetions y a * y regular expression The * is used to define zero or more occurrences of the single regular expression preceding it. I got mail, yaaaaaaaaaay! match regular expression o a * o For me to poop on. match .* Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-4954

Regular Expressions: Repetion Ranges, Subexpressions Ranges can also be specified {n,m} notation can specify a range of repetitions for the immediately preceding regex {n} means exactly n occurrences {n,} means at least n occurrences {n,m} means at least n occurrences but no more than m occurrences Example: .{0,} same as .* a{2,} same as aaa* If you want to group part of an expression so that * applies to more than just the previous character, use ( ) notation Subexpresssions are treated like a single character a* matches 0 or more occurrences of a abc* matches ab, abc, abcc, abccc, … (abc)* matches abc, abcabc, abcabcabc, … (abc){2,3} matches abcabc or abcabcabc Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-4954

Single Quoting Regex Since many of the special characters used in regexs also have special meaning to the shell, it’s a good idea to get in the habit of single quoting your regexs This will protect any special characters from being operated on by the shell If you habitually do it, you won’t have to worry about when it is necessary Even though we are single quoting our regexs so the shell won’t interpret the special characters, sometimes we still want to use an operator as itself To do this, we escape the character with a \ (backslash) Suppose we want to search for the character sequence ‘a*b*’ Unless we do something special, this will match zero or more ‘a’s followed by zero or more ‘b’s, not what we want! ‘a\*b\*’ will fix this - now the asterisks are treated as regular characters Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-4954

Extended Regular Expressions Regex also provides an alternation character | for matching one or another subexpression (T|Fl)an will match Tan or Flan ^(From|Subject): will match the From and Subject lines of a typical email message It matches a beginning of line followed by either the characters From or Subject followed by a ‘:’ Subexpressions are used to limit the scope of the alternation At(ten|nine)tion then matches Attention or Atninetion, not Atten or ninetion as would happen without the parenthesis - Atten|ninetion Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-4954

Extended Regular Expressions: Repetition Shorthands The * (star) has already been seen to specify zero or more occurrences of the immediately preceding character The + (plus) means one or more abc+d will match abcd, abccd, or abccccccd but will not match ‘abd’ while abc?d will match abd and abcd but not ‘abccd’ Equivalent to {1,} The ? (question mark) specifies an optional character, the single character that immediately precedes it July? will match Jul or July Equivalent to {0,1} Also equivalent to (Jul|July) The *, ?, and + are known as quantifiers because they specify the quantity of a match Quantifiers can also be used with subexpressions (a*c)+ will match c, ac, aac or aacaacac but will not match ‘a’ or a blank line Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-4954

Regular Expressions: Backreferences Sometimes it is handy to be able to refer to a match that was made earlier in a regex This is done using backreferences \n is the backreference specifier, where n is a number For example, to find if the first word of a line is the same as the last: ^\([[:alpha:]]\{1,\}\).*\1$ The \([[:alpha:]]\{1,\}\) matches 1 or more letters Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-4954

Regular Expressions: Some Practical Examples Variable names in C [a-zA-Z_][a-zA-Z_0-9]* Dollar amount with optional cents \$[0-9]+(\.[0-9][0-9])? Time of day (1[012]|[1-9]):[0-5][0-9] (am|pm) HTML headers <h1> <H1> <h2> … <[hH][1-4]> Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-4954

Regular Experessions: Quick Refrences xyz Ordinary characters match themselves (NEWLINES and metacharacters excluded) Ordinary strings match themselves \m ^ $ . [xy^$x] [^xy^$z] [a-z] r* r1r2 Matches literal character m Start of line End of line Any single character Any of x, y, ^, $, or z Any one character other than x, y, ^, $, or z Any single character in given range zero or more occurrences of regex r Matches r1 followed by r2 \(r\) \n \{n,m\} Tagged regular expression, matches r Set to what matched the nth tagged expression (n = 1-9) Repetition r+ r? r1|r2 (r1|r2)r3 (r1|r2)* {n,m} One or more occurrences of r Zero or one occurrences of r Either r1 or r2 Either r1r3 or r2r3 Zero or more occurrences of r1|r2, e.g., r1, r1r1, r2r1, r1r1r2r1,…) fgrep, grep, egrep grep, egrep grep egrep Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-4954

grep/fgrep/egrep utilities grep [option] pattern {filename}* grep searches the named files or standard input and prints each line that contains an instance of the pattern. grep interprets the same regular expressions as ed(1) 2000 Copyrights Danielle S. Lahmani

grep/egrep/fgrep utilities g/re/p global regular expression print regular expressions are specified by giving special meanings to certain characters. 2000 Copyrights Danielle S. Lahmani

grep/egrep/fgrep utilities ^ and $ "anchor" the pattern to the beginning (^) or end ($) of the line. regular expressions metacharacters overlap with shell metacharacters , so it's always a good idea to enclose grep patterns in single quotes, to suppress interpretation by the shell. $ grep From $MAIL locates lines containing From in your mailbox $ grep '^From' $MAIL prints lines that begin with From in your mailbox 2000 Copyrights Danielle S. Lahmani

grep utility [a-z] matches any lower case letter grep supports character classes much like those in the shell. [a-z] matches any lower case letter [^….] matches any character except those in the class [^0-9] matches any non-digit A period '.' is equivalent to the shell's ? it matches any single character. 2000 Copyrights Danielle S. Lahmani

grep examples print lines that begin with mary $ who | grep '^mary' print lines that end with mary $ who | grep 'mary$’ list files others can read and write  $ls -l | grep '^. . . . . . .rw’ 2000 Copyrights Danielle S. Lahmani

grep examples: grep 'abc*' matches ab followed by zero or more c's the closure operator * applies to previous character or metacharacter ( including a character class) in expression grep 'abc*' matches ab followed by zero or more c's grep 'ab[a-z]*' matches ab followed by any number of lower case letters 2000 Copyrights Danielle S. Lahmani

grep utility grep '^[^:]*::' matches beginning of line, zero or more non-colons followed by a double colon. Inside the brackets, the ^ means not. no grep regular expression matches a newline; the expressions are applied to each line individually. 2000 Copyrights Danielle S. Lahmani

grep/fgrep/ egrep utilities Example: grep '^[^aeiou]*a[^aeiou]*e$' foo Looks for zero or more non-vowels followed by a, followed by zero or more non vowels followed by e at the end of the line Would match dsdakjkjkje …  \ turns off meaning of special character that follows: $ grep \' foo find all ' apostrophes in file foo 2000 Copyrights Danielle S. Lahmani

Common grep options -n prints line numbers $ grep -n variable *.[ch] locate variable in C source -v inverts the sense of the test $ grep -v From foo print all lines that do not contain From in file foo -i makes lower case in pattern match either case in file $ grep -i mary $HOME/bin/phone-book 2000 Copyrights Danielle S. Lahmani

grep common options (continued) -c prints only a count of matched lines $grep -c /bin/csh /etc/passwd -l list filenames but not matched lines $ grep -l '^#include' /usr/include/* 2000 Copyrights Danielle S. Lahmani

egrep is an extended grep that works on extended regular expression,accepts: + one or more occurrences ? zero or one occurrence pat1| pat2 "or" op matches on either pat1 or pat2 ( r) regular expression r , can be nested (xy)* matches any of the empty string, xy, xyxy, xyxyxy and so on 2000 Copyrights Danielle S. Lahmani

fgrep/ egrep utilities fgrep searches for many literal strings simultaneously. It does not work with *. Both, fgrep and egrep have -f option, to read patterns stored in a file. In file, newlines separate patterns to be searched for simultaneously. 2000 Copyrights Danielle S. Lahmani

TABLE of Regular expressions for grep and egrep c any-non special character c matches itself \c turns off any special meaning of character c ^ beginning of line $ end of line . matches any single character […] any one of characters in …; [^…] any character not in … 2000 Copyrights Danielle S. Lahmani

TABLE of Regular expressions for grep and egrep (continued) r* zero or more occurrences of r r+ one or more occurrences of r (egrep only) r? zero or one occurrences of r (egrep only) r1r2 r1 followed by r2 r1| r2 r1 or r2 ( egrep only) (r) nested regular expression r (egrep only) 2000 Copyrights Danielle S. Lahmani