CSCI The UNIX System Regular Expressions

Slides:



Advertisements
Similar presentations
An Introduction to Sed & Awk Presented Tues, Jan 14 th, 2003 Send any suggestions to Siobhan Quinn
Advertisements

CSCI 330 T HE UNIX S YSTEM sed - Stream Editor. W HAT IS SED ? A non-interactive stream editor Interprets sed instructions and performs actions Use sed.
Lecture 5  Regular Expressions;  grep; CSE4251 The Unix Programming Environment.
CSCI 330 T HE UNIX S YSTEM Regular Expressions. R EGULAR E XPRESSION A pattern of special characters used to match strings in a search Typically made.
LINUX System : Lecture 3 (English-Only Lecture) Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Acknowledgement.
Regular Expressions grep
7 Searching and Regular Expressions (Regex) Mauro Jaskelioff.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
Regular Expressions. u A regular expression is a pattern which matches some regular (predictable) text. u Regular expressions are used in many Unix utilities.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Filters using Regular Expressions grep: Searching a Pattern.
Regular Expressions A regular expression defines a pattern of characters to be found in a string Regular expressions are made up of – Literal characters.
Overview of the grep Command Alex Dukhovny CS 265 Spring 2011.
System Programming Regular Expressions Regular Expressions
INFO 320 Server Technology I Week 7 Regular expressions 1INFO 320 week 7.
Unix Talk #2 (sed). 2 You have learned…  Regular expressions, grep, & egrep  grep & egrep are tools used to search for text in a file  AWK -- powerful.
Introduction to Unix – CS 21 Lecture 6. Lecture Overview Homework questions More on wildcards Regular expressions Using grep Quiz #1.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
I/O Redirection and Regular Expressions February 9 th, 2004 Class Meeting 4.
WHAT IS SED? A non-interactive stream editor Interprets sed instructions and performs actions Use sed to: Automatically perform edits on file(s) ‏ Simplify.
Regular Expression - Intro Patterns that define a set of strings (or, pieces of a string) Not wildcards (similar notion, but different thing) Used by utilities.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Pattern Matching CSCI N321 – System and Network Administration.
©Brooks/Cole, 2001 Chapter 9 Regular Expressions ( 정규수식 )
©Brooks/Cole, 2001 Chapter 9 Regular Expressions.
Introduction to sed. Sed : a “S tream ED itor ” What is Sed ?  A “non-interactive” text editor that is called from the unix command line.  Input text.
Sys Prog & Scrip - Heriot Watt Univ 1 Systems Programming & Scripting Lecture 12: Introduction to Scripting & Regular Expressions.
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
Unix Programming Environment Part 3-4 Regular Expression and Pattern Matching Prepared by Xu Zhenya( Draft – Xu Zhenya(
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
By Corey Stokes 9/14/10. What is grep? Global Regular Expression Print grep is a command line search utility in Unix Try: Search for a word in a.cpp file.
BASH – Text Processing Utilities Erick, Joan © Sekolah Tinggi Teknik Surabaya 1.
UNIX Commands RTFM: grep(1), egrep(1) & fgrep(1) Gilbert Detillieux April 13, 2010 MUUG Meeting.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
What are Regular Expressions?What are Regular Expressions?  Pattern to match text  Consists of two parts, atoms and operators  Atoms specifies what.
What is grep ?  % man grep  DESCRIPTION  The grep utility searches text files for a pattern and prints all lines that contain that pattern. It uses.
CSCI 330 UNIX and Network Programming
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
May 2006CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
PROGRAMMING THE BASH SHELL PART III by İlker Korkmaz and Kaya Oğuz
Regular Expressions Copyright Doug Maxwell (
CSCI The UNIX System sed - Stream Editor
CSC 352– Unix Programming, Spring 2016
Department of Computer Science and Engineering
Regular Expressions ICCM 2017
Regular Expression Basic and Extended regular expressions. The grep, egrep. Typical examples involving different regular expressions. By: Abhilash C B.
Looking for Patterns - Finding them with Regular Expressions
Regular Expression - Intro
BASIC AND EXTENDED REGULAR EXPRESSIONS
Lecture 9 Shell Programming – Command substitution
Regular Expression Beihang Open Source Club.
CSC 352– Unix Programming, Fall 2012
Folks Carelli, Instructor Kutztown University
The ‘grep’ Command Colin Masterson.
CSC 352– Unix Programming, Spring 2016
Unix Talk #2 grep/egrep/fgrep (maybe add more to this one….)
Lecture 5 Additional useful commands COP 3353 Introduction to UNIX 1.
Unix Talk #2 (sed).
Chin-Chih Chang CS 497C – Introduction to UNIX Lecture 28: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
CSE 303 Concepts and Tools for Software Development
Regular Expressions and Grep
CIT 383: Administrative Scripting
1.5 Regular Expressions (REs)
Lecture 5 Additional useful commands COP 3353 Introduction to UNIX 1.
Review.
Presentation transcript:

CSCI 330 - The UNIX System Regular Expressions NIU - Department of Computer Science

Regular Expression A pattern of special characters used to match strings in a search Typically made up from special characters called metacharacters Regular expressions are used thoughout UNIX: Editors: ed, ex, vi Utilities: grep, egrep, sed, and awk CSCI 330 - The UNIX System

Metacharacters any non-metacharacter matches itself RE Metacharacter . Any one character, except new line [a-z] Any one of the enclosed characters (e.g. a-z) * Zero or more of preceding character ? or \? Zero or one of the preceding characters + or \+ One or more of the preceding characters CSCI 330 - The UNIX System

The grep Utility “grep” command: searches for text in file(s) Examples: % grep root mail.log % grep r..t mail.log % grep ro*t mail.log % grep ‘ro*t’ mail.log % grep ‘r[a-z]*t’ mail.log CSCI 330 - The UNIX System

more Metacharacters RE Metacharacter Matches… ^ beginning of line $ end of line \char Escape the meaning of char following it [^] One character not in the set \< Beginning of word anchor \> End of word anchor ( ) or \( \) Tags matched characters to be used later (max = 9) | or \| Or grouping x\{m\} Repetition of character x, m times (x,m = integer) x\{m,\} Repetition of character x, at least m times x\{m,n\} Repetition of character x between m and m times CSCI 330 - The UNIX System

An operator combines regular expression atoms. An atom specifies what text is to be matched and where it is to be found. An operator combines regular expression atoms. CSCI 330 - The UNIX System

An atom specifies what text is to be matched and where it is to be found. CSCI 330 - The UNIX System

Single-Character Atom A single character matches itself CSCI 330 - The UNIX System

matches any single character except for a new Dot Atom matches any single character except for a new line character (\n) CSCI 330 - The UNIX System

Class Atom matches only single character that can be any of the characters defined in a set: Example: [ABC] matches either A, B, or C. CSCI 330 - The UNIX System Notes: 1) A range of characters is indicated by a dash, e.g. [A-Q] 2) Can specify characters to be excluded from the set, e.g. [^0-9] matches any character other than a number.

Example: Classes CSCI 330 - The UNIX System

short-hand classes [:alnum:] [:alpha:] [:upper:] [:lower:] [:digit:] [:space:] CSCI 330 - The UNIX System

Anchors Anchors tell where the next character in the pattern must be located in the text data. CSCI 330 - The UNIX System

Back References: \n used to retrieve saved text in one of nine buffers can refer to the text in a saved buffer by using a back reference: ex.: \1 \2 \3 ...\9 more details on this later CSCI 330 - The UNIX System

Operators CSCI 330 - The UNIX System

Sequence Operator In a sequence operator, if a series of atoms are shown in a regular expression, there is no operator between them. CSCI 330 - The UNIX System

Alternation Operator: | or \| operator (| or \| ) is used to define one or more alternatives CSCI 330 - The UNIX System Note: depends on version of “grep”

Repetition Operator: \{…\} The repetition operator specifies that the atom or expression immediately before the repetition may be repeated. CSCI 330 - The UNIX System

Basic Repetition Forms CSCI 330 - The UNIX System

Short Form Repetition Operators: * + ? CSCI 330 - The UNIX System

Group Operator In the group operator, when a group of characters is enclosed in parentheses, the next operator applies to the whole group, not only the previous characters. CSCI 330 - The UNIX System Note: depends on version of “grep” use \( and \) instead

Grep detail and examples grep is family of commands grep common version egrep understands extended REs (| + ? ( ) don’t need backslash) fgrep understands only fixed strings, i.e. is faster rgrep will traverse sub-directories recursively CSCI 330 - The UNIX System

Commonly used “grep” options: Print only a count of matched lines. -i Ignore uppercase and lowercase distinctions. -l List all files that contain the specified pattern. -n Print matched lines and line numbers. -s Work silently; display nothing except error messages. Useful for checking the exit status. -v Print lines that do not match the pattern. CSCI 330 - The UNIX System

Example: grep with pipe % ls -l | grep '^d' drwxr-xr-x 2 krush csci 512 Feb 8 22:12 assignments drwxr-xr-x 2 krush csci 512 Feb 5 07:43 feb3 drwxr-xr-x 2 krush csci 512 Feb 5 14:48 feb5 drwxr-xr-x 2 krush csci 512 Dec 18 14:29 grades drwxr-xr-x 2 krush csci 512 Jan 18 13:41 jan13 drwxr-xr-x 2 krush csci 512 Jan 18 13:17 jan15 drwxr-xr-x 2 krush csci 512 Jan 18 13:43 jan20 drwxr-xr-x 2 krush csci 512 Jan 24 19:37 jan22 drwxr-xr-x 4 krush csci 512 Jan 30 17:00 jan27 drwxr-xr-x 2 krush csci 512 Jan 29 15:03 jan29 % ls -l | grep -c '^d' 10 Pipe the output of the “ls –l” command to grep and list/select only directory entries. CSCI 330 - The UNIX System Display the number of lines where the pattern was found. This does not mean the number of occurrences of the pattern.

Example: grep with \< \> % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 CSCI 330 - The UNIX System Print the line if it contains the word “north”. % grep '\<north\>' grep-datafile north NO Ann Stephens 455000.50

Example: grep with a\|b % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 CSCI 330 - The UNIX System Print the lines that contain either the expression “NW” or the expression “EA” % grep 'NW\|EA' grep-datafile northwest NW Charles Main 300000.00 eastern EA TB Savage 440500.45 Note: egrep works with |

Example: egrep with + Note: grep works with \+ % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 CSCI 330 - The UNIX System Print all lines containing one or more 3's. % egrep '3+' grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 Note: grep works with \+

Example: egrep with RE: ? % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 CSCI 330 - The UNIX System Print all lines containing a 2, followed by zero or one period, followed by a number. % egrep '2\.?[0-9]' grep-datafile southwest SW Lewis Dalsass 290000.73 Note: grep works with \?

Note: grep works with \( \) \+ Example: egrep with ( ) % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 CSCI 330 - The UNIX System Print all lines containing one or more consecutive occurrences of the pattern “no”. % egrep '(no)+' grep-datafile northwest NW Charles Main 300000.00 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 Note: grep works with \( \) \+

Example: egrep with (a|b) % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 CSCI 330 - The UNIX System Print all lines containing the uppercase letter “S”, followed by either “h” or “u”. % egrep 'S(h|u)' grep-datafile western WE Sharon Gray 53000.89 southern SO Suan Chin 54500.10 Note: grep works with \( \) \|

Example: fgrep % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 CSCI 330 - The UNIX System Find all lines in the file containing the literal string “[A-Z]****[0-9]..$5.00”. All characters are treated as themselves. There are no special characters. % fgrep '[A-Z]****[0-9]..$5.00' grep-datafile Extra [A-Z]****[0-9]..$5.00

Print all lines beginning with the letter n. Example: Grep with ^ % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 CSCI 330 - The UNIX System Print all lines beginning with the letter n. % grep '^n' grep-datafile northwest NW Charles Main 300000.00 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50

Print all lines ending with a period and exactly two zero numbers. Example: grep with $ % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 CSCI 330 - The UNIX System Print all lines ending with a period and exactly two zero numbers. % grep '\.00$' grep-datafile northwest NW Charles Main 300000.00 southeast SE Patricia Hemenway 400000.00 Extra [A-Z]****[0-9]..$5.00

Example: grep with \char % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 CSCI 330 - The UNIX System Print all lines containing the number 5, followed by a literal period and any single character. % grep '5\..' grep-datafile Extra [A-Z]****[0-9]..$5.00

Print all lines beginning with either a “w” or an “e”. Example: grep with [ ] % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 CSCI 330 - The UNIX System Print all lines beginning with either a “w” or an “e”. % grep '^[we]' grep-datafile western WE Sharon Gray 53000.89 eastern EA TB Savage 440500.45

Print all lines ending with a period and exactly two non-zero numbers. Example: grep with [^] % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 CSCI 330 - The UNIX System Print all lines ending with a period and exactly two non-zero numbers. % grep '\.[^0][^0]$' grep-datafile western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 eastern EA TB Savage 440500.45

Example: grep with x\{m\} % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 CSCI 330 - The UNIX System Print all lines where there are at least six consecutive numbers followed by a period. % grep '[0-9]\{6\}\.' grep-datafile northwest NW Charles Main 300000.00 southwest SW Lewis Dalsass 290000.73 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 north NO Ann Stephens 455000.50 central CT KRush 575500.70

Example: grep with \< % cat grep-datafile northwest NW Charles Main 300000.00 western WE Sharon Gray 53000.89 southwest SW Lewis Dalsass 290000.73 southern SO Suan Chin 54500.10 southeast SE Patricia Hemenway 400000.00 eastern EA TB Savage 440500.45 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50 central CT KRush 575500.70 Extra [A-Z]****[0-9]..$5.00 CSCI 330 - The UNIX System Print all lines containing a word starting with “north”. % grep '\<north' grep-datafile northwest NW Charles Main 300000.00 northeast NE AM Main Jr. 57800.10 north NO Ann Stephens 455000.50

Summary regular expressions for grep family of commands CSCI 330 - The UNIX System