1.5 Regular Expressions (REs)

Slides:



Advertisements
Similar presentations
CST8177 awk. The awk program is not named after the sea-bird (that's auk), nor is it a cry from a parrot (awwwk!). It's the initials of the authors, Aho,
Advertisements

UNIX Chapter 10 Advanced File Processing Mr. Mohammad Smirat.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
CSCI 330 T HE UNIX S YSTEM Regular Expressions. R EGULAR E XPRESSION A pattern of special characters used to match strings in a search Typically made.
Regular Expressions grep and egrep. Previously Basic UNIX Commands –Files: rm, cp, mv, ls, ln –Processes: ps, kill Unix Filters –cat, head, tail, tee,
7 Searching and Regular Expressions (Regex) Mauro Jaskelioff.
Lecture 4 Regular Expressions grep and sed intro.
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.
Regular Expressions Lecturer: Prof. Andrzej (AJ) Bieszczad Phone: “UNIX for Programmers and Users” Third Edition,
Regular Expressions. u A regular expression is a pattern which matches some regular (predictable) text. u Regular expressions are used in many Unix utilities.
Filters using Regular Expressions grep: Searching a Pattern.
Overview of the grep Command Alex Dukhovny CS 265 Spring 2011.
System Programming Regular Expressions Regular Expressions
Unix Talk #2 (sed). 2 You have learned…  Regular expressions, grep, & egrep  grep & egrep are tools used to search for text in a file  AWK -- powerful.
1 Regular Expressions CIS*2450 Advanced Programming Techniques Material for this lectures has been taken from the excellent book, Mastering Regular Expressions,
Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use , Semester 1, Lex.
Introduction to Programming David Goldschmidt, Ph.D. Computer Science The College of Saint Rose Java Fundamentals (Comments, Variables, etc.)
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
BIF713 Additional Utilities. Linux Utilities  You have learned many Linux commands. Here are some more that you can use:  Data Manipulation (Reg Exps)
I/O Redirection and Regular Expressions February 9 th, 2004 Class Meeting 4.
Regular Expression - Intro Patterns that define a set of strings (or, pieces of a string) Not wildcards (similar notion, but different thing) Used by utilities.
Compilers: lex analysis/2 1 Compiler Structures Objective – –what is lexical analysis? – –look at a lexical analyzer for a simple 'expressions'
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
By Corey Stokes 9/14/10. What is grep? Global Regular Expression Print grep is a command line search utility in Unix Try: Search for a word in a.cpp file.
BASH – Text Processing Utilities Erick, Joan © Sekolah Tinggi Teknik Surabaya 1.
UNIX Commands RTFM: grep(1), egrep(1) & fgrep(1) Gilbert Detillieux April 13, 2010 MUUG Meeting.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 5 – Regular Expressions, grep, Other Utilities.
Regular Expressions Todd Kelley CST8207 – Todd Kelley1.
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
CS:414 INTRODUCTION TO UNIX AND LINUX Part 3: Regular Expressions and vi editor By Dr. Noman Hasany.
CIRC Summer School 2016 Baowei Liu
PROGRAMMING THE BASH SHELL PART III by İlker Korkmaz and Kaya Oğuz
Regular Expressions Copyright Doug Maxwell (
Lesson 5-Exploring Utilities
CS314 – Section 5 Recitation 2
Regular Expressions ICCM 2017
Regular Expression Basic and Extended regular expressions. The grep, egrep. Typical examples involving different regular expressions. By: Abhilash C B.
LINUX LANGUAGE MULTIPLE CHOICE QUESTION SET-5
Looking for Patterns - Finding them with Regular Expressions
CIRC Summer School 2017 Baowei Liu
Regular Expression - Intro
Computer Engineering 1nd Semester
Grep Allows you to filter text based upon several different regular expression variants Basic Extended Perl.
CS 403: Programming Languages
Folks Carelli, Instructor Kutztown University
12. Automata and Regular Expressions
Unix Talk #2 grep/egrep/fgrep (maybe add more to this one….)
Unix Talk #2 (sed).
An Overview of Grep and Regular Expression
Chin-Chih Chang CS 497C – Introduction to UNIX Lecture 28: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
Compiler Structures 3. Lex Objectives , Semester 2,
Computer Eng. Software Lab
CSE 303 Concepts and Tools for Software Development
Regular Expressions and Grep
Regular Expressions grep and sed intro
CSCI The UNIX System Regular Expressions
Lab 8: Regular Expressions
Nate Brunelle Today: Regular Expressions
Nate Brunelle Today: Regular Expressions
REGEX.
Review.
LPI Linux Certification
Regular Expressions.
Presentation transcript:

1.5 Regular Expressions (REs) Compiler Structures 242-437, Semester 2, 2018-2019 1.5 Regular Expressions (REs) Objectives what is a regular expression? give examples of REs used in the grep command

1. Regular Expressions A regular expression (RE or regex) is a pattern used to match against text when searching inside a file. Regexs are used everywhere in Linux: Editors: ed, ex, vi Utilities: grep, egrep, sed, and awk

String Regex c k s UNIX Tools rocks. UNIX Tools sucks. regex pattern UNIX Tools rocks. text: match UNIX Tools sucks. text: match text: UNIX Tools is okay. no match

Multiple Matches a p p l e Scrapple from the apple. A regex pattern can match text in more than one place. a p p l e regex pattern Scrapple from the apple. text: match 1 match 2

The . (dot) Regex o . For me to poop on. The . regex pattern can be used to match any character in the text. o . regex pattern For me to poop on. text: match 1 match 2

The Character Class Regex A character class [] can match any set of characters in the text. b [eor] a t regex pattern beat a brat on a boat text: match 1 match 2 match 3

Character Class Examples

Repetition Regex: * (star) The * defines zero or more copies of the letter before it. y a * y regex pattern I got mail, yaaaaaaaaaay! text: match

o a * o I like the zoo. h . * o Say hello Andrew. regex pattern text: match h . * o regex pattern Say hello Andrew. text: match

h . * o regex pattern Say hello to Andrew. text: match Regex are greedy – they match as much of the text as they can.

Anchors: ^ $ ^ b [eor] a t beat a brat on a boat b [eor] a t $ regex pattern ^ matches the beginning of the text line beat a brat on a boat text: match b [eor] a t $ text: regex pattern $ matches the end of the text line beat a brat on a boat match

More Anchors

The | (or) Regex

More Repetition Regexs: * + ?

More Regex Operations See the regular expressions "cheat-sheet" at the course website over 80 operators!!

2. grep “grep” uses a regex pattern to search a text file Examples: all the lines containing a match (or matches) are printed Examples: % grep "root" test1 % grep "r..t" test1 % grep "ro*t" test1 % grep "r[a-z]*t" test1 regex pattern in "..." text filename

The Grep Family grep usual version egrep extended REs | + ? don’t need backslash) fgrep only strings, i.e. is faster

Common “grep” Options -c Print a count of matched lines. -i Ignore uppercase/ lowercase -l List filenames that contain matches -n Print matched lines and line numbers -s Work silently; only display error messages. -v Print lines that do not match the pattern.

Some Simple Examples grep searches input lines, a line at a time. If the line contains a string that matches grep's RE (pattern), then the line is output. input lines (e.g. from a file) output matching lines (e.g. to a file) grep "RE" hello andy my name is andy my bye byhe continued

Examples "|" means "or" continued grep "and" grep -E "an|my" hello andy my name is andy my bye byhe hello andy my name is andy grep -E "an|my" hello andy my name is andy my bye byhe hello andy my name is andy my bye byhe "|" means "or" continued

"*" means "0 or more" grep "hel*" hello andy my name is andy my bye byhe hello andy my bye byhe "*" means "0 or more"

grep with \< \> begin and end of word Look for the word "north"

grep with a\|b a or b egrep doesn't need backslash

grep with \+ one or more egrep doesn't need backslash

grep with . any character egrep doesn't need backslash

grep with ^ and $ begin and end of line

grep with [ ] set of chars

Fun with a Linux Dictionary Find the location of the words file List all the words containing "hh"

Look for "niether" or "neither" Look for words with three "u"s Count the words with three "a"s

Complex Regex Examples Variable names in C [a-zA-Z_][a-zA-Z_0-9]* Dollar amount with optional cents \$[0-9]+(\.[0-9][0-9])? Time of day (1[012]|[1-9]):[0-5][0-9] (am|pm) HTML headers <h1> <H1> <h2> … <[hH][1-4]>

3. The RE Language A RE can be defined as a pattern language (operands and operators) which matches on text strings.

Some Possible RE Operands text characters (e.g. ‘a’, ‘1’, ‘(‘) the symbol e (means an empty string ‘’) in code just use "" variables, which can be assigned a RE variable = RE

The Basic RE Operators There are three basic operators: union ‘|’ concatenation closure *

Union S | T use S or T to match strings Example REs: a | b a | b | c

Concatenation S T Example REs: use S followed by the T to match against strings Example REs: a b matches the string "ab" w | (a b) matches the strings "w" or "ab"

Closure S* Example RE: use S 0 or more times to match against strings a* matches the strings: e, a, aa, aaa, aaaa, aaaaa, ... empty string

3.1. REs for C Identifiers We define two RE variables, letter and digit: letter = A | B | C | D ... Z | a | b | c | d .... z digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 id is defined using letter and digit: id = letter ( letter | digit )* continued

Strings matched by id include: ab345 w h5g Strings not matched: 2 $abc ****

3.2. REs for Integers and Floats We redefine digit: digit = 0|1|2|3|4|5|6|7|8|9 or digit = [1 – 9] int and float: int = {digit}+ float = {digit}+ "." {digit}+

Integers and floats with exponents: number = {digit}+ ('.' {digit}+ )? ( 'E'('+'|'-')? {digit}+ )?

4. More on REs See RE summary on the course website: regular_expressions_cheat_sheet.pdf I have the standard RE book: Mastering Regular Expressions Jeffrey E. F. Freidl O'Reilly & Associates continued

There are many websites that explain REs: http://etext.lib.virginia.edu/services/ helpsheets/unix/regex.html http://www.zytrax.com/tech/web/regex.htm http://www.regular-expressions.info