System Programming Regular Expressions Regular Expressions

Slides:



Advertisements
Similar presentations
Regular Expressions in Perl By Josue Vazquez. What are Regular Expressions? A template that either matches or doesn’t match a given string. Often called.
Advertisements

Regular Expressions (in Python). Python or Egrep We will use Python. In some scripting languages you can call the command “grep” or “egrep” egrep pattern.
Lecture 5  Regular Expressions;  grep; CSE4251 The Unix Programming Environment.
CSCI 330 T HE UNIX S YSTEM Regular Expressions. R EGULAR E XPRESSION A pattern of special characters used to match strings in a search Typically made.
Regular Expressions grep
Regular Expressions grep and egrep. Previously Basic UNIX Commands –Files: rm, cp, mv, ls, ln –Processes: ps, kill Unix Filters –cat, head, tail, tee,
7 Searching and Regular Expressions (Regex) Mauro Jaskelioff.
1 CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller and Ruth Anderson
CS 497C – Introduction to UNIX Lecture 29: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
Chin-Chih Chang CS 497C – Introduction to UNIX Lecture 28: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
1 CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller
CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
Quotes: single vs. double vs. grave accent % set day = date % echo day day % echo $day date % echo '$day' $day % echo "$day" date % echo `$day` Mon Jul.
Regular Expressions. u A regular expression is a pattern which matches some regular (predictable) text. u Regular expressions are used in many Unix utilities.
Scripting Languages Chapter 8 More About Regular Expressions.
Filters using Regular Expressions grep: Searching a Pattern.
Regular Language & Expressions. Regular Language A regular language is one that a finite state machine (fsm) will accept. ‘Alphabet’: {a, b} ‘Rules’:
Chapter 4: UNIX File Processing Input and Output.
Regular Expressions A regular expression defines a pattern of characters to be found in a string Regular expressions are made up of – Literal characters.
Last Updated March 2006 Slide 1 Regular Expressions.
Overview of the grep Command Alex Dukhovny CS 265 Spring 2011.
INFO 320 Server Technology I Week 7 Regular expressions 1INFO 320 week 7.
Unix Talk #2 (sed). 2 You have learned…  Regular expressions, grep, & egrep  grep & egrep are tools used to search for text in a file  AWK -- powerful.
Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
UNIX Shell Script (1) Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
CIS 218 Advanced UNIX1 Advanced UNIX CIS 218 Advanced UNIX Regular Expressions.
I/O Redirection and Regular Expressions February 9 th, 2004 Class Meeting 4.
Regular Expression - Intro Patterns that define a set of strings (or, pieces of a string) Not wildcards (similar notion, but different thing) Used by utilities.
Appendix A: Regular Expressions It’s All Greek to Me.
CSC 4630 Meeting 21 April 4, Return to Perl Where are we? What is confusing? What practice do you need?
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
Sys Prog & Scrip - Heriot Watt Univ 1 Systems Programming & Scripting Lecture 12: Introduction to Scripting & Regular Expressions.
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.
2004/12/051/27 SPARCS 04 Seminar Regular Expression By 박강현 (lightspd)
Unix Programming Environment Part 3-4 Regular Expression and Pattern Matching Prepared by Xu Zhenya( Draft – Xu Zhenya(
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
BASH – Text Processing Utilities Erick, Joan © Sekolah Tinggi Teknik Surabaya 1.
UNIX Commands RTFM: grep(1), egrep(1) & fgrep(1) Gilbert Detillieux April 13, 2010 MUUG Meeting.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
Adv. UNIX: REs/31 Advanced UNIX v Objectives –explain how to write Regular Expressions (REs) in vi and grep Special Topics in Comp. Eng.
FILTERS USING REGULAR EXPRESSIONS – grep and sed.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
PROGRAMMING THE BASH SHELL PART III by İlker Korkmaz and Kaya Oğuz
Regular Expressions Copyright Doug Maxwell (
Regular expressions, egrep, and sed
Regular expressions, egrep, and sed
Looking for Patterns - Finding them with Regular Expressions
Regular expressions, egrep, and sed
Lecture 9 Shell Programming – Command substitution
CSE 390a Lecture 7 Regular expressions, egrep, and sed
Unix Talk #2 grep/egrep/fgrep (maybe add more to this one….)
Unix Talk #2 (sed).
CSE 390a Lecture 7 Regular expressions, egrep, and sed
Chin-Chih Chang CS 497C – Introduction to UNIX Lecture 28: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
Regular expressions, egrep, and sed
Regular expressions, egrep, and sed
Regular expressions, egrep, and sed
Regular expressions, egrep, and sed
CSCI The UNIX System Regular Expressions
Regular expressions, egrep, and sed
1.5 Regular Expressions (REs)
Regular expressions, egrep, and sed
CSE 390a Lecture 7 Regular expressions, egrep, and sed
Presentation transcript:

System Programming Regular Expressions Regular Expressions Chapter Five

UNIX programs that use REs grep (search within files) egrep (grep with extended RE’s) vi/emacs (text editors) ex (line editor) sed (stream editor) awk (pattern scanning language) perl (scripting language) Regular Expressions Chapter Five

Basic vs. Extended REs In basic regular expressions the metacharacters ?, +, {, }, (, ), |, and ) have no special meaning (grep) To give them special meaning, use the escaped versions: \?, \+, \{, \}, \(, \), and \| When using extended regular expressions, these metacharacters have special meaning grep –E = egrep Regular Expressions Chapter Five

Using egrep To be safe, put quotation marks around your pattern egrep pattern filename(s) To be safe, put quotation marks around your pattern Examples: egrep "abc" textfile (print lines containing “abc”) egrep -i "abc" textfile (same, but ignore case) egrep -v "abc" textfile (print lines not containing “abc”) egrep -n "abc" textfile (include line numbers) egrep -c "abc" textfile (print a count of lines containing “abc”) Regular Expressions Chapter Five

Metacharacters Period ( . ): matches any single character “a.c” matches abc, adc, a&c, a;c, … “u..x” matches unix, uvax, u3(x,… Asterisk ( * ): matches zero or more occurrences of the previous RE not the same as wildcards in the shell! “ab*c” matches ac, abc,abbc, abbbc,… “.*”matches any string Regular Expressions Chapter Five

Metacharacters (cont.) Plus ( + ): matches one or more occurrences of the preceding RE “ab+c” matches abc, abbc, but not ac Question mark ( ? ): matches zero or one occurrence of the preceding RE “ab?c” matches ac, abc but not abbc Logical or ( | ): matches RE before or RE after bar “abc|def” matches abc or def Regular Expressions Chapter Five

Metacharacters (cont.) Caret ( ^ ): means beginning of line “^D.*” matches a line beginning with D Dollar sign ( $ ) means end of line “.*d$” matches a line ending with d Backslash ( \ ): escapes other metacharacters “file\.txt” matches file.txt but not file_txt Regular Expressions Chapter Five

Metacharacters (cont.) Square brackets ( [ ] ): specifies a set of characters as a list any character in the set will match ^ before the set negates the set - specifies a character range Examples: “[fF]un” matches fun, Fun “b[aeiou]g” matches bag, beg, big, bog, bug “[A-Z].*” matches a string starting with a capital letter “[^abc].*” matches any string not starting with a, b, or c Regular Expressions Chapter Five

Metacharacters (cont.) Parentheses ( ( ) ): used for grouping “a(bc)*” matches a, abc, abcbc, abcbcbc, … “(foot|base)ball” matches football or baseball Braces ( { } ): specify the number of repetitions of an RE “[a-z]{3}”matches three lowercase letters “m.{2,4}” matches strings with m followed by between 2 and 4 characters Regular Expressions Chapter Five

What do these mean? Examples What if grep was used instead? egrep ”^B.*s$” file egrep ”[0-9]{3}” file egrep ”num(ber)? [0-9]+” file egrep ”word” file | wc -l egrep ”[A-Z].*\?” file ls -l | egrep "^....r.-r.-" What if grep was used instead? Search for users with user IDs containing at least two 0s grep "^[^:]*:[^:]*:[^:]*0[^:]*0[^:]*:.*" /etc/passwd /etc/passwd file format <username>:x:<userid>:<groupid>:<useridinfo>:<homedir>:<loginshell> An x character indicates that encrypted password is stored in /etc/shadow file Regular Expressions Chapter Five

Word searching with egrep The system may have a small dictionary for checking spelling: /usr/dict/words Find words that contain all five vowels in alphabetical order cat alphvowels ^[^aeiou]*a[^aeiou]*e[^aeiou]*i[^aeiou]*o[^aeiou]*u[^aeiou]*$ egrep -f alphvowels /usr/dict/words affectious facetious ... Regular Expressions Chapter Five

Word searching with egrep Find all words of six or more letters that have the letters in alphabetical order. cat monotonic ^a?b?c?d?e?f?g?h?i?j?k?l?m?n?o?p?q?r?s?t?u?v?w?x?y?x?$ egrep -f monotonic /usr/dict/words | grep "......" abdest almost biopsy ... Regular Expressions Chapter Five

Practice Construct egrep commands that find in file: Lines beginning with a word of at least 10 characters Lines containing a student ID number in standard 3-part form Number of lines with 2 consecutive capitalized words Number of lines not ending in an alphabetic character Lines containing a word beginning with a vowel at the end of a sentence Regular Expressions Chapter Five