Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

Slides:



Advertisements
Similar presentations
Tr. translate characters - standard input. tr x y < namesAndNumbers.txt translated from x to y in file namesAndNumbers.txt tr can be used to produce more.
Advertisements

การใช้ระบบปฏิบัติการ UNIX พื้นฐาน บทที่ 4 File Manipulation วิบูลย์ วราสิทธิชัย นักวิชาการคอมพิวเตอร์ ศูนย์คอมพิวเตอร์ ม. สงขลานครินทร์ เวอร์ชั่น 1 วันที่
CSCI 330 T HE UNIX S YSTEM Regular Expressions. R EGULAR E XPRESSION A pattern of special characters used to match strings in a search Typically made.
A Guide to Unix Using Linux Fourth Edition
 *, ? And [ …] . Any single character  ^ beginning of a line  $ end of the line.
Quotes: single vs. double vs. grave accent % set day = date % echo day day % echo $day date % echo '$day' $day % echo "$day" date % echo `$day` Mon Jul.
CS 497C – Introduction to UNIX Lecture 25: - Simple Filters Chin-Chih Chang
Guide To UNIX Using Linux Third Edition
T UTORIAL OF U NIX C OMMAND & SHELL SCRIPT S 5027 Professor: Dr. Shu-Ching Chen TA: Samira Pouyanfar Spring 2015.
Grep, comm, and uniq. The grep Command The grep command allows a user to search for specific text inside a file. The grep command will find all occurrences.
Introduction to UNIX GPS Processing and Analysis with GAMIT/GLOBK/TRACK T. Herring, R. King. M. Floyd – MIT UNAVCO, Boulder - July 8-12, 2013 Directory.
Unix Filters Text processing utilities. Filters Filter commands – Unix commands that serve dual purposes: –standalone –used with other commands and pipes.
UNIX Filters.
Shell Script Examples.
Chapter 4: UNIX File Processing Input and Output.
Advanced File Processing
Overview of the grep Command Alex Dukhovny CS 265 Spring 2011.
System Programming Regular Expressions Regular Expressions
Agenda User Profile File (.profile) –Keyword Shell Variables Linux (Unix) filters –Purpose –Commands: grep, sort, awk cut, tr, wc, spell.
Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files.
Guide To UNIX Using Linux Fourth Edition
LIN 6932 Unix Lecture 6 Hana Filip. LIN 6932 HW6 - Part II solutions posted on my website see syllabus.
Unix Talk #2 (sed). 2 You have learned…  Regular expressions, grep, & egrep  grep & egrep are tools used to search for text in a file  AWK -- powerful.
Introduction to Unix (CA263) File Processing. Guide to UNIX Using Linux, Third Edition 2 Objectives Explain UNIX and Linux file processing Use basic file.
Unix programming Term: III B.Tech II semester Unit-II PPT Slides Text Books: (1)unix the ultimate guide by Sumitabha Das (2)Advanced programming.
Sed sed is a program used for editing data. It stands for stream editor. Unlike ed, sed cannot be used interactively. However, its commands are similar.
CS 403: Programming Languages Lecture 21 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
CIT 500: IT Fundamentals Text Processing 1. Topics 1.Displaying files: cat, less, od, head, tail 2.Creating and appending 3.Concatenating files 4.Comparing.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
UNIX Shell Script (1) Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
Chapter Five Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command.
Module 6 – Redirections, Pipes and Power Tools.. STDin 0 STDout 1 STDerr 2 Redirections.
(Stream Editor) By: Ross Mills.  Sed is an acronym for stream editor  Instead of altering the original file, sed is used to scan the input file line.
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
I/O Redirection and Regular Expressions February 9 th, 2004 Class Meeting 4.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
WHAT IS SED? A non-interactive stream editor Interprets sed instructions and performs actions Use sed to: Automatically perform edits on file(s) ‏ Simplify.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Appendix A: Regular Expressions It’s All Greek to Me.
Introduction to Unix (CA263) File Processing (continued) By Tariq Ibn Aziz.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
Introduction to sed. Sed : a “S tream ED itor ” What is Sed ?  A “non-interactive” text editor that is called from the unix command line.  Input text.
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.
Unix Programming Environment Part 3-4 Regular Expression and Pattern Matching Prepared by Xu Zhenya( Draft – Xu Zhenya(
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
Introduction to Unix (CA263) Quotes By Tariq Ibn Aziz.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
Files and Directories in UNIX The first file in UNIX file system is “root” or “/”
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
Uniq The uniq command is useful when you need to find duplicate lines in a file. The basic format of the command is uniq in_file out_file In this format,
In the last class, Filters and delimiters The sample database pr command head and tail commands cut and paste commands.
6/13/2016Course material created by D. Woit 1 CPS 393 Introduction to Unix and C START OF WEEK 3 (UNIX) 6/13/2016Course material created by D. Woit 1.
Filters and Utilities. Notes: This is a simple overview of the filtering capability Some of these commands are very powerful ▫Only showing some of the.
Regular Expressions Copyright Doug Maxwell (
Tutorial of Unix Command & shell scriptS 5027
Lesson 5-Exploring Utilities
Looking for Patterns - Finding them with Regular Expressions
Linux command line basics III: piping commands for text processing
Tutorial of Unix Command & shell scriptS 5027
Tutorial of Unix Command & shell scriptS 5027
Folks Carelli, Instructor Kutztown University
Guide To UNIX Using Linux Third Edition
Tutorial of Unix Command & shell scriptS 5027
Unix Talk #2 (sed).
Chapter Four UNIX File Processing.
CSCI The UNIX System Regular Expressions
Software I: Utilities and Internals
Presentation transcript:

Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X that is surrounded by any two characters Caret character ^ matches the beginning of the line ^Bridgeport matches the characters Bridgeport only if they occur at the beginning of the line

Regular expressions (continue.) A dollar sign ‘$’ is used to match the end of the line Bridgeport$ will match the characters Bridgeport only they are the very last characters on the line $ matches any single character at the end of the line To match any single character, this character should be preceded by a backslash ‘\’ to remove the special meaning \.$ matches any line end with a period

Regular expressions (continue.) ^$ matches any line that contains no characters […] is used to match any character enclosed in […] [tT] matches a lower or upper case t followed immediately by the characters [A-Z] matches upper case letter [A-Za-z] matches upper or lower case letter [^A-Z] matches any character except upper case letter [A-Za-z] matches any non alphabetic character

Regular expressions (continue.) (*) Asterisk matches zero or more characters X* matches zero, one, two, three, … capital X’s XX* matches one or more capital X’s.* matches zero or more occurrences of any characters e.*e matches all the characters from the first e in the line to the last one [A-Za-z] [A-Za-z] * matches any alphabetic character followed by zero or more alphabetic character

Regular expressions (continue.) [-0-9] matches a single dash or digit character (ORDER IS IMPORTANT) [0-9-] same as [-0-9] [^-0-9] matches any alphabetic except digits and dash []a-z] matches a right bracket or lower case letter (ORDER IS IMPORTANT)

Regular expressions (continue.) \{min, max\} matches a precise number of characters min specifies the minimum number of occurrences of the preceding regular expression to be matched, and max specifies the maximum w\{1,10\} matches from 1 to 10 consecutive w’s [a-zA-Z]\{7\} matches exactly seven alphabetic characters

Regular expressions (continue.) X\{5,\} matches at least five consecutive X’s \(….) is used to save matched characters ^\(.\) matches the first character on the line and store it into register one There is 1-9 registers To retrieve what is stored in any register \n is used Example: ^\(.\)\1 matches the first two characters on a line if they are both the same characters

Regular expressions (continue.) ^\(.\).*\1$ matches all lines in which the first character on the line is the same as the last. Note (.*) matches all the characters in- between ^\(…)\(…\) the first three characters on the line will be stored into register 1 and the next three characters into register 2

cut $ who bgeorge pts/16 Oct 5 15:01 ( ) abakshi pts/13 Oct 6 19:48 ( ) tphilip pts/11 Oct 2 14:10 (AC8C6085.ipt.aol.com) $ who | cut -c1-8,18- bgeorge Oct 5 15:01 ( ) abakshi Oct 6 19:48 ( ) tphilip Oct 2 14:10 (AC8C6085.ipt.aol.com) $ Used in extracting various fields of data from a data file or the output of a command Format: cut -cchars file chars specifies what characters to extract from each line of file.

cut (continue.) Example: -c5, -c1,3,4 -c c5- The –d and –f options are used with cut when you have data that is delimited by a particular character Format: cut –ddchars –ffields file dchar: delimiters of the fields (default: tab character) fields: fields to be extracted from file

cut (continue.) $ cat /etc/passwd root:x:0:1:Super-User:/:/sbin/sh daemon:x:1:1::/: bin:x:2:2::/usr/bin: sys:x:3:3::/: adm:x:4:4:Admin:/var/adm: lp:x:71:8:Line Printer Admin:/usr/spool/lp: uucp:x:5:5:uucp Admin:/usr/lib/uucp: listen:x:37:4:Network Admin:/usr/net/nls: nobody:x:60001:60001:Nobody:/: noaccess:x:60002:60002:No Access User:/: oracle:*:101:67:DBA Account:/export/home/oracle:/bin/csh webuser:*:102:102:Web User:/export/home/webuser:/bin/csh abuzneid:x:103:100:Abdelshakour Abuzneid:/home/abuzneid:/sbin/csh $

cut (continue.) $ cut -d: -f1 /etc/passwd root daemon bin sys adm lp uucp nuucp listen nobody oracle webuser abuzneid $

cut (continue.) $ cat phonebook Edward Alice Sony Robert $ cut -f1 phonebook Edward Alice Sony Robert $

paste Format: paste files tab character is a default delimiter

paste (continue.) Example: $ cat students Sue Vara Elvis Luis Eliza $ cat sid $ paste students sid Sue Vara Elvis Luis Eliza $

paste (continue.) The option –s tells paste to paste together lines from the same file not from alternate files To change the delimiter, -d option is used

paste (continue.) Examples: $ paste -d '+' students sid Sue Vara Elvis Luis Eliza $ paste -s students Sue Vara Elvis Luis Eliza $ ls | paste -d ' ' -s - addr args list mail memo name nsmail phonebook programs roster sid students test tp twice user $

sed sed (stream editor) is a program used for editing data Unlike ed, sed can not be used interactively Format: sed command file command: applied to each line of the specified file file: if no file is specified, then standard input is assumed sed writes the output to the standard output s/Unix/UNIX command is applied to every line in the file, it replaces the first Unix with UNIX

sed (continue.) sed makes no changes to the original input file ‘s/Unix/UNIX/g’ command is applied to every line in the file. It replaces every Unix with UNIX. “g” means global With –n option, selected lines can be printed Example: sed –n ’1,2p’ file which prints the first two lines Example: sed –n ‘/UNIX/p’ file, prints any line containing UNIX

sed (continue.) Example: sed –n ‘/1,2d/’ file, deletes lines 1 and 2 Example: sed –n’ /1’ text, prints all lines from text, showing non printing characters as \nn and tab characters as “>”

tr The tr filter is used to translate characters from standard input Format: tr from-chars to-chars Result is written to standard output Example tr e x <file, translates every “e” in file to “x” and prints the output to the standard output The octal representation of a character can be given to “tr” in the format \nnn Example: tr : ‘\11’ will translate all : to tabs

tr (continue.) CharacterOctal value Bell7 Backspace10 Tab11 New line12 Linefeed12 Form feed14 Carriage return15 Escape33

tr (continue.) Example: tr ‘[a-z]’’[A-Z]’ < file translate all lower case letters in file to their uppercase equivalent. The characters ranges [a-z] and [A-Z] are enclosed in quotes to keep the shell from replacing them with all files named from a through z and A through Z To “squeeze” out multiple occurrences of characters the –s option is used

tr (continue.) Example: tr –s ’ ’ ‘ ‘ < file will squeeze multiple spaces to one space The –d option is used to delete single characters from a stream of input Format: tr –d from-chars Example: tr –d ‘ ‘ < file will delete all spaces from the input stream

grep Searches one or more files for a particular characters patterns Format: grep pattern files Example: grep path.cshrc will print every line in.cshrc file which has the pattern ‘path’ and print it Example: grep bin.cshrc.login.profile will print every line from any of the three files.cshrc,.login and.profile which has the pattern “bin”

grep (continue.) Example : grep * smarts will give an error because * will be substituted with all file in the correct directory Example : grep ‘*’ smarts * smarts grep arguments

sort By default, sort takes each line of the specified input file and sorts it into ascending order $ cat students Sue Vara Elvis Luis Eliza $ sort students Eliza Elvis Luis Sue Vara $

sort (continue.) The –n option tells sort to eliminate duplicate lines from the output

sort (continue.) $ echo Ash >> students $ cat students Sue Vara Elvis Luis Eliza Ash $ sort students Ash Eliza Elvis Luis Sue Vara $

sort (continue.) The –s option reverses the order of the sort The –o option is used to direct the input from the standard output to file sort students > sorted_students works as sort students –o sorted_students The –o option allows to sort file and saves the output to the same file Example: sort students –o students correct sort students > students incorrect

sort (continue.) The –n option specifies the first field for sort as number and data to sorted arithmetically

sort (continue.) $ cat data $ sort data $

sort (continue.) $ sort -n data $ sort +1n data $

sort (continue.) To sort by the second field +1n should be used instead of n. +1 says to skip the first field +5n would mean to skip the first five fields on each line and then sort the data numerically

sort (continue.) Example $ sort -t: +2n /etc/passwd root:x:0:1:Super-User:/:/sbin/sh daemon:x:1:1::/: bin:x:2:2::/usr/bin: sys:x:3:3::/: adm:x:4:4:Admin:/var/adm: uucp:x:5:5:uucp Admin:/usr/lib/uucp: nuucp:x:9:9:uucp Admin:/var/spool/uucppublic:/usr/lib/uucp/uucico listen:x:37:4:Network Admin:/usr/net/nls: lp:x:71:8:Line Printer Admin:/usr/spool/lp: oracle:*:101:67:DBA Account:/export/home/oracle:/bin/csh webuser:*:102:102:Web User:/export/home/webuser:/bin/csh y:x:60001:60001:Nobody:/: $

uniq Used to find duplicate lines in a file Format: uniq in_file out_file uniq will copy in_file to out_file removing any duplicate lines in the process uniq’s definition of duplicated lines are consecutive-occurring lines that match exactly

uniq (continue.) $ cat students Sue Vara Elvis Luis Eliza Ash $ uniq students Sue Vara Elvis Luis Eliza Ash $ The –d option is used to list duplicate lines Example:

References UNIX SHELLS BY EXAMPLE BY ELLIE QUIGLEY UNIX FOR PROGRAMMERS AND USERS BY G. GLASS AND K ABLES UNIX SHELL PROGRAMMING BY S. KOCHAN AND P. WOOD