TEXT PROCESSING UTILITIES. THE cat COMMAND $ cat emp1.lst $ cat emp1.lst 2233 | shukla | g.m | sales | 12/12/52 | 20000 9876 | sharma |d.g.m |product.

Slides:



Advertisements
Similar presentations
UNIX Chapter 10 Advanced File Processing Mr. Mohammad Smirat.
Advertisements

CS 497C – Introduction to UNIX Lecture 22: - The Shell Chin-Chih Chang
 *, ? And [ …] . Any single character  ^ beginning of a line  $ end of the line.
Chin-Chih Chang CS 497C – Introduction to UNIX Lecture 28: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
Now, return to the Unix Unix shells: Subshells--- Variable---1. Local 2. Environmental.
Linux+ Guide to Linux Certification, Second Edition
CS 497C – Introduction to UNIX Lecture 25: - Simple Filters Chin-Chih Chang
CS 497C – Introduction to UNIX Lecture 23: - Simple Filters Chin-Chih Chang
Guide To UNIX Using Linux Third Edition
Lecture 02CS311 – Operating Systems 1 1 CS311 – Lecture 02 Outline UNIX/Linux features – Redirection – pipes – Terminating a command – Running program.
Introduction to UNIX GPS Processing and Analysis with GAMIT/GLOBK/TRACK T. Herring, R. King. M. Floyd – MIT UNAVCO, Boulder - July 8-12, 2013 Directory.
CSCI 330 T HE UNIX S YSTEM File operations. OPERATIONS ON REGULAR FILES 2 CSCI The UNIX System Create Edit Display Contents Display Contents Print.
Unix Files, IO Plumbing and Filters The file system and pathnames Files with more than one link Shell wildcards Characters special to the shell Pipes and.
Unix Filters Text processing utilities. Filters Filter commands – Unix commands that serve dual purposes: –standalone –used with other commands and pipes.
UNIX Filters.
Shell Script Examples.
©NIIT Pipes and Filters Lesson 2B / Slide 1 of 28 Introduction to Linux Pre-Assessment Questions 1.Consider the following statements: Statement A: A text.
Advanced File Processing
Agenda User Profile File (.profile) –Keyword Shell Variables Linux (Unix) filters –Purpose –Commands: grep, sort, awk cut, tr, wc, spell.
Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files.
Guide To UNIX Using Linux Fourth Edition
LIN 6932 Unix Lecture 6 Hana Filip. LIN 6932 HW6 - Part II solutions posted on my website see syllabus.
Introduction to Unix (CA263) File Processing. Guide to UNIX Using Linux, Third Edition 2 Objectives Explain UNIX and Linux file processing Use basic file.
Unix programming Term: III B.Tech II semester Unit-II PPT Slides Text Books: (1)unix the ultimate guide by Sumitabha Das (2)Advanced programming.
The UNIX Shell. The Shell Program that constantly runs at terminal after a user has logged in. Prompts the user and waits for user input. Interprets command.
Linux+ Guide to Linux Certification, Third Edition
Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Week 3 Exploring Linux Filesystems. Objectives  Understand and navigate the Linux directory structure using relative and absolute pathnames  Describe.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
UNIX Shell Script (1) Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
Chapter Five Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command.
Module 6 – Redirections, Pipes and Power Tools.. STDin 0 STDout 1 STDerr 2 Redirections.
(Stream Editor) By: Ross Mills.  Sed is an acronym for stream editor  Instead of altering the original file, sed is used to scan the input file line.
SQL (DDL & DML Commands)
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
Chapter 3: Formatted Input/Output Copyright © 2008 W. W. Norton & Company. All rights reserved. 1 Chapter 3 Formatted Input/Output.
WHAT IS A DATABASE? A DATABASE IS A COLLECTION OF DATA RELATED TO A PARTICULAR TOPIC OR PURPOSE OR TO PUT IT SIMPLY A GENERAL PURPOSE CONTAINER FOR STORING.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Introduction to Unix (CA263) File Processing (continued) By Tariq Ibn Aziz.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
Searching and Sorting. Why Use Data Files? There are many cases where the input to the program may come from a data file.Using data files in your programs.
Ch 91 Pipes, Filters and Redirection. Ch 92 Overview Will use redirection to redirect standard input and standard output.
Lesson 4-Mastering the Visual Editor. Overview Introducing the visual editor. Working in an existing file with vi. Understanding the visual editor. Navigating.
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.
Lesson 3-Touring Utilities and System Features. Overview Employing fundamental utilities. Linux terminal sessions. Managing input and output. Using special.
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
– Introduction to the Shell 1/21/2016 Introduction to the Shell – Session Introduction to the Shell – Session 3 · Job control · Start,
The awk command. Introduction Awk is a programming language used for manipulating data and generating reports. The data may come from standard input,
Linux+ Guide to Linux Certification, Second Edition Chapter 4 Exploring Linux Filesystems.
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
FILTERS USING REGULAR EXPRESSIONS – grep and sed.
In the last class, Filters and delimiters The sample database pr command head and tail commands cut and paste commands.
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
Chapter 3: Formatted Input/Output 1 Chapter 3 Formatted Input/Output.
Filters and Utilities. Notes: This is a simple overview of the filtering capability Some of these commands are very powerful ▫Only showing some of the.
SIMPLE FILTERS. CONTENTS Filters – definition To format text – pr Pick lines from the beginning – head Pick lines from the end – tail Extract characters.
Lesson 5-Exploring Utilities
BASIC AND EXTENDED REGULAR EXPRESSIONS
Chapter 6 Filters.
Filters using regular expressions
In the last class, sed to edit an input stream and understand its addressing mechanism Line addressing Using multiple instructions Context addressing Writing.
Guide To UNIX Using Linux Third Edition
Chapter Four UNIX File Processing.
Chin-Chih Chang CS 497C – Introduction to UNIX Lecture 28: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
Software I: Utilities and Internals
Presentation transcript:

TEXT PROCESSING UTILITIES

THE cat COMMAND $ cat emp1.lst $ cat emp1.lst 2233 | shukla | g.m | sales | 12/12/52 | | sharma |d.g.m |product | 12/03 60 | | akash |dir. |mark. | 11/06/70 | | tiwary |g.m |product | 05/02/89 | | kumar | mgr |accnts | 18/03/79 | | anil |chman |sales | 30/02/69 | | lalith |mrg | mark. | 17/01/80 | | a | d | m | 12/12/80 |12000 This is the emp database which stores the information about various employees. that is employeenumber. emp name designationdepartment date of birth and their salary.

DISPLAYING THE BEGINNING OF A FILE – THE head COMMAND The head command as the name implies displays the top LINES of the file. When used without an option it displays the first ten records of the argument file. The head command as the name implies displays the top LINES of the file. When used without an option it displays the first ten records of the argument file.

$ head emp.lst $ head emp.lst 2233 | shukla | g.m | sales | 12/12/52 | | sharma |d.g.m |product| 12/03 60 | | akash |dir. |mark. | 11/06/70 | | tiwary |g.m |product| 05/02/89 | | kumar | mgr |accnts | 18/03/79 | | anil |chman |sales | 30/02/69 | | lalith |mrg | mark. | 17/01/80 | | a | d | m | 12/12/80 |12000 This is the emp database which stores

You can specify the line count and display say the first three lines of the file. Use the – symbol, followed by a numeric argument. You can specify the line count and display say the first three lines of the file. Use the – symbol, followed by a numeric argument. Ex: $ head -3 emp.lst Ex: $ head -3 emp.lst 2233 | shukla | g.m | sales | 12/12/52 | | sharma |d.g.m |product| 12/03 60 | | akash |dir. |mark. | 11/06/70 |9000 If the linecount specified exceeds the number of lines actually present in the file, head displays the entire file. If the linecount specified exceeds the number of lines actually present in the file, head displays the entire file. You can also find out the “record length” by word counting the first line of the file : You can also find out the “record length” by word counting the first line of the file : $ head -1 emp.lst | wc -c $ head -1 emp.lst | wc -c 47 47

head also works with multiple files. For each file it indicates the filename and the lines extracted: head also works with multiple files. For each file it indicates the filename and the lines extracted: $ head -2 emp.lst f1.lst $ head -2 emp.lst f1.lst ==> emp.lst emp.lst <== 2233 | shukla | g.m | sales | 12/12/52 | | sharma |d.g.m|product| 12/03 60 | ==> f1.lst f1.lst <== root tty :56 (:0) root pts/ :56 (:0)

DISPLAYING THE END OF A FILE – THE tail COMMAND The tail command displays the end of the file. It provides an additional method of addressing lines, and can also extract information in units of blocks and characters. The tail command displays the end of the file. It provides an additional method of addressing lines, and can also extract information in units of blocks and characters. Like head it displays the last ten lines when used without arguments. Like head it displays the last ten lines when used without arguments. Ex: Ex: $ tail -3 emp.lst $ tail -3 emp.lstdepartment date of birth and their salary.

$ tail emp.lst $ tail emp.lst This is the emp database which stores the information about various employees. that is employeenumber. emp name designationdepartment date of birth and their salary.

~]$ tail -40c emp.lst ~]$ tail -40c emp.lstartment date of birth and their salary. Ex: $ tail -v emp.lst Ex: $ tail -v emp.lst ==> emp.lst emp.lst <== This is the emp This is the emp database which stores database which stores the information about various the information about various employees. employees. that is employeenumber. that is employeenumber. emp name emp name designation designation department department date of birth date of birth and their salary. and their salary.

The disadvantage with head and tail is that they cannot display a range of lines. Moreover what is displayed is final. That is if we have displayed the first 50 lines in a file, we cannot move back and view say the 10 lines. The disadvantage with head and tail is that they cannot display a range of lines. Moreover what is displayed is final. That is if we have displayed the first 50 lines in a file, we cannot move back and view say the 10 lines. -v -v If you use this option it will always print the headers giving the file name. If you use this option it will always print the headers giving the file name.

Tail also address lines from the beginning of the file instead of the end. The + count option allows you to do that, where count represents the line number from where the selection should begin. Tail also address lines from the beginning of the file instead of the end. The + count option allows you to do that, where count represents the line number from where the selection should begin. Ex: Ex: $ tail -n +8 emp.lst $ tail -n +8 emp.lst 5678 | a | d | m | 12/12/80 |12000 This is the emp database which stores the information about various employees. that is employeenumber. emp name designationdepartment date of birth and their salary.

SLITTING A FILE VERTICALLY – THE cut COMMAND While head and tail are used to slice a file horizontally, you can slice a file vertically with the cut command. Cut identifies both columns and fields. While head and tail are used to slice a file horizontally, you can slice a file vertically with the cut command. Cut identifies both columns and fields. Syntax: Syntax: cut cut Ex: store the first 5 lines of the file emp.lst in a file shortlist. Ex: store the first 5 lines of the file emp.lst in a file shortlist. $ head -5 emp.lst >shortlist $ head -5 emp.lst >shortlist

$ cat shortlist $ cat shortlist 2233 | shukla | g.m | sales | 12/12/52 | | sharma |d.g.m|product| 12/03 60 | | akash |dir. |mark. | 11/06/70 | | tiwary |g.m |product| 05/02/89 | | kumar | mgr |accnts | 18/03/79 |15000 cut can be used to extract specific columns from this file. Use the –c (columns) option for cutting columns: cut can be used to extract specific columns from this file. Use the –c (columns) option for cutting columns: $ cut -c5-20 shortlist $ cut -c5-20 shortlist | shukla | g.m | sharma |d.g.m | sharma |d.g.m | akash |dir. | akash |dir. | tiwary |g.m | tiwary |g.m | kumar | mgr | kumar | mgr Column numbers must immediately follow the option. Ranges are permitted, and commas are used to separate the column chunks. Column numbers must immediately follow the option. Ranges are permitted, and commas are used to separate the column chunks.

$ cut -c2-5,10-15,40- shortlist $ cut -c2-5,10-15,40- shortlist 233 ukla || arma || ash || wary || mar ||15000 The expression 40- indicates column number 55 to end of the line. The expression 40- indicates column number 55 to end of the line. The method of tracking fields by column positions is tedious and also the file may doesn’t contain fixed length records. The method of tracking fields by column positions is tedious and also the file may doesn’t contain fixed length records. You can extract specific fields using two options -d (delimiter) for specification of the field delimiter and –f (field) for specifying the field list: You can extract specific fields using two options -d (delimiter) for specification of the field delimiter and –f (field) for specifying the field list: When you use the –f option, don’t forget to use the –d option too, unless the file has the default delimiter (the tab). When you use the –f option, don’t forget to use the –d option too, unless the file has the default delimiter (the tab).

Ex: $ cut -d"|" -f2,3 shortlist | tee clist1 Ex: $ cut -d"|" -f2,3 shortlist | tee clist1 shukla | g.m shukla | g.m sharma |d.g.m sharma |d.g.m akash |dir. akash |dir. tiwary |g.m tiwary |g.m kumar | mgr kumar | mgr The tee command saves the output in the file clist1, and also displays it on the terminal. The tee command saves the output in the file clist1, and also displays it on the terminal. $ cat clist1 $ cat clist1 shukla | g.m shukla | g.m sharma |d.g.m sharma |d.g.m akash |dir. akash |dir. tiwary |g.m tiwary |g.m kumar | mgr kumar | mgr

PASTING FILES – THE paste COMMAND What you “cut” with the previous command can be pasted with the paste command. What you “cut” with the previous command can be pasted with the paste command. In this respect it resembles the cat command. But while cat pastes more than one file horizontally, paste does it vertically. In this respect it resembles the cat command. But while cat pastes more than one file horizontally, paste does it vertically. $ cut -d"|" -f6 shortlist | tee clist2 $ cut -d"|" -f6 shortlist | tee clist Cut was used to create two files clist1 and clist2, containing two cut-out portions of the same file. Cut was used to create two files clist1 and clist2, containing two cut-out portions of the same file.

$ paste clist1 clist2 $ paste clist1 clist2 shukla | g.m sharma |d.g.m sharma |d.g.m akash |dir akash |dir tiwary |g.m tiwary |g.m kumar | mgr kumar | mgr By default paste uses the tab character for pasting files. You can specify a delimiter of your choice: By default paste uses the tab character for pasting files. You can specify a delimiter of your choice: $ paste -d"|" clist1 clist2 $ paste -d"|" clist1 clist2 shukla | g.m # sharma |d.g.m # sharma |d.g.m # akash |dir. # 9000 akash |dir. # 9000 tiwary |g.m # tiwary |g.m # kumar | mgr # kumar | mgr # 15000

While using the –d option along with several files in the command line, you can specify more than one delimiter. For ex: While using the –d option along with several files in the command line, you can specify more than one delimiter. For ex: $ paste –d” |#~” file1 file2 file3 file4 file5 $ paste –d” |#~” file1 file2 file3 file4 file5 The above example uses the space character for pasting file1 and file2, the | character for pasting file2 and file3 and so forth. The above example uses the space character for pasting file1 and file2, the | character for pasting file2 and file3 and so forth.

ORDERING A FILE – THE sort COMMAND Sorts the contents of a file. Sorts the contents of a file. It can merge multiple sorted files and store the result in the specified output file. It can merge multiple sorted files and store the result in the specified output file. When the command is invoked without options, it sorts the entire line : When the command is invoked without options, it sorts the entire line : Ex: Ex: $ sort shortlist $ sort shortlist 1234 | kumar | mgr |accnts | 18/03/79 | | shukla | g.m | sales | 12/12/52 | | tiwary |g.m |product| 05/02/89 | | akash |dir. |mark. | 11/06/70 | | sharma |d.g.m|product| 12/03 60 | 15000

Sorting starts with the first character of each line in the file. If the first character of two lines is same then the second character in each line is compared and so on. Sorting starts with the first character of each line in the file. If the first character of two lines is same then the second character in each line is compared and so on. The sorting is done according to the ASCII collating sequence. That is, it sorts the spaces and tabs first, then the punctuation marks followed by numbers, uppercase letters and lowercase letters in that order. The sorting is done according to the ASCII collating sequence. That is, it sorts the spaces and tabs first, then the punctuation marks followed by numbers, uppercase letters and lowercase letters in that order. Like cut and paste, sort also works on fields, and the default field separator is the space character. The –t option, followed immediately by the delimiter, overrides the default. This lets you to sort the file on any field, for instance, the second field (name): Like cut and paste, sort also works on fields, and the default field separator is the space character. The –t option, followed immediately by the delimiter, overrides the default. This lets you to sort the file on any field, for instance, the second field (name): $ sort –t”|” –k2 shortlist $ sort –t”|” –k2 shortlist

The sort order can be reversed with the –r (reverse) option. The sort order can be reversed with the –r (reverse) option. Ex: Ex: $ sort -r shortlist $ sort -r shortlist 9876 | sharma |d.g.m|product| 12/03 60 | | akash |dir. |mark. | 11/06/70 | | tiwary |g.m |product| 05/02/89 | | shukla | g.m | sales | 12/12/52 | | kumar | mgr |accnts | 18/03/79 |15000 We can sort the contents of several files at one shot as in: We can sort the contents of several files at one shot as in: $ sort file1 file2 file3 $ sort file1 file2 file3

Instead of displaying the sorted output on the screen we can store it in a file by saying, Instead of displaying the sorted output on the screen we can store it in a file by saying, $ sort –o result clist1 $ sort –o result clist1 $ cat result $ cat result akash |dir. akash |dir. kumar | mgr kumar | mgr sharma |d.g.m sharma |d.g.m shukla | g.m shukla | g.m tiwary |g.m tiwary |g.m To check whether the file has actually been sorted, use To check whether the file has actually been sorted, use $ sort –c shortlist $ sort –c shortlist

Sorting on secondary key: Sorting on secondary key: You can sort on more than one key, i.e., you can provide a secondary key to sort. For example, if the primary key is the 3 rd field, and the secondary key is the 2 nd field, then you need to specify for every –k option, where the sort ends. This is done in this way: You can sort on more than one key, i.e., you can provide a secondary key to sort. For example, if the primary key is the 3 rd field, and the secondary key is the 2 nd field, then you need to specify for every –k option, where the sort ends. This is done in this way: $ sort -t"|" -k3,3 -k2,2 shortlist $ sort -t"|" -k3,3 -k2,2 shortlist 9876 | sharma |d.g.m|product| 12/03 60 | | akash |dir. |mark. | 11/06/70 | | shukla | g.m | sales | 12/12/52 | | tiwary |g.m |product| 05/02/89 | | kumar | mgr |accnts | 18/03/79 |15000 This sorts the file by designation and name. the –k3,3 option indicates that sorting starts on the 3 rd field and ends on the same field. This sorts the file by designation and name. the –k3,3 option indicates that sorting starts on the 3 rd field and ends on the same field.

Sorting on columns : Sorting on columns : You can also specify a character position within a field to be the beginning of sort. For example, if you are to sort the file according to the year of birth, then you need to sort on the 7 th and 8 th column positions within 5 th field: You can also specify a character position within a field to be the beginning of sort. For example, if you are to sort the file according to the year of birth, then you need to sort on the 7 th and 8 th column positions within 5 th field: $ sort -t"|" -k5.7,5.8 shortlist $ sort -t"|" -k5.7,5.8 shortlist 2233 | shukla | g.m | sales | 12/12/52 | | sharma |d.g.m|product| 12/03 60 | | kumar | mgr |accnts | 18/03/79 | | akash |dir. |mark. | 11/06/70 | | tiwary |g.m |product| 05/02/89 |23000

Numeric sort (-n): Numeric sort (-n): When sort acts on numerals, strange things can happen. When sort acts on numerals, strange things can happen. ~]$ cat>nfile ~]$ cat>nfile ~]$ sort nfile ~]$ sort nfile This is probably not what you expected, but the ASCII collating sequence places 1 above 2, and 2 above 4. That’s why 10 preceded 2 and 27 preceded 4. This can be overridden by the –n (numeric ) option. This is probably not what you expected, but the ASCII collating sequence places 1 above 2, and 2 above 4. That’s why 10 preceded 2 and 27 preceded 4. This can be overridden by the –n (numeric ) option.

~]$ sort -n nfile ~]$ sort -n nfile241027

Removing Repeated Lines (-u): Removing Repeated Lines (-u): The –u (unique) option lets you remove repeated lines from a file. To find out the unique designations that occur in the file, cut out the designation field and pipe it to sort : The –u (unique) option lets you remove repeated lines from a file. To find out the unique designations that occur in the file, cut out the designation field and pipe it to sort : $ cut -d"|" -f3 e.lst | sort -u |tee desg.lst $ cut -d"|" -f3 e.lst | sort -u |tee desg.lstdir. g.m g.m mgr mgr Merge sort (-m): Merge sort (-m): When sort is used with multiple filenames as arguments, it concatenates them and sorts them collectively. When sort is used with multiple filenames as arguments, it concatenates them and sorts them collectively. When large files are sorted in this way, performance often suffers. The –m (merge) option can merge two or more files that are sorted individually. When large files are sorted in this way, performance often suffers. The –m (merge) option can merge two or more files that are sorted individually. $ sort –m f1 f2 f3 $ sort –m f1 f2 f3

sort options OptionDescription OptionDescription -tcharUses delimeter char to identify fields -tcharUses delimeter char to identify fields -k nSorts on nth field -k nSorts on nth field -k m,nStarts sort on mth field and ends sort on nth field -k m,nStarts sort on mth field and ends sort on nth field -k m.n Starts sort on nth column of mth field -k m.n Starts sort on nth column of mth field -u Removes repeated lines -u Removes repeated lines -n Sorts numerically -n Sorts numerically -rReverses sort order -rReverses sort order -fFolds lowercase to equivalent uppercase (case insensitive sort) -fFolds lowercase to equivalent uppercase (case insensitive sort) -m listMerges sorted files in list -m listMerges sorted files in list -cChecks if the file is sorted -cChecks if the file is sorted -o flnamePlaces output in file flname -o flnamePlaces output in file flname

THE uniq COMMAND There is often problem of duplicate entries creeping in due to faulty data entry. Unix offers a special tool to handle these records -- the uniq command. There is often problem of duplicate entries creeping in due to faulty data entry. Unix offers a special tool to handle these records -- the uniq command. The command is most useful when placed in pipelines, and can be used as an SQL type query tool (distinct). The command is most useful when placed in pipelines, and can be used as an SQL type query tool (distinct). Ex: $ cat dept.lst Ex: $ cat dept.lst 01 | accounts | | admin | | marketing | 6521 $ uniq dept.lst $ uniq dept.lst 01 | accounts | | admin | | marketing | 6521

uniq simply fetches one copy of the redundant records, writing them to the standard output. uniq simply fetches one copy of the redundant records, writing them to the standard output. Since uniq requires a sorted file as input, the general procedure is to sort a file and pipe the process to uniq. The following pipeline also produces the same output, except that the output is saved in a file : Since uniq requires a sorted file as input, the general procedure is to sort a file and pipe the process to uniq. The following pipeline also produces the same output, except that the output is saved in a file : $ sort dept.lst | uniq - ulist $ sort dept.lst | uniq - ulist d1]$ cat ulist d1]$ cat ulist 01 | accounts | | admin | | marketing | 6521 Like sort, uniq also accepts the filename as an argument. Since it is done without using an option (unlike –o in sort), you should make sure that you don’t specify multiple filenames as input to this command; Like sort, uniq also accepts the filename as an argument. Since it is done without using an option (unlike –o in sort), you should make sure that you don’t specify multiple filenames as input to this command; uniq uses only one file at a time. uniq uses only one file at a time.

If we use two filenames, then uniq simply processes first file and overwrites the second with its output. So you lose the data in the second file. If we use two filenames, then uniq simply processes first file and overwrites the second with its output. So you lose the data in the second file. If uniq is to merely select unique lines, it is preferable to use sort –u. But uniq has a couple of options which can be used to make simple database queries. If uniq is to merely select unique lines, it is preferable to use sort –u. But uniq has a couple of options which can be used to make simple database queries. Ex: To determine the designation that occurs uniquely in the file e.lst, cut out the 3 rd field, sort it, and then pipe it to uniq. Ex: To determine the designation that occurs uniquely in the file e.lst, cut out the 3 rd field, sort it, and then pipe it to uniq. $ cat e.lst $ cat e.lst 2233 | shukla | g.m | sales | 12/12/52 | | sharma | mgr |product| 12/03 60 | | akash | dir. |mark. | 11/06/70 | | tiwary | g.m |product| 05/02/89 | | kumar | mgr |accnts | 18/03/79 |1500

The –u (unique) option selects only the non-repeated lines. The –u (unique) option selects only the non-repeated lines.Ex: $ cut -d"|" -f3 e.lst |sort |uniq -u $ cut -d"|" -f3 e.lst |sort |uniq -udir. The –d (duplicate) option selects only one copy of the repeated lines: The –d (duplicate) option selects only one copy of the repeated lines:Ex: $ cut -d"|" -f3 e.lst |sort |uniq -d $ cut -d"|" -f3 e.lst |sort |uniq -d g.m g.m mgr mgr And the –c (count) option displays the frequency of occurrence of all lines, along with the lines: And the –c (count) option displays the frequency of occurrence of all lines, along with the lines:Ex: $ cut -d"|" -f3 e.lst |sort |uniq -c $ cut -d"|" -f3 e.lst |sort |uniq -c 1 dir. 1 dir. 2 g.m 2 mgr 2 mgr

LINE NUMBERING – THE nl COMMAND There is separate command in UNIX system that has elaborate schemes for numbering lines --the nl command There is separate command in UNIX system that has elaborate schemes for numbering lines --the nl command nl numbers only logical lines, i.e. the new line character containing something apart from the new line character. nl numbers only logical lines, i.e. the new line character containing something apart from the new line character. By default, nl simply adds line numbers to its input, and prints them in a space six characters wide: By default, nl simply adds line numbers to its input, and prints them in a space six characters wide: Ex: Ex: $ nl clist1 $ nl clist1 1 shukla | g.m 1 shukla | g.m 2 sharma |d.g.m 2 sharma |d.g.m 3 akash |dir. 3 akash |dir. 4 tiwary |g.m 4 tiwary |g.m 5 kumar | mgr 5 kumar | mgr

nl uses the tab character to separate the numbers from the text. Use the –w(width) option to specify the width of the number format, and –s (separator) to specify the separator: nl uses the tab character to separate the numbers from the text. Use the –w(width) option to specify the width of the number format, and –s (separator) to specify the separator: Ex: Ex: $ nl -w2 -s":" clist1 $ nl -w2 -s":" clist1 1: shukla | g.m 1: shukla | g.m 2: sharma |d.g.m 2: sharma |d.g.m 3: akash |dir. 3: akash |dir. 4: tiwary |g.m 4: tiwary |g.m 5: kumar | mgr 5: kumar | mgr

To have leading zeroes in the first field, use –n option: To have leading zeroes in the first field, use –n option: Ex: Ex: $ nl -w2 -s":" -nrz clist1 $ nl -w2 -s":" -nrz clist1 01: shukla | g.m 01: shukla | g.m 02: sharma |d.g.m 02: sharma |d.g.m 03: akash |dir. 03: akash |dir. 04: tiwary |g.m 04: tiwary |g.m 05: kumar | mgr 05: kumar | mgr The –n option, followed immediately by the parameter rz, right justifies the number, with the leading zeroes to fill the gaps. The other format you can use is ln, which left justifies the number and removes the leading zeroes. The –n option, followed immediately by the parameter rz, right justifies the number, with the leading zeroes to fill the gaps. The other format you can use is ln, which left justifies the number and removes the leading zeroes.

In many applications, you have code tables starting from a number different from 1 (or 01 or 001). The –v option followed by a number, determines the initial value that is to be used to number the lines. You can use the number 40 as the initial value: In many applications, you have code tables starting from a number different from 1 (or 01 or 001). The –v option followed by a number, determines the initial value that is to be used to number the lines. You can use the number 40 as the initial value: Ex: Ex: $ nl -w2 -s":" -nrz -v40 clist1 $ nl -w2 -s":" -nrz -v40 clist1 40: shukla | g.m 40: shukla | g.m 41: sharma |d.g.m 41: sharma |d.g.m 42: akash |dir. 42: akash |dir. 43: tiwary |g.m 43: tiwary |g.m 44: kumar | mgr 44: kumar | mgr

You can set the increment too with –i (increment) option : You can set the increment too with –i (increment) option : Ex: Ex: $ nl -w2 -s":" -nrz -v40 -i5 clist1 $ nl -w2 -s":" -nrz -v40 -i5 clist1 40: shukla | g.m 40: shukla | g.m 45: sharma |d.g.m 45: sharma |d.g.m 50: akash |dir. 50: akash |dir. 55: tiwary |g.m 55: tiwary |g.m 60: kumar | mgr 60: kumar | mgr

TRANSLATING CHARACTERS - THE tr COMMAND The tr (translate) filter manipulates individual characters in a line. The tr (translate) filter manipulates individual characters in a line. It translates characters using one or two compact expressions: It translates characters using one or two compact expressions: Syntax: Syntax: tr options expression1 expression2 standard input tr options expression1 expression2 standard input tr takes input only from the standard input; it doesn’t take a filename as argument. tr takes input only from the standard input; it doesn’t take a filename as argument. By default, it translates each character in expression1 to its mapped counterpart in expression2. By default, it translates each character in expression1 to its mapped counterpart in expression2. The 1 st character in 1 st expression is replaced with the 1 st character in the 2 nd expression, and similarly for the other characters. The 1 st character in 1 st expression is replaced with the 1 st character in the 2 nd expression, and similarly for the other characters.

Ex: To replace the “|” with a ~(tilde) and the “/” with a -. Ex: To replace the “|” with a ~(tilde) and the “/” with a -. $ tr '|/' '~-' < shortlist | head -2 $ tr '|/' '~-' < shortlist | head ~ shukla ~ g.m ~ sales ~ ~ ~ sharma ~d.g.m~product~ ~ Changing case of text: Changing case of text: To change the case of 1 st three lines from lower to upper: To change the case of 1 st three lines from lower to upper: $ head -2 e.lst | tr '[a-z]' '[A-Z]' $ head -2 e.lst | tr '[a-z]' '[A-Z]' 2233 | SHUKLA | G.M | SALES | 12/12/52 | | SHARMA | MGR |PRODUCT| 12/03 60 | 15000

Using ASCII octal values and escape sequences : Using ASCII octal values and escape sequences : tr also uses octal values and escape sequences to represent characters. tr also uses octal values and escape sequences to represent characters. To have each field on a separate line, replae the “|” with the LF character (octal value 012): To have each field on a separate line, replae the “|” with the LF character (octal value 012): $ tr '|' '\012' < emp.lst |head -n 6 $ tr '|' '\012' < emp.lst |head -n shukla shukla g.m g.m sales sales 12/12/52 12/12/

Deleting characters (-d) : Deleting characters (-d) : To delete the characters “|” and “/” from the file: To delete the characters “|” and “/” from the file: $ tr –d ‘|/’ < shortlist | head –n 2 $ tr –d ‘|/’ < shortlist | head –n shukla g.m sales shukla g.m sales sharma d.g.m product sharma d.g.m product Compressing Multiple Consecutive characters (-s): Compressing Multiple Consecutive characters (-s): We can eliminate all redundant spaces in the files with delimited fields with the –s (squeeze) option. We can eliminate all redundant spaces in the files with delimited fields with the –s (squeeze) option. The –s option squeezes multiple consecutive occurrences of its argument to a single character. The –s option squeezes multiple consecutive occurrences of its argument to a single character. $ tr –s ‘ ‘ <shortlist | head –n 3 $ tr –s ‘ ‘ <shortlist | head –n 3

File Utilities CutPasteHeadTailCmpCommDiff

Filters A group of commands, each of which accepts some data as input, performs some manipulation on it, and produces some output. Since they perform some filtering action on the data, they are appropriately called filters. A group of commands, each of which accepts some data as input, performs some manipulation on it, and produces some output. Since they perform some filtering action on the data, they are appropriately called filters. Grep Grep Egrep Egrep Fgrep Fgrep Sed Sed Awk Awk sort sort uniq uniq nl nl

SEARCHING FOR A PATTERN – THE grep COMMAND The grep (global regular expression printer) scans a file for the occurrence of a pattern. The grep (global regular expression printer) scans a file for the occurrence of a pattern. It uses a couple of options, and depending on their usage, outputs the lines containing the pattern, or the filenames or the line numbers. It uses a couple of options, and depending on their usage, outputs the lines containing the pattern, or the filenames or the line numbers. Syntax: Syntax: grep grep Most of the grep’s options are shared by its other members also (egrep and fgrep). Most of the grep’s options are shared by its other members also (egrep and fgrep).

In addition to options, grep compulsorily requires an expression to represent the pattern to be searched for. The first argument (barring the option) is always treated as the expression, and the ones remaining as the filenames. In addition to options, grep compulsorily requires an expression to represent the pattern to be searched for. The first argument (barring the option) is always treated as the expression, and the ones remaining as the filenames. grep looks for all occurrences of the expression in its input, and, by default, outputs the lines containing the expression. grep looks for all occurrences of the expression in its input, and, by default, outputs the lines containing the expression.

Ex: Ex: $ grep "sales" e.lst $ grep "sales" e.lst 2233 | shukla | g.m | sales | 12/12/52 | | shukla | g.m | sales | 12/12/52 | When grep is used with multiple filenames, it displays the filenames along with the output. When grep is used with multiple filenames, it displays the filenames along with the output. $ grep "sales" e.lst shortlist $ grep "sales" e.lst shortlist e.lst:2233 | shukla | g.m | sales | 12/12/52 | shortlist:2233 | shukla | g.m | sales | 12/12/52 | 20000

Because grep is also a filter, it can search its standard input for the pattern and store the output in a file: Because grep is also a filter, it can search its standard input for the pattern and store the output in a file: $ Who | grep itlaxmi > fff Quoting in grep: Quoting in grep: Quoting is essential if the search string consists of more than one word, or uses any of the shell’s characters like *,$ etc. Quoting is essential if the search string consists of more than one word, or uses any of the shell’s characters like *,$ etc. grep simply returns the prompt when the pattern can’t be located. grep simply returns the prompt when the pattern can’t be located. $ grep president shortlist $ grep president shortlist $

grep options OptionSignificance -cDisplays count of number of occurrences -cDisplays count of number of occurrences -l Displays list of the filenames only -l Displays list of the filenames only -nDisplays line numbers along with the lines -nDisplays line numbers along with the lines -vDoesn’t display lines matching expression -vDoesn’t display lines matching expression -iIgnores case for matching -iIgnores case for matching -hOmits filenames when handling multiple files -hOmits filenames when handling multiple files -f flnameTakes expressions from file flname (egrep and fgrep only). -f flnameTakes expressions from file flname (egrep and fgrep only). -xDisplays lines matched in entirety (fgrep only) -xDisplays lines matched in entirety (fgrep only)

Examples 1. $ grep -h mgr emp.lst shortlist 1. $ grep -h mgr emp.lst shortlist 1234 | kumar | mgr |accnts | 18/03/79 | $ grep -c 'mgr' e.lst emp.lst 2. $ grep -c 'mgr' e.lst emp.lste.lst:2emp.lst:1 3. $ grep -n 'mgr' e.lst emp.lst 3. $ grep -n 'mgr' e.lst emp.lst e.lst:2:9876 | sharma | mgr |product| 12/03 60 | e.lst:5:1234 | kumar | mgr |accnts | 18/03/79 |1500 emp.lst:5:1234 | kumar | mgr |accnts | 18/03/79 |15000

Examples 4. $ grep -v 'mgr' e.lst 4. $ grep -v 'mgr' e.lst 2233 | shukla | g.m | sales | 12/12/52 | | akash | dir.|mark. | 11/06/70 | | tiwary | g.m |product| 05/02/89 | v option is used for deleting lines in grep. -v option is used for deleting lines in grep. 5. $ grep -l 'mgr' *.lst 5. $ grep -l 'mgr' *.lstdesg.lstdesig.lste1.lste.lstemp1.lstemp.lst

Examples 6. $ grep -i 'SHUKLA' e.lst 6. $ grep -i 'SHUKLA' e.lst 2233 | shukla | g.m | sales | 12/12/52 | 20000

Basic Regular Expressions (BRE) You don’t always search a file with simple strings. It is possible that you may be looking for a name, but don’t know exactly how it is spelt. Or, you may be interested in the occurrences of a pattern only at a certain location, e.g. the beginning of a record. You don’t always search a file with simple strings. It is possible that you may be looking for a name, but don’t know exactly how it is spelt. Or, you may be interested in the occurrences of a pattern only at a certain location, e.g. the beginning of a record. The importance of grep lies not merely in its simple pattern-matching capability but in its acceptance of a regular expression for a pattern. The importance of grep lies not merely in its simple pattern-matching capability but in its acceptance of a regular expression for a pattern. A regular expression is a string of ordinary and metacharacters which can be used to match more than one type of pattern. A regular expression is a string of ordinary and metacharacters which can be used to match more than one type of pattern.

The BRE Character Set Used by grep, sed and awk PatternMatches PatternMatches *Zero or more occurrences of the previous character *Zero or more occurrences of the previous character.A single character.A single character [pqr]A single character p,q, or r [pqr]A single character p,q, or r [c1-c2]A single character within the ASCII range represented by c1 and c2 [c1-c2]A single character within the ASCII range represented by c1 and c2 [^pqr]A single character which is not a p, q or r [^pqr]A single character which is not a p, q or r ^patPattern pat at beginning of line ^patPattern pat at beginning of line Pat$ Pattern pat at end of line. Pat$ Pattern pat at end of line.

Examples g* Nothing or g, gg, ggg, etc. g* Nothing or g, gg, ggg, etc. gg* g, gg, ggg, etc gg* g, gg, ggg, etc.*Nothing or any number of characters.*Nothing or any number of characters [1-3]A digit between 1 and 3 [1-3]A digit between 1 and 3 [^a-zA-Z] A nonalphabetic character [^a-zA-Z] A nonalphabetic character bash$bash at end of line bash$bash at end of line ^bash$bash as the only word in line ^bash$bash as the only word in line ^$Lines containing nothing. ^$Lines containing nothing.

Examples $ grep "k.*" e.lst $ grep "k.*" e.lst 2233 | shukla | g.m | sales | 12/12/52 | | akash | dir.|mark. | 11/06/70 | | kumar | mgr |accnts | 18/03/79 |1500 $ grep "9000$" e.lst $ grep "9000$" e.lst 7898 | akash | dir.|mark. | 11/06/70 |9000 $ grep '[Ss]h*arma' e.lst $ grep '[Ss]h*arma' e.lst 9876 | sharma | mgr |product| 12/03/60 | | Sarma | dir.| sales | 05/09/60 |25000 $ grep '[1-2]...$' e.lst $ grep '[1-2]...$' e.lst 1234 | kumar | mgr |accnts | 18/03/79 |1500

EXTENDING grep – THE egrep The egrep command, extends grep’s pattern-matching capabilities. The egrep command, extends grep’s pattern-matching capabilities. It offers all the options of grep, but its most useful feature is the facility to specify more than one pattern for search. It offers all the options of grep, but its most useful feature is the facility to specify more than one pattern for search. Each pattern is separated from the other by a | (pipe). Each pattern is separated from the other by a | (pipe).

The extended regular expression set used by egrep and awk Expression Significance Expression Significance Ch+Matches one or more occurrences of the character ch Ch+Matches one or more occurrences of the character ch Ch?Matches zero or one occurrence of the character ch Ch?Matches zero or one occurrence of the character ch Exp1\exp2 Matches the expression exp1 or exp2 Exp1\exp2 Matches the expression exp1 or exp2 (x1\x2)x3 Matches the expression x1x3 or x2x3 (x1\x2)x3 Matches the expression x1x3 or x2x3

Examples g+At least one g g+At least one g g?Nothing or one g g?Nothing or one g GIF|JPEG Matches GIF or JPEG GIF|JPEG Matches GIF or JPEG (lock | ver)woodMatches lockwood or verwood (lock | ver)woodMatches lockwood or verwood $ egrep 'sales |mark.' e.lst 2233 | shukla | g.m | sales | 12/12/52 | | akash | dir.|mark. | 11/06/70 | | Sarma | dir.| sales | 05/09/60 |25000 $ egrep -i '(sh|s)arma' e.lst 9876 | sharma | mgr |product| 12/03/60 | | Sarma | dir.| sales | 05/09/60 |25000

$ egrep –f pat.lst emp.lst $ egrep –f pat.lst emp.lst The command takes the expressions from the file pat.lst. This file must contain the patterns, suitably delimited in the same way as they are specified in the command line. The command takes the expressions from the file pat.lst. This file must contain the patterns, suitably delimited in the same way as they are specified in the command line.

MULTIPLE STRING SEARCHING – THE fgrep Like egrep, fgrep accepts alternative patterns, both from the command line, as well as from a file, but unlike grep and egrep, it doesn’t accept regular expressions. Like egrep, fgrep accepts alternative patterns, both from the command line, as well as from a file, but unlike grep and egrep, it doesn’t accept regular expressions. If the pattern to search for is a simple string, or a group of them, then fgrep is recommended. If the pattern to search for is a simple string, or a group of them, then fgrep is recommended. It is faster than its two fellow members, and should be used while using fixed strings. It is faster than its two fellow members, and should be used while using fixed strings. Alternative patterns in fgrep are specified by separating one pattern from another by the newline character. This is unlike egrep, which uses the | to delimit two expressions. Alternative patterns in fgrep are specified by separating one pattern from another by the newline character. This is unlike egrep, which uses the | to delimit two expressions.

Ex: Ex: If you search for three specific departments (without regular expressions), fgrep used in the following manner can produce a list sorted in reverse order containing the three patterns : If you search for three specific departments (without regular expressions), fgrep used in the following manner can produce a list sorted in reverse order containing the three patterns : $ fgrep ‘sales $ fgrep ‘sales > personnel > admin’ emp.lst | sort –t “|” +3r | tee newlist Like egrep, fgrep also takes patterns from a file, except that each string has to be stored in a separate line. Like egrep, fgrep also takes patterns from a file, except that each string has to be stored in a separate line. EX: $ cat pat1.lst EX: $ cat pat1.lstsalespersonneladmin $ fgrep –f pat1.lst emp.lst

Examples $ fgrep 'sales $ fgrep 'sales > mark. > product' e.lst 2233 | shukla | g.m | sales | 12/12/52 | | shukla | g.m | sales | 12/12/52 | | sharma | mgr |product| 12/03/60 | | sharma | mgr |product| 12/03/60 | | akash | dir.|mark. | 11/06/70 | | akash | dir.|mark. | 11/06/70 | | tiwary | g.m |product| 05/02/89 | | tiwary | g.m |product| 05/02/89 | | Sarma | dir.| sales | 05/09/60 | | Sarma | dir.| sales | 05/09/60 |25000

RELATIONAL JOIN – THE join COMMAND join helps to establish a logical relationship between two tables. join helps to establish a logical relationship between two tables. It uses a common column in each table to establish this relationship, and, by default, creates a single row which contains all the columns of the two tables. It uses a common column in each table to establish this relationship, and, by default, creates a single row which contains all the columns of the two tables. The prerequisite is that both tables be sorted on the joined columns. The prerequisite is that both tables be sorted on the joined columns. Syntax: Syntax: join file1 file2 When no field delimiters are specified, it assumes that the fields are delimited by spaces. When no field delimiters are specified, it assumes that the fields are delimited by spaces.

The join uses numbers to identify fields, but it also uses numbers to identify files. Since you can join only two files with a single command, this parameter can take the values 1 or 2, depending on the location of the file argument in the command line. The join uses numbers to identify fields, but it also uses numbers to identify files. Since you can join only two files with a single command, this parameter can take the values 1 or 2, depending on the location of the file argument in the command line.

Examples $ cat > emp_table $ cat > emp_table empid designation deptno 111 director manager dgm 20 ~]$ cat > dept_table ~]$ cat > dept_table deptno deptname 10 sales 20 production $ join -j1 3 -j2 1 emp_table dept_table $ join -j1 3 -j2 1 emp_table dept_table deptno empid designation deptname director sales director sales manager sales manager sales dgm production dgm production

CREATING A TEE – THE tee COMMAND Tee is an external command and not a feature of the shell. It handles a character stream by splitting its input into two components. It saves one component in a file and writes the other to the standard output. Tee is an external command and not a feature of the shell. It handles a character stream by splitting its input into two components. It saves one component in a file and writes the other to the standard output. Being also a filter, tee can be placed anywhere in a pipeline. Being also a filter, tee can be placed anywhere in a pipeline. Tee doesn’t perform any filtering action on its input, it gives out exact what it takes. Tee doesn’t perform any filtering action on its input, it gives out exact what it takes. The following command sequence uses tee to display the output of who and saves this output in a file as well. The following command sequence uses tee to display the output of who and saves this output in a file as well.

Examples $ who | tee user.lst $ who | tee user.lst root tty :51 (:0) root pts/ :51 (:0) itlaxmi pts/ :52 ( ) ~]$ cat user.lst ~]$ cat user.lst root tty :51 (:0) root pts/ :51 (:0) itlaxmi pts/ :52 ( )

Since tee uses standard output, you can pipe its output to another command, say wc: Since tee uses standard output, you can pipe its output to another command, say wc: $ who | tee user.lst | wc -l $ who | tee user.lst | wc -l 3 The –a (append) option appends the output to the file specified as argument. The –a (append) option appends the output to the file specified as argument. $ cal 2009 | tee -a calfile > calfile2 $ cal 2009 | tee -a calfile > calfile2 The sequence appends one stream to the file calfile, while overwriting the file calfile2 with the other stream. The sequence appends one stream to the file calfile, while overwriting the file calfile2 with the other stream.

THE pg COMMAND The disadvantage of head and tail is that they cannot display a range of lines. Moreover, what is displayed is final. That is, if we have displayed the first 50 lines in a file we cannot move back and view say the 10 th line. The disadvantage of head and tail is that they cannot display a range of lines. Moreover, what is displayed is final. That is, if we have displayed the first 50 lines in a file we cannot move back and view say the 10 th line. Unix provides two commands which offer more flexibility in viewing files. These are pg and more. Unix provides two commands which offer more flexibility in viewing files. These are pg and more. They are more or less work in the same manner, except for a few minor differences. They are more or less work in the same manner, except for a few minor differences.

Each of them helps you view a file page by page with lot of useful options like: Each of them helps you view a file page by page with lot of useful options like: (a) Set the number of lines to be displayed per page. (b) Ability to move either forwards or backwards in a file just at the touch of a key. (c) Skip pages while viewing the file page by page. (d) Search the file for a pattern in forward or backward direction. On executing each of these commands one pageful of file contents are displayed on the screen after which a prompt is displayed at which the user can give various commands that are understood by pg or more.

Example $ pg p “Page no. %d” myfile $ pg p “Page no. %d” myfile This command starts displaying the contents of myfile, 15 lines at a time from 10 th line onwards. At the end of each displayed page a prompt comes which displays the page number on view. This prompt overrides the default ‘:’ prompt of the pg command. This command starts displaying the contents of myfile, 15 lines at a time from 10 th line onwards. At the end of each displayed page a prompt comes which displays the page number on view. This prompt overrides the default ‘:’ prompt of the pg command.