Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Textutils sort, uniq, wc, cut, grep, sed, awk ● Steve Walsh ● Linux Users of Victoria ● November, 2007.

Similar presentations


Presentation on theme: "Introduction to Textutils sort, uniq, wc, cut, grep, sed, awk ● Steve Walsh ● Linux Users of Victoria ● November, 2007."— Presentation transcript:

1 Introduction to Textutils sort, uniq, wc, cut, grep, sed, awk ● Steve Walsh ● Linux Users of Victoria ● November, 2007

2 Introduction

3 Single rule If you interrupt me during the talk, you buy me dinner. You interrupt me lots during the talk, you buy me lots of dinners.

4 So what are Textutils, anyway?

5 Sort ● Writes sorted concatenation of all FILE(s) to standard output. ● Sort order can be varied by arguments; ● -n is numerical sorting ● -r is reverse sorting ● -f will “fold” or change case of letters ● -i will only consider “printable” characters ● -M will sort months, where 'JAN' <.. < 'DEC' ● -o can specify an output file

6 Sort - Examples $du /bin/* | sort -n 4 /bin/domainname 24 /bin/ls 102 /bin/sh 304 /bin/csh In old versions of sort, the +1 option made the program sort using the second column of data (+2 for the third, etc.). This is no longer supported, and instead the -k option can be used to do the same thing (note: "-k 2" for the second column): $ sort -nk 2 ~/team_zipcode peter 3051 donna 3051 grant 3058 steve 3132

7 uniq The uniq utility reads an input file comparing adjacent lines, and write one copy of each input line on the output. The second and succeeding copies of repeated adjacent input lines are not written. Repeated lines in the input shall not be detected if they are not adjacent, hence uniq is often paired with sort.

8 uniq ● Uniq has very few arguments, the most common are; ● -c or Precede each output line with a count of the number of times the line occurred in the input. ● -d Suppress the writing of lines that are not repeated in the input. ● - or -f avoid comparing the first N fields fields on each input line when doing comparisons. ● -s chars Ignore the first chars characters when doing comparisons, where chars shall be a positive decimal integer. ● -u Suppress the writing of lines that are repeated in the input.

9 Uniq - Examples say we have a file "sample" containing b a b c The output of “uniq sample” would be; b a b c The output of UNIQ consists of the lines in the file with duplicate lines removed. As shown above, UNIQ does not consider the "b" lines to be duplicate because they aren't adjacent.

10 Uniq - Examples The output of “uniq sample +unique” is b c because these are the lines that had no duplicates. The output of “uniq sample -unique” is just a

11 wc Print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified. If no file is specified, wc will read from stdin. It's arguments are; -c, which provides a byte count -m, provides a character count -l, which provides a line count -w, which provides a word count -L, which prints the length of the longest line

12 wc - examples $ wc -m /usr/share/doc/screen-4.0.3/FAQ 14081 /usr/share/doc/screen-4.0.3/FAQ $ wc -l /etc/passwd 32 /etc/passwd

13 cut Print selected parts of lines from each FILE to standard output. If no File is specified, read from stdin. Location of cut can be defined by bytes (-b), characters (-c), the use of a delimiter (-d), or via a field (-f). Each of these lists can be made up of a single range, or multiple ranges seperated by a comma. When using the above arguments, a range can be specified; N N’th byte, character or field, counted from 1 N- from N’th byte, character or field, to end of line N-M from N’th to M’th (included) byte, character or field -M from first to M’th (included) byte, character or field

14 Cut - Examples Show the names and login times of the currently logged in users: $who | cut -c 1-16,26-38 Extract users' login names and shells from /etc/passwd file as “name:shell'' pairs: $cut -d : -f 1,7 /etc/passwd

15 grep Grep searches the named input file (or standard input if no files are given) for lines containing a match to the given PATTERN. By default, grep prints the matching lines. The program's name derives from the command used to perform a similar operation, using the Unix text editor ed: g/re/p This command searches a file globally for lines matching a given regular expression, and prints them.

16 grep ● There are many derivatives of grep, for example; ● agrep, or approximate grep to facilitate fuzzy string searching ● fgrep for fixed pattern searches ● egrep for searches involving more sophisticated regular expression syntax. fgrep and egrep are typically the same program as grep, which behaves differently depending on the name by which it is invoked. ● Tcgrep is a rewrite of grep which uses Perl regular expression syntax. ● pgrep, which displays the processes whose names match a regular expression.

17 grep Grep is normally invoked with the search phrase, then the file. If no file is provided, stdin is read. grep “not my fault” #lca08.log Grep would return, in this case, all of the lines in file #lca08.log with an instance of 'not my fault' in them. Keep in mind that grep would not return lines with 'Not My Fault' (capitalised) because by default grep is case sensitive. Grep uses the -i, or ignore case, argument to get around this. For example: grep -i “not my fault” #lca08.log This would return all lines with the words 'Not my fault', 'Not My Fault', 'not MY Fault', or any other mixing of capital and lower case.

18 grep – examples Say you want to exclusively display lines starting with the string "root", you'd use the ^ (caret): $ grep ^root /etc/passwd root:x:0:0:root:/root:/bin/bash If we want to see which accounts have no shell assigned whatsoever, we search for lines ending in ":"; $ grep :$ /etc/passwd news:x:9:13:news:/var/spool/news:

19 grep - examples Use the "." for a single character match. If you want to get a list of all five- character English dictionary words starting with "c" and ending in "h" (handy for solving crosswords): $ grep '\ ' /usr/share/dict/words catch clash cloth coach couch cough crash crush

20 grep - examples For matching multiple characters, use the asterisk. This example selects all words starting with "c" and ending in "h" from the system's dictionary: $ grep '\ ' /usr/share/dict/words caliph cash catch cheesecloth If you want to find the literal asterisk character in a file or output, use grep -F: $ grep * /etc/profile $ grep -F '*' /etc/profile for i in /etc/profile.d/*.sh ; do

21 Sed – Stream Editor A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). While in someways similar to an editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient. But it is sed's ability to filter text in a pipeline or from stdout which particularly distinguishes it from other types of editors.

22 Sed – Stream Editor This example shows a typical usage of sed, where the -e option indicates that the sed expression follows: sed -e 's/oldstuff/newstuff/g' inputFileName > outputFileName The s stands for substitute; the g stands for global, which means that all matching occurrences in the line would be replaced. After the first slash is the regular expression to search for and after the second slash is the expression to replace it with. The substitute command (s///) is by far the most powerful and most commonly used sed command.

23 Sed - examples Replacing a single word in a line $ echo "foo means bar" | sed s/foo/bar/g bar means bar Replacing a phrase in a sentence or line; $ cat rule | sed s/"interrupt me"/"interrupt me lots"/g | sed s/"buy me dinner"/"buy me lots of dinner"/g interrupt me lots and buy me lots of dinner Replacing empty lines, or lines that contain only spaces in a file; $ sed -e '/^ *$/d' file * The caret (^) matches the beginning of the line. * The dollar sign ($) matches the end of the line. * A period (.) matches any single character. * The asterisk (*) matches zero or more occurrences of the previous character.

24 Sed - examples replacing a phrase in a file; $ sed -i s/"interrupt me and buy me dinner"/" \ interrupt me lots and buy me lots of dinner"/g rule $ cat rule interrupt me lots and buy me lots of dinner The regular expression does not always need to be seperated by a slash. This means that you can use Regexp's to determine what gets searched for. As the slash (/) is a valid part of a regex, sed allows you to use any text character to seperate the search and replace string; s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^>/"'\\]*)#$2 #gs

25 Awk AWK is a general purpose programming language that is designed for processing text-based data, either in files or data streams. The name AWK is derived from the surnames of its authors — Aho, Weinberger, and Kernighan. AWK is an example of a programming language that extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions. The power, terseness, and limitations of AWK programs and sed scripts inspired Larry Wall to write Perl. Because of their dense notation, all these languages are often used for writing one-liner programs.

26 AWK An AWK program consists of a sequence of pattern-action statements and optional function definitions; pattern { action statements } awk -F: '{ print $1 }' Awk programs can be multi-lined programs in the true sense, or can be terse, single line statements. Awk will accept variables from stdout and will handle them as variables, where $0 is the whole line, and $1, $2, etc, are individual parts. It will also support variables that generated during runtime, such as NR, the total number of record seen so far, and NF, the total number of fields in current input.

27 Awk - Examples Print first two fields in opposite order: awk '{ print $2, $1 }' file # Print lines longer than 72 characters: awk 'length > 72' file # Print length of string in 2nd column awk '{print length($2)}' file Move several thousand files (junk01, etc) into a new directory (../lca08/) and rename by appending a.dat to the filenames; $ ls junk* | awk '{print "mv "$0"../lca08/"$0".dat"}' | csh #Count number of lines where col 3 > col 1 awk '$3 > $1 {print i + "1"; i++}' file

28 Awk - examples # Print sequence number and then column 1 of file: awk '{print NR, $1}' file # Print every line after erasing the 2nd field awk '{$2 = ""; print}' file # Print the total number of kilobytes used by files: ls -l files | awk '{ x += $5 } END { print "total K-bytes: " (x + 1023)/1024 }' # Count the lines in a data file: awk 'END { print NR }' data # Print the even-numbered lines in the data file: awk 'NR % 2 == 0' data # Print a list of all loginnames in the system, sorted by name awk -F: '{ print $1 }' /etc/passwd | sort


Download ppt "Introduction to Textutils sort, uniq, wc, cut, grep, sed, awk ● Steve Walsh ● Linux Users of Victoria ● November, 2007."

Similar presentations


Ads by Google