Filters and Utilities. Notes: This is a simple overview of the filtering capability Some of these commands are very powerful ▫Only showing some of the.

Slides:



Advertisements
Similar presentations
CST8177 sed The Stream Editor. The original editor for Unix was called ed, short for editor. By today's standards, ed was very primitive. Soon, sed was.
Advertisements

A Guide to Unix Using Linux Fourth Edition
CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
Now, return to the Unix Unix shells: Subshells--- Variable---1. Local 2. Environmental.
Linux+ Guide to Linux Certification, Second Edition
CS 497C – Introduction to UNIX Lecture 25: - Simple Filters Chin-Chih Chang
CS 497C – Introduction to UNIX Lecture 23: - Simple Filters Chin-Chih Chang
Guide To UNIX Using Linux Third Edition
T UTORIAL OF U NIX C OMMAND & SHELL SCRIPT S 5027 Professor: Dr. Shu-Ching Chen TA: Samira Pouyanfar Spring 2015.
Lecture 02CS311 – Operating Systems 1 1 CS311 – Lecture 02 Outline UNIX/Linux features – Redirection – pipes – Terminating a command – Running program.
Introduction to UNIX GPS Processing and Analysis with GAMIT/GLOBK/TRACK T. Herring, R. King. M. Floyd – MIT UNAVCO, Boulder - July 8-12, 2013 Directory.
CSCI 330 T HE UNIX S YSTEM File operations. OPERATIONS ON REGULAR FILES 2 CSCI The UNIX System Create Edit Display Contents Display Contents Print.
Unix Files, IO Plumbing and Filters The file system and pathnames Files with more than one link Shell wildcards Characters special to the shell Pipes and.
Unix Filters Text processing utilities. Filters Filter commands – Unix commands that serve dual purposes: –standalone –used with other commands and pipes.
UNIX Filters.
Filters using Regular Expressions grep: Searching a Pattern.
CS 124/LINGUIST 180 From Languages to Information Unix for Poets (in 2014) Dan Jurafsky (From Chris Manning’s modification of Ken Church’s presentation)
Shell Script Examples.
Advanced File Processing
Agenda User Profile File (.profile) –Keyword Shell Variables Linux (Unix) filters –Purpose –Commands: grep, sort, awk cut, tr, wc, spell.
Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files.
Guide To UNIX Using Linux Fourth Edition
LIN 6932 Unix Lecture 6 Hana Filip. LIN 6932 HW6 - Part II solutions posted on my website see syllabus.
Unix Talk #2 (sed). 2 You have learned…  Regular expressions, grep, & egrep  grep & egrep are tools used to search for text in a file  AWK -- powerful.
Introduction to Unix (CA263) File Processing. Guide to UNIX Using Linux, Third Edition 2 Objectives Explain UNIX and Linux file processing Use basic file.
Unix programming Term: III B.Tech II semester Unit-II PPT Slides Text Books: (1)unix the ultimate guide by Sumitabha Das (2)Advanced programming.
Dedan Githae, BecA-ILRI Hub Introduction to Linux / UNIX OS MARI eBioKit Workshop; Nov , 2014.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
UNIX Shell Script (1) Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
Chapter Five Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command.
Module 6 – Redirections, Pipes and Power Tools.. STDin 0 STDout 1 STDerr 2 Redirections.
(Stream Editor) By: Ross Mills.  Sed is an acronym for stream editor  Instead of altering the original file, sed is used to scan the input file line.
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
Chapter 13: sed Say what?. In this chapter … Basics Programs Addresses Instructions Control Spaces Examples.
I/O Redirection and Regular Expressions February 9 th, 2004 Class Meeting 4.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
WHAT IS SED? A non-interactive stream editor Interprets sed instructions and performs actions Use sed to: Automatically perform edits on file(s) ‏ Simplify.
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
Introduction to Unix (CA263) File Processing (continued) By Tariq Ibn Aziz.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
Introduction to sed. Sed : a “S tream ED itor ” What is Sed ?  A “non-interactive” text editor that is called from the unix command line.  Input text.
Chapter Four I/O Redirection1 System Programming Shell Operators.
40 Years and Still Rocking the Terminal!
CS 124/LINGUIST 180 From Languages to Information Unix for Poets (in 2013) Christopher Manning Stanford University.
Lesson 3-Touring Utilities and System Features. Overview Employing fundamental utilities. Linux terminal sessions. Managing input and output. Using special.
– Introduction to the Shell 1/21/2016 Introduction to the Shell – Session Introduction to the Shell – Session 3 · Job control · Start,
Linux+ Guide to Linux Certification, Second Edition Chapter 4 Exploring Linux Filesystems.
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
Lesson 6-Using Utilities to Accomplish Complex Tasks.
In the last class, Filters and delimiters The sample database pr command head and tail commands cut and paste commands.
SIMPLE FILTERS. CONTENTS Filters – definition To format text – pr Pick lines from the beginning – head Pick lines from the end – tail Extract characters.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 10/9/2006 Lecture 6 – String Processing.
Tutorial of Unix Command & shell scriptS 5027
Lesson 5-Exploring Utilities
Prepared by: Eng. Maryam Adel Abdel-Hady
CST8177 sed The Stream Editor.
Containers and Lists CIS 40 – Introduction to Programming in Python
Chapter 6 Filters.
Linux command line basics III: piping commands for text processing
CS 403: Programming Languages
Tutorial of Unix Command & shell scriptS 5027
Tutorial of Unix Command & shell scriptS 5027
Guide To UNIX Using Linux Third Edition
Tutorial of Unix Command & shell scriptS 5027
Unix Talk #2 (sed).
Chapter Four UNIX File Processing.
Presentation transcript:

Filters and Utilities

Notes: This is a simple overview of the filtering capability Some of these commands are very powerful ▫Only showing some of the basics of a few of the commands

Reminder: Grave accent ▫AKA backtick or backquote ▫Used for command substitution in bash and other Linux utilities and languages ▫Typical use:  put a command between a pair of `  the std out of the command is substituted ▫Example:  #echo The date is:`date`! #The date is:Sun Mar 17 15:51:28 EDT 2013!

What are Filters? ▫Use std in and std out  Monitor the input  Modify data as appropriate  Change  Delete  Move  "as appropriate"  Send data to standard out

Filter examples Simple ▫pr ▫cmp ▫diff ▫comm ▫head ▫tail ▫cut ▫paste ▫sort ▫uniq ▫tr Complex ▫grep ▫sed Filter/script ▫awk

pr: Paginate Files Prepare files for printing Adds: ▫Headers ▫Footers ▫Formatted text Default adds 5 lines before and after text on page Options: ▫Make columns ▫Set page length ▫Set page width ▫Number lines in output

cmp: Byte by Byte Compare Compares two files Terminates on first delta ▫Echoes the location of first mismatch  Usually reports line and character position ▫Returns:  True if identical  False otherwise

comm: What Is Common between files Compares files line by line ▫Requires sorted files to work properly Returns 3 types of differently indented lines ▫Lines unique to first file ▫Lines unique to second file ▫Lines common to both Output is “weird” in columns 1 st col is lines unique to 1 st file 2 nd col is lines unique to 2 nd file 3 rd col is common lines comm.sh in ~/ITIS3110/bashscripts commbad.sh (with error)

diff: "How to make files the same" Details how to change one file to make it the same as the other ▫For deltas instructions of how to change

head: Display beginning of file Show the first n lines of a file ▫Default is 10 ▫Can change with –n x Example use: ▫Want to re-edit the last file you edited: ▫ nano `ls –t | head –n 1`  ls –t: list by time  head –n 1: list first entry  Feed as a parameter to nano with the backticks

tail: Display end of file Show the last n lines of a file ▫Default is 10 ▫Can change with –n x Options ▫-f  Monitor the file as it grows  Must terminate with ▫-c  Do the last n chars instead of lines

cut: Splitting a file vertically Cuts a range out by: ▫Columns  Good for fixed length entries  -c range  -c1-4 ▫Fields  Good for delimited entries  Tab is default  -d specifies delimiter  -d/ set the / as the delimiter  -f specifies the fields to use  -f1,4 specifies the first and fourth fields

paste: Paste files vertically Paste two files together line by line Can be used on a single file to join multiple sequential lines together ▫ -s  Do serial on a single file ▫ -d  Separate joined element with the list of delimiters

sort: Order files Put files in order ▫Default is ascending order on column 1  ASCII order Options: ▫-t  Define a delimiter ▫-k  Used with –t, which field to use  Can have multiple keys  Use commas to separate ranges  Use –k again to denote a new field  Can sort on columns in a field  Use a dot to separate ▫-n  Treat a field as a number, not an ASCII character  Remember the number 1 is different than the character "1" ▫-u  Remove repeated lines

uniq: Locating identical lines Returns only unique lines ▫Options:  -u  Return only the non-repeated lines  -d  Return only the repeated lines ▫But only one copy of each  -c  Return the count of how many times each line is repeated

tr: Translate characters Changes one set of characters to another, default input is the standard input Example: ▫ #tr 'ab' 'cd' This is abnormal This is cdnormcl absolute cdsolute ab a b c cd c d c ^C  Blue is std in  Red is std out – bold is what changed ▫Note: a  c and b  d, not ab  cd ▫Note: ^D can be used to denote end of file to tr instead of the shown ^C which stops the process tr

tr: Translate characters More examples: ▫Can be used to translate case for a file  tr a-z A-Z <file1 or tr '[a-z]' '[A-Z]' <file1  Takes the input from file1 with the < redirection  Turns all lower case letters to upper case  Output goes to std out ▫Get rid of characters  tr –d [a-z] <file1  Gets rid of all lower case chars from file1  Again output is std out ▫Compressing repeated chars  tr –s ' ' <file1  Changes repeated spaces to a single space

Resume 2/5

Regular Expression A pattern to match strings of text which is: ▫Concise ▫Flexible Used by many programming languages and operating systems

Regular Expressions BRE ▫Basic Regular Expression ERE ▫Extended Regular Expression IRE ▫Interval Regular Expression TRE ▫Tagged Regular Expression

Character class Set of characters enclosed within square brackets [ ] ▫Can be a list of single characters  [aD1]  a, D, and the character 1 only ▫Can be a range of characters  [a-zA-Z]  All the upper and lower case chars ▫Negate a class  [^0-9]  Not the numeric chars 0-9

Regular Expressions * ▫Refers to the immediately proceeding character ▫Any number of repeated character(s)  0 or more  Used with other patterns  [A*] ▫Anything that matches 0 or more ‘A’s in a row ▫ s*print will match sprint, ssprint, sssprint and print ! Note: this is not related to the familiar wildcard *

Regular Expressions. ▫Any character  Exactly one ▫ S... with match Sort, Sxxx, S123, …  Any four char string starting with S  Does not match Sabcd (5 characters) ▫Note.* means 0 or more of any character Pattern starting locations ▫^  Pattern starts at the beginning of a line ▫$  Pattern starts at the end of a line

Extended Regular Expressions | ▫Either one of a set ▫ [a|b]  Matches if an a or a b  Must be one of them ( and ) ▫Chars between the parenthesis and what is before or after ▫ ‘animaltype:(dog|cat)’  look for animaltype:dog or animaltype:cat  ( ) is used to group patterns

grep – Search a pattern Searches for a pattern in a file ▫ grep options pattern filename(s)  std in is used if there is no filename  Can also pipe data to grep ▫Notes:  Pattern does not need be quoted if no delimiters or special chars in it  Can always use quotes to be safe

grep - Options -i ▫Ignore case -v ▫Don’t display lines matching expression ▫Typically want to check the return code -l ▫Display filenames  Useful when grepping multiple files -e ▫Useful when grepping for – -x ▫Match entire line -f file ▫Takes expression from a file ▫Great if you have a messy or complex regular expression

grep - examples Examples: ▫ #grep 3 bigfile3 file 3 text ▫ #grep file bigfile3 file 1 text file 1 text file 3 text file 1 text file 1 test file 1 test Try grep for text and test also… #cat bigfile3 file 1 text file 3 text file 1 text file 1 test

sed – Streaming Editor Edit a file(s) with a specified action ▫ sed options 'address action' file(s) Basics: ▫Take input from the file(s) ▫Performs the action on the file(s) ▫Sends output to std out Uses: ▫Select part(s) of a file  By line  By content ▫Edit a file  e.g. create a template, then use sed to customize for a run Oddities ▫Usually need –n to get rid of unwanted duplicated or original lines

sed – Line addressing Select specific lines ▫ #sed '3q' tenline.file Line 1 Line 2 Line 3  Selects the first 3 lines then quits ▫ #sed '$p' tenline.file Last Line  Prints last line  $ - last line  p – print  Show with and without the –n option ▫ #sed '5,7p' tenline.file Line 5 Line 6 Line 7  Prints lines 5 through 7 #cat tenline.file Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7 Line 8 Line 9 Last Line

sed – Line addressing Select specific lines with ; ▫ #sed '1p;3p;$p' tenline.file Line 1 Line 3 Last Line  Prints line 1, 3 and the last line ($) ! Will negate operations ▫ #sed '3,$!p' tenline.file Line 1 Line 2  Does not print line 3 through the end Notes: ▫By default sed will echo the input lines as well as the selected lines   get duplicated lines  Use –n to not echo the input lines

sed – Context addressing Use a pattern to identify lines to work with ▫Use / to delimit the pattern Examples ▫ #sed –n '/2/p' tenline.file Line 2  Find all lines with 2 in them and print ▫ #sed –n '/^2/p' tenline.file  Finds all lines that start with 2 and print  ^ - starting the line #cat tenline.file Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7 Line 8 Line 9 Last Line

sed – Writing selected lines to a file Can use w to write the selected lines to a file Example ▫ sed –n '/2/w twos.file' tenline.file  w instead of p puts the output to a file  -n does not print duplicated

sed – Text editing Can edit the stream ▫i  Insert ▫a  Append ▫c  Change ▫d  Delete ▫s  Substitute

sed - editing Example: inserting ▫ #sed '1i\ >#!/bin/bash\ ># using the bash shell >' test.sh > $$  Notes:  1i inserts text starting line 1  \ is a continuation character within the quotes  Input is the code or text in test.sh  Redirecting the output to $$ (temporary file)  Ends up with the 2 new lines at the beginning in $$  Can further modify $$

sed - editing Use s to indicate substitution Example: substituting ▫ sed 's/a/b/' file  replaces a with b for the first instance on each line ▫ sed 's/a/b/g' file  g (global) replaces a with b for all instances on each line