Download presentation
Presentation is loading. Please wait.
Published byKimberly Malone Modified over 8 years ago
1
ORAFACT Text Processing
2
ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows line numbers -A [NUM] prints match and [NUM] lines after match -B [NUM] prints match and preceding [NUM] lines -C [NUM] prints match and [NUM] lines before and after For -C, [NUM] defaults to 2 -i performs case insensitive match -v inverts match; prints what doesn't match --color highlight matched string in color The grep command in Linux searches a file or files for a pattern and by default prints the lines containing matches. This default behavior is shown in the following example where grep returns the entire line that contains the pattern nobody: $ grep nobody /etc/passwd nobody:x:99:99:Nobody:/:
3
ORAFACT grep Examples: 1. Consider the following extremely simple examples of using the grep command: $ cat file mouse cat dog bear 2. Print all lines that contain the letter e including their line numbers: $ grep -n e file 1:mouse 4:bear 3. Print all lines that do not contain the letter e including their line numbers: $ grep -nv e file 2:cat 3:dog 4. Print lines containing the pattern cat plus one line preceding each match: $ grep -B 1 cat file mouse cat 5. Print lines containing the case insensitive pattern BEAR: $ grep -i BEAR file bear
4
ORAFACT The Streaming Editor sed - A [s]treaming [ed]itor sed [options] filename [...] performs edits on a stream of text (usually the output of another program) often used to automate edits on many files quickly small and very efficient -i switch for in place edits with modern versions Example: $ cat letter I love Windows. Windows is my favorite operating Then sed works its magic fixing the statement with a simple search and replace command: $ sed s/Windows/Linux/g letter I love Linux. Linux is my favorite operating system.
5
ORAFACT Text Processing with awk awk - pattern scanning and processing language $ awk -f awk_script_name /path/to/file Turning complete programming language splits lines into fields (like cut) regex pattern matching (like grep) math operations, control statements, variables, IO... awk Command Examples Print the lines that end with the string bash: $ awk ‘/bash$/’ /etc/passwd... output omitted... Print the names of the users (field one) for each line that end with the string bash: $ awk -F: ‘/bash$/ {print $1}’ /etc/passwd... output omitted...
6
ORAFACT Replacing Text Characters tr - translates, squeezes & deletes characters tr [options] [set1] [set2] translates one set of characters into another commonly used to convert lower case into upper case $ tr a-z A-Z squeeze collapses duplicate characters commonly used to merge multiple blank lines into one $tr -s ‘\n’ deletes a set of characters commonly used to delete special characters tr -d ‘\000’ To display the contents of the lower.txt file and convert all lower-case characters to upper case: $ cat lower.txt | tr a-z A-Z THESE ARE CHARACTERS THAT WERE TYPED INTO THIS FILE IN LOWER CASE To display the contents of the lower.txt file and delete all occurrences of the letter e: $ cat lower.txt | tr -d e ths ar charactrs that wr typd into this fil in lowr cas
7
ORAFACT Text Sorting sort - Sorts text sort [options] filename [...] can sort on different columns by default sorts in lexicographical order 1, 2, 234, 265, 29, 3, 4, 5 can be told to sort numerically 1, 2, 3, 4, 5, 29, 234, 265 can merge and sort multiple files simultaneously can sort in reverse order often used to prepare input for the uniq command -n sort numerically -r sort in reverse order -m do not sort, only merge; this is faster but only works if the input is already sorted -t separator use this as a column separator -k number sort by this column number, counting the first column as 1 -o filename output to the specified file instead of STDOUT
8
ORAFACT Duplicate Removal Utility uniq - Removes duplicate lines from sorted text uniq [options] [filename [filename]] cleanly combines lists of overlapping but not identical information -c prefixes each line of output with a number indicating number of occurrences taking this output and performing a reverse sort produces a sorted list based on number of occurrences -i ignore case, ie b is equivalent to B -D print all duplicated lines -d only print duplicated lines -u only print unique lines -c prefix lines by the number of occurrences
9
ORAFACT uniq Command Examples Consider the following example which shows several of the features of the uniq command: $ cat file$ uniq -c /tmp/file mouse2 mouse mouse1 Mouse Mouse1 cat Cat1 dog dog1 cat cat $ uniq /tmp/file $ uniq -d /tmp/file mouse Mouse Cat dog cat $ uniq -i /tmp/file mouse cat dog
10
ORAFACT Extracting Columns of Text cut - Extracts selected fields from a line of text cut [options] [filename] [...] can specify which fields you want to extract uses tabs as default delimiter -d option to specify a different delimiter most useful on structured input (text with columns) -b range cut and paste only bytes from this range -c range cut and paste only characters from this range -f range cut and paste only this range -d delimiter use this delimiter instead of the default 6-39 from 6 to 39 -12 from 1 (the beginning of the line) to 12 30- from 30 to the end of the line $ cat /etc/passwd | cut -d : -f 1,3 foo:501 bar:502
11
ORAFACT Merging Multiple Files paste - Merges text from multiple files to STDOUT paste [options] [filename] [...] -s option to merge files serially uses tabs as default delimiter $ cat file1 $ cat file2 $ cat file3 Aone1 Btwo2 Cthree3 Dfour4 $ paste file1 file2 file3 $ paste -s file1 file2 file3 A one 1 A B C D B two 2 one two three four C three 3 1 2 3 4 D four 4
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.