Download presentation
Presentation is loading. Please wait.
Published byLetitia Flowers Modified over 9 years ago
1
LIN 6932 Unix Lecture 6 Hana Filip
2
LIN 6932 HW6 - Part II solutions posted on my website see syllabus
3
LIN 6932 Text Processing Command Line Utility Programs sed wc awk comm cut ex iconv join paste sort tr uniq xargs
4
LIN 6932 TextPro Lexicon File Lexicon file “core.text” Background: TextPro An information extraction system used as SRI International, Menlo Park, CA Developed by Doug Appelt
5
LIN 6932 copy “machen.txt” into your account > cd.. > cd c6932aab > ls … machen.txt … > cp machen.txt ~ c6932aad > cd > ls … machen.txt …
6
LIN 6932 Text Processing Command Line Utility Programs tr translate or delete characters Example 1: delete (-d) all the new line characters from “machen.txt”, and redirect the output to a file named “machen-cont.txt”. % cat machen.txt | tr -d "\n" > machen-cont.txt Example 2: delete (-d) all characters from “machen.txt” except for alphabetical characters, new lines, and spaces, and redirect the output to a file named “machen-alpha.txt”. % cat machen.txt | tr -c -d "[:alpha:]\n " > machen-alpha.txt Try also: % cat machen.txt | tr -c -d "[:alpha:]\n" > machen-alpha.txt
7
LIN 6932 Text Processing Command Line Utility Programs tr can be used to make a wordlist from a text. This can be done by replacing all spaces with a newline: % cat machen.txt | tr " " "\n" | less % cat machen.txt | tr " " "\012" | less We can combine the command above with the delete functionality of tr to make a wordlist without unwanted characters: % cat machen.txt | tr " " "\n" | tr -c -d "[:alpha:]\n" > lex
8
LIN 6932 Text Processing Command Line Utility Programs sort prints the lines of its input or concatenation of all files listed in its argument list in sorted order. (The -r flag will reverse the sort order.) % sort -r movie_characters
9
LIN 6932 Text Processing Command Line Utility Programs uniq takes a text file and outputs the file with adjacent identical lines collapsed to one it is a kind of filter program typically it is used after sort % cat machen.txt | tr " " "\n" | tr -c -d "[:alpha:]\n” | sort | uniq > lex
10
LIN 6932 Text Processing Command Line Utility Programs sed = stream editor a special editor for automatically modifying files a find and replace program, it reads text from standard input and writes the result to standard outout (normally the screen) The search pattern is a regular expression (see references). sed search pattern is a regular expression, essentially the same as a grep regular expression often used in a program to make changes in a file
11
LIN 6932 Text Processing Command Line Utility Programs sed: simple example 1 % sed 's/United States/USA/' new-usa-gaz.text s Substitute command /../../ Delimiter United States Regular Expression Pattern String USA Replacement string new_file
12
LIN 6932 Text Processing Command Line Utility Programs sed: simple example 2 % sed 's/\(United\)\(States\)/\2\1/' usa-switch-gaz.text switch two words around \( word onset \) word end /../../delimiter \1 register 1 \2 register 2
13
LIN 6932 Text Processing Command Line Utility Programs multiple sed commands may also be stored in a script file. The "-f" option is used on the command line to access the commands in the script: % sed -f sedscript.sed [file]
14
LIN 6932 Text Processing Command Line Utility Programs % sed 's/^/LexEntry: /g;s/$/ ;./' lex > newlex ^ match the beginning of the line $ match the end of the line
15
LIN 6932 Text Processing Command Line Utility Programs & shell script #! /usr/local/bin/tcsh #usage: make_lex filename1; make_lex filename1 filename2, … # first, make sure the user typed in at least one argument if ( $# < 1 ) then echo "This script needs at least 1 argument." echo "Exiting...(annoyed)" exit 666 endif foreach name ($*) cat $name | tr " " "\n" | tr -c -d "[:alpha:]\n" | sort | uniq > mylex sed 's/^/LexEntry: /g;s/$/ ;./' mylex > newlex echo "Your new lexical file is called 'newlex'." end
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.