Download presentation
Presentation is loading. Please wait.
Published byDarleen Newman Modified over 8 years ago
1
1 © 2001 John Urrutia. All rights reserved. CIS52 – File Manipulation File Manipulation Utilities Regular Expressions sed, awk
2
2 © 2001 John Urrutia. All rights reserved. Overview comm – comparison of sorted files cut – output sections of lines in a file find – find files that match a pattern paste – merges records in files pr – paginate files into pages tr – translate or delete characters
3
3 © 2001 John Urrutia. All rights reserved. Overview regular expressions sed – S tream Ed itor (batch file editor) awk – A ho,W einberger,K ernighan ( Pattern match )
4
4 © 2001 John Urrutia. All rights reserved. The comm before the storm Compares 2 sorted files Results reported in 3 columns 1 st – records found only in file 1 2 nd – records found only in file 2 3 rd – records that match in both files Options remove corresponding columns – [1] [2] [3]
5
5 © 2001 John Urrutia. All rights reserved. comm – cont. Either file name can be substituted with standard input Example: File1File2 aabb ddcc eedd ggee hhff
6
6 © 2001 John Urrutia. All rights reserved. comm results File1File2Both aa bb cc dd ee ff gg hh option bb cc dd ee ff option -2-2 aa dd ee gg hh option -12 dd ee
7
7 © 2001 John Urrutia. All rights reserved. cut to the chase Allows you to extract portions of each record in a file. Delimits data in the file into fields or columns. Default delimiter is the tab character Can be changed by the –d option
8
8 © 2001 John Urrutia. All rights reserved. cut cont. cut - [b | c | [ f [-d char ] [-s] ] list [--output-delimiter=string] b – bytes c – characters (same as bytes) f – fields d – delimiter character s– display only records with delimiters
9
9 © 2001 John Urrutia. All rights reserved. cut ! print char – single byte used to delimit fields in a record list – list of range/s of characters to display Ranges are comma separated. 1-7 first 7 characters in record 1,7 first and seventh characters
10
10 © 2001 John Urrutia. All rights reserved. cut ! print again string – list of characters to substitute for the delimiters.
11
11 © 2001 John Urrutia. All rights reserved. cut - Example [/@linux2 uid]$ cat file1 The quick brown fox eyed the jactitating dog [/@linux2 uid]$ cut –f1,3,5,8 –d’ ‘ file1 The brown eyed dog [/@linux2 uid]$ cut –f1,4-6,8 –d’ ‘ file1 The fox eyed the dog
12
12 © 2001 John Urrutia. All rights reserved. find that pot of gold find – selects all files that meet the selection criteria in the expression No action is taken unless it is specified Sub-directories are scanned automatically The expression can be simple or complex
13
13 © 2001 John Urrutia. All rights reserved. find me something The criteria expression: And’s each operand separated by a space Or’s each operand separated by –o Processes left to right sequentially
14
14 © 2001 John Urrutia. All rights reserved. find criteria continued Actions -print prints the path of all files that meet the selection criteria -exec cmds\; executes the commands before the \: -ok same as –exec but must have a Y from stdin.
15
15 © 2001 John Urrutia. All rights reserved. find criteria continued again Evaluations -type specify a type of file ( ie. directory ) -atime ±n accessed ±n days ago. -mtime ±n modified ±n days ago. -user uid owner of the file -nouser uid owner is not known to system
16
16 © 2001 John Urrutia. All rights reserved. paste tastes good paste [options] [filelist] each record in the file is merged into 1 record -s process filelist sequentially. All records are processed before going to the next file -d [delimiter list] each character in turn delimits the file records.
17
17 © 2001 John Urrutia. All rights reserved. paste continued [/@linux2 uid]$ cat file1 A B C [/@linux2 uid]$ cat file2 1 2 3 [/@linux2 uid]$ cat file3 x y z
18
18 © 2001 John Urrutia. All rights reserved. paste continued [/@linux2 uid]$ paste file1 file2 file3 Output file A1x B2y C3z [/@linux2 uid]$ paste –s file1 file2 file3 Output file ABC 123 xyz
19
19 © 2001 John Urrutia. All rights reserved. pr – public relations--NOT pr paginate file(s) for printing Can specify page attributes Changed lines through the –l option For multiple files each starts a new page
20
20 © 2001 John Urrutia. All rights reserved. pr – continued pr paginate a file for printing Creates a header and trailer Changed through the –h option Suppress through the –t option Can create columns of data – nbr Number of columns per line –S x Character used to separate columns
21
21 © 2001 John Urrutia. All rights reserved. pr – continued Can create numbers for each line –n ck c - character data separator default is tab character k – number of digits
22
22 © 2001 John Urrutia. All rights reserved. Regular Expressions A set of characters that define the criteria used to identify a string within a record. Used by vi, grep, sed, awk, and others.
23
23 © 2001 John Urrutia. All rights reserved. tr – Translate this tr – [c] [d] [s] [t] set1 [ set2 ] Translate from set1 to set2 c – compliment of set1 d – delete characters found in set1 s – squeeze out duplicates t – truncate set1 to length of set2
24
24 © 2001 John Urrutia. All rights reserved. Regular Expressions Simple strings Bound by / … / Interpreted literally ie. /e D/ - matches exactly e D Taste Dee – OK Taste don’t – not OK
25
25 © 2001 John Urrutia. All rights reserved. Regular Expressions The special single sub character Matches any single character ie. – /.eny/ matches Aeny Beny Ceny The [ char-range ] define a character class The [^ char-range ] define the not-in- character class
26
26 © 2001 John Urrutia. All rights reserved. Regular Expressions The (asterisk) Matches 0 or more of the preceding character. What’s this? /. / / [ a-zA-Z ] / / ([ ^ )] )/
27
27 © 2001 John Urrutia. All rights reserved. Regular Expressions The /^ ( for the rabbit ) character In the beginning … The $/ ( for the teacher ) character At the end …
28
28 © 2001 John Urrutia. All rights reserved. Regular Expressions Quote the raven – backslash \. This yields \\ This yields \ \* This yields * \[ This yields [ \] This yields ] \ / This yields /
29
29 © 2001 John Urrutia. All rights reserved. sed – the old Stream EDitor sed [-n] [-f script ] [file-list] Copies and edits to standard output Edits file(s) in a non-interactive mode Gets its instructions from a script file –f filename contains sed instructions No option 1 st command argument is used –n suppress stdout unless specified
30
30 © 2001 John Urrutia. All rights reserved. sed – the old mill stream Record processing 1.Read record from file list 2.Read record from script (or cmd line) 3.Apply selection criteria 4.If selected perform instruction and repeat 2 4 until no more script 5.Repeat 1 5 until no more file list.
31
31 © 2001 John Urrutia. All rights reserved. He sed what!!?? Instruction format [addr1 ],addr2 ] ] inst [arg-list] Address A line number Regular expression Addr1 – start Addr2 – stop
32
32 © 2001 John Urrutia. All rights reserved. Address line numbers $ Designates the last line of the last file 1 st address line number Starts selecting records based on their position in the input file list relative to 1. 2 nd address line number Stops selecting records when position in the input file list is > than the line number.
33
33 © 2001 John Urrutia. All rights reserved. He sed some more Instructions ! – Not negates the address selection sed ‘!/line/ p’ file.list {…} – Groups the instructions for the address selection
34
34 © 2001 John Urrutia. All rights reserved. sed Instructions p – Print now and continue d – Delete and get the next record q – Quit processing; Stop; Go Away
35
35 © 2001 John Urrutia. All rights reserved. sed Instructions c – Change [addr1] [addr2] c\ yada yada yada all selected records are replaced as a group by the change value a – Append [addr1] a\ … add the text to the end of the selected records
36
36 © 2001 John Urrutia. All rights reserved. sed Instructions i – Insert [addr1] a\ … add the text to the beginning of the selected records n – Next [addr1] n writes the current, gets the next and continues the script
37
37 © 2001 John Urrutia. All rights reserved. sed Instructions w – Write [addr1] [,addr2] w filename writes the selected records to a file r – Read [addr1] r filename reads records from the filename and appends them to the selected record
38
38 © 2001 John Urrutia. All rights reserved. sed Instructions s – Substitute [addr1] [,addr2] s/ ptrn / repl /[g] [p] [w f ] for each selected record match the pattern and replace g – Replace all non-overlapping occurrences p – Print the record w – write the record to the filename
39
39 © 2001 John Urrutia. All rights reserved. Hawk – Squawk – awk The programmable utility that does everything. Aho – Weinberger – Kernighan Provides: Conditional execution Looping Handles: Numeric & string variables Regular expresions C print facilities
40
40 © 2001 John Urrutia. All rights reserved. awk awk [–F c ] [–f] program-file [ file list ] F – field delimiter character f – name of the awk program file program-file instream instructions List of files to process
41
41 © 2001 John Urrutia. All rights reserved. awk – program lines pattern [ action ] Like sed pattern selects records Record processing is the same as sed
42
42 © 2001 John Urrutia. All rights reserved. awk – pattern Patterns follow regular expression format. ~ Tests for match to regular expression !~ Tests for NO match to regular expression , – Establishes a pattern range all records are processed inclusively within the range BEGIN executes before the first record is processed END executes after the last record is processed
43
43 © 2001 John Urrutia. All rights reserved. awk – relational operators < – less than <= – less than or equal to == – equal to != – not equal to >= – greater than or equal to > – greater than
44
44 © 2001 John Urrutia. All rights reserved. awk – operators Arithmetic + – addition - – subtraction * – multiplication / – division Assignment = – assigns value to the left += – adds value to the left
45
45 © 2001 John Urrutia. All rights reserved. awk – boolean operators &&– and ||– or !– not
46
46 © 2001 John Urrutia. All rights reserved. awk – actions # - Comment to the right on any line Default action is print to stdout Multiple actions can be taken Use {…} to enclose multiple actions Separate actions with ;
47
47 © 2001 John Urrutia. All rights reserved. awk – actions print variable … Var, Var2, Var3 Prints variables separated by delimiter Var Var2 Var3 NO separators “ literal value “ Prints exactly everything between the “ “
48
48 © 2001 John Urrutia. All rights reserved. awk – actions printf “cntl string” variable … Control String \n – new line \t – tab %[-] [ n ] [. d ] conv char - left justification n number of character . d decimal positions
49
49 © 2001 John Urrutia. All rights reserved. awk – actions %[-] [ n ] [. d ] conv char - left justification n number of character . d decimal positions conv char – conversion character d - decimal, e - exponent, f - floating-point o - octal, x - hexadecimal s - string
50
50 © 2001 John Urrutia. All rights reserved. awk – variables awk provided variables NF – total number of fields $1…$n – each field in the current record FS – input field separator (default space or tab ) OFS – output field separator (default space )
51
51 © 2001 John Urrutia. All rights reserved. awk – variables awk provided variables NR – current record number $0 – entire current record RS – record separator (default newline ) ORS – output record separator (default newline ) FILENAME – name of current input file
52
52 © 2001 John Urrutia. All rights reserved. awk - variables Associative Arrays array_name [ string ] The array name should be meaningful The index of the array is a string Elements are automatically created for ( element in array ) actions
53
53 © 2001 John Urrutia. All rights reserved. awk - functions length(string) – returns the number of characters in string int(num) – returns the integer portion index(str1,str2) – returns the index of str2 found in str1 or 0 if not present split(str,arr,del) – populates arr[ ] from fields in str delimited by del – returns count of elements.
54
54 © 2001 John Urrutia. All rights reserved. awk - functions sprintf(fmt, args) – formats args using the fmt and returns the formatted string. substr(str, pos, len) – returns a substring of str starting with position pos for a length of len.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.