Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 © 2001 John Urrutia. All rights reserved. CIS52 – File Manipulation File Manipulation Utilities Regular Expressions sed, awk.

Similar presentations


Presentation on theme: "1 © 2001 John Urrutia. All rights reserved. CIS52 – File Manipulation File Manipulation Utilities Regular Expressions sed, awk."— Presentation transcript:

1 1 © 2001 John Urrutia. All rights reserved. CIS52 – File Manipulation File Manipulation Utilities Regular Expressions sed, awk

2 2 © 2001 John Urrutia. All rights reserved. Overview comm – comparison of sorted files cut – output sections of lines in a file find – find files that match a pattern paste – merges records in files pr – paginate files into pages tr – translate or delete characters

3 3 © 2001 John Urrutia. All rights reserved. Overview regular expressions sed – S tream Ed itor (batch file editor) awk – A ho,W einberger,K ernighan ( Pattern match )

4 4 © 2001 John Urrutia. All rights reserved. The comm before the storm Compares 2 sorted files  Results reported in 3 columns  1 st – records found only in file 1  2 nd – records found only in file 2  3 rd – records that match in both files  Options remove corresponding columns  – [1] [2] [3]

5 5 © 2001 John Urrutia. All rights reserved. comm – cont. Either file name can be substituted with standard input Example:  File1File2 aabb ddcc eedd ggee hhff

6 6 © 2001 John Urrutia. All rights reserved. comm results File1File2Both aa bb cc dd ee ff gg hh option bb cc dd ee ff option -2-2 aa dd ee gg hh option -12 dd ee

7 7 © 2001 John Urrutia. All rights reserved. cut to the chase Allows you to extract portions of each record in a file. Delimits data in the file into fields or columns.  Default delimiter is the tab character  Can be changed by the –d option

8 8 © 2001 John Urrutia. All rights reserved. cut cont. cut - [b | c | [ f [-d char ] [-s] ] list [--output-delimiter=string]  b – bytes  c – characters (same as bytes)  f – fields  d – delimiter character  s– display only records with delimiters

9 9 © 2001 John Urrutia. All rights reserved. cut ! print char – single byte used to delimit fields in a record list – list of range/s of characters to display  Ranges are comma separated.  1-7 first 7 characters in record  1,7 first and seventh characters

10 10 © 2001 John Urrutia. All rights reserved. cut ! print again string – list of characters to substitute for the delimiters.

11 11 © 2001 John Urrutia. All rights reserved. cut - Example [/@linux2 uid]$ cat file1 The quick brown fox eyed the jactitating dog [/@linux2 uid]$ cut –f1,3,5,8 –d’ ‘ file1 The brown eyed dog [/@linux2 uid]$ cut –f1,4-6,8 –d’ ‘ file1 The fox eyed the dog

12 12 © 2001 John Urrutia. All rights reserved. find that pot of gold find – selects all files that meet the selection criteria in the expression  No action is taken unless it is specified  Sub-directories are scanned automatically  The expression can be simple or complex

13 13 © 2001 John Urrutia. All rights reserved. find me something The criteria expression:  And’s each operand separated by a space  Or’s each operand separated by –o  Processes left to right sequentially

14 14 © 2001 John Urrutia. All rights reserved. find criteria continued Actions  -print prints the path of all files that meet the selection criteria  -exec cmds\; executes the commands before the \:  -ok same as –exec but must have a Y from stdin.

15 15 © 2001 John Urrutia. All rights reserved. find criteria continued again Evaluations  -type specify a type of file ( ie. directory )  -atime ±n accessed ±n days ago.  -mtime ±n modified ±n days ago.  -user uid owner of the file  -nouser uid owner is not known to system

16 16 © 2001 John Urrutia. All rights reserved. paste tastes good paste [options] [filelist] each record in the file is merged into 1 record  -s process filelist sequentially. All records are processed before going to the next file  -d [delimiter list] each character in turn delimits the file records.

17 17 © 2001 John Urrutia. All rights reserved. paste continued [/@linux2 uid]$ cat file1 A B C [/@linux2 uid]$ cat file2 1 2 3 [/@linux2 uid]$ cat file3 x y z

18 18 © 2001 John Urrutia. All rights reserved. paste continued [/@linux2 uid]$ paste file1 file2 file3 Output file A1x B2y C3z [/@linux2 uid]$ paste –s file1 file2 file3 Output file ABC 123 xyz

19 19 © 2001 John Urrutia. All rights reserved. pr – public relations--NOT pr paginate file(s) for printing  Can specify page attributes  Changed lines through the –l option  For multiple files each starts a new page

20 20 © 2001 John Urrutia. All rights reserved. pr – continued pr paginate a file for printing  Creates a header and trailer  Changed through the –h option  Suppress through the –t option  Can create columns of data  – nbr Number of columns per line  –S x Character used to separate columns

21 21 © 2001 John Urrutia. All rights reserved. pr – continued  Can create numbers for each line  –n ck  c - character data separator default is tab character  k – number of digits

22 22 © 2001 John Urrutia. All rights reserved. Regular Expressions A set of characters that define the criteria used to identify a string within a record. Used by vi, grep, sed, awk, and others.

23 23 © 2001 John Urrutia. All rights reserved. tr – Translate this tr – [c] [d] [s] [t] set1 [ set2 ] Translate from set1 to set2  c – compliment of set1  d – delete characters found in set1  s – squeeze out duplicates  t – truncate set1 to length of set2

24 24 © 2001 John Urrutia. All rights reserved. Regular Expressions Simple strings  Bound by / … /  Interpreted literally  ie. /e D/ - matches exactly e D  Taste Dee – OK  Taste don’t – not OK

25 25 © 2001 John Urrutia. All rights reserved. Regular Expressions The special single sub character  Matches any single character  ie. – /.eny/ matches Aeny Beny Ceny The [ char-range ] define a character class The [^ char-range ] define the not-in- character class

26 26 © 2001 John Urrutia. All rights reserved. Regular Expressions The  (asterisk)  Matches 0 or more of the preceding character. What’s this?  /.  /  / [ a-zA-Z ]  /  / ([ ^ )]  )/

27 27 © 2001 John Urrutia. All rights reserved. Regular Expressions The /^ ( for the rabbit ) character  In the beginning … The $/ ( for the teacher ) character  At the end …

28 28 © 2001 John Urrutia. All rights reserved. Regular Expressions Quote the raven – backslash  \. This yields   \\ This yields  \  \* This yields  *  \[ This yields  [  \] This yields  ]  \ / This yields  /

29 29 © 2001 John Urrutia. All rights reserved. sed – the old Stream EDitor sed [-n] [-f script ] [file-list] Copies and edits to standard output Edits file(s) in a non-interactive mode Gets its instructions from a script file  –f filename contains sed instructions  No option 1 st command argument is used  –n suppress stdout unless specified

30 30 © 2001 John Urrutia. All rights reserved. sed – the old mill stream Record processing 1.Read record from file list 2.Read record from script (or cmd line) 3.Apply selection criteria 4.If selected perform instruction and repeat 2  4 until no more script 5.Repeat 1  5 until no more file list.

31 31 © 2001 John Urrutia. All rights reserved. He sed what!!?? Instruction format [addr1 ],addr2 ] ] inst [arg-list] Address  A line number  Regular expression  Addr1 – start  Addr2 – stop

32 32 © 2001 John Urrutia. All rights reserved. Address line numbers $ Designates the last line of the last file 1 st address line number  Starts selecting records based on their position in the input file list relative to 1. 2 nd address line number  Stops selecting records when position in the input file list is > than the line number.

33 33 © 2001 John Urrutia. All rights reserved. He sed some more Instructions  ! – Not negates the address selection  sed ‘!/line/ p’ file.list  {…} – Groups the instructions for the address selection

34 34 © 2001 John Urrutia. All rights reserved. sed Instructions p – Print now and continue d – Delete and get the next record q – Quit processing; Stop; Go Away

35 35 © 2001 John Urrutia. All rights reserved. sed Instructions c – Change  [addr1] [addr2] c\ yada yada yada all selected records are replaced as a group by the change value a – Append  [addr1] a\ … add the text to the end of the selected records

36 36 © 2001 John Urrutia. All rights reserved. sed Instructions i – Insert  [addr1] a\ … add the text to the beginning of the selected records n – Next  [addr1] n writes the current, gets the next and continues the script

37 37 © 2001 John Urrutia. All rights reserved. sed Instructions w – Write  [addr1] [,addr2] w filename writes the selected records to a file r – Read  [addr1] r filename reads records from the filename and appends them to the selected record

38 38 © 2001 John Urrutia. All rights reserved. sed Instructions s – Substitute  [addr1] [,addr2] s/ ptrn / repl /[g] [p] [w f ] for each selected record match the pattern and replace  g – Replace all non-overlapping occurrences  p – Print the record  w – write the record to the filename

39 39 © 2001 John Urrutia. All rights reserved. Hawk – Squawk – awk The programmable utility that does everything. Aho – Weinberger – Kernighan Provides:  Conditional execution  Looping Handles:  Numeric & string variables  Regular expresions  C print facilities

40 40 © 2001 John Urrutia. All rights reserved. awk awk [–F c ] [–f] program-file [ file list ]  F – field delimiter character  f – name of the awk program file  program-file instream instructions  List of files to process

41 41 © 2001 John Urrutia. All rights reserved. awk – program lines pattern [ action ]  Like sed pattern selects records  Record processing is the same as sed

42 42 © 2001 John Urrutia. All rights reserved. awk – pattern Patterns follow regular expression format.  ~ Tests for match to regular expression  !~ Tests for NO match to regular expression , – Establishes a pattern range all records are processed inclusively within the range  BEGIN executes before the first record is processed  END executes after the last record is processed

43 43 © 2001 John Urrutia. All rights reserved. awk – relational operators < – less than <= – less than or equal to == – equal to != – not equal to >= – greater than or equal to > – greater than

44 44 © 2001 John Urrutia. All rights reserved. awk – operators Arithmetic  + – addition  - – subtraction  * – multiplication  / – division Assignment  = – assigns value to the left  += – adds value to the left

45 45 © 2001 John Urrutia. All rights reserved. awk – boolean operators &&– and ||– or !– not

46 46 © 2001 John Urrutia. All rights reserved. awk – actions # - Comment to the right on any line Default action is print to stdout Multiple actions can be taken  Use {…} to enclose multiple actions  Separate actions with ;

47 47 © 2001 John Urrutia. All rights reserved. awk – actions print variable …  Var, Var2, Var3  Prints variables separated by delimiter  Var Var2 Var3  NO separators  “ literal value “  Prints exactly everything between the “ “

48 48 © 2001 John Urrutia. All rights reserved. awk – actions printf “cntl string” variable …  Control String  \n – new line  \t – tab  %[-] [ n ] [. d ] conv char  - left justification  n number of character . d decimal positions

49 49 © 2001 John Urrutia. All rights reserved. awk – actions  %[-] [ n ] [. d ] conv char  - left justification  n number of character . d decimal positions  conv char – conversion character d - decimal, e - exponent, f - floating-point o - octal, x - hexadecimal s - string

50 50 © 2001 John Urrutia. All rights reserved. awk – variables awk provided variables  NF – total number of fields  $1…$n – each field in the current record  FS – input field separator (default space or tab )  OFS – output field separator (default space )

51 51 © 2001 John Urrutia. All rights reserved. awk – variables awk provided variables  NR – current record number  $0 – entire current record  RS – record separator (default newline )  ORS – output record separator (default newline )  FILENAME – name of current input file

52 52 © 2001 John Urrutia. All rights reserved. awk - variables Associative Arrays  array_name [ string ]  The array name should be meaningful  The index of the array is a string  Elements are automatically created  for ( element in array ) actions

53 53 © 2001 John Urrutia. All rights reserved. awk - functions length(string) – returns the number of characters in string int(num) – returns the integer portion index(str1,str2) – returns the index of str2 found in str1 or 0 if not present split(str,arr,del) – populates arr[ ] from fields in str delimited by del – returns count of elements.

54 54 © 2001 John Urrutia. All rights reserved. awk - functions sprintf(fmt, args) – formats args using the fmt and returns the formatted string. substr(str, pos, len) – returns a substring of str starting with position pos for a length of len.


Download ppt "1 © 2001 John Urrutia. All rights reserved. CIS52 – File Manipulation File Manipulation Utilities Regular Expressions sed, awk."

Similar presentations


Ads by Google