CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing
CIT 140: Introduction to ITSlide #2 Topics 1.Compressing files: 1.compress, 2.gzip, 3.bzip2 2.Archiving Files: tar 3.Sorting files: sort
CIT 140: Introduction to ITSlide #3 Data Compression Problem: How can we store X bytes using only Y < X bytes? Solution: Find redundancies in the data. 1.Run-length encoding Encode reptitions as the repeated value and a count. Ex: thethethe -> the3 2.Dictionary encoding Build dictionary of words. Encode each with a number. Common words: the, an, is, this
CIT 140: Introduction to ITSlide #4 Data Compression "Ask not what your country can do for you -- ask what you can do for your country." Dictionary: 1 ask 2 not 3 what 4 your 5 country 6 can 7 do 8 for 9 you Encoded version: “ – ”
CIT 140: Introduction to ITSlide #5 Compressing Files: compress compress [-c] [-d] [-l] [-v] file1 [file2, …] -cSend output to stdout. -dDecompress instead of compressing. -vProvide verbose output.
CIT 140: Introduction to ITSlide #6 Compressing Files Old School The compress command compress [options][file-list]
CIT 140: Introduction to ITSlide #7 The uncompress command Uncompressing Files Old School
CIT 140: Introduction to ITSlide #8 Compressing Files: gzip gzip [-#] [-c] [-d] [-l] [-v] file1 [file2, …] -#Specify compression level. Default=6. -cSend output to stdout. -dDecompress instead of compressing. -lList compression stats. -vProvide verbose output.
CIT 140: Introduction to ITSlide #9 Compressing Files: gzip > man bash >bash.man > man tcsh >tcsh.man > ls –l *man -rw-r--r-- 1 waldenj Oct 4 19:48 bash.man -rw-r--r-- 1 waldenj Oct 4 19:48 tcsh.man > gzip *.man > ls –l *gz -rw-r--r-- 1 waldenj Oct 4 19:45 bash.man.gz -rw-r--r-- 1 waldenj Oct 4 19:45 tcsh.man.gz > gzip –l *gz compressed uncompressed ratio uncompressed_name % bash.man % tcsh.man % (totals) >
CIT 140: Introduction to ITSlide #10 Uncompressing Files: gunzip > gunzip bash.man.gz > ls -l *man *gz -rw-r--r-- 1 waldenj Oct 4 19:45 bash.man -rw-r--r-- 1 waldenj Oct 4 19:45 tcsh.man.gz > gzip -v bash.man bash.man: 73.3% -- replaced with bash.man.gz > gzip -dc bash.man.gz | less User Commands BASH(1) NAME bash - GNU Bourne-Again Shell … > ls -l *man *gz -rw-r--r-- 1 waldenj Oct 4 19:45 bash.man.gz -rw-r--r-- 1 waldenj Oct 4 19:45 tcsh.man.gz
CIT 140: Introduction to ITSlide #11 Modern Compression: bzip2 bzip2 [-#] [-c] [-d] [-l] [-v] file1 [file2, …] -#Specify compression level. Default=9. -cSend output to stdout. -dDecompress instead of compressing. -vProvide verbose output.
CIT 140: Introduction to ITSlide #12 Modern Compression: bzip2 > bzip2 -v bash.man tcsh.man bash.man: 4.821:1, bits/byte, 79.26% saved, in, out. tcsh.man: 4.259:1, bits/byte, 76.52% saved, in, out. > ls -l *bz2 -rw-r--r-- 1 waldenj Oct 4 19:45 bash.man.bz2 -rw-r--r-- 1 waldenj Oct 4 19:48 tcsh.man.bz2 > bzip2 -d bash.man.bz2 > bunzip2 tcsh.man.bz2 > ls -l *.man -rw-r--r-- 1 waldenj Oct 4 19:45 bash.man -rw-r--r-- 1 waldenj Oct 4 19:48 tcsh.man > bzip2 -dc bash.man.bz2 |less User Commands BASH(1) NAME bash - GNU Bourne-Again Shell
CIT 140: Introduction to ITSlide #13 Displaying Compressed Files zcat –Identical to compress -dc gzcat –Identical to gzip -dc bzcat2 –Identical to bzip2 -dc
CIT 140: Introduction to ITSlide #14 Compression Benchmarks > ls -l patch* -rw-r--r-- 1 waldenj Oct 4 19:37 patch rw-r--r-- 1 waldenj Oct 4 19:37 patch Z -rw-r--r-- 1 waldenj Oct 4 19:37 patch bz2 -rw-r--r-- 1 waldenj Oct 4 19:37 patch gz Compression ToolCompression Ratio compress64.6% gzip78.5% bzip282.7%
CIT 140: Introduction to ITSlide #15 Archiving Files: tar tar [-c] [-t] [-x] [-v] [-f file.tar] file1 [file2, …] -cCreate a new tape archive. -fWrite the archive to specified file instead of writing to tape. -tTrace (view) archive contents. -vProvide verbose output. -xeXtract archive contents.
CIT 140: Introduction to ITSlide #16 Archiving Files: tar > tar -cvf manpages.tar *.man bash.man tcsh.man > ls -l manpages.tar -rw-r--r-- 1 waldenj Oct 4 21:01 manpages.tar > tar -tf manpages.tar bash.man tcsh.man > tar -tvf manpages.tar -rw-r--r-- waldenj/students :45 bash.man -rw-r--r-- waldenj/students :48 tcsh.man > mkdir tmp > cd tmp > tar -xvf../manpages.tar bash.man tcsh.man
CIT 140: Introduction to ITSlide #17 Other File Compression Tools PKzip/WinZip zip, unzip ARJ arj, unarj RAR rar, unrar
CIT 140: Introduction to ITSlide #18 Sorting Ordering set of items by some criteria. Systems in which sorting is used include: –Words in a dictionary. –Names of people in a telephone directory. –Numbers.
CIT 140: Introduction to ITSlide #19 Sorting: sort sort [-f] [-i] [-d] [-l] [-v] file1 [file2, …] -dSort in dictionary order (default.) -fIgnore case of letters. -iIgnore non-printable characters. -nSort in numerical order. -rReverse order of sort -uDo not list duplicate lines in output.
CIT 140: Introduction to ITSlide #20 sort Example > cat days.txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday > sort days.txt Friday Monday Saturday Sunday Thursday Tuesday Wednesday
CIT 140: Introduction to ITSlide #21 sort Example > cat days.txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday > sort -r days.txt Wednesday Tuesday Thursday Sunday Saturday Monday Friday
CIT 140: Introduction to ITSlide #22 sort Example > cat numbers.txt > sort numbers.txt > sort -n numbers.txt