Lane Medical Library & Knowledge Management Center Essential UNIX Skills for Biologists Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center 8/14/2008
Lane Medical Library & Knowledge Management Center 2 The Bioresearch Informationist: At Your Service Yannick Pouliot, PhD, Lane Medical Library & Knowledge Management Center Bioresearch Informationist ≈ computational biologist in residence Lane Library service Closely coordinated with CMGMCMGM Role: Support laboratory researchers regarding biocomputational resources and their use …especially postdocs Contact:
Lane Medical Library & Knowledge Management Center 3 Goals Deliver basic understanding of core UNIX commands Tips on running UNIX on Mac and Windows
Lane Medical Library & Knowledge Management Center 4 But First: LaneConnex -- Your Key to Finding Resources Quickly
Lane Medical Library & Knowledge Management Center 5 Why UNIX? UNIX is good for: 1. performing operations with very few key strokes 2. operating on large number of objects for e.g., searching file contents very specifically renaming files moving/copying files UNIX is fast LINUX(≈ UNIX) is free and runs on everything
Lane Medical Library & Knowledge Management Center 6 UNIX Trip-Ups UNIX is capitalization-sensitive ls ≠ Ls What you type is what you get no mistyping! mind those commands e.g., rm –fr = delete everything below the current directory! → DON’T DO THIS AT HOME!
Lane Medical Library & Knowledge Management Center 7 So How Does One Access UNIX? Mac: UNIX underlies Mac’s graphical interface access: Applications → Utilities → Terminal Windows: Must install code (more later)
Lane Medical Library & Knowledge Management Center 8 Exploring UNIX
Lane Medical Library & Knowledge Management Center 9 Key Concepts UNIX is command-line based (no cute icons). There are flavors of UNIX Linux ≈ UNIX “Shell” = command line interface different shells exist, all with identical basic functionality Anything you can imagine, UNIX can do … but you may have to think about it… In UNIX, anything can be done in at least three different ways… UNIX has: commands (built-in) → most of today’s workshop utilities ≈ “super-commands”, e.g., grep, for parsing text not built-in but usually there
Lane Medical Library & Knowledge Management Center 10 Concept: Redirection *** Redirection operator “>” or “<“ : add to file (overwrite) “>>” or “<<“: add to file (don’t overwrite) Applies to both input and output file.txt > prog.exe prog.exe > file.txt File.txt > prog.exe > file1.txt prog.exe >> file.txt
Lane Medical Library & Knowledge Management Center 11 Concept: Metacharacters *** “*”= 0 or more characters of any kind ‘.’ = exactly one character of any kind Metacharacters can be used with nearly any other command, e.g., ls file?.txt ls file*.txt ls *.* more *.txt grep *omics *.txt NB: There are lots of other kinds of metacharacters…
Lane Medical Library & Knowledge Management Center 12 Concept: Stringing Commands Together Using Pipes “I” = pipe, e.g.: ls -1 | more
Lane Medical Library & Knowledge Management Center 13 Overview of Selected UNIX Commands
Lane Medical Library & Knowledge Management Center 14 ls [options] [names] **** List contents of directories, including directories themselves Basically, lists files… When names are provides, lists files contained in a directory name or that match a file name. names can include filename metacharacters. The options display information in different formats. The most useful options include -F, -R, -l, and -s. Examples 1. list all details of all files in current directory ls –l 2. list just the filenames ls create a file that contains a list of the filenames ls -1 > mylist.txt
Lane Medical Library & Knowledge Management Center 15 cat/more/head/tail → commands to look at content of files cat: returns everything more: same but one page at a time **** head: returns top x lines tail: returns bottom x lines all can operate on multiple files Examples 1. show contents of all txt files cat *.txt 2. show first 100 lines of file head +100 file.txt 3. show first 1000 lines of file and paginate: head file.txt | more
Lane Medical Library & Knowledge Management Center 16 grep: Searching File Contents Using Regular Expressions **** grepgrep [options] pattern [files] Searches files for presence of a string grep protein *.pdf about a million options… Also searches using regular expressionsregular expressions Definition: a mathematical expression that expresses the characteristics of one or more strings, e.g.: te?xt *omics
Lane Medical Library & Knowledge Management Center 17 findfind [pathnames] [conditions] *** Very powerful: can specify anything, including exclusions and negations Descends the directory tree beginning at each pathname and locates files that meet the specified conditions. The default pathname is the current directory. Most useful conditions are -name and -type (for general use) Examples 1. List all files named chapter1 in the /work directory: find /work -name chapter1 2. Look for filenames in current directory that don't begin with a capital letter find. ! -name '[A-Z]*'
Lane Medical Library & Knowledge Management Center 18 UNIX on Windows Easy: UnxUtlsUnxUtls = UNIX “light” Excellent for most tasks Not a complete emulation of UNIX Hard: CygwinCygwin difficult to make it behave perfectly can run in parallel with Windows Easier: dual boot ability to boot either Windows or Linux requires reboot…
Lane Medical Library & Knowledge Management Center 19 Resources UNIX commands: ommands ommands Another list of UNIX utilities:
Lane Medical Library & Knowledge Management Center 20 Everything You Need to Know About UNIX in Short Form: eBooks from Lane The ultimate quick reference for LINUX More than you typically need, but you can zoom into what you need
Lane Medical Library & Knowledge Management Center