Download presentation
Presentation is loading. Please wait.
Published byOsborn Miller Modified over 9 years ago
1
Shell Scripting Basics Arun Sethuraman
2
What’s a shell? Command line interpreter for Unix Bourne (sh), Bourne-again (bash), C shell (csh, tcsh), etc Handful of commands Text mining made easy!
3
Before we get started Unix/Mac Users: Open a terminal Windows Users: Should have installed VMware Player, and downloaded the virtual machine with Unix pre-loaded on it (else do it now!)
4
VMware Player Basics Allows creating/playing virtual machines We will use a standalone version of GNU/Linux called SliTaz, which is very minimalist (< 40 mb), but should work for all our exercises. Download all example files from my website: www.sites.google.com/site/arunsethuraman1/teaching instead of from Blackboard. www.sites.google.com/site/arunsethuraman1/teaching Save state of virtual machine, suspend, restart, etc. Switch environments using CTRL+ALT File sharing is a little complicated – so before you submit your assignment for next week, VMware users please email me and stop by my office with your laptop to submit it (unless you can get Gmail to work without any glitches inside Midori).
7
Working at the prompt The ‘prompt’ refers to Unix’s native command line interface. Your prompt should look something like: username@prompt:~$ Prompt commands are similar to python scripts – can specify variables, run one-liner commands, specify entire program flows, etc.
8
Unix 101 Try: man ls pwd clear Ctrl+C echo ps cat tail head cd mkdir rm cp mv cal kill vi/vim find set who
9
Piping Piping (|) refers to sequentially running multiple commands at one go. For eg. Say I want to read a file, then print only the last line of the file, try: cat example1.txt | tail –n 1 ls | grep “exam” cat example4.txt | head Important: Piped commands only work on the output of the previous command!
10
Regular Expressions Describe a pattern (sequence of characters) [A-Z]*, [a-z]* [0-9]*, [0-9]\{n\} Escape (special) characters – start with \ ^ - start of a line $ - end of a line
11
Examples Eg. {bicycle, bidirectional, biology, binary, bigotry, bill, big, bin, bionic, …} Eg. {Sunday, Monday, …, Saturday} Eg. {121, 123, 124, …, 129}
12
Examples TATAAA – TATA box, 25 bases upstream of transcription start site Telomeric repeat - (TTAGGG) n
13
Example 1 – grep Syntax: grep ‘pattern’ Create a new directory. Copy file “example1.txt” from /usr/home/shellbasics to your folder Explore contents of the file using cat/head/tail/vi Explore grep - copy first line of the file into another file (use –n flag) Copy 14 th line/last line/last 4 lines into another file Look for the word “Poe” in example1.txt, paste all instances into another file (name it ) Look for all numbers in the file – what’s wrong?
14
Example 2 – sed Syntax: sed ‘s/ / /g’ Stream Editor – substituting text Substitute all words that are “old” with “new” in example1.txt. Substitute all “a” with “A”, and all “b” with “B” in one line.
15
Example 3 – Your first shell script! Copy example3.sh to your folder. Explore its contents: #!/bin/sh sed ‘ s/a/A/g s/b/B/g ‘ example2.txt > example3.txt Execute this script using./example3.sh Oops – what happened here?
16
Permissions in Unix Unix has three permission/file access modes for all files – read (r), write (w), and execute (x). Need to specify permissions explicitly for executables. Try chmod +x example3.sh, then try./example3.sh
17
Example 3 – contd. Add script to change all small letters to capital letters in example2.txt and save it as a new file, example3.txt Execute it in the command line. Write a script to change find all numbers, and replace them with “[ref]”.
18
Example 4 – awk Syntax: awk ‘{ }’ Used to mine column formatted data. Columns denoted by $ Copy example4.txt to your folder awk to print only the third column of the file and save it to awk to print the 4 th and 5 th columns, separated by a tab character to a new file
19
Example 5 – a FASTA file Copy example5.fasta from /usr/home to your folder Explore its contents – what is the FASTA file format? What does it contain? Do you see a pattern? Now use any of the commands we just learned to extract only the gene-ID from the FASTA file. Print it. Count the number of “AC” repeats, save to a file Save only the first 5 lines in example5.fasta to
20
Example 6 – Executing commands in Shell What is BLAST? Write a shell script to: BLAST against all nucleotide BLAST databases. Save output of BLAST to a separate file – call it What hits do you get? Explore the BLAST output, pull out only gene ID’s for all your hits with ‘e’ value = 0.0, and with Genbank accessions (gb), save it to a new file HINT: You’ll notice that there are multiple ID’s, separated by “|” – to tell awk to use this as a delimiter, use awk ‘BEGIN { FS=“|”};…’ HINT: To sort a list, use “sort” function
22
Example 7 – Advanced scripts (Assignment) Write a python script to pull all gene ID’s from, look for these gene ID’s against NCBI and obtain all hits, save it to a file. Execute this python script, then parse out only protein id’s (gene/protein=) values from it using a shell script into a separate file. Copy all these protein ID’s (they should be Genbank accession ID’s), paste into the query at www.pantherdb.org, select all species on the list, add PANTHER-GO-Slim Biological Process to your columns. www.pantherdb.org
24
Assignment (contd.) Save the output of PANTHER as a file. Now parse this file using grep/sed/awk to print only the GO terms – they should be separated by ; Make a unique list of these GO terms by using the ‘uniq’ function, save this to a final assignment submission file. HINT: Prior to pulling unique values, try replacing the “;” values with something else, say a newline character “\n”.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.