Bioinformatics 生物信息学理论和实践 唐继军 13928761660.

Slides:



Advertisements
Similar presentations
Learning Unix/Linux Bioinformatics Orientation 2008 Eric Bishop.
Advertisements

CS311 – Today's class Perl – Practical Extraction Report Language. Assignment 2 discussion Lecture 071CS Operating Systems I.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
Scalar Variables Start the file with: #! /usr/bin/perl –w No spaces or newlines before the the #! “#!” is sometimes called a “shebang”. It is a signal.
Perl Lecture #1 Scripting Languages Fall Perl Practical Extraction and Report Language -created by Larry Wall -- mid – 1980’s –needed a quick language.
T UTORIAL OF U NIX C OMMAND & SHELL SCRIPT S 5027 Professor: Dr. Shu-Ching Chen TA: Samira Pouyanfar Spring 2015.
Linux & Shell Scripting Small Group Lecture 4 How to Learn to Code Workshop group/ Erin.
Shell Scripting Basics Arun Sethuraman. What’s a shell? Command line interpreter for Unix Bourne (sh), Bourne-again (bash), C shell (csh, tcsh), etc Handful.
1 SEEM3460 Tutorial Unix Introduction. 2 Introduction What is Unix? An operation system (OS), similar to Windows, MacOS X Why learn Unix? Greatest Software.
L INUX C OMMAND L INE I NTERFACE G UNAANBAN.G
CS 141 Labs are mandatory. Attendance will be taken in each lab. Make account on moodle. Projects will be submitted via moodle.
Using Macs and Unix Nancy Griffeth January 6, 2014 Funding for this workshop was provided by the program “Computational Modeling and Analysis of Complex.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
MCB Lecture #3 Sept 2/14 Intro to UNIX terminal.
COMP1070/2002/lec4/H.Melikian COMP1070 Lecture #5  Files and directories in UNIX  Various types of files  File attributes  Notion of pathname  Commands.
Using the Unix Shell There is No ‘Undelete’. The Unix Shell “A Unix shell is a command-line interpreter or shell that provides a traditional user interface.
Bioinformatics 生物信息学理论和实践 唐继军
1 THE UNIX FILE SYSTEM By Chokechai Chuensukanant ID COSC 513 Operating System.
Unix Primer. Unix Shell The shell is a command programming language that provides an interface to the UNIX operating system. The shell is a “regular”
Lesson 7-Creating and Changing Directories. Overview Using directories to create order. Managing files in directories. Using pathnames to manage files.
Linux Directory Navigation. File & Directory Commands This is a minimal list of Unix commands that you must know for file management: ls (list) mkdir.
Linux environment ● Graphical interface – X-window + window manager ● Text interface – terminal + shell.
1 Lecture 2 Working with Files and Directories COP 3344 Introduction to UNIX.
Introduction to Shell Script Programming
Essential Unix at ACEnet Joey Bernard, Computational Research Consultant.
Unix Basics Chapter 4.
AN INTRO TO UNIX/LINUX COMMANDS BY: JIAYANG WANG.
Unix Tutorial for FreeSurfer Users. Helpful To Know FreeSurfer Tutorial Wiki:
System Administration Introduction to Unix Session 2 – Fri 02 Nov 2007 Reference:  chapter 1, The Unix Programming Environment, Kernighan & Pike, ISBN.
Unix Tutorial for FreeSurfer Users. Helpful To Know FreeSurfer Tutorial Wiki:
Linux Operations and Administration
1 System Administration Introduction to Scripting, Perl Session 3 – Sat 10 Nov 2007 References:  chapter 1, The Unix Programming Environment, Kernighan.
Agenda Link of the week Use of Virtual Machine Review week one lab assignment This week’s expected outcomes Review next lab assignments Break Out Problems.
Bioinformatics 生物信息学理论和实践 唐继军
Week Two Agenda Announcements Link of the week Use of Virtual Machine Review week one lab assignment This week’s expected outcomes Next lab assignments.
Introduction to Perl Yupu Liang cbio at MSKCC
Week Two Agenda Announcements Link of the week Use of Virtual Machine Review week one lab assignment This week’s expected outcomes Next lab assignments.
Chapter 1 : The Linux System Part 2 Lecture 2 11/14/
Basic Unix Commands CGS 3460, Lecture 6 Jan 23, 2006 Zhen Yang.
UNIX An Introduction. Brief History UNIX UNIX Created at Bell Labs, 1969 Created at Bell Labs, 1969 BSD during mid 70s BSD during mid 70s AT&T began offering.
Writing Scripts Hadi Otrok COEN 346.
Introduction to Programming Using C An Introduction to Operating Systems.
Week Two Agenda Announcements Link of the week Use of Virtual Machine Review week one lab assignment This week’s expected outcomes Next lab assignments.
Lesson 3-Touring Utilities and System Features. Overview Employing fundamental utilities. Linux terminal sessions. Managing input and output. Using special.
Linux Commands C151 Multi-User Operating Systems.
Introduction to Perl. What is Perl Perl is an interpreted language. This means you run it through an interpreter, not a compiler. Similar to shell script.
2 Manual & Filestore Mauro Jaskelioff. Introduction Using the manual The UNIX filestore File permissions.
1 Lecture 2 Working with Files and Directories COP 3353 Introduction to UNIX.
Your Home Directory When you login to the server, you always start in your Home directory. Create sub-directories to store specific projects or groups.
AN INTRO TO UNIX/LINUX COMMANDS BY: JIAYANG WANG.
A Brief Overview of Unix Brandon Bohrer. Topics What is Unix? – Quick introduction Documentation – Where to get it, how to use it Text Editors – Know.
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
File Management commands cat Cat command cat cal.txt cat command displays the contents of a file here cal.txt on screen (or standard out).
Linux Tutorial Lesson Two *Getting Help in Linux *Data movement and manipulation *Relative and Absolute path *Processes Note: see chapter 1,2,3 from Linux.
CS 120 Extra: The CS1 Server Tarik Booker CS 120.
Learning Unix/Linux Based on slides from: Eric Bishop.
INTRODUCTION TO SHELL SCRIPTING By Byamukama Frank
Tutorial of Unix Command & shell scriptS 5027
More about comments Review Single Line Comments The # sign is for comments. A comment is a line of text that Python won’t try to run as code. Its just.
Some Linux Commands.
C151 Multi-User Operating Systems
INTRODUCTION TO UNIX: The Shell Command Interface
Tutorial of Unix Command & shell scriptS 5027
Tutorial of Unix Command & shell scriptS 5027
LING 408/508: Computational Techniques for Linguists
Perl for Bioinformatics
Tutorial of Unix Command & shell scriptS 5027
Introduction Paul Flynn
Tutorial Unix Command & Makefile CIS 5027
Presentation transcript:

Bioinformatics 生物信息学理论和实践 唐继军

作业 GTTGCAGCAATGGTAGACTCAACGGTAGCAAT AACTGCAGGACCTAGAGGAAAAACAGTAGGG ATTAATAAGCCCTATGGAGCACCAGAAATTAC AAAAGATGGTTATAAGGTGATGAAGGGTATC AAGCCTGAA 为什么用缺省 blast 出不来结果?需要如何 选择? 相关物种的最新 pubmed 文章有哪些?

Working with Directories Directories are a means of organizing your files on a Linux computer. They are equivalent to folders on Windows and Macintosh computers Directories contain files, executable programs, and sub-directories Understanding how to use directories is crucial to manipulating your files on a Linux system.

File & Directory Commands This is a minimal list of Linux commands that you must know for file management: All of these commands can be modified with many options. Learn to use Linux ‘man’ pages for more information. ls (list)mkdir (make directory) cd (change directory)pwd (present directory) cp (copy) rm (remove) mv (move)more (view by page) cat (view entire)man (help)

Navigation pwd (present working directory) shows the name and location of the directory where you are currently working: > pwd /home/jtang This is a “pathname,” the slashes indicate sub-directories The initial slash is the “root” of the whole filesytem ls (list) gives you a list of the files in the current directory: > ls assembin4.fasta Misc test2.txt bin temp testfile Use the ls -l (long) option to get more information about each file > ls -l total 1768 drwxr-x--- 2 browns02 users 8192 Aug 28 18:26 Opioid -rw-r browns02 users 6205 May af gb_in2 -rw-r browns02 users May af fasta

Sub-directories cd (change directory) moves you to another directory >cd Misc > pwd /u/browns02/Misc mkdir (make directory) creates a new sub-directory inside of the current directory > ls assembler phrap space > mkdir subdir > ls assembler phrap space subdir rmdir (remove directory) deletes a sub- directory, but the sub-directory must be empty > rmdir subdir > ls assembler phrap space

Create new files nano vi/vim emacs

Programming perl python c/c++ R Java

more Use the command more to view at the contents of a file one screen at a time: > more t27054_cel.pep !!AA_SEQUENCE 1.0 P1;T hypothetical protein Y49E Caenorhabditis elegans Length: 534 May 30, :49 Type: P Check: MLKKAPCLFG SAIILGLLLA AAGVLLLIGI PIDRIVNRQV IDQDFLGYTR 51 DENGTEVPNA MTKSWLKPLY AMQLNIWMFN VTNVDGILKR HEKPNLHEIG 101 PFVFDEVQEK VYHRFADNDT RVFYKNQKLY HFNKNASCPT CHLDMKVTIP t27054_cel.pep (87%) Hit the spacebar to page down through the file Ctrl-U moves back up a page At the bottom of the screen, more shows how much of the file has been displayed Similar command: less

Copy & Move cp lets you copy a file from any directory to any other directory, or create a copy of a file with a new name in one directory cp filename.ext newfilename.ext cp filename.ext subdir/newname.ext cp /u/jdoe01/filename.ext./subdir/newfilename.ext mv allows you to move files to other directories, but it is also used to rename files. Filename and directory syntax for mv is exactly the same as for the cp command. mv filename.ext subdir/newfilename.ext NOTE: When you use mv to move a file into another directory, the current file is deleted.

Delete Use the command rm (remove) to delete files There is no way to undo this command!!! We have set the server to ask if you really want to remove each file before it is deleted. You must answer “Y” or else the file is not deleted. But can use –f rm –rf

View File Permissions Use the ls -l command to see the permissions for all files in a directory: The username of the owner is shown in the third column. (The owner of the files listed above is jtang) The owner belongs to the group “None” The access rights for these files is shown in the first column. This column consists of 10 characters known as the attributes of the file: r, w, x, and - r indicates read permission w indicates write (and delete) permission x indicates execute (run) permission - indicates no permission for that operation $ ls -l total 2 -rw-r--r-- 1 jtang None 56 Feb 29 11:21 data.txt -rwxr-xr-x 1 jtang None 33 Feb 29 11:21 test.pl

Change Protections Only the owner of a file can change its protections To change the protections on a file use the chmod (change mode) command. [Beware, this is a confusing command.] Taken all together, it looks like this: > chmod 644 data.txt This will set the owner to have read, write; add the permission for the group and the world to read 600, 755, 700,

Commands for Files Files are used to store information, for example, data or the results of some analysis. You will mostly deal with text files Files on the RCR Alpha are automatically backed up to tape every night. cat dumps the entire contents of a file onto the screen. For a long file this can be annoying, but it can also be helpful if you want to copy and paste (use the buffer of your telnet program)

FTP/SCP is Simple File Transfer Protocol is standard for all computers on any network. The best way to move lots of data to and from remote machines: put raw data onto the server for analysis get results back to the desktop for use in papers and grants Graphical FTP applications for desktop PCs On a Mac, use Fetch, CyberDuck (!) On a Windows PC, use WS_FTP, FileZilla winscp

Some More Advanced Linux Commands grep: searches a file for a specific text pattern cut: copies one or more columns from a tab-delimited text file wc: word count | : the pipe — sends output of one command as input to the next > : redirect output to a file

Perl

Why Write Programs? Automate computer work that you do by hand - save time & reduce errors Run the same analysis on lots of similar data files = scale-up Analyze data, make decisions sort Blast results by e-value &/or species of best mach Build a pipeline Create new analysis methods

Why Perl? Fairly easy to learn the basics Many powerful functions for working with text: search & extract, modify, combine Can control other programs Free and available for all operating systems Most popular language in bioinformatics Many pre-built “modules” are available that do useful things

Get Perl You can install Perl on any type of computer Download and install Perl on your own computer:

Programming Concepts Program = a text file that contains instructions for the computer to follow Programming Language = a set of commands that the computer understands (via a “command interpreter”) Input = data that is given to the program Output = something that is produced by the program

Programming Write the program (with a text editor) Run the program Look at the output Correct the errors (debugging) Repeat (computers are VERY dumb -they do exactly what you tell them to do, so be careful what you ask for…)

Basic Concepts Variables and Assignment Conditions Loop Input/Output (I/O) Procedures/functions

Strings Text is handled in Perl as a string This basically means that you have to put quotes around any piece of text that is not an actual Perl instruction. Perl has two kinds of quotes - single ‘ and double “ (they are different- single quote will print as is)

Print Perl uses the term “print” to create output Without a print statement, you won’t know what your program has done You need to tell Perl to put a carriage return at the end of a printed line Use the “\n” (newline) command Include the quotes The “\” character is called an escape - Perl uses it a lot

Your First Perl Program Open a new text file >nano prog1.pl Type: #!/usr/bin/perl #my first perl program print "Hello world\n";

Program details Perl programs always start with the line: #!/usr/bin/perl this tells the computer that this is a Perl program and where to get the Perl interpreter All other lines that start with # are considered comments, and are ignored by Perl Lines that are Perl commands end with a ;

Run your Perl program >perl prog1.pl [#use the perl interpreter to run your script] >chmod 755 *.pl [#make the file executable] >./prog1.pl [run it]

#!/usr/bin/perl $DNA = 'ACGT'; # Next, we print the DNA onto the screen print $DNA, "\n"; print '$DNA\n'; print "$DNA\n"; exit;

Numbers and Functions Perl handles numbers in most common formats: E-26 Mathematical functions work pretty much as you would expect: 4+7 6* /12 2/(3-5)

Do the Math (your 2nd Perl program) #!/usr/bin/perl print " 4+5\n " ; print 4+5, " \n " ; print " 4+5= ", 4+5, " \n " ; [Note: use commas to separate multiple items in a print statement, whitespace is ignored]

Variables To be useful at all, a program needs to be able to store information from one line to the next Perl stores information in variables A variable name starts with the “$” symbol, and it can store strings or numbers Variables are case sensitive Give them sensible names Use the “=”sign to assign values to variables $one_hundred = 100; $my_sequence = " ttattagcc ";

You can do Math with Variables #!/usr/bin/perl #put some values in variables $sequences_analyzed = 200 ; $new_sequences = 21 ; #now we will do the work $percent_new_sequences =( $new_sequences / $sequences_analyzed) *100 ; print " % of new sequences = ", $percent_new_sequences; % of new sequences =

Strings (text) in variables can be used for some math-like operations Concatenate (join) use the dot. operator $seq1= " ACTG " ; $seq2= " GGCTA " ; $seq3= $seq1. $seq2; print $seq3; ACTGGGCTA String Operations

#!/usr/bin/perl # Storing DNA in a variable, and printing it out # First we store the DNA in a variable called $DNA $DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; # Next, we print the DNA onto the screen print $DNA; # Finally, we'll specifically tell the program to exit. exit;

#!/usr/bin/perl -w $DNA1 = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; $DNA2 = 'ATAGTGCCGTGAGAGTGATGTAGTA'; print "Here are the original two DNA fragments:\n\n"; print $DNA1, "\n"; print $DNA2, "\n\n"; # Using "string interpolation" $DNA3 = "$DNA1$DNA2"; print "Here is the concatenation of the first two fragments (version 1):\n\n"; print "$DNA3\n\n"; # An alternative way using the "dot operator": $DNA3 = $DNA1. $DNA2; print “Here is the concatenation of the first two fragments (version 2):\n\n”; print "$DNA3\n\n"; exit;

#!/usr/bin/perl –w $DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; print "Here is the starting DNA:\n\n"; print "$DNA\n\n"; # Transcribe the DNA to RNA by substituting all T's with U's. $RNA = $DNA; $RNA =~ s/T/U/g; # Print the RNA onto the screen print "Here is the result of transcribing the DNA to RNA:\n\n"; print "$RNA\n"; # Exit the program. exit;

Exercises Create a dir named Exercises in your home dir Create a folder Class1 in your Exercises dir Create three perl programs Prog2: Cancatenate three DNAs Prog3: Convert a DNA to one with lower cases A->a, C->c, G->g, T->t Chmod, Test and Debug

#!/usr/bin/perl -w $DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; print "$DNA\n\n"; $revcom = reverse $DNA; $revcom =~ s/A/T/g; $revcom =~ s/T/A/g; $revcom =~ s/G/C/g; $revcom =~ s/C/G/g; # Print the reverse complement DNA onto the screen print "Here is the reverse complement DNA:\n\n"; print "$revcom\n";

#!/usr/bin/perl -w $DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; print "$DNA\n\n"; $revcom = reverse $DNA; # See the text for a discussion of tr/// $revcom =~ tr/ACGTacgt/TGCAtgca/; # Print the reverse complement DNA onto the screen print "Here is the reverse complement DNA:\n\n"; print "$revcom\n"; exit;

Exercise Change your previous program so that it can convert to lowercases easier

More In Exercise, create a dir named Class2 Using nano, create a file named NM_021964fragment.pep Put some amino acid sequence into it Save and quit

#!/usr/bin/perl -w # The filename of the file containing the protein sequence data $proteinfilename = 'NM_021964fragment.pep'; # First we have to "open" the file open(PROTEINFILE, $proteinfilename); $protein = ; # Now that we've got our data, we can close the file. close PROTEINFILE; # Print the protein onto the screen print "Here is the protein:\n\n"; print $protein; exit;

More Using nano, add two more lines to NM_021964fragment.pep Save and quit

#!/usr/bin/perl -w $proteinfilename = 'NM_021964fragment.pep'; open(PROTEINFILE, $proteinfilename); # First line $protein = ; print “\nHere is the first line of the protein file:\n\n”; print $protein; # Second line $protein = ; print “\nHere is the second line of the protein file:\n\n”; print $protein; # Third line $protein = ; print “\nHere is the third line of the protein file:\n\n”; print $protein; close PROTEINFILE; exit;

Exercise Create a file named dna.fasta Add two lines to this file: >DNA1 ATGCGGGATGGAGCGCGC Write a program, open it, print the DNA name and the sequence How to avoid the print of “>”?

#!/usr/bin/perl -w # The filename of the file containing the protein sequence data $proteinfilename = 'NM_021964fragment.pep'; # First we have to "open" the file open(PROTEINFILE, $proteinfilename); # Read the protein sequence data from the file, and store it # into the = ; # Print the protein onto the screen # Close the file. close PROTEINFILE; exit;

#!/usr/bin/perl -w # "scalar context" and "list = ('A', 'C', 'G', 'T'); print $a print $a, "\n"; ($a) print $a, "\n"; exit;

#!/usr/bin/perl -w # array = ('A', 'C', 'G', 'T'); print print $bases[0], "\n"; print $bases[1], "\n"; print $bases[2], "\n"; print $bases[3], "\n"; exit;

#!/usr/bin/perl -w # array = ("Quarter","Dime","Nickel"); print $coins; print $coins[0], "\n"; exit;

#!/usr/bin/perl -w # array = qw(Quarter Dime Nickel); print $coins[0], "\n"; exit;

#!/usr/bin/perl -w # array = qw(Quarter Dime Nickel); $x = print $x; print join(' exit;

#!/usr/bin/perl -w # array indexing $coins = "Quarter Dime = split(' ', $coins); print $y[0], = split(',', $coins); print $y[0]; exit;

String functions Chomp Length of a string Substring

#!/usr/bin/perl -w $proteinfilename = 'NM_021964fragment.pep'; open(PROTEINFILE, $proteinfilename); $protein = ; close PROTEINFILE; $len = length $protein; print $len, ""; exit;

#!/usr/bin/perl -w $proteinfilename = 'NM_021964fragment.pep'; open(PROTEINFILE, $proteinfilename); $protein = ; close PROTEINFILE; chomp $protein; $len = length $protein; print $len, ""; exit;

#!/usr/bin/perl -w $proteinfilename = 'NM_021964fragment.pep'; open(PROTEINFILE, $proteinfilename); $protein = ; close PROTEINFILE; chomp $protein; $st1 = substr($protein, 0, 2); print $st1, ""; exit; #or substr $protein, 0, 2;

#!/usr/bin/perl -w $proteinfilename = 'NM_021964fragment.pep'; open(PROTEINFILE, $proteinfilename); $protein = ; close PROTEINFILE; chomp $protein; $st1 = substr($protein, 3); print $st1, ""; exit; #or substr $protein, 0, 2;

Exercise Create a DNA fasta file with one > and three lines of sequence data Show those lines onto the screen Show the number of characters in the sequence How can we show them into one line? Play with substr method Can we tell how many A in the sequence?

#!/usr/bin/perl -w $proteinfilename = 'NM_021964fragment.pep'; unless ( open(PROTEINFILE, $proteinfilename) ) { print "Could not open file $proteinfilename!\n"; exit; } while( $protein = ) { print " ###### Here is the next line of the file:\n"; print $protein; } # Close the file. close PROTEINFILE; exit;

Bigger Exercise Create a DNA fasta file with one > and several lines of sequence data Show those lines onto the screen Show the number of characters in the sequence How can we show them into one line?

Comparison String comparison (are they the same, > or <) eq (equal ) ne (not equal ) ge (greater or equal ) gt (greater than ) lt (less than ) le (less or equal )

Conditions if () {} elsif() {} else {}

#!/usr/bin/perl –w $word = 'MNIDDKL'; if($word eq 'QSTVSGE') { print "QSTVSGE\n"; } elsif($word eq 'MRQQDMISHDEL') { print "MRQQDMISHDEL\n"; } elsif ( $word eq 'MNIDDKL' ) { print "MNIDDKL-the magic word!\n"; } else { print "Is \”$word\“ a peptide?\n"; } exit;