Bioinformatics 生物信息学理论和实践 唐继军 13928761660.

Slides:



Advertisements
Similar presentations
Learning Unix/Linux Bioinformatics Orientation 2008 Eric Bishop.
Advertisements

Biological Databases. Biologists Collect Lots of Data Hundreds of thousands of species Millions of articles in scientific journals Genetic information:
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
Working Environment - - Linux - -.
T UTORIAL OF U NIX C OMMAND & SHELL SCRIPT S 5027 Professor: Dr. Shu-Ching Chen TA: Samira Pouyanfar Spring 2015.
L INUX C OMMAND L INE I NTERFACE G UNAANBAN.G
CS 141 Labs are mandatory. Attendance will be taken in each lab. Make account on moodle. Projects will be submitted via moodle.
MCB Lecture #3 Sept 2/14 Intro to UNIX terminal.
COMP1070/2002/lec4/H.Melikian COMP1070 Lecture #5  Files and directories in UNIX  Various types of files  File attributes  Notion of pathname  Commands.
Command line tools Manfred G. Grabherr. Overview -How do web-based tools work? -What is source code? -How to run things locally? -What is UNIX/Linux?
CSC – 332 Data Structures Day 2: Unix and More! Dr. Curry Guinn.
1 THE UNIX FILE SYSTEM By Chokechai Chuensukanant ID COSC 513 Operating System.
Help session: Unix basics Keith 9/9/2011. Login in Unix lab  User name: ug0xx Password: ece321 (initial)  The password will not be displayed on the.
Unix Primer. Unix Shell The shell is a command programming language that provides an interface to the UNIX operating system. The shell is a “regular”
A Short Introduction to Unix. Bioinformatics Requires Powerful Computers One definition of bioinformatics is "the use of computers to analyze biological.
Introducing UNIX EMBnet slide 1 Introducing the UNIX Operating System.
CHAPTER 1 UNIX FOR NONPROGRAMMERS By U ğ ur Halıcı.
Linux Directory Navigation. File & Directory Commands This is a minimal list of Unix commands that you must know for file management: ls (list) mkdir.
Linux environment ● Graphical interface – X-window + window manager ● Text interface – terminal + shell.
1 Lecture 2 Working with Files and Directories COP 3344 Introduction to UNIX.
Essential Unix at ACEnet Joey Bernard, Computational Research Consultant.
Unix Basics Chapter 4.
Dedan Githae, BecA-ILRI Hub Introduction to Linux / UNIX OS MARI eBioKit Workshop; Nov , 2014.
AN INTRO TO UNIX/LINUX COMMANDS BY: JIAYANG WANG.
PROGRAMMING PROJECT POLICIES AND UNIX INTRO Sal LaMarca CSCI 1302, Fall 2009.
System Administration Introduction to Unix Session 2 – Fri 02 Nov 2007 Reference:  chapter 1, The Unix Programming Environment, Kernighan & Pike, ISBN.
1 Operating Systems and Using Linux Topics What is an Operating System? Linux Overview Frequently Used Linux Commands Some content in this lecture added.
1 Operating Systems and Using Linux Topics What is an Operating System? Linux Overview Frequently Used Linux Commands Reading None.
Linux file system "On a UNIX system, everything is a file; if something is not a file, it is a process." Sorts of files (on a Linux system) Directories:
Second edition Your UNIX: The Ultimate Guide Das © 2006 The McGraw-Hill Companies, Inc. All rights reserved. UNIX Commands cal – will print a calendar.
BIF713 Basic Unix/Linux Commands Getting Help with Commands.
Bioinformatics 生物信息学理论和实践 唐继军
A Genomics View of Unix. General Unix Tips To use the command line start X11 and type commands into the “xterm” window A few things about unix commands:
E X C E E D I N G E X P E C T A T I O N S Basic LINUX Linux System Administration Dr. Hoganson Kennesaw State University Operating Systems Directory structure:
Linux+ Guide to Linux Certification, Third Edition
Introduction to Programming Using C An Introduction to Operating Systems.
Λειτουργικά Συστήματα – Lab2 Γιάννης Πετράκης. Directory Navigation and Control  The Unix file system is set up like a tree branching out from the root.
Linux Commands C151 Multi-User Operating Systems.
2 Manual & Filestore Mauro Jaskelioff. Introduction Using the manual The UNIX filestore File permissions.
1 Lecture 2 Working with Files and Directories COP 3353 Introduction to UNIX.
AN INTRO TO UNIX/LINUX COMMANDS BY: JIAYANG WANG.
1 Introduction to Unix. 2 What is UNIX?  UNIX is an Operating System (OS).  An operating system is a control program that helps the user communicate.
A Brief Overview of Unix Brandon Bohrer. Topics What is Unix? – Quick introduction Documentation – Where to get it, how to use it Text Editors – Know.
1 CS3695 – Network Vulnerability Assessment & Risk Mitigation – Introduction to Unix & Linux.
CSC – 332 Data Structures Unix and Vi Dr. Curry Guinn.
File Management commands cat Cat command cat cal.txt cat command displays the contents of a file here cal.txt on screen (or standard out).
Linux Tutorial Lesson Two *Getting Help in Linux *Data movement and manipulation *Relative and Absolute path *Processes Note: see chapter 1,2,3 from Linux.
CS 120 Extra: The CS1 Server Tarik Booker CS 120.
Learning Unix/Linux Based on slides from: Eric Bishop.
Review Why do we use protection levels? Why do we use constructors?
UNIX Basics Matt Hayward October 18, 2016 LS560 – Information Technology for information professionals.
Tutorial of Unix Command & shell scriptS 5027
Introducing the UNIX Operating System.
Chapter 11 Command-Line Master Class
Commands Basic syntax of shell commands UNIX or shell commands have a basic structure command -options target command comes first (such as cd or ls) any.
Web Programming Essentials:
Andy Wang Object Oriented Programming in C++ COP 3330
Some Linux Commands.
C151 Multi-User Operating Systems
Introduction to Linux Dr Karina Kubiak - Ossowska
Tutorial of Unix Command & shell scriptS 5027
Tutorial of Unix Commands
Tutorial of Unix Command & shell scriptS 5027
Operating Systems and Using Linux
Operating Systems and Using Linux
Web Programming Essentials:
Tutorial of Unix Command & shell scriptS 5027
Introduction Paul Flynn
UNIX/LINUX Commands Using BASH Copyright © 2017 – Curt Hill.
Command line.
Presentation transcript:

Bioinformatics 生物信息学理论和实践 唐继军

作业 GTTGCAGCAATGGTAGACTCAACGGTAGCAAT AACTGCAGGACCTAGAGGAAAAACAGTAGGG ATTAATAAGCCCTATGGAGCACCAGAAATTAC AAAAGATGGTTATAAGGTGATGAAGGGTATC AAGCCTGAA 为什么用缺省 blast 出不来结果?需要如何 选择? 相关物种的最新 pubmed 文章有哪些?

DNA Sequencing capability has grown exponentially Doubling time = 18 months DNA sequences in GenBank

BLAST Algorithm

Sample Multiple Alignment

Bioinformatics Paradigm Find the data Download the data Reformat the data Collect the samples Run molecular analysis Filter the data Run analysis software Collect and sort results Publish / Data sharing

Multi-Sequence FASTA file >FBpp type=protein; loc=X:complement( , ); ID=FBpp ; name=CG12507-PA; parent=FBgn ,FBtr ; dbxref=FlyBase:FBpp ,FlyBase_Annotation_IDs:CG12507 PA,GB_protein:AAF ,GB_protein:AAF48569; MD5=123b97d79d04a06c66e12fa665e6d801; release=r5.1; species=Dmel; length=294; MRCLMPLLLANCIAANPSFEDPDRSLDMEAKDSSVVDTMGMGMGVLDPTQ PKQMNYQKPPLGYKDYDYYLGSRRMADPYGADNDLSASSAIKIHGEGNLA SLNRPVSGVAHKPLPWYGDYSGKLLASAPPMYPSRSYDPYIRRYDRYDEQ YHRNYPQYFEDMYMHRQRFDPYDSYSPRIPQYPEPYVMYPDRYPDAPPLR DYPKLRRGYIGEPMAPIDSYSSSKYVSSKQSDLSFPVRNERIVYYAHLPE IVRTPYDSGSPEDRNSAPYKLNKKKIKNIQRPLANNSTTYKMTL >FBpp type=protein; loc=3R:complement( , ); ID=FBpp ; name=mRpS21-PA; parent=FBgn ,FBtr ; dbxref=FlyBase:FBpp ,FlyBase_Annotation_IDs:CG32854-PA,GB_protein:AAN ,GB_protein:AAN13563; MD5=dcf91821f75ffab320491d124a0d816c; release=r5.1; species=Dmel; length=87; MRHVQFLARTVLVQNNNVEEACRLLNRVLGKEELLDQFRRTRFYEKPYQV RRRINFEKCKAIYNEDMNRKIQFVLRKNRAEPFPGCS >FBpp type=protein; loc=2R:complement( , , , ); ID=FBpp ; name=CG33919-PA; parent=FBgn ,FBtr ; dbxref=FlyBase:FBpp ,FlyBase_Annotation_IDs:CG PA,GB_protein:AAZ ,GB_protein:AAZ52801; MD5=c91d880b654cd612d f95038c5; release=r5.1; species=Dmel; length=191; MKLVLVVLLGCCFIGQLTNTQLVYKLKKIECLVNRTRVSNVSCHVKAINW NLAVVNMDCFMIVPLHNPIIRMQVFTKDYSNQYKPFLVDVKIRICEVIER RNFIPYGVIMWKLFKRYTNVNHSCPFSGHLIARDGFLDTSLLPPFPQGFY QVSLVVTDTNSTSTDYVGTMKFFLQAMEHIKSKKTHNLVHN >FBpp type=protein; loc=X:join( , , , ); ID=FBpp ; name=cv-PA; parent=FBgn ,FBtr ; dbxref=FlyBase:FBpp ,FlyBase_Annotation_IDs:CG12410-PA,GB_protein:AAF ,GB_protein:AAF46063; MD5=0626ee34a518f248bbdda11a211f9b14; release=r5.1; species=Dmel; length=257; MEIWRSLTVGTIVLLAIVCFYGTVESCNEVVCASIVSKCMLTQSCKCELK NCSCCKECLKCLGKNYEECCSCVELCPKPNDTRNSLSKKSHVEDFDGVPE LFNAVATPDEGDSFGYNWNVFTFQVDFDKYLKGPKLEKDGHYFLRTNDKN LDEAIQERDNIVTVNCTVIYLDQCVSWNKCRTSCQTTGASSTRWFHDGCC ECVGSTCINYGVNESRCRKCPESKGELGDELDDPMEEEMQDFGESMGPFD GPVNNNY …

Fields

ENTREZ is the GenBank web query tool

Advanced query interface:

Expasy.org

Other Important Databases Genomes Proteins Biochemical & Regulatory Pathways Gene Expression Genetic Variation (mutants, SNPs) Protein-Protein Interactions Gene Ontology (Biological Function)

UCSC Genome Browser Search by gene name: or by sequence:

Lots of additional data can be added as optional "tracks" - anything that can be mapped to locations on the genome

ensembl.org

KEGG: Kyoto Encylopedia of Genes and Genomes Enzymatic and regulatory pathways Mapped out by EC number and cross- referenced to genes in all known organisms (wherever sequence information exits) Parallel maps of regulatory pathways

Genome Ontology Genetics is a messy science Scientists have been working in isolation on individual species for many years - naming genes, mutants, odd phenotypes “sonic hedgehog” Now that we have complete genome sequences, how to reconcile the names across all species? Genome Ontology uses a single 3 part system Molecular function (specific tasks) Biological process (broad biologial goals - e.g cell division) Cellular component (location)

Unix/Linux

Filename Extensions Most Linux filenames start with a lower case letter and end with a dot followed by one, two, or three letters: myfile.txt However, this is just a common convention and is not required. It is also possible to have additional dots in the filename. The part of the name following the dot is called the “extension.” The extension is often used to designate the type of file.

Some Common Extensions By convention: files that end in.txt are text files files that end in.c are source code in the "C” language files that end in.html are HTML files for the Web Compressed files have the.zip or.gz extension Linux does not require these extensions (unlike Windows), but it is a sensible idea and one that you should follow

Working with Directories Directories are a means of organizing your files on a Linux computer. They are equivalent to folders on Windows and Macintosh computers Directories contain files, executable programs, and sub-directories Understanding how to use directories is crucial to manipulating your files on a Linux system.

Your Home Directory When you login to the server, you always start in your Home directory. Create sub-directories to store specific projects or groups of information, just as you would place folders in a filing cabinet. Do not accumulate thousands of files with cryptic names in your Home directory

File & Directory Commands This is a minimal list of Linux commands that you must know for file management: All of these commands can be modified with many options. Learn to use Linux ‘man’ pages for more information. ls (list)mkdir (make directory) cd (change directory)pwd (present directory) cp (copy) rm (remove) mv (move)more (view by page) cat (view entire)man (help)

Navigation pwd (present working directory) shows the name and location of the directory where you are currently working: > pwd /home/jtang This is a “pathname,” the slashes indicate sub-directories The initial slash is the “root” of the whole filesytem ls (list) gives you a list of the files in the current directory: > ls assembin4.fasta Misc test2.txt bin temp testfile Use the ls -l (long) option to get more information about each file > ls -l total 1768 drwxr-x--- 2 browns02 users 8192 Aug 28 18:26 Opioid -rw-r browns02 users 6205 May af gb_in2 -rw-r browns02 users May af fasta

Sub-directories cd (change directory) moves you to another directory >cd Misc > pwd /u/browns02/Misc mkdir (make directory) creates a new sub-directory inside of the current directory > ls assembler phrap space > mkdir subdir > ls assembler phrap space subdir rmdir (remove directory) deletes a sub- directory, but the sub-directory must be empty > rmdir subdir > ls assembler phrap space

Shortcuts There are some important shortcuts in Linux for specifying directories. (dot) means "the current directory".. means "the parent directory" - the directory one level above the current directory, so cd.. will move you up one level ~ (tilde) means your Home directory, so cd ~ will move you back to your Home. Just typing a plain cd will also bring you back to your home directory

Create new files pico nano vi/vim emacs

Programming perl python c/c++ R Java

Linux File Protections File protection (also known as permissions) enables the user to set up a file so that only specific people can read (r), write/delete (w), and execute (x) it. Write and delete privilege are the same on a Linux system since write privilege allows someone to overwrite a file with a different one.

File Owners and Groups Linux file permissions are defined according to ownership. The person who creates a file is its owner. You are the owner of files in your Home directory and all its sub-directories In addition, there is a concept known as a Group. Members of a group have privileges to see each other's files. We create groups as the members of a single lab - the students, technicians, postdocs, visitors, etc. who work for a given PI.

View File Permissions Use the ls -l command to see the permissions for all files in a directory: The username of the owner is shown in the third column. (The owner of the files listed above is jtang) The owner belongs to the group “None” The access rights for these files is shown in the first column. This column consists of 10 characters known as the attributes of the file: r, w, x, and - r indicates read permission w indicates write (and delete) permission x indicates execute (run) permission - indicates no permission for that operation $ ls -l total 2 -rw-r--r-- 1 jtang None 56 Feb 29 11:21 data.txt -rwxr-xr-x 1 jtang None 33 Feb 29 11:21 test.pl

The first character in the attribute string indicates if a file is a directory (d) or a regular file (-). The next 3 characters (rwx) give the file permissions for the owner of the file. The middle 3 characters give the permissions for other members of the owner's group. The last 3 characters give the permissions for everyone else (others) The default protections assigned to new files on our system is: -rw-r----- (owner=read and write, group =read, others=nothing) $ ls -l total 2 -rw-r--r-- 1 jtang None 56 Feb 29 11:21 data.txt -rwxr-xr-x 1 jtang None 33 Feb 29 11:21 test.pl

Change Protections Only the owner of a file can change its protections To change the protections on a file use the chmod (change mode) command. [Beware, this is a confusing command.] Taken all together, it looks like this: > chmod 644 data.txt This will set the owner to have read, write; add the permission for the group and the world to read 600, 755, 700,

Commands for Files Files are used to store information, for example, data or the results of some analysis. You will mostly deal with text files Files on the RCR Alpha are automatically backed up to tape every night. cat dumps the entire contents of a file onto the screen. For a long file this can be annoying, but it can also be helpful if you want to copy and paste (use the buffer of your telnet program)

more Use the command more to view at the contents of a file one screen at a time: > more t27054_cel.pep !!AA_SEQUENCE 1.0 P1;T hypothetical protein Y49E Caenorhabditis elegans Length: 534 May 30, :49 Type: P Check: MLKKAPCLFG SAIILGLLLA AAGVLLLIGI PIDRIVNRQV IDQDFLGYTR 51 DENGTEVPNA MTKSWLKPLY AMQLNIWMFN VTNVDGILKR HEKPNLHEIG 101 PFVFDEVQEK VYHRFADNDT RVFYKNQKLY HFNKNASCPT CHLDMKVTIP t27054_cel.pep (87%) Hit the spacebar to page down through the file Ctrl-U moves back up a page At the bottom of the screen, more shows how much of the file has been displayed Similar command: less

Copy & Move cp lets you copy a file from any directory to any other directory, or create a copy of a file with a new name in one directory cp filename.ext newfilename.ext cp filename.ext subdir/newname.ext cp /u/jdoe01/filename.ext./subdir/newfilename.ext mv allows you to move files to other directories, but it is also used to rename files. Filename and directory syntax for mv is exactly the same as for the cp command. mv filename.ext subdir/newfilename.ext NOTE: When you use mv to move a file into another directory, the current file is deleted.

Delete Use the command rm (remove) to delete files There is no way to undo this command!!! We have set the server to ask if you really want to remove each file before it is deleted. You must answer “Y” or else the file is not deleted. But can use –f rm –rf

Moving Files between Computers You will often need to move files between computers - desktop to server and back There are several options Sneaker net (floppy, zip, writeable CD) (not an option for the mendel machine) Network filesharing FTP scp

FTP/SCP is Simple File Transfer Protocol is standard for all computers on any network. The best way to move lots of data to and from remote machines: put raw data onto the server for analysis get results back to the desktop for use in papers and grants Graphical FTP applications for desktop PCs On a Mac, use Fetch, CyberDuck (!) On a Windows PC, use WS_FTP, FileZilla winscp

FTP Login When you open an FTP program, you connect to sanger just as you would with a terminal We now use sFTP (secure FTP) Your username and password are the same.

You will automatically end up in your home directory. Put files from you PC to the server, Get files from the server to your desktop machine.

Some More Advanced Linux Commands grep: searches a file for a specific text pattern cut: copies one or more columns from a tab-delimited text file wc: word count | : the pipe — sends output of one command as input to the next > : redirect output to a file sed : stream editor – change text inside a file