Presentation is loading. Please wait.

Presentation is loading. Please wait.

EMBOSS "The European Molecular Biology Open Software Suite "

Similar presentations


Presentation on theme: "EMBOSS "The European Molecular Biology Open Software Suite ""— Presentation transcript:

1 EMBOSS "The European Molecular Biology Open Software Suite "

2 EMBOSS Open Source software Over 150 individual programs –Sequence alignment –Rapid database searching –Protein motif identification –Nucleotide sequence pattern analysis –Codon usage analysis –Identification of sequence patterns –An much more…

3 EMBOSS was initiated as an european project when GCG (american analysis package) became commercial. They both provide roughly the same services: http://helix.nih.gov/apps/bioinfo/emboss- gcg.html http://helix.nih.gov/apps/bioinfo/emboss- gcg.html

4 Advantages It is free It runs practically on every UNIX based system (Linux and MacOSX. At the CSC netsite you can also use a windows version) Free of arbitrary size limits Can be used from most of the programming environments Programs of EMBOSS package can be combined and piped together in countless ways Extremely stable Most useful in UNIX command prompt enviroment but there is GUIs available

5 Programs are grouped Alignment Display Edit Enzyme kinetics Feature tables Information Nucleic Phylogeny Protein Utils EMBOSS website has comprehensive list of programs Another list of EMBOSS programs can be found from http://www.csc.fi/english/r esearch/sciences/bioscien ce/programs/emboss/inde x_html http://emboss.sourceforge.net/docs/emboss_tutorial/emboss_tutorial.html

6 EMBOSS command syntax Follows normal UNIX syntax Uniform Sequence Addresses –(=> USA syntax…nothing to do with the USA ;) Sequence format –Multiple formats supported Alignment formats Feature formats Report formats

7 USA syntax ”format::file” ”format::file:entry” ”dbname:entry” ”@listfile” (a file of file-names)

8 Sequence Formats I There are at least couple of dozens different formats ”Nearly every collection of sequences that call itself a database has stored its data in its own format” Ids and Accessions –Most databases has both –ID was originally intended to be human-readable…not working since there is far too many sequences to be named by humans –Accession numbers are unique identificators more for computer (=automated) use

9 Sequence Formats II Annotation and Features –Every format have some line or field for holding annotation about sequence in question The Sequence –Sequences are usually held in the IUPAC standards one-letter codes Sequence Database Formats –EMBL –GenBank –SwissProt –PIR Formats supported by EMBOSS can be seen from http://emboss.sourceforge.net/docs/themes/SequenceFo rmats.html


Download ppt "EMBOSS "The European Molecular Biology Open Software Suite ""

Similar presentations


Ads by Google