Presentation is loading. Please wait.

Presentation is loading. Please wait.

GCG vs EMBOSS Gary Williams. Which is better GCG or EMBOSS? n You must decide for yourselves n You may find other packages that do what you want n Use.

Similar presentations


Presentation on theme: "GCG vs EMBOSS Gary Williams. Which is better GCG or EMBOSS? n You must decide for yourselves n You may find other packages that do what you want n Use."— Presentation transcript:

1 GCG vs EMBOSS Gary Williams

2 Which is better GCG or EMBOSS? n You must decide for yourselves n You may find other packages that do what you want n Use the tools that do the job n This is a comparison of GCG and EMBOSS to help you decide

3 Interfaces n Web u W2H available for both u EMBOSS W2H still has rough edges u PISE u Others under development n X-Windows u GCG - Seqlab u EMBOSS - SPIN, (+ others coming) n Telnet/xterm/Character-based u emnu

4 Command line is very similar n The UNIX command line interfaces of GCG and EMBOSS are very similar. n You type the name of the program n You can add any options you want to the command-line n Press the RETURN key n Any mandatory information that was not on the command-line will be prompted for.

5 GCG command-line % name -other=thing This is the name program that reads a sequence and writes out something. NAME what sequence ? embl:hsfau1 Begin (* 1 *) ? End (* 2016 *) ? Reverse (* No *) ? What should I call the output (* hsfau.name *) ?

6 EMBOSS command-line % name -other thing Reads in sequences and writes a thing Input sequence(s): embl:hsfau1 Output data [hsfau1.name]: n Use ‘-ask’ to make EMBOSS programs prompt for the start and end of sequences

7 Some common options n Running in scripts, don’t prompt, just fail if command-line is insufficient u GCG: -default u EMBOSS: -auto n Help on options u GCG: -check u EMBOSS: -help or -help -verbose n Boolean options (Yes/No, True/False) u GCG: -thing, -nothing u EMBOSS: -thing, -nothing, -thing=T, -thing=F, -thing=1, -thing=0, -thing=Y, -thing=N

8 Sequence options in EMBOSS "-sequence" related qualifiers -sbegin integer first base used -send integer last base used, def=seq length -sreverse bool reverse (if DNA) -sask bool ask for begin/end/reverse -slower bool make lower case -supper bool make upper case -sformat string input sequence format -ufo string UFO features

9 Sequence options in EMBOSS "-outseq" related qualifiers -osformat string output sequence format -ossingle bool separate file for each entry

10 EMBOSS general options -debug bool write debug output to program.dbg -auto bool turn off prompts -stdout bool write standard output -filter bool read standard input, write standard output -options bool prompt for required and optional values -verbose bool report some/full command line options -help bool report command line options

11 Data files n GCG uses ‘..’ to divide comments from data n EMBOSS does not use ‘..’ n In general, EMBOSS uses ‘#’ to mark a comment line n Use ‘embossdata’ to extract and check on data files. n As in GCG, data files copied into the current or home directory are used in preference to the originals.

12 List files (files of file names) n Similar to GCG lists files, but no ‘..’ n Comment lines start with ‘#’ n Can contain the names of other list files: # This is my list file embl:hsfau embl:ggg* myfile.seq:clone10 file.seq @list2

13 File formats n GCG u only GCG format, MSF and RSF n EMBOSS u many formats u automatically recognised u can specify using ‘::’ or ‘-osf’ u eg: clustal::globin.aln -osf gcg

14 One file, many sequences n GCG u Only one sequence per GCG file n EMBOSS u One or more sequences per file u Default is to write all sequences to one file u -ossingle will change to writing many files u GCG, Staden and plain format files can only hold one sequence per file.

15 Features n GCG u No concept of feature tables n EMBOSS u Many programs now write out results as GFF u Soon, all programs that find things will write the results as GFF u GFF will become another sequence format u Programs to manipulate and display sets of features are planned u c.f. showfeat, coderet, maskfeat, diffseq

16 Databases n EMBOSS is poor at grouping many databases under one name n E.G. Need a way of referring to ‘embl’ and ‘emblnew’ as one database. n This will be done, but currently, a list file containing the following seems best: embl:* emblnew:*

17 Command line wildcards n GCG: u embl:* - no problem n EMBOSS: u embl:* - UNIX complains it can’t find the files u solution is to quote it: u “embl:*” u or: u embl:\*

18 HELP n GCG: u genman, genhelp n EMBOSS u tfm

19 What program does what? n See David Martin’s list of equivalences: http://www.no.embnet.org/Programs/SAL/EMBOSS/fromGCG.php3 n NB this doesn’t list EMBOSS programs with no equivalent in GCG!

20 What EMBOSS does NOT do n The major deficiencies in the EMBOSS package are: n BLAST, FASTA, ASSEMBLY n You should use the publicly available software: u Blast - NCBI, HGMP, many other sites u Fasta - HGMP u Assembly - Staden package

21 What EMBOSS does do n Giving ‘stdout’ as the output file name makes output go to the screen. n Much effort is put into removing arbitrary limits. u E.g. Max. sequence length: 2Gb u Many programs limited only by available memory n Source code available for inspection, change and writing your own programs n EMBOSS is FREE! u GNU Public Licence u Open Source Software

22 THE END


Download ppt "GCG vs EMBOSS Gary Williams. Which is better GCG or EMBOSS? n You must decide for yourselves n You may find other packages that do what you want n Use."

Similar presentations


Ads by Google