Presentation is loading. Please wait.

Presentation is loading. Please wait.

EMBOSS – an application suite for Bioinformatics  Shahid Manzoor  Adnan Niazi SLU Global Bioinformatics Centre.

Similar presentations


Presentation on theme: "EMBOSS – an application suite for Bioinformatics  Shahid Manzoor  Adnan Niazi SLU Global Bioinformatics Centre."— Presentation transcript:

1 EMBOSS – an application suite for Bioinformatics  Shahid Manzoor  Adnan Niazi SLU Global Bioinformatics Centre

2 E – European M – Molecular B – Biology O – Open S – Software S - Suite SLU Global Bioinformatics Centre

3 All Information  EMBOSS info at http://emboss.sourceforge.net/.http://emboss.sourceforge.net/  wEMBOSS info at http://wemboss.sourceforge.net/.http://wemboss.sourceforge.net/  E-mail martin.norling@slu.se to get a username and password for wEMBOSS at http://ebiokit.hgen.slu.se/.

4 SLU Global Bioinformatics Centre  Open Source molecular biology analysis package.  Handles a variety of common file formats.  Provides libraries for easy development  Software, licensed under GPL and LGPL  Developed by Martin Sarachu and Marc Colet  Available at http://emboss.sourceforge.net What is EMBOSS

5 SLU Global Bioinformatics Centre  A comprehensive set of sequence analysis programs.  All sequence and many alignment and structural formats are Handled.  It runs on practically every UNIX you can think of (and likely some that you can't), plus Windows and OS X.  Each application has the same style of interface so master one and you've mastered them all. Features of EMBOSS

6 SLU Global Bioinformatics Centre  Sequence alignment.  Protein motif identification (including domain analysis)  Nucleotide sequence pattern analysis (for example to identify CpG islands or repeats).  Presentation tools for publications. Uses for EMBOSS

7 SLU Global Bioinformatics Centre  Many small and large programs in package (>140).  All programs share a common look and feel.  Easy to run from command line.  Retrieval of sequence data from the web. Programs in EMBOSS

8 SLU Global Bioinformatics Centre The one Argument  help the –help argument displays a short help for any EMBOSS program.

9 SLU Global Bioinformatics Centre  wossname wossname searches the other programs short description for keywords. The One Command

10 Large collection of gene and protein analysis tools Sequence retrieval Alignments Primer design Restriction Mapping Protein domain searching Translation SLU Global Bioinformatics Centre

11 DNA Sequence 1 DNA Sequence 2 dotplottranslation protein local/global alignment protein Sequence 1 protein Sequence 2 multiple sequence alignment motif and domain searching physico- chemical properties SLU Global Bioinformatics Centre

12 AGTGGTCGTGAAG AGAATGCTCCTCC TTTGGAATCTTAA >SEQ1.fasta AGTGCTCCTCCCT TAGAATCTTAG >SEQ2.fasta Unix% dottup SEQ1.fasta SEQ2.fasta –window 10 & Unix% dotmatcher SEQ1.fasta SEQ2.fasta –window 10 – threshold 17 & For an exact match: For a similarity match: Dotplots SLU Global Bioinformatics Centre

13 A T G C A 5 -4 -4 -4 T -4 5 -4 -4 G –4 -4 5 -4 C -4 -4 -4 5 Identity Matrix Dotplots … SLU Global Bioinformatics Centre Window Size is number of bases in a sliding window that is moved along each sequence and compared to generate a single data point on the plot. Window size must be an odd number. Mismatch Limit determines how similar the two sequences in a window must be to "match". For example, if window size is 9 and mismatch limit is 2, then up to 2 mismatches in a 9 base window will still be classified as a match.

14 A T G C A 5 -4 -4 -4 T -4 5 -4 -4 G –4 -4 5 -4 C -4 -4 -4 5 CCTCCTTTGG Score = 50 5555555555 CCTCCTTTGG CCTCCCTTAG 55-455555 5 Score = 32 ProLeu ProLeu Dotplots … SLU Global Bioinformatics Centre

15 Dotplots  A dot plot is a simple graphical representation of identical residues between two sequences.  The X axis represents the first sequence (PHO5),  The Y axis represents the second sequence (PHO3)  A dot is plotted for each match between two residues of the sequences.  Diagonal lines reveal regions of identity between the two sequences.

16 SLU Global Bioinformatics Centre  The dot plot can be adapted to display only word matches, which correspond to a diagonal of dots in the letter-based dot plot.  Example: alignment of PHO5 and PHO3 coding sequences, with different word sizes. Dotplots …

17 SLU Global Bioinformatics Centre Detecting repeats with a dot plot  Sequence repeats are easily detected in a dot plot when a sequence is compared to itself.  The main diagonal is completely marked (by definition, since the sequence is identical do itself)  Repeats appear as segments of lines parallel to the diagonal.

18 ATGGGTCGTGAAG AGAATGCTCCTCC TTTGGAATCTTAA >SEQ1.fasta ATGGCTCCTCCCT TAGAATCTTAG >SEQ2.fasta Unix% plotorf SEQ1.fasta –stop TAA, TAG –out GA.plot & Unix% getorf SEQ1.fasta –minsize 5 –table 0 –find 1 –out GA.getorf & SLU Global Bioinformatics Centre Plotorf

19 ATGGGTCGTGAAGAGAATGCTCCTCCTTTGGAATCTTAA TACCCAGCACTTCTCTTACGAGGAGGAAACCTTAGAATT Frame -3 Frame -2 Frame -1 Frame 1 Frame 2 Frame 3 Start and stop codons are located according to the instructions to the program, and the area in between start and stop codons SLU Global Bioinformatics Centre

20 Indication of full coding sequence? Alternative splice form? SLU Global Bioinformatics Centre

21 >_1 [17 - 37] MLLLWNL >_2 [1 - 36] MGREENAPPLES* Using getorf: stop codon start methionine SLU Global Bioinformatics Centre

22 Unix% transeq SEQ1.fasta –frame 1 –table 0 –sbegin 4 –send 33 -out GA.fasta & >GA.fasta GREENAPPLES SLU Global Bioinformatics Centre

23 Unix% needle GA.fasta A.fasta –gapopen 10 –gapextend 0.5 –matrix EPAM250 & Unix% water GA.fasta A.fasta –gapopen 10 –gapextend 0.5 –matrix EPAM250 & >GA.fasta GREENAPPLES >A.fasta APPLES For a global alignment: For a local alignment: Alignments SLU Global Bioinformatics Centre

24 Alignments … To align two or more sequences in a biologically significant way. GREENAPPLES APPLES Local (water) Global (needle) Gap penalty = 10; Extension penalty = 0.5 APPLES SLU Global Bioinformatics Centre

25 GREENAPPLES APPLES looks like the “apples” motif may be part of a larger domain APPLES physicochemical properties pattern searching SLU Global Bioinformatics Centre

26 Physico-chemical properties Unix% iep GA.fasta –plot -step 0.5 –out GA.IEP & Unix% pepinfo GA.fasta –hwindow 8 –generalplot –hydropathyplot & Isoelectric point General properties SLU Global Bioinformatics Centre

27 Physico-chemical properties D Y FW H K R E Q N M A G C S P I V L T Aliphatic Aromatic Hydrophobic Tiny Small Charged Positive Polar The pepinfo graph of properties is based on this diagram SLU Global Bioinformatics Centre

28 Physico- chemical properties non-polar region with small residues polar region to one side of non- charged region SLU Global Bioinformatics Centre

29 Pattern searching GREENAPPL---ES -RE-DAPPL---ES GREEN---LEAVES -RE-D---LEAVES GREENAPPLES >GA.fasta GREENLEAVES >GL.fasta REDAPPLES >RA.fasta REDLEAVES >RL.fasta [G] (0,1)-R–[E] (1,2)–[ND]–X (3)–L–X (3) – E – S SLU Global Bioinformatics Centre

30 Pattern searching Unix% fuzzpro sptr:* pattern.fruit –mismatch 0 –out GA.fuzzpro & Search a protein database: [G] (0,1) - [R] – [E] (1,2) – [ND] –x (3) – [L] –x (3) – [E] – [S] pattern.fruit Nothing resembling this pattern is found in the database - But we could try scanning PRINTS (pscan) and PROSTIE (patmatmotifs) with one of our sequences. SLU Global Bioinformatics Centre

31 Some Programs

32 SLU Global Bioinformatics Centre Some Programs …

33 SLU Global Bioinformatics Centre More Information


Download ppt "EMBOSS – an application suite for Bioinformatics  Shahid Manzoor  Adnan Niazi SLU Global Bioinformatics Centre."

Similar presentations


Ads by Google