Presentation is loading. Please wait.

Presentation is loading. Please wait.

Matlab Bioinformatics Toolkit Evaluation Kanishka Bhutani.

Similar presentations


Presentation on theme: "Matlab Bioinformatics Toolkit Evaluation Kanishka Bhutani."— Presentation transcript:

1 Matlab Bioinformatics Toolkit Evaluation Kanishka Bhutani

2 What I expected ??  Local/Global sequence alignments.  Multiple sequence alignments.  Choice of different scoring matrices (BLOSUM, PAM) for evaluation.  Build Hidden Markov Models.  Easily import sequences from databases (PFAM,PDB, Swissprot)

3 What I found ??  Most of the features.  “Bonus” = Microarray normalization tools. Microarray normalization tools. Microarray Visualization tools including box plots, heat maps. Microarray Visualization tools including box plots, heat maps.

4 Any surprises ?  No “Multiple sequence alignments”  Avg./Std Dev. of hydrophobicity, solvent accessibility : Command ?  “Proteinplot”- GUI for protein structure analysis.  Import your file to view, select parameters and display stats.

5 What all I tried?  Local alignment, Global alignment.  For short sequences: swalign(‘seq1’,’seq2’) swalign(‘seq1’,’seq2’) nwalign(‘seq1’,’seq2’) nwalign(‘seq1’,’seq2’) seq1,seq2: AA or NT sequences.  For ‘imported’ long sequences: Convert seq into a vector of integer values Commands: nt2int, aa2int

6 Pairwise Sequence alignment  S = getgenbank(‘NM_00001’)  M= getgenbank(‘NM_00002’)  Output : Header and a sequence.  K=nt2int(S.Sequence) B=nt2int(M.Sequence) B=nt2int(M.Sequence) [sc,align] = nwalign [K,B] Alignment Score Aligned seq.

7 Getting sequences : V Easy !  ‘getgenbank’: Retrieve sequence information from Genbank database.  ‘getembl’: Retrieve seq. information from EMBL database.  ‘getpept’: Retrieve seq information from Genpept database.  ‘gethmmprof’: Get HMM from the PFAM database.

8 Experiment  hmmodel = gethmmprof(‘PF00001’)

9 Visualization of model Showhmmprof (hmmodel,’scale’,’logodds’)

10 Get GPCR seq’s  S = getgenbank (‘NM_024531’)  disp (S.Sequence)

11 Alignment of the seq’s  var = gethmmalignment (‘PF00001,’type’,’seed’)  disp [char(var.Header) char (var.Sequence)]

12 For GPCR Family C  Similarly for diff families.  Multiple aligned sequences retrieved.

13 GUI proteinplot  User friendly.  Avg./ Std. dev values for: Hydrophobicity. Hydrophobicity. Secondary structure propensity (Alpha helices or beta strands) Secondary structure propensity (Alpha helices or beta strands) Accessibility (accessible and buried residues) Accessibility (accessible and buried residues)

14 Mglur1 plot (Proteinplot)

15 Mglur1 results Parameter Average (%) Std. Dev.(%) Accessible residues 5.041.25 Buried residues 8.221.816 Alpha helix 0.890.1565 Beta sheet 0.970.1038 Hydrophobicity3.010.9608

16 Test a seq. with HMM  Retrieve mglur1 from Genbank mgr = getgenbank (‘NM_012407’) mgr = getgenbank (‘NM_012407’) glusequence = mgr.sequence glusequence = mgr.sequence  Test it with the HMM model class A [a.sglu] = hmmprofalign (model A, glusequence,’showscore’,true) [a.sglu] = hmmprofalign (model A, glusequence,’showscore’,true)  Score = -203.53  Seq =

17 Log odd score plot for best path

18 Difficulties & questions  No multiple sequence alignment.  Demos: Not very helpful.  Difficult to view the sequences as no “disp” command found.  Bugs: Storing huge sequences (GPCR A) in a file, parsing error. Storing huge sequences (GPCR A) in a file, parsing error. HMMprofdemo command abruptly stops and gives errors. HMMprofdemo command abruptly stops and gives errors.  Proteinplot (GUI) hangs the machine often.  Verify the sequences using the HMM models ??  Regular expression matches and highlighting those positions??

19 Suggestions of experiment  Given an unknown sample dataset of proteins, known dataset of proteins (known structural information).  Utilize the BLMT to extract ‘over expressed’ 4 Grams in a protein sequence or a group of protein sequences from the known set.  Use “search for regular expression” function in the Matlab toolkit to look for those ‘4 Grams’ in unknown proteins and hence predict their structure.


Download ppt "Matlab Bioinformatics Toolkit Evaluation Kanishka Bhutani."

Similar presentations


Ads by Google