Tools and Datasets Exploring the tools of the trade
Sequence Databases ● Understanding EMBL Entries ● Understanding SWISS-PROT Entries
Understanding EMBL Entries
Understanding SWISS-PROT Entries
General Concepts and Methods ● Predictions and Validation
Maxim 17.1 Recognise the difference between the validation of a model and the testing of it for self-consistency
True/False/Negative/Positive
Maxim 17.2 Generally, False Negative predictions are considered more acceptable than False Positives
Assessment/Validation Procedure and Possible Outcomes figOUTCOME.eps
Balancing the errors
Maxim 17.3 With False Negatives we could come back next year and find the ones we missed, and these are preferred to False Positives, where we can waste time studying them this year, only to find out that the time was wasted. It all depends on the circumstances
Maxim 17.4 Sometimes all those false positives are maybe, just maybe, trying to tell you something. So, if you aspire to a Nobel prize...
Using multiple algorithms to improve performance
Maxim 17.5 Use a fast if inaccurate algorithm to protect your slow, accurate second-stage algorithm
An overview of tRNA: 2D, 3D and Gene Structure figTRNA.eps
Introducing Bioinformatics Tools
ftp://ftp.ebi.ac.uk/pub/software ClustalW
ClustalX operating under Windows XP figCLUSTALX.eps
$ gzip -d clustalw1.83.UNIX.tar.gz $ tar -xvf clustalw1.83.UNIX.tar $ cd clustalw1.83 $ make $./clustalw $./clustalw -h $./clustalw -INFILE=../MerAHMAs_MerP.swp -OUTFILE=../Mer.aln Algorithms and Methods
Substitution/scoring matrices
BLAST
Maxim 17.6 Exactly which BLAST is best depends on the circumstances
$ cd $ mkdir blast $ cp blast ia32-linux.tar.gz blast $ cd blast $ gzip -d blast ia32-linux.tar.gz $ tar -xvf blast ia32-linux.tar [NCBI] Data="/home/michael/blast/data" Installing NCBI-BLAST
$ mkdir databases $ cd databases $ mv../All_Mer_Proteins.fsa. $../formatdb -i All_Mer_Proteins.fsa -p T -o T -n Merproteins $ blastall -p blastp -d databases/Merproteins -i test_seq.fsa $ sed 's/sw|/sp|/' All_Mer_Proteins.fsa > Mer_db.prot $../formatdb -i Mer_db.prot -p T -o T -n Merproteins Preparation of database files for faster searching
$ fastacmd -d databases/Merproteins -I $ fastacmd -d databases/Merproteins -s MERA_SHIFL $ blastclust -d databases/Merproteins | head The different types of BLAST search
Where To From Here