Download presentation
Presentation is loading. Please wait.
Published byPeregrine White Modified over 8 years ago
1
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO
2
BACKGROUND Our program uses BLAST to identify similar protein sequences to the query sequence. BLAST stands for Basic Local Alignment Search Tool and is a series of computational algorithms that is able to take a query sequence and search against the NCBI database to find sequences that have a certain level of similarity. Here, we report the top sequence hits and the associated E- value. The E-value decreases exponentially as a hit’s score increases, and it represents the number of hits expected due to chance, given the database size.
3
PROBLEM DESCRIPTION Our program allows the user to input a fastA formatted file containing a protein sequence, and it will conduct a blast search against a local database, outputting the top hits, and allowing the user to use a number of tools to analyze those hits, determining which may be best for future research
4
DEMO The GUI allows the user to select a file from the computer, and then can hit the BLAST button to conduct the search. Other buttons are then available for more information on the returned hits from the BLAST search.
5
FLOWCHART Amino acid fastA sequence input, selected by user Bash script executes to create local database Bash script runs query sequence against the database and stores output in an XML file Python script runs to parse the XML output and create a fastA formatted output file with top hits Matlab BLAST button callback parses the fastA hits output and displays output in table 3D Molecular View Sequence alignment function Enzyme cleave function
6
INPUT DATA The user selects a fastA formatted amino acid sequence file for input to the blast search The blast function creates a local database in reference to already accumulated hemoglobin sequences, also in fastA format The output of the blast search is an xml file, which is inputted into a parsing function to create a.txt file of the top hits, which is used to display the results to the user
7
DATA STRUCTURES The function that parses the.txt output file from the blast function returns the top hits in a cell array The cell array is stored in a uitable so the data can be accessed by other functions Other data, such as molecular weight, sequence length, and alignments, are stored to single variables that can be displayed to the user
8
RESULTS The user is able to access the molecular weight and sequence length of each blast hit The user can align any two sequences from the hit to see the similarity The user can cleave the query sequence with a selected amino acid to see what fragments are produced
9
CHALLENGES The current version of Matlab cannot create local blast databases or blast queries The path to python needs to be specifically set, using setenv, for the bash script to run through Matlab Local database only has hemoglobin sequences Unable to write cell array data to exported file (still unsolved) Were not able to obtain PDB IDs for each blast hit (still unsolved)
10
DIVISION OF WORK Ian– GUI creation, code to get each hit’s molecular weight, sequence length, 3D Molecular Viewer function Nicole- Creating local blast database, running the blast query and parsing the xml output and then fastA output, and alignment function Joseph- Cleave function and biological research
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.