Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chem 291C Draft Sample Preliminary Seminar

Similar presentations


Presentation on theme: "Chem 291C Draft Sample Preliminary Seminar"— Presentation transcript:

1 Identifying Protein Related Sequences using Computational DNA Hybridization Methods in Whole Genome
Chem 291C Draft Sample Preliminary Seminar Originally Presented: October 23, 2008 Chewei Hsu Research advisor: Dr. Brooke Lustig Committee member: Dr. Elaine Collins, Dr. Roger Terrill

2 What is the principle used in this research?
Introduction What is the principle used in this research? Bioinformatics is a field which involves computer science, statistics, chemistry, biochemistry in order to provide a variety of approaches for the use of biological information. Align query sequence to subject sequence in order to identify similarity. Query A T C - G | | Subject T T G T G

3 What programs are used for sequence alignment ?
There are various programs used for sequence alignment. The most widely used program is Basic Local Alignment Search Tool (BLAST). BLAST is a free online program. Its calculation is based on the Smith-Waterman’s algorithm using pattern-derived scoring. And in BLASTN, the version nucleic acids, it is relatively simplistic. It may be useful to employ hybridization energy rule’s as an alternative approach.

4 What is Smith-Waterman’s model?
It is a recursive mathematical dynamic programming which uses local alignment for sequence analysis.

5 Background Key Definitions
Sequence alignment: it is a process of aligning two or more sequences to achieve maximal possibility of homology. Mismatch: if one base pair does not match between two sequences, that will be a mismatch. Gap: a space is introduced into an alignment to compensate for insertions and deletions in one sequence relative to another. To prevent the accumulation of too many gaps in an alignment, introduction of a gap causes the deduction of a fixed amount (the gap score) from the alignment score.

6 Key Definitions (cont.)
Query sequence: it is a sequence with which people are interested and used to compare with subject sequences for homology. Hybridization: annealing two strands of nucleic acids. Hybridization energy: free energy of such annealing Genome: genome is a set of DNA sequences and genes. It contains the whole hereditary information of an organism. Introns: DNA regions in a gene that are not translated into proteins. These non-coding sections will be removed by a process called splicing.

7 Key Definitions (cont.)
Exon: a nucleic acid sequence that is represented in the mature form of an RNA molecule. The mature RNA molecule can be messenger or non-coding RNA such as rRNA or tRNA. ncRNA: It is any RNA molecule that functions without being translated into proteins.

8 Literature Review Zuker and coworkers (2003, 2004)
Develop a tool for predicting the secondary structure of RNA and DNA by using thermodynamic methods. Can be adapted to do hybridization. Providing a general statistical mechanical approach to describe self-folding with the hybridization between a pair of DNA or RNA molecules. The folding and hybridization models deal with matched pairs, mismatches, symmetric and asymmetric interior loops.

9 Lustig group (2003, 2006) BLAST and its more computationally intensive predecessor Smith-Waterman algorithm use a scoring scheme that is character-based and not intuitive, but works well specifically for sequence alignment (including for nucleic acids). Lustig group’s interest is to see if energy rules can properly be used to score and align sequences.

10 Lustig group (cont.) First, this involves converting the query DNA sequence such that A->T, G->C etc. and then letting the subject DNA sequence now hybridize to the query sequence (note earlier energy set from RNAs). Query: (10) g a g a a g g g c a a g a a Energ: Sbjct: (158) c t c t t t c c g t t c t t Query: a a t t t t t g t c c a a (36) Energ: Sbjct: c t c a a a a a c a a g t t (185)

11 Lustig group (cont.) Second, there are strong correlations between the energy based scores and the corresponding BLAST scores for a representative set of thirty thousand alignments representing DNA sequences that express proteins (& their introns) & ncRNAs.

12 Lustig group (cont.)

13 Lustig group (cont.) Third, More importantly the energies of hybridization per nucleotide do show some specificity w/ respect to the class of DNA regions (e.g. protein expressing vs. ncRNA).

14 Lustig group (cont.) Frequency

15 Lustig group (cont.) Fourth, slip pairing may offer an opportunity to easily randomize hybridization energy data.

16 Lustig group (cont.)

17 Tjaden and coworkers (2006, 2008)
Utilize Smith-Waterman’s dynamic program, where energies are used in the scoring (TargetRNA webserver program). Two models to perform functions: the individual basepairing model and stacked basepairing model. Individual basepairing model is an analogous to the Smith-Waterman’s program. - They first characterized non-coding RNA targets (>50 nt) in bacterial genomic expression libraries.

18 Tjaden and coworkers (cont.)
- More recently have identified smaller RNAs from bacterial genomes just using individual base-pairing (by excluding stacking is computationally less intensive). - Problem is that there may be some issues with the statistics for the alignment/hybridization of such small targets.

19 Santa Lucia & Hicks (2004) Nearest-neighbor thermodynamic parameters for DNA Watson-Crick pairs in 1M NaCl

20 Santa Lucia & Hicks (cont.)
Nearest-neighbor ∆G37 increments (kcal mol-1) for internal single mismatches next to Watson-Crick pairs in 1M NaCl

21 Santa Lucia & Hicks (cont.)
∆G37 increments (kcal mol-1) for length dependence of loop motifs in 1 M NaCl

22 Santa Lucia & Hicks (cont.)

23 Computational Methods
Research Objectives Computational Methods Potnis (2007) modified Perl software from Nair (2005) which can evaluate existing BLAST alignments based on the energy rule of Santa Lucia & Hicks (2004). Busani (2007) modified the earlier Lustig group energy-based Smith-Waterman using the newer energy rules.

24 Materials/Resources BLAST PERL Suite of programs (Lustig Group)
Energy-Based SW (Lustig Group) SJSU Chemistry Cluster

25 Research Objectives & Goals
Modify Potnis (2007) program to do more generalized slip pairing sequences. Determine if slip pairing can create random frequency distributions that may allow for statistical equivalent of E values used in BLAST. Reload Busani (2007) program on new Chemistry Cluster, finishing whole-genome by energy-based SW analysis.

26 Research Objectives & Goals (cont.)
Explore the relationship between HE (hybridization energy) values and BLAST score. Characterize HE as a function of sequence length. Validate energy-based Smith-Waterman Algorithm in whole-genome search. Explore HE statistics with respect to alignment “validity”. At least determine if one can create a reasonable random probability distribution.

27 References Busani, S., “Master Thesis-Project: Energy Based Evaluation of Protein and Other Sequence Information”, San Jose State University, 2007. Nair, J., “Master Thesis: Alignment-shift analysis using hybridization energy rules”, San Jose State University, 2005. Potnis, S., “Master Thesis-Project: Characterization signatures of protein sequences and related species”, San Jose State University, 2007. SantaLucia, J. Jr.; Hicks, D., The thermodynamics of DNA structure motifs. 2004, Annu. Rev. Biophys. Biomol. Struct., 33, Tjaden, B., TargetRNA: a tool for predicting targets of small RNA action in bacteria. 2008, Nucl. Acids Res., 36, Web Server issue, w109-w113. Tjaden, B.; Goodwin, S. S.; Opdyke, J. A.; Guillier, M.; Fu, D. X.; Gottesman, S.; Storz, G., Target prediction for small, noncoding RNAs in bacteria. 2006, Nucl. Acids Res., 34, Wander, D.; Yang, F.; McNeil, M.; Lustig, B., Scoring DNA sequence alignment using energetics of hybridization. 2003, T2/555b(1-8). Zuker, M.; Dimitrov, R., Prediction of hybridization and melting for double-stranded nucleic acids. 2004, Biophy. J., 87, Zuker, M., Mfold web server for nucleic acid folding and hybridization prediction. 2003, Nucl. Acids Res., 31,


Download ppt "Chem 291C Draft Sample Preliminary Seminar"

Similar presentations


Ads by Google