Chem 291C Draft Sample Preliminary Seminar

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Jeff Shen, Morgan Kearse, Jeff Shi, Yang Ding, & Owen Astrachan Genome Revolution Focus 2007, Duke University, Durham, North Carolina Introduction.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Introduction to BioInformatics GCB/CIS535
CSE182-L12 Gene Finding.
Comparative ab initio prediction of gene structures using pair HMMs
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Similar Sequence Similar Function Charles Yan Spring 2006.
Predicting RNA Structure and Function. Following the human genome sequencing there is a high interest in RNA “Just when scientists thought they had deciphered.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Genomics and bioinformatics summary 1. Gene finding: computer searches, cDNAs, ESTs, 2.Microarrays 3.Use BLAST to find homologous sequences 4.Multiple.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Introduction to RNA Bioinformatics Craig L. Zirbel October 5, 2010 Based on a talk originally given by Anton Petrov.
Nucleic Acid Secondarily Structure AND Primer Selection Bioinformatics
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
DNA alphabet DNA is the principal constituent of the genome. It may be regarded as a complex set of instructions for creating an organism. Four different.
RNA Secondary Structure Prediction. 16s rRNA RNA Secondary Structure Hairpin loop Junction (Multiloop)Bulge Single- Stranded Interior Loop Stem Image–
A Tutorial of Sequence Matching in Oracle Haifeng Ji* and Gang Qian** * Oklahoma City Community College ** University of Central Oklahoma.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Questions?. Novel ncRNAs are abundant: Ex: miRNAs miRNAs were the second major story in 2001 (after the genome). Subsequently, many other non-coding genes.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Sequence Alignment.
Motif Search and RNA Structure Prediction Lesson 9.
Step 3: Tools Database Searching
Starter What do you know about DNA and gene expression?
What is BLAST? Basic BLAST search What is BLAST?
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Poster Design & Printing by Genigraphics ® Esposito, D., Heitsch, C. E., Poznanovik, S. and Swenson, M. S. Georgia Institute of Technology.
Unit 1: DNA and the Genome Structure and function of RNA.
Jason Gans Los Alamos National Laboratory Improved Assay-dependent Searching of Nucleic Acid Sequence Databases.
Ch. 11: DNA Replication, Transcription, & Translation Mrs. Geist Biology, Fall Swansboro High School.
Biomathematics seminar Application of Fourier to Bioinformatics Girolamo Giudice.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
What is BLAST? Basic BLAST search What is BLAST?
bacteria and eukaryotes
Gene Expression and Protein Synthesis
Introduction to Bioinformatics Resources for DNA Barcoding
A Very Basic Gibbs Sampler for Motif Detection
Basics of BLAST Basic BLAST Search - What is BLAST?
Basics of Comparative Genomics
Vienna RNA web servers
RNA Secondary Structure Prediction
Sequence comparison: Significance of similarity scores
CSE182-L12 Gene Finding.
Bioinformatics and BLAST
Generalizations of Markov model to characterize biological sequences
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
Sequence comparison: Significance of similarity scores
The Structure of the Genome
Russell Deaton, junghuei Chen, hong Bi, and John A. Rose
Applying principles of computer science in a biological context
Basics of Comparative Genomics
Basic Local Alignment Search Tool
Computational Genomics of Noncoding RNA Genes
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Identifying Protein Related Sequences using Computational DNA Hybridization Methods in Whole Genome Chem 291C Draft Sample Preliminary Seminar Originally Presented: October 23, 2008 Chewei Hsu Research advisor: Dr. Brooke Lustig Committee member: Dr. Elaine Collins, Dr. Roger Terrill

What is the principle used in this research? Introduction What is the principle used in this research? Bioinformatics is a field which involves computer science, statistics, chemistry, biochemistry in order to provide a variety of approaches for the use of biological information. Align query sequence to subject sequence in order to identify similarity. Query A T C - G | | Subject T T G T G

What programs are used for sequence alignment ? There are various programs used for sequence alignment. The most widely used program is Basic Local Alignment Search Tool (BLAST). BLAST is a free online program. Its calculation is based on the Smith-Waterman’s algorithm using pattern-derived scoring. And in BLASTN, the version nucleic acids, it is relatively simplistic. It may be useful to employ hybridization energy rule’s as an alternative approach.

What is Smith-Waterman’s model? It is a recursive mathematical dynamic programming which uses local alignment for sequence analysis.

Background Key Definitions Sequence alignment: it is a process of aligning two or more sequences to achieve maximal possibility of homology. Mismatch: if one base pair does not match between two sequences, that will be a mismatch. Gap: a space is introduced into an alignment to compensate for insertions and deletions in one sequence relative to another. To prevent the accumulation of too many gaps in an alignment, introduction of a gap causes the deduction of a fixed amount (the gap score) from the alignment score.

Key Definitions (cont.) Query sequence: it is a sequence with which people are interested and used to compare with subject sequences for homology. Hybridization: annealing two strands of nucleic acids. Hybridization energy: free energy of such annealing Genome: genome is a set of DNA sequences and genes. It contains the whole hereditary information of an organism. Introns: DNA regions in a gene that are not translated into proteins. These non-coding sections will be removed by a process called splicing.

Key Definitions (cont.) Exon: a nucleic acid sequence that is represented in the mature form of an RNA molecule. The mature RNA molecule can be messenger or non-coding RNA such as rRNA or tRNA. ncRNA: It is any RNA molecule that functions without being translated into proteins.

Literature Review Zuker and coworkers (2003, 2004) Develop a tool for predicting the secondary structure of RNA and DNA by using thermodynamic methods. Can be adapted to do hybridization. Providing a general statistical mechanical approach to describe self-folding with the hybridization between a pair of DNA or RNA molecules. The folding and hybridization models deal with matched pairs, mismatches, symmetric and asymmetric interior loops.

Lustig group (2003, 2006) BLAST and its more computationally intensive predecessor Smith-Waterman algorithm use a scoring scheme that is character-based and not intuitive, but works well specifically for sequence alignment (including for nucleic acids). Lustig group’s interest is to see if energy rules can properly be used to score and align sequences.

Lustig group (cont.) First, this involves converting the query DNA sequence such that A->T, G->C etc. and then letting the subject DNA sequence now hybridize to the query sequence (note earlier energy set from RNAs). Query: (10) g a g a a g g g c a a g a a Energ: -2.3 -1.7 -2.3 -0.9 0.8 -2.9 -3.4 -1.8 -0.9 -1.7 -2.3 -0.9 Sbjct: (158) c t c t t t c c g t t c t t Query: a a - t t t t t g t c c a a (36) Energ: 0.8 3.3 -0.9 -0.9 -0.9 -0.9 -1.8 -2.1 0.8 -1.8 -0.9 Sbjct: c t c a a a a a c a a g t t (185)

Lustig group (cont.) Second, there are strong correlations between the energy based scores and the corresponding BLAST scores for a representative set of thirty thousand alignments representing DNA sequences that express proteins (& their introns) & ncRNAs.

Lustig group (cont.)

Lustig group (cont.) Third, More importantly the energies of hybridization per nucleotide do show some specificity w/ respect to the class of DNA regions (e.g. protein expressing vs. ncRNA).

Lustig group (cont.) Frequency

Lustig group (cont.) Fourth, slip pairing may offer an opportunity to easily randomize hybridization energy data.

Lustig group (cont.)

Tjaden and coworkers (2006, 2008) Utilize Smith-Waterman’s dynamic program, where energies are used in the scoring (TargetRNA webserver program). Two models to perform functions: the individual basepairing model and stacked basepairing model. Individual basepairing model is an analogous to the Smith-Waterman’s program. - They first characterized non-coding RNA targets (>50 nt) in bacterial genomic expression libraries.

Tjaden and coworkers (cont.) - More recently have identified smaller RNAs from bacterial genomes just using individual base-pairing (by excluding stacking is computationally less intensive). - Problem is that there may be some issues with the statistics for the alignment/hybridization of such small targets.

Santa Lucia & Hicks (2004) Nearest-neighbor thermodynamic parameters for DNA Watson-Crick pairs in 1M NaCl

Santa Lucia & Hicks (cont.) Nearest-neighbor ∆G37 increments (kcal mol-1) for internal single mismatches next to Watson-Crick pairs in 1M NaCl

Santa Lucia & Hicks (cont.) ∆G37 increments (kcal mol-1) for length dependence of loop motifs in 1 M NaCl

Santa Lucia & Hicks (cont.)

Computational Methods Research Objectives Computational Methods Potnis (2007) modified Perl software from Nair (2005) which can evaluate existing BLAST alignments based on the energy rule of Santa Lucia & Hicks (2004). Busani (2007) modified the earlier Lustig group energy-based Smith-Waterman using the newer energy rules.

Materials/Resources BLAST PERL Suite of programs (Lustig Group) Energy-Based SW (Lustig Group) SJSU Chemistry Cluster

Research Objectives & Goals Modify Potnis (2007) program to do more generalized slip pairing sequences. Determine if slip pairing can create random frequency distributions that may allow for statistical equivalent of E values used in BLAST. Reload Busani (2007) program on new Chemistry Cluster, finishing whole-genome by energy-based SW analysis.

Research Objectives & Goals (cont.) Explore the relationship between HE (hybridization energy) values and BLAST score. Characterize HE as a function of sequence length. Validate energy-based Smith-Waterman Algorithm in whole-genome search. Explore HE statistics with respect to alignment “validity”. At least determine if one can create a reasonable random probability distribution.

References Busani, S., “Master Thesis-Project: Energy Based Evaluation of Protein and Other Sequence Information”, San Jose State University, 2007. Nair, J., “Master Thesis: Alignment-shift analysis using hybridization energy rules”, San Jose State University, 2005. Potnis, S., “Master Thesis-Project: Characterization signatures of protein sequences and related species”, San Jose State University, 2007. SantaLucia, J. Jr.; Hicks, D., The thermodynamics of DNA structure motifs. 2004, Annu. Rev. Biophys. Biomol. Struct., 33, 415-440. Tjaden, B., TargetRNA: a tool for predicting targets of small RNA action in bacteria. 2008, Nucl. Acids Res., 36, Web Server issue, w109-w113. Tjaden, B.; Goodwin, S. S.; Opdyke, J. A.; Guillier, M.; Fu, D. X.; Gottesman, S.; Storz, G., Target prediction for small, noncoding RNAs in bacteria. 2006, Nucl. Acids Res., 34, 2791-2802. Wander, D.; Yang, F.; McNeil, M.; Lustig, B., Scoring DNA sequence alignment using energetics of hybridization. 2003, T2/555b(1-8). Zuker, M.; Dimitrov, R., Prediction of hybridization and melting for double-stranded nucleic acids. 2004, Biophy. J., 87, 215-226. Zuker, M., Mfold web server for nucleic acid folding and hybridization prediction. 2003, Nucl. Acids Res., 31, 3406-3415.