1 Convolution and Its Applications to Sequence Analysis Student: Bo-Hung Wu Advisor: Professor Herng-Yow Chen & R. C. T. Lee Department of Computer Science.

Slides:



Advertisements
Similar presentations
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
Advertisements

Tuned Boyer Moore Algorithm
College of Information Technology & Design
Indexing DNA Sequences Using q-Grams
CPSC 335 Dynamic Programming Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
IEPAD: Information Extraction based on Pattern Discovery Chia-Hui Chang National Central University, Taiwan
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
Ex. 11 (pp.409) Given the lattice structure shown in Figure 6.33 and the transactions given in Table 6.24, label each node with the following letter(s):
An Optimization Approach to Improving Collections of Shape Maps Andy Nguyen, Mirela Ben-Chen, Katarzyna Welnicka, Yinyu Ye, Leonidas Guibas Computer Science.
Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan
6-1 String Matching Learning Outcomes Students are able to: Explain naïve, Rabin-Karp, Knuth-Morris- Pratt algorithms Analyse the complexity of these algorithms.
Whole Genome Alignment using Multithreaded Parallel Implementation Hyma S Murthy CMSC 838 Presentation.
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
CSIE NCNU1 Block Alignment: An Approach for Multiple Sequence Alignment Containing Clusters Advisor: Professor R. C. T. Lee Speaker: B. W. Xiao 2004/06/04.
MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.
Improved TCAM-based Pre-Filtering for Network Intrusion Detection Systems Department of Computer Science and Information Engineering National Cheng Kung.
1 Regular expression matching with input compression : a hardware design for use within network intrusion detection systems Department of Computer Science.
Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp Adviser:
Quick Search Algorithm A very fast substring search algorithm, SUNDAY D.M., Communications of the ACM. 33(8),1990, pp Adviser: R. C. T. Lee Speaker:
1 The Galil-Giancarlo algorithm Advisor: Prof. R. C. T. Lee Speaker: S. Y. Tang On the exact complexity of string matching: upper bounds, SIAM Journal.
Implementation of Planted Motif Search Algorithms PMS1 and PMS2 Clifford Locke BioGrid REU, Summer 2008 Department of Computer Science and Engineering.
Backward Nondeterministic DAWG Matching Algorithm
6/29/20151 Efficient Algorithms for Motif Search Sudha Balla Sanguthevar Rajasekaran University of Connecticut.
1 Sorting by Transpositions Based on the First Increasing Substring Concept Advisor: Professor R.C.T. Lee Speaker: Ming-Chiang Chen.
The Galil-Giancarlo algorithm
Sequence comparison: Local alignment
String Matching Input: Strings P (pattern) and T (text); |P| = m, |T| = n. Output: Indices of all occurrences of P in T. ExampleT = discombobulate later.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Health and CS Philip Chan. DNA, Genes, Proteins What is the relationship among DNA Genes Proteins ?
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
How to make a presentation (Oral and Poster) Dr. Bernard Chen Ph.D. University of Central Arkansas July 5 th Applied Research in Healthy Information.
Chapter 3: The Fundamentals: Algorithms, the Integers, and Matrices
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
1 TEMPLATE MATCHING  The Goal: Given a set of reference patterns known as TEMPLATES, find to which one an unknown pattern matches best. That is, each.
Simpson Rule For Integration.
Filter Algorithms for Approximate String Matching Stefan Burkhardt.
1 Speeding up on two string matching algorithms Advisor: Prof. R. C. T. Lee Speaker: Kuei-hao Chen, CROCHEMORE, M., CZUMAJ, A., GASIENIEC, L., JAROMINEK,
Advisor: Prof. R. C. T. Lee Speaker: T. H. Ku
Numerical Methods Part: Simpson Rule For Integration.
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
An Implementation of The Teiresias Algorithm Na Zhao Chengjun Zhan.
Sequencing a genome and Basic Sequence Alignment
Faster Algorithm for String Matching with k Mismatches (II) Amihood Amir, Moshe Lewenstin, Ely Porat Journal of Algorithms, Vol. 50, 2004, pp
Section 2.4. Section Summary Sequences. Examples: Geometric Progression, Arithmetic Progression Recurrence Relations Example: Fibonacci Sequence Summations.
Computing longest common substring and all palindromes from compressed strings Wataru Matsubara 1, Shunsuke Inenaga 2, Akira Ishino 1, Ayumi Shinohara.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Sequence Alignment.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Doug Raiford Phage class: introduction to sequence databases.
2016/1/27Summer Course1 Pattern Search Problems Part I: Fundament Concept.
Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.
String-Matching Problem COSC Advanced Algorithm Analysis and Design
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Quiz Week 8 Topical. Topical Quiz (Section 2) What is the difference between Computer Vision and Computer Graphics What is the difference between Computer.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Section 7.6 Functions Math in Our World. Learning Objectives  Identify functions.  Write functions in function notation.  Evaluate functions.  Find.
Student Gesture Recognition System in Classroom 2.0 Chiung-Yao Fang, Min-Han Kuo, Greg-C Lee, and Sei-Wang Chen Department of Computer Science and Information.
Sequence comparison: Local alignment
A Hybrid Algorithm for Multiple DNA Sequence Alignment
On the k-Closest Substring and k-Consensus Pattern Problems
CSE 589 Applied Algorithms Spring 1999
Jumbled Matching with SIMD
Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing Speaker : Chuan-Cheng Lin Advisor.
Applying principles of computer science in a biological context
Discovering Frequent Poly-Regions in DNA Sequences
Presentation transcript:

1 Convolution and Its Applications to Sequence Analysis Student: Bo-Hung Wu Advisor: Professor Herng-Yow Chen & R. C. T. Lee Department of Computer Science & Information Engineering National Chi Nan University

2 The Definition of Convolution in the Continuous Case Reference: Lecture notes, “Introduction to communication”, R. C. T. Lee et al. Example

3

4 Exact String-Matching Problem Input. Text string T=T 1 T 2 …T n and pattern string P=P 1 P 2 …P m where T i, P i ∑(alphabet) and m<=n. Output. All locations i in T where T i T i+1 T i+2 …T i+m-1 =P 1 P 2 …P m It is obvious that string matching is related to convolution.

5 Convolution in the Discrete Case for k=0~ m+n Then the convolution of X and Y with respect to and is Definition: Let X=, Y= be two given vectors, x i, y i D. Let and be two given functions, where

6 Consider the exact string-matching problem, how can we use convolution to solve it?[FP74] First we reverse Y to be Second we define the functions and to be as follows: Note that the process of this convolution is equal to the one of the sliding window approach. [FP74]

7 Applying Convolution to Sequence Analysis (1)The common substring with k-mismatch allowed problem (2)Common substrings with k-mismatches allowed among multiple sequences problem (3)Determining the similarity of two DNA sequences (4)Searching in a DNA sequences database (5)Finding repeating groups in a DNA sequence (6)An aid for detection in transposition (7)An aid for detecting insertion/deletion (8)An aid for detecting the overlapping of segments resulting from the shot-gun operations (9)The corresponding pair-wise nucleotides in a DNA sequence (10)An aid for looking for similar regions in a DNA sequence with a distance constraint

8 The Corresponding Pair-wise Nucleotides in a DNA Sequence Substitution rule: A  T T  A C  G G  C Example: S=”acttgacgtgaac”

9 Experiments We apply convolution on DNA sequences and English compositions to find the similarity of them. In the following experiments, we used the following DNA sequences as the input data. (Clustering was known in advance for evaluating.) C1(0-25) : Hepatitis B virus; C2(26-162) : Human mitochondrion; C3( ): Other viruses

10

11 Experiment : The Comparison of English compositions. We applied convolution on two English compositions to detect whether they are similar or not.

12

13 Conclusion and Future Work We have shown that several applications related to sequences analysis which we discovered can be solved by means of convolution. Convolution can be used as a negative answer filter. In practical parts, we did some experiments. The experimental results confirm that this approach is feasible. By arranging appropriate operations to be the functions in the convolution, we can solve more problems related to sequences analysis. For example, we hope that we may apply convolution to help solve protein structure comparison.

14 Thank you.