Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biomathematics seminar Application of Fourier to Bioinformatics Girolamo Giudice.

Similar presentations


Presentation on theme: "Biomathematics seminar Application of Fourier to Bioinformatics Girolamo Giudice."— Presentation transcript:

1 Biomathematics seminar Application of Fourier to Bioinformatics Girolamo Giudice

2

3 Background −1 + 3i and −1 − 3i,

4 Background Z = −1 + 3i Complex plane

5 Background Euler’s Formula Polar Cartesian Exponential

6 Periodic and aperiodic function Periodic functionAperiodic function

7 Sine (or cosine) wave A the amplitude, is the peak deviation of the function from zero. F the frequency, is the number of oscillations (cycles) that occur each second of time. ω = 2πf, the angular frequency, how many cycles occur in a second φ the phase, specifies (in radians) where in its cycle the oscillation is at t = 0.

8 Harmonic analysis and Fouries series It is possible to express periodic function into the sum of a (possibly infinite) set sines and cosines (or, equivalently, complex exponentials). Complex Fourier series Quadrature Fourier Series Euler Formula

9 Harmonic analysis and Fouries series It is possible to express a periodic function into the sum of a (possibly infinite) set sines and cosines Fourier Series

10 Spectrum Fourier Series

11 Spectrum of a sinusoid Quadrature Fourier Series Complex Fourier Series Delta Function

12 Example A=5 0

13 Take home message

14 Spectrum of a sinusoid

15 Spectrum of a Swiched sinusoid

16 Fourier transform Discrete domain Inverse Fourier Transform Continuous Domain Complex Fourier series

17 Fourier’s Transform

18 Relationship between Fourier series and transform

19 Application of Fourier to Bioinformatics

20 Conversion of DNA Character Strings to Conversion of DNA Character Strings to Numbers Create four binary sequences, one for each character one for each character (base), which specify whether a character is present (1) or absent (0) at a specific location known as indicator sequences Assign meaningful real or complex numbers to the four A, T, G, and C characters. In this way, a single numerical sequence representing the entire character string is obtained.

21 Binary Indicator Sequences Dna Sequence ATTGCACCGTGA 100001000001 011000000100 000100001010 000010110000 Sum111111111111 Any three of the four indicator sequences completely characterize the full DNA character string. Indicator sequences can be analyzed to identify in the structure of a DNA string.

22 Period-3 property PS of a protein coding regionPS of a non-coding region

23 Identify Protein-Coding Regions

24

25 Resonant Recognition Model The energy of delocalized electrons in amino acids produce the strongest impact on the electronic distribution of the whole protein because produce electromagnetic irradiation or absorption with spectral characteristics corresponding to energy distribution along the protein

26 Resonant Recognition Model

27 This frequency is related to the biological function provided the following criteria are met: 1) One peak only exists for a group of protein sequences 2) No significant peak exists for biologically unrelated 3) Peak frequencies are different for different biological In our previous studies

28

29 Each frequency in the RRM characterizes one biological function. To grasp the meaning of characteristic frequency, it is important first to understand what is meant by the biological function of proteins. Each biological process involves a number of interactions between proteins and their targets (other protein, DNA regulatory segment or small molecule). Each of these processes involves energy transfer between interacting molecules. These interactions are highly selective and this selectivity is defined within the protein structure. Protein and their protein or DNA targets have been analyzed to find out whether RRM characteristic frequencies denote a parameter which describes this selectivity between interacting molecules. It has been shown that proteins and their DNA or protein targets share the same characteristic frequency [ 131, [151, [17], [22], but of opposite phase [17], [22] for each in a pair of interacting macromolecules. Thus, it can be postulated that RRM characteristic frequencies characterize not only general functions but also provide recognition between a particular protein and its target (receptor, ligand, etc.). As this recognition arises from the matching of periodicities within the distribution of energies of free electrons along the interacting proteins, it can be regarded as resonant recognition.

30 Repetita Permit to find periodicities hidden along the sequence.

31 Repetita

32 Alignment-MAFFT Uses FFT for the detection of homologous segments Substitutions between physico‐chemically similar amino acids tend to preserve the structure of proteins, and such neutral substitutions have been accumulated in molecules during evolution An amino acid a is assigned to a vector whose components are the volume value v(a) and the polarity value p(a) Calculation of the correlation between two amino acid sequences. We define the correlation c(k) between two sequences of such vectors as c(k) = c v (k) + c p (k),1 where c v (k) and c p (k) are, as defined below, the correlations of volume component and polarity component, respectively, between two amino acid sequences to be aligned. The correlation c(k) represents the degree of similarity of two sequences with the positional lag of k sites. The high value of c(k) indicates that the sequences may have homologous regions The correlation c v (k) of volume component between sequence 1 and sequence 2 with the positional lag of k sites is defined as where v̂ 1 (n) and v̂ 2 (n) are the volume component of the nth site of sequence 1 with the length of N and that of sequence 2 with the length ofM, respectively. If two sequences compared have homologous regions, the correlation c(k) has some peaks corresponding to these regions By the FFT analysis, however, we can know only the positional lag k of a homologous region in two sequences but not the position of the region. As shown in Figure 1B, to determine the positions of the homologous region in each sequence, a sliding window analysis with the window size of 30 sites is carried out, in which the degree of local homologies is calculated for each of the highest 20 peaks in the correlation c(k)1

33 MSA with Fourier - MAFFT

34 Homologous regions are quickly identified by converting amino acid residues to vectors of volume and polarity If is are the volume component of the nth site

35

36 Recursive Splicing

37 RNA Binding protein motifs Exon Intron Exon Pre-mRNA

38

39

40

41 Take home message Fourier transform and Fourier series have the same purpose: decompose a signal in sum of waves. It was possible: Detect hidden signal (period-3 property) Filtering noise (identify Protein-Coding Regions) Detect periodicity (Repetita) Detect common structure and sequence similarities (Resonant recognition model, MAFFT) Denoising and reconstructing signals( Recursive splicing) Reduce computational time ( MAFFT)

42 Thank you


Download ppt "Biomathematics seminar Application of Fourier to Bioinformatics Girolamo Giudice."

Similar presentations


Ads by Google