Presentation is loading. Please wait.

Presentation is loading. Please wait.

A mathematical model of the genetic code: structure and applications A mathematical model of the genetic code: structure and applications Antonino Sciarrino.

Similar presentations


Presentation on theme: "A mathematical model of the genetic code: structure and applications A mathematical model of the genetic code: structure and applications Antonino Sciarrino."— Presentation transcript:

1 A mathematical model of the genetic code: structure and applications A mathematical model of the genetic code: structure and applications Antonino Sciarrino Università di Napoli “Federico II” INFN, Sezione di Napoli TAG 2006 Annecy-leVieux, 9 November 2006

2 Mathematical Model of the Genetic Code Work in collaboration with Luc FRAPPAT Paul SORBA Diego COCURULLO

3 SUMMARY Introduction Introduction Description of the model Description of the model Applications : Codon usage frequencies Applications : Codon usage frequencies DNA dimers free energy DNA dimers free energy Work in progress Work in progress

4 It is amazing that the complex biochemical relations between DNA and proteins were very quickly reduced to a mathematical model. Just few months after the WATSON- CRICK discovery G. GAMOW proposed the “diamond code”

5 Gamow “diamond code” Gamow, Nature (1954) Nucleotides are denoted by number 1,2,3,4 Amino-acids FIT the rhomb -shaped “holes” formed by the 4 nucleotides  20 a.a. !

6 Since 1954 many mathematical modelisations of the genetic coded have been proposed (based on informatiom, thermodynamic, symmetry, topology… arguments) Weak point of the models: often poor explanatory and/or predictive power

7 The genetic code

8 Crystal basis model of the genetic code 4 basis C, U/T (Pyrimidines) G, A (Purines) are identified by a couple of “spin” labels (+  1/2, -  -1/2) L.Frappat, A. Sciarrino, P. Sorba: Phys.Lett. A (1998) Mathematically - C,U/T,G,A transform as the 4 basis vectors of irrep. (1/2, 1/2) of U q  0 (sl(2) H  sl(2) V )

9 Crystal basis model of the genetic code Dinucleotides are composite states (  16 basis vectors of (1/2, 1/2)  2 ) belonging to “sets” identified by two integer numbers J H J V In each “set” the dinucleotide is identified by two labels - J H  J H,3  J H - J V  J V,3  J V Ex. CU = (+,+)  (+, -) ( J H = 1/2, J H,3 = 1/2; J V = 1/2, J V,3 = 1/2)  Follows from property of U (q  0) (sl(2))

10 DINUCLEOTIDE Representation Content DINUCLEOTIDE Representation Content

11 Crystal basis model of the genetic code Codons are composite states (  64 basis vectors of (1/2, 1/2)  ) belonging to “sets” identified by half- integer J H J V (“set”  irreducible representation = irrep.) Ex. CUA = (+,+)  (-, +)  (-,-) ( J H = 1/2, J H,3 = 1/2; J V = 1/2, J V,3 = 1/2)  Follows from property of U (q  0) (sl(2))

12 Codons in the crystal basis

13 Codon usage frequency Synonymous codons are not used uniformly (codon bias) Synonymous codons are not used uniformly (codon bias) codon bias (not fully understood) ascribed to evolutive- selective effects codon bias (not fully understood) ascribed to evolutive- selective effects codon bias depends codon bias depends  Biological species (b.sp.)  Sequence analysed  Amino acid (a.a.) encoded  Structure of the considered multiplet  Nature of codon XYZ  …………………….

14 Codon usage in Homo sap.

15 Our analysis deals with global codon usage, i.e. computed over all the coding sequences (exonic region) for the b.sp. of the considered specimen  To put into evidence possible general features of the standard eukaryotic genetic code ascribable to its organisation and its evolution

16 Let us define the codon usage probability for the codon XZN (X,Z,N  {A,C,G,U  T in DNA} ) P(XZN) = limit n   n XZN / N tot n XZN number of times codon XZN used in the processes N tot total number of codons in the same processes For fixed XZ Normalization ∑ N P(XZN) = 1 Note - Sextets are considered quartets + doublets  8 quartets

17 Def. - Correlation coefficient r XY for two variables X  P..X Y  P..Y

18 Specimen (GenBank Release 149.0 09/2005 - N codons > 100.000) 26 VERTEBRATES 26 VERTEBRATES 28 INVERTEBRATES 28 INVERTEBRATES 38 PLANTS 38 PLANTS TOTAL - 92 Biological species TOTAL - 92 Biological species

19 Correlation coefficient VERTEBRATES

20 Correlation coefficient PLANTS

21 Correlation coefficient INVERTEBRATES

22 Averaged value of P(..N)

23

24 Averaged value of sum of two correlated P(N) 

25 Ratios of  obs 2 (X+Y) and  th 2 (X+Y) =  obs 2 (X)+  obs 2 (Y) averaged over the 8 a.a. for the sum of two codon probabilities

26  Indication for correlation for codon usage probabilities P(A) and P(C) (  P(U) and P(G)) for quartets.

27 Correlation between codon probabilities for different a.a. Correlation coefficients between the 28 couples P XZN-X’Z’N where XZ (X’Z’) specify 8 quartets. The following pattern comes out for the whole eucaryotes specimen (n = 92)

28 The set of 8 quartets splits into 3 subsets 4 a.a. with correlated codon usage (Ser, Pro, Arg, Thr) 4 a.a. with correlated codon usage (Ser, Pro, Arg, Thr) 2 a.a. with correlated codon usage (Leu, Val) 2 a.a. with correlated codon usage (Leu, Val) 2 a.a. with generally uncorrelated codon usage (Arg, Gly) 2 a.a. with generally uncorrelated codon usage (Arg, Gly)

29 Statistical analysis   Correlation for P(XZA)-P(XZC), XZ  quartets  Correlation for P(N) between {Ser, Pro, Thr, Ala} and {Leu, Val} The observed correlations well fit in the mathematical scheme of the crystal basis model of the genetic code The observed correlations well fit in the mathematical scheme of the crystal basis model of the genetic code

30 In the crystal basis model P(XYZ) can be written as function of

31 ASSUMPTION

32  SUM RULES K INDEPENDENT OF THE b.s. XZ  QUARTETS

33 SUM RULES  “Theoretical” correlation matrix XZ = NC,CG,GG,CU,GU

34 Observed averaged value of the correlation matrix, in red the theoretical value

35

36 Shannon Entropy Shannon Entropy Let us define the Shannon entropy for the amino-acid specified by the first two nucleotide XZ (8 quartes)

37 Shannon Entropy Using the previous expression for P(XZN) we get  N   (XZN), H bsN  H bs (XZN), P N  P (XZN)  S XZ largely independent of the b.sp.

38 Shannon Entropy

39 DNA dinucleotide free energy DNA dinucleotide free energy Free energy for a pair of nucleotides, ex. GC, lying on one strand of DNA, coupled with complementary pair, CG, on the other strand. CG from 5’  3’ correlated with GC from 3’  5’

40 DINUCLEOTIDE Representation Content DINUCLEOTIDE Representation Content

41

42 SUM RULES for FREE ENERGY

43 Comparison with exp. data  G in Kcal/mol

44 DINUCLEOTIDE Distribution

45

46 Comparison with experimental data Comparison with experimental data

47 Work in progress and future perspectives Fron the correspondence {C,U/T,G,A}  I.R. (1/2,1/2) of U q  0 (sl(2) H  sl(2) V )  Any ordered N nucleotides sequence  Vector of I.R.  (1/2,1/2)  N of U q  0 (sl(2) H  sl(2) V )  New pametrization of nucleotidees sequences

48 “Spin” parametrisation

49 Algorithm for the “spin” parametrisation of ordered n-nucleotide sequence

50 From this parametrisation: Alternative construction of mutation model, where mutation intensitydoes not depend from the Hamming distance between the sequences, but from the change of “labels” of the “sets”. Alternative construction of mutation model, where mutation intensitydoes not depend from the Hamming distance between the sequences, but from the change of “labels” of the “sets”. C. Minichini, A.S., Biosystems (2006) Characterization of particular sequences (exons, introns, promoter, 5’ or 3’ UTR sequences,….) Characterization of particular sequences (exons, introns, promoter, 5’ or 3’ UTR sequences,….) L. Frappat, P. Sorba, A.S., L. Vuillon, in progress

51 For each gene of Homo Sap. (total ~28.000 genes) Consider the N-nucleotide coding sequence (CDS) Consider the N-nucleotide coding sequence (CDS)  Compute the “ labels” J H, J 3H ; J V, J 3V for any n-nucleotide subsequence (1  n  N) for any n-nucleotide subsequence (1  n  N)  Plot “ labels” versus n

52 Red J H - Green J 3H Blue J V - Black J 3V

53

54

55

56 Numerical estimator Define for any sequence of length N Plot number of CDS with the same value of Diff (Sum) versus Diff (Sum) Compute Diff (Sum) for 28.000 random sequences (300 < N < 4300) with uniform probability for each nucleotide Comparison number of CDS - random sequences

57

58

59 Conclusions Correlations in codon usage frequencies computed over the whole exonic region fit well in the mathematical scheme of the crystal basis model of the genetic code Missing explanation for the correlations Correlations in codon usage frequencies computed over the whole exonic region fit well in the mathematical scheme of the crystal basis model of the genetic code Missing explanation for the correlations Formalism of crystal basis model useful to parametrize free energy for DNA dimers Formalism of crystal basis model useful to parametrize free energy for DNA dimers More generally, use of mathematical structure may be useful to describe sequences of nucleotides. More generally, use of U q  0 (sl(2) H  sl(2) V ) mathematical structure may be useful to describe sequences of nucleotides.


Download ppt "A mathematical model of the genetic code: structure and applications A mathematical model of the genetic code: structure and applications Antonino Sciarrino."

Similar presentations


Ads by Google