Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformātika Proteīnu un RNS struktūras LU, 2008, Juris Vīksna.

Similar presentations


Presentation on theme: "Bioinformātika Proteīnu un RNS struktūras LU, 2008, Juris Vīksna."— Presentation transcript:

1 Bioinformātika Proteīnu un RNS struktūras LU, 2008, Juris Vīksna

2 Proteīni: ko mēs ar to saprotam ar proteīnu struktūru, struktūru reprezentācija Ar proteīnu struktūrām saistītās problēmas RNS: ko mēs ar to saprotam ar RNS struktūru Ar RNS struktūrām saistītās problēmas Metodes proteīnu struktūru salīdzināšanai Proteīnu struktūru datubāzes Rīki proteīnu struktūru salīdzināšanai un vizualizācijai Proteīnu struktūru klasifikācijas RNS struktūru prognozēšana Šodien:

3 Proteīni [Adapted from R.Shamir]...VPEKNRPLKGRINLVLSRELKEPPQGAHFLSRSLDDALKLTEQPELANK... Protein sequence:

4 Proteīnu struktūra [Adapted from R.Shamir]

5 Proteīnu struktūra [Adapted from R.Shamir] We will be interested mostly in secondary and tertiary structure

6 Proteīnu struktūras noteikšana - kristalogrāfija [Adapted from G.Lee] The basics:  Purify an protein crystal  Shoot an X-ray through the rotating crystal  Collect Data in one of many ways  Interpret data

7 Proteīnu struktūras noteikšana - kristalogrāfija [Adapted from G.Lee] Problems:  Crystal setup takes….forever (almost)  Interpreting the data is no easy task But all methods create this mass of data  Expensive($$$)

8 Proteīnu struktūras noteikšana - kristalogrāfija [Adapted from G.Lee]  In the end, biologists want the best results possible and X-ray Crystallography provides this right now It gets the job done No other method does the job better

9 Proteīnu struktūras noteikšana -kristalogrāfija

10

11

12 Magnet Radio frequency amplifiers Samples Proteīnu struktūras noteikšana - NMR [Adapted from V.Arcus] NMR - Nuclear magnetic resonance

13 Proteīnu struktūras noteikšana - NMR [Adapted from V.Arcus]

14 Proteīnu struktūras noteikšana - NMR Protein NMR requires large amounts of very pure protein.. [Adapted from V.Arcus]  Extraction from the natural source a major disadvantage here is the very low levels of protein in tissues for example, one might start with 10 l of blood and get 1 mg of protein! this also requires a large number of purification steps the main advantage is the maintenance of post-translational modifications

15 NMR vai kristalogrāfija? [Adapted from V.Arcus]  Both techniques to determine protein structures  NMR uses protein in solution  X-ray crystallography uses protein crystals  Both techniques require large amounts of pure protein  Both techniques require expensive equipment!

16 NMR priekšrocības [Adapted from V.Arcus] Protein in solution! Can look at the dynamic properties of the protein structure Can look at the interactions between the protein and ligands, substrates or other proteins Can look at protein folding Sample is not damaged in any way No “phase problem” Can “characterise” your protein using NMR

17 NMR trūkumi [Adapted from V.Arcus] Size limit! The maximum size of a protein for NMR structure determination is ~30 kDa. This eliminates ~50% of all proteins High solubility is a requirement Comparatively low resolution

18 Kristalogrāfijas priekšrocības [Adapted from V.Arcus] No size limit As long as you can crystallise it Solubility requirement is less stringent Simple definition of resolution Direct calculation from data to electron density and back again

19 Kristalogrāfijas trūkumi [Adapted from V.Arcus] Crystallisation! This is a process bottleneck Binary (all or nothing) Phase problem If the cell contains two electrons (each with the same scattering power) and their positional relationship is such that the distance between them is exactly one-half the distance between reflecting planes, then they will cancel out each others contribution to diffraction.

20 Proteīnu struktūras fails HEADER HYDROLASE 03-NOV-00 1G65 TITLE CRYSTAL STRUCTURE OF EPOXOMICIN:20S PROTEASOME REVEALS A TITLE 2 MOLECULAR BASIS FOR SELECTIVITY OF ALPHA,BETA-EPOXYKETONE TITLE 3 PROTEASOME INHIBITORS COMPND MOL_ID: 1;............................................................................. ATOM 115 CD PRO A 17 44.162 -73.549 30.303 1.00 34.52 C ATOM 116 N SER A 18 47.730 -73.191 28.777 1.00 37.54 N ATOM 117 CA SER A 18 49.119 -72.807 28.499 1.00 40.24 C ATOM 118 C SER A 18 50.025 -74.009 28.289 1.00 42.29 C ATOM 119 O SER A 18 51.252 -73.870 28.152 1.00 42.34 O ATOM 120 CB SER A 18 49.661 -71.974 29.653 1.00 42.60 C ATOM 121 OG SER A 18 49.219 -72.500 30.895 1.00 45.61 O ATOM 122 N GLY A 19 49.411 -75.189 28.300 1.00 42.88 N ATOM 123 CA GLY A 19 50.145 -76.427 28.117 1.00 44.73 C ATOM 124 C GLY A 19 50.743 -76.999 29.391 1.00 43.86 C ATOM 125 O GLY A 19 51.585 -77.900 29.352 1.00 45.98 O ATOM 126 N LYS A 20 50.315 -76.498 30.532 1.00 42.31 N PDB file format

21 Proteīnu struktūra - atomu koordinātas [Adapted from M.Gerstein and I.Eidhammer, I.Jonassen] Structure is described by 3D coordinates (X,Y,Z) of all C atoms

22 Proteīnu struktūra - foldi "Fold" representation of 7timA0

23 Hydrogen bonding patterns for four helices; Structures are represented in a diagrammatic way to simplify counting the atoms in each H-bonded loop. 2 7 ribbon 3 10 helix 3.6 13 helix  helix      Proteīnu struktūra -   spirāles [Adapted from S.Rafferty]

24 Proteīnu struktūra -  sloksnes Composed of  strands Adjacent Strands may be parallel or antiparallel Strands are flat: think of a beta sheet as a helix with two residues per turn Parallel AntiParallel [Adapted from S.Rafferty]

25 Proteīnu foldi - sandwhich (  )

26 Proteīnu foldi - barrels (  )

27 Proteīnu foldi - horseshoe (  -  )

28 Proteīnu foldi - helix “bundles” (  )

29 Proteīnu foldi - mijiedarbības Transcription factors - homeodomain proteins

30 Proteīnu foldi - daži skaitļi

31 Proteīnu struktūras - citas reprezentācijas Different representations of myoglobin molecule Contact map (graph-based) representation of protein structure

32 Noteikšana (ne gluži bioinformātikas problēma) Prognozēšana (protein folding problēma; viens no bioinformātikas Holy Grail...) Salīdzināšana (nav gluži triviāli, bet ir metodes, kas praksē darbojas pietiekami labi) Reprezentācijas Virsmas modelēšana Proteīnu mijiedarbību modelēšana/prognozēšana Vizualizācija Ar proteīnu struktūrām saistītās problēmas

33 The folded state is a low energy state under physiological conditions: H 2 O, pH ~ 7.0, NaCl Protein folding G Gibbs Free Energy U I F  G U–F

34 Kas ietekmē protein folding: Hidrofobiskie spēki (ūdens "izspiešana") Ūdeņraža saites Elektrostatiskie spēki Disulfīdu saites Chaperones Protein folding

35 Chaperones Chaperone proteins were first identified as "heat-shock proteins" (hsp60 and hsp70) Hsp70 recognizes exposed, unfolded regions of new protein chains - especially hydrophobic regions It binds to these regions, apparently protecting them until productive folding reactions can occur Occurs while the chain is still being translated

36 CASP http://predictioncenter.gc.ucdavis.edu/casp7/Casp7.html

37 CAFASP http://www.cs.bgu.ac.il/~dfischer/CAFASP4/

38 Prioni Prion - proteinaceous infectious particle PrP c -the normal versionHypothetical structure of PrP sc

39 Prioni Spontaneously (rare): the normal fold is overwhelmingly the favored conformation Inherited: a mutation in the PRNP gene destabilizes the normal conformation Transmitted: ingestion of PrPsc from diet, surgical instruments, blood, or blood-derived products

40 Molekulārās virsmas Key-and-lock princips:

41 RNS struktūra RNA sequence:...AGGCUAUGGCCA... Single-stranded, but A tends to pair with U G tends to pair with C

42 RNS sekundārā struktūra 5’ 3’ G--C C--G A | U--A G--C A A A [Adapted from C.Staben]

43 RNS sekundārā struktūra [Adapted from K.Selesniemi] Pseudo-knot

44 RNS terciālā struktūra [Adapted from K.Selesniemi]

45 RNS terciālā struktūra [Adapted from K.Selesniemi]

46 RNS struktūras noteikšana - fizikālās metodes [Adapted from P. De Rijk] The experimental method giving the highest resolution is single crystal X-ray diffraction. X-ray diffraction reveals secondary, tertiary and three dimensional structures. Unfortunately, it is very difficult to obtain crystals of RNA molecules suitable for X-ray diffraction. The structure of tRNA's have been solved using this technique.

47 RNS struktūras noteikšana - fizikālās metodes [Adapted from P. De Rijk] NMR can provide details about local conformation, and can be used to determine secondary, tertiary and, in theory, three-dimensional structures. The size of RNA molecules that can studied using NMR is currently rather limited. Oligonucleotides used in NMR studies are designed to adopt structures found in larger RNA molecules.

48 RNS struktūras noteikšana - fizikālās metodes [Adapted from P. De Rijk] Direct observation of partially denatured RNA molecules is possible using electron microscopy. However, the choice of denaturing conditions is crucial, and the resolution of electron microscopy is usually too limited to see fine details.

49 RNS struktūras noteikšana - ķīmiskās metodes [Adapted from P. De Rijk] RNA structure has been probed by testing the accessibility of nucleotides to chemical and enzymatic modification. The RNA molecules are exposed to chemical reagents or enzymes with a specific affinity for either single-stranded or double stranded RNA. This method is only applicable for short RNAs because of the limited resolution of gel electrophoresis. For larger RNAs reverse transcriptase is used to synthesize DNA complementary to the RNA starting from a radioactively labeled primer. Modified residues cause the reverse transcriptase to stop, and separation of the synthesised DNAs by gel electrophoresis can then be used to determine the positions of modification.

50 RNS struktūras noteikšana - mutāciju analīze [Adapted from P. De Rijk] RNA structure or protein-RNA interactions can also be studied by the introduction of specific mutations into the RNA sequence. The effect of the mutations can be assayed by measuring the ability of the mutated sequence to bind a protein which specifically recognizes the normal RNA or by testing the change in some function. Caution is required with the interpretation of mutation analysis results. Loss of protein binding or other functions is not always necessarily caused by a change in RNA secondary structure.

51 Noteikšana (ne gluži bioinformātikas problēma) Prognozēšana (atšķirībā no proteīniem salīdzinoši viegla, bet sekundārajai, nevis terciālajai struktūrai) Salīdzināšana (mērķi mazliet citi, nekā proteīniem) Mijiedarbība (tik tālu, iespējams, mēs vēl neesam tikuši) Ar RNS struktūrām saistītās problēmas

52 Struktūru salīdzināšana Translation Rotation Translation and rotation x 1, y 1, z 1 x 2, y 2, z 2 x 3, y 3, z 3 x 1 + d, y 1, z 1 x 2 + d, y 2, z 2 x 3 + d, y 3, z 3 [Adapted from T.Hanekamp]

53 How to estimate comparison "quality"? Root Mean Square Deviation (RMSD) n = number of atoms d i = distance between the corresponding atoms in structures [Adapted from T.Hanekamp] Struktūru salīdzināšana - RMSD

54 RMSD units => e.g. Ångstroms - identical structures => RMSD = “0” - similar structures => RMSD is small (1 – 3 Å) - distant structures => RMSD > 3 Å [Adapted from T.Hanekamp]

55 Koordinātu RMSD [Adapted from I.Eidhammer, I.Jonassen]

56 Attālumu RMSD [Adapted from I.Eidhammer, I.Jonassen] Experimentally it has been shown that these two measures are linearly related: RMSD D  0.75 RMSD C + 0.2

57 RMSD metodes [Adapted from I.Eidhammer, I.Jonassen]

58 RMSD - optimālās transformācijas atrašana Given two 3D sets of points: P={p i }, Q={q i }, i=1,…,n; Find a 3-D rotation R 0 and translation T 0, such that min R,T  i |Rp i + T - q i | 2 =  i |R 0 p i + T 0 - q i | 2. It can be done in time O(n).

59 RMSD - struktūru līdzības atrašana Tātad: Dotiem k atomu pāriem nav grūti atrast transformāciju, kas minimizē RMSD Bet: Iespējamo atomu pāru kopu skaits ir eksponenciāls (no proteīnu "garuma" n un/vai pāru skaita k) Optimālās pāru kopas atrašana tiek uzskatīta (?) par NP-pilnu problēmu... Praksē mēdz lietot t.s. double dynamic programming heiristiku.

60 RMSD - vēl daži aspekti Sequence order dependent alignment RMSD iekļauto atomu pāru secība abās struktūrās atbilst to secībai aminoskābju virknēs Sequence order independent alignment RMSD iekļauto atomu pāru secība nav saistīta ar atomu secību aminoskābju virknēs Nav viennozīmīgi skaidrs, kura no pieejām ir "labāka" Populārākās struktūru salīdzināšanas programmas laikam ņem vērā atomu secību aminoskābju virknēs

61 RMSD - vēl daži aspekti Līdz šim mēs pieņēmām, ka proteīnu struktūras ir nemainīgas. Principā struktūras mēdz būt arī elastīgas - var nedaudz mainīties, atkarībā no "ārējiem apstākļiem". Ir algoritmu modifikācijas, kas ņem vērā struktūru elastību - piem., mēs varam vispirms meklēt nelielus ne-ealstīgus līdzīgus struktūru fragmentus, un tad paskatīties, vei mēs varam tos iekļaut abās struktūrās tādā pašā secībā.

62 RMSD - vēl daži aspekti Virknēm mēs sākām ar pāru salīdzināšanu, un tad apgalvojām, ka bieži vien interesantāk ir vienlaicīgi salīdzināt vairāk kā divas virknes. Kā ir ar struktūrām? Principā ir programmas, kas salīdzina vienlaicīgi vairāk kā divas struktūras (lietojot, piem., kaut ko līdzīgu pakāpeniskajai heiristikai), taču multiple alignment problēma struktūrām ir mazāk aktuāla:  struktūru līdzība homologiem saglabājās daudz labāk nekā virkņu līdzība  ir cits "evolūcijas modelis" un attālām struktūrām multiple alignment parasti neuzrādīs labi saglabātus struktūru fragmentus

63 RMSD - DDP pamatprocedūra [Adapted from I.Eidhammer, I.Jonassen]

64 RMSD - sākam ar līdzības matricu [Adapted from M.Gerstein] Sakotnēju martricu var konstruēt balstoties uz aminoskābju līdzību, lai gan bieži izmanto arī vēl citus kritērijus

65 RMSD - līdzības matricas Structural Alignment Similarity S(i,J) is dependent from the 3D coordinates of residues i and j Distance between i and j M(i,j) = 100 / (5 + d 2 ) [Adapted from M.Gerstein] Pēc tam līdzību katram atomu pārim pārrēķina - jo mazāks attālums pēc RMSD minimizējošās transformācijas, jo "līdzīgāki"

66 RMSD - līdzības matricas [Adapted from R.B.Altman]

67 RMSD - līdzības matricas [Adapted from I.Eidhammer, I.Jonassen]

68 RMSD trūkumi all atoms are being treated as equal (but residues on the surface usually have a greater freedom of movement than residues inside the structure) the best alignment not necessarily means the best RMSD RMSD performance depends form the size of molecules [Adapted from T.Hanekamp]

69 RSMD alternatīvas aRMSD = best root-mean-square deviation calculated over all aligned alpha-carbon atoms bRMSD = the RMSD over the highest scoring residue pairs wRMSD = weighted RMSD [Adapted from T.Hanekamp]

70 Piemērs - 3znf un 4znf salīdzinājums Lys30 30 CA atoms RMS = 0.70Å 248 atoms RMS = 1.42Å [Adapted from T.Hanekamp]

71 Cik viegli pamanīt struktūru līdzību? Easy: Globins 125 res., ~1.5 Å Tricky: Ig C & V 85 res., ~3 Å Very Subtle: G3P-dehydro- genase, C-term. Domain >5 Å [Adapted from M.Gerstein]

72 Struktūru līdzība un Computer Vision [Adapted from M.Shatsky]

73 Vienkāršs heiristisks algoritms For each pair of point triples (one from each molecule), which form “almost equal” triangle find an affine transformation that transfers one of them to the another. Find number of pairs which is “almost superimposed” by this transformation and give the results in this order For the best hypotheses improve the transformation by using RMSD Complexity (assuming there are n points in each molecule) - O(n 7 ). [Adapted from M.Shatsky] Ja n=100, tad n 7 =10 14 :(

74 References punktu trijnieki p1p1 p2p2 p3p3 [Adapted from M.Shatsky] Refernece frame - ortogonālu vienības vektoru, kuri iziet no viena punkta, trijnieks Katram (nedeģenerētam) 3D punktu trijniekam var viennozīmīgi piekārtot šādu reference frame

75 Geometric hashing - ideja Chose a reference frame Find the point coordinates in this reference frame Use these coordinates as “hash” adresses and place these points in hash table Repeat this step for each reference frame. [Adapted from M.Shatsky]

76 Geometric hashing - ideja [Adapted from M.Shatsky] Izvēlamies universālo reference frame, un katram trijniekam no-hašojam transformāciju uz lokālo reference frame (laiks O(n 4 ))

77 Geometric hashing - atpazīšana For the target protein : Chose a reference frame Find the coordinates of other points in this reference frame Use coordinates to select the points from hash table Find RMSD transformations for best hypotheses Repeat for each reference frame Select the best alignments O(n 4 + n 4 * BinSize) ~ O(n 5 ) Ja n=100 tad n 5 =10 10 [Adapted from M.Shatsky]

78 Geometric hashing - 2D piemērs [Adapted from I.Eidhammer, I.Jonassen]

79 Geometric hashing - 2D piemērs [Adapted from I.Eidhammer, I.Jonassen] (a) (0,0)(6,2)(8,0)(9,4)(6,10)(3,8)(-1,6) (b) (1,8)(2,2)(0,0)(4,-2)(10,0)(8,3)(8,7) (c) (0,0)(3,-2)(8,0)(6,2)(10,4)(3,8)(0,6)

80 Geometric hashing - 2D piemērs [Adapted from I.Eidhammer, I.Jonassen]

81 midpoint distance line distance References sekundārās struktūras elementi A base fingerprint is a 5D vector composed of: SSE types: helix, strand Line distance Midpoint distance Angle

82 Geometric hashing - priekšrocības Independence from sequences Can be used for partially disconnected structures Allows to find interesting “patterns” Comparatively fast Can be applied also for the docking problem Can be easily parallelized [Adapted from M.Shatsky]

83 Proteīnu struktūru datubāzes - PDB http://www.pdb.org

84 PDB faila fragments ATOM 1575 C ASP E 211 -4.659 29.609 1.843 1.00 0.03 1ENT1729 ATOM 1576 O ASP E 211 -5.333 29.668 2.876 1.00 0.06 1ENT1730 ATOM 1577 CB ASP E 211 -6.058 31.009 0.311 1.00 0.12 1ENT1731 ATOM 1578 CG ASP E 211 -5.117 32.197 0.534 1.00 0.08 1ENT1732 ATOM 1579 OD1 ASP E 211 -4.841 32.534 1.691 1.00 0.30 1ENT1733 ATOM 1580 OD2 ASP E 211 -4.650 32.810 -0.429 1.00 0.62 1ENT1734 ATOM 1581 N GLY E 212 -3.346 29.481 1.866 1.00 0.02 1ENT1735 ATOM 1582 CA GLY E 212 -2.634 29.404 3.141 1.00 0.03 1ENT1736 ATOM 1583 C GLY E 212 -1.251 29.989 3.025 1.00 0.08 1ENT1737 ATOM 1584 O GLY E 212 -0.818 30.413 1.957 1.00 0.04 1ENT1738 ATOM 1585 N ILE E 213 -0.533 30.029 4.146 1.00 0.00 1ENT1739 ATOM 1586 CA ILE E 213 0.817 30.575 4.112 1.00 0.03 1ENT1740 ATOM 1587 C ILE E 213 1.843 29.530 4.545 1.00 0.04 1ENT1741 Formāts: 80 simboli katrā rindā, katram atribūtam fiksētas pozīcijas Atoma Nr Atoms AS Chain X,Y,ZAS Nr Temp. factor Occupancy Only 5 digits are available for the atom serial number, but some structures have already been received with more that 99,999 atoms...

85 Proteīnu struktūru datubāzes - MMDB http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml

86 Struktūru vizualizācija 1) Rasmol un Protein Explorer http://www.umass.edu/microbio/rasmol/ 2) Cn3D http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml 3) DeepView Swiss-PDB Viewer (quite powerful modeling program) Also calculates various RSMDs http://www.expasy.org/spdbv/

87 Struktūru salīdzināšana - DaliLite http://www.ebi.ac.uk/DaliLite/

88 Struktūru salīdzināšana - SSAP http://www.cathdb.info/cgi-bin/cath/SsapServer.pl

89 Struktūru salīdzināšana - VAST http://www.ncbi.nlm.nih.gov/Structure/VAST/vastsearch.html

90 Struktūru salīdzināšana - CE un CL http://www.ncbi.nlm.nih.gov/Structure/VAST/vastsearch.html

91 Proteīnu struktūru klasifikācijas - SCOP http://cl.sdsc.edu/cl.html

92 Proteīnu struktūru klasifikācijas - CATH http://www.cathdb.info

93 Proteīnu struktūru klasifikācijas - CATH CATH - hierarchical classification of protein domain structures [C.Orengo, J.Thornton et al; UCL] CATH number - 3. 30. 70. 330 Class (C) Topology (T) Architecture (A) Homologous superfamily (H)

94 Proteīnu struktūru klasifikācijas - CATH CATH number - 3. 30. 70. 330 Class (C) Topology (T) Architecture (A) Homologous superfamily (H) Class 1 - mainly alpha 2 - mainly beta 3 - alpha-beta 4 - low secondary structure content Assigned automatically

95 Proteīnu struktūru klasifikācijas - CATH CATH number - 3. 30. 70. 330 Class (C) Topology (T) Architecture (A) Homologous superfamily (H) Architecture overall shape of the domain structure according to orientations of secondary structures Assigned manually

96 Proteīnu struktūru klasifikācijas - CATH CATH number - 3. 30. 70. 330 Class (C) Topology (T) Architecture (A) Homologous superfamily (H) Topology shape and connectivity of secondary structures Assigned automatically by SSAP algorithm

97 Proteīnu struktūru klasifikācijas - CATH CATH number - 3. 30. 70. 330 Class (C) Topology (T) Architecture (A) Homologous superfamily (H) Homologous superfamily proteins that share a common ancestor Assigned automatically by sequence comparisons and SSAP

98 Proteīnu struktūru klasifikācijas - DALI http://www.ebi.ac.uk/dali/

99 Proteīnu struktūru klasifikācijas - DALI http://ekhidna.biocenter.helsinki.fi/dali/start

100 RNS struktūru prognozēšana? RNA sequence:...AGGCUAUGGCCA... Fortunately here we can do better...

101 RNS struktūra [Adapted from R.B.Altman]

102 RNS struktūra - pseidomezgli [Adapted from R.B.Altman]

103 Enerģijas minimizācija [Adapted from R.B.Altman]

104 RNS struktūru prognozēšana - DP algoritms [Adapted from R.B.Altman]

105 RNS struktūru prognozēšana - DP algoritms [Adapted from R.B.Altman]


Download ppt "Bioinformātika Proteīnu un RNS struktūras LU, 2008, Juris Vīksna."

Similar presentations


Ads by Google