Download presentation
Presentation is loading. Please wait.
Published byLogan Johnston Modified over 9 years ago
1
Scott Hollingsworth (Department of Biochemistry & Biophysics, Oregon State University) Mentor: Dr. P. Andrew Karplus (Department Of Biochemistry & Biophysics, OSU) In Collaboration With:Dr. Weng-Keen Wong (Department Of Computer Science, OSU) Dr. Donald Berkholz (Department of Biochemistry and Molecular Biology, Mayo Clinic) Dr. Dale Tronrud (Department of Biochemistry & Biophysics, OSU)
2
Each protein has an individual structure Structure flows from function Understand structure, understand function Ptr Tox A
3
Phi & Psi (φ, ψ) Phi and psi describe the conformation of the planar peptide (amino acid) in regards to other peptides One amino acid – two angles Ramachandran Plot Voet, Voet & Pratt Biochemistry (Upcoming 4 th Edition) φ ψ
4
Use of Protein Geometry Database (PGD) to identify linear group existence (i.e. α-helix, β-sheet, π- helix…) Simple repeating structures Methods: manual searches Hollingsworth et al. 2009. “On the occurrence of linear groups in proteins.” Protein Sci. 18:1321-25 α -Helix 3 10 Helix
5
Linear groups are only part of the picture Not all common protein motifs are repeating structures Many have changing conformations Goal of this research: Identify all common motifs in proteins Too complex for manual searches Enter machine learning
6
Form of artificial intelligence Can identify clusters within a dataset Cluster – significant grouping of data points Visual example…
7
Topographical map of Oregon Data value: Elevation Highest points (Individual peaks) Mt. Hood (11,239 Feet) Mt. Jefferson (10,497 Feet) Three Sisters (10,358-10,047 Feet)
8
Topographical map of Oregon Data value: Elevation Highest points (Individual peaks)
9
Topographical map of Oregon Data value: Elevation Mountain ranges (Broad patterns) C A S C A D E S C O A S T R A N G E S I S K I Y O U S ( K A L A M A T H ) B L U E M T S W A L L O W A S S T E E N S S T R A W B E R R I E S O C H O C O M A H O G A N Y M T S J A C K A S S M T S H A R T M T N T U A L A T I N H I L L S T R O U T C R E E K M T S P A U L I N A M T S
10
Similar approach with our data 2-Dimensional Example φ ψ
11
Similar approach with our data 2-Dimensional Example α-helix β P II αLαL Abundance φ ψ
12
Complications… Our Data: 4-dimensional dataset 4D to 2D distance conversions What has and hasn’t been observed? No definitive source Abundance / Peak Heights
13
Machine learning programs can identify both previously documented and unknown common motifs and their abundances
14
1) Create and prep datasets with resolution of at least 1.2Å or higher, 1.75Å or higher 2) Run cuevas 3) Analyze identified clusters Automated process using Python to remove bias 4) Analyze context of motifs 2D-visual example of cuevas clustering
15
Goal: Definitive list of the most common protein motifs In order of abundance “Everest” Method Locate “highest” peak first ▪ Bad pun : “Mt. Alpha-rest” Locate second highest peak Locate third…….
16
Identifying motifs Search for peaks while looking for ranges Results: Definitive list of common protein motifs in order of abundance The list…
17
Points PerResidue Circle r=10Degree 2 φiφi ψiψi φ i+1 ψ i+1 ii+1Cluster SizeMotif Name New Motif 5644 18.07-63.4-42-64-40.6 αα 1 α-helix / 3 10 -helix 247 0.7909-125.5132.4-118130.2 ββ 1 β-strand 173 0.5540-69.9157.4-61-36.3 P II α 1 PII- Helix N-Cap / Capping Box 147 0.4707-65.5-21.4-90.31.5 αδ 1 Type I Turn # 125 0.4003-70.4153.6-60.4143 P II 1 117 0.3747-57.213182.4-0.6 P II δLδL 1 Type II Turn 88 0.2818-88.3-2-64.7136.9 δP II 1 Type I Turn Cap 55 0.1761-88.11.387.95.7 δδLδL 1 Schellman Motif 51 0.1633-91.8-1.9-58.4-42.5 δα 1 Reverse Type I Turn X 43 0.137793.5-0.1-71.7146 δLδL P II 1 Reverse Type II Turn X 40 0.1281-133.9164.3-62.2-34.1 βα 1 βα Turn 36 0.1153-82.4-26.8-146.3152.1 δβ 2 Classic Beta Bulge ‡ 35 0.112154.938.384.50.8 αLαL δLδL 1 Type I` Turn 34 0.1089-122.3119.652.741 βαLαL 1 β → α L X 31 0.0993-136.170.4-65-19 ζα 1 ζ → α P † 31 0.099365.328.3-67.2140.8 αLαL P II 1 G1 Beta Bulge 30 0.096182.65.6-103.1137.5 δLδL β 1 δ L → β X 29 0.092956.7-133.5-73.7-10.7 P II `δ 3 Type II` Turn 24 0.0769780.5-67.5-43.1 δLδL α 1 δ L → α X 20 0.0640-78.3116-89.1-31.1 P II δ 1 Type VIa1 Turn (S) 20 0.0640-96.60.9-133.8156.3 δβ 1 Classic Beta Bulge (S) 20 0.064050.549.9-61.2148.3 αLαL P II 1 Wide Beta Bulge (S) 19 0.0608-69.9-32.3-129.873.1 αζ 2 α → ζ † 17 0.0544-129.180.8-70.3141.9 ζP II 1 ζ → P II X 15 0.048053.748-118.9126.6 αLαL β 1 α L → β (S) X 14 0.0448-87.661-140.3149.5 γ`β 1 γ` Turn 11 0.035276.3-169.3-61.4138.3 P II `P II 2 P II ` → P II X 10 0.032078.8171.1-69.3-29.6 P II `α 1 P II ` → α (S) X 9 0.0288-138.5165.757.7-137.8 βP II ` 2 β → P II ` X 9 0.028892.8165.9-62.5-35.7 εα 1 ε → α X 8 0.0256-107.616.880-177 δP II ` 1 Reverse Type II` Turn X 8 0.025684.68.1-143169.3 δLδL β 1 δ L → β X 7 0.0224-85.871.8-83.1163.5 γ`P II 3 γ` → P II X 6 0.0192-102.4-992.6163.3 δ ε4 δ → ε X 6 0.0192-77.9-8.686.7174.2 δ ε1 δ → ε (S) X 6 0.019283.8-166.3-121.9132.1 P II `β 1 P II ` → β X 6 0.019257.144.5-152.5158.8 αLαL β 1 α L → β X 6 0.0192-128.398.756.7-133.3 ζP II ` 1 ζ → P II ` X
18
Motif “shapes” Each motif analyzed by plotting of each motif range Understand the shape of the cluster/motif Results: New insight into each motif’s structure Context Comparisons
19
Example Cluster Shape Type II Vs. Type II` Type II Vs. Type II` Hairpin turns 180 ° Turn Two Residues Defined as mirror images of each other Distributions show differences between the two structures Nearly four years in the making… φ ψ
20
The results go on… Motif analysis ▪ Viral forming of “Pangea” Range and peak method sections ▪ Adapting cuevas for our data ▪ Python automation ▪ Identification of 3 10 Helix & Type I Turn 6D, 8D, 10D and 12D clustering ▪ Full helix caps, loops, halfturns… For full story, a manuscript for publication is being prepared: Hollingsworth et al. “The protein parts list: motif identification through the application of machine learning.”(Unpublished)
21
Cuevas was successful in identifying both documented and undocumented motifs Previously described: Linear groups, helix caps, β-turns (& reverses), β-bulges, α-turns, loops, helix bends, π-structures… Numerous new motifs Successful from 4D through 20D Results form the “Protein Parts” List Comprehensive list of all common protein motifs found in proteins
22
Dr. P. Andrew Karplus Dr. Weng-Keen Wong Dr. Donald Berkholz Dr. Dale Tronrud Dr. Kevin Ahern Howard Hughes Medical Institute
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.