Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scott Hollingsworth (Department of Biochemistry & Biophysics, Oregon State University) Mentor: Dr. P. Andrew Karplus (Department Of Biochemistry & Biophysics,

Similar presentations


Presentation on theme: "Scott Hollingsworth (Department of Biochemistry & Biophysics, Oregon State University) Mentor: Dr. P. Andrew Karplus (Department Of Biochemistry & Biophysics,"— Presentation transcript:

1 Scott Hollingsworth (Department of Biochemistry & Biophysics, Oregon State University) Mentor: Dr. P. Andrew Karplus (Department Of Biochemistry & Biophysics, OSU) In Collaboration With:Dr. Weng-Keen Wong (Department Of Computer Science, OSU) Dr. Donald Berkholz (Department of Biochemistry and Molecular Biology, Mayo Clinic) Dr. Dale Tronrud (Department of Biochemistry & Biophysics, OSU)

2 Each protein has an individual structure Structure flows from function Understand structure, understand function Ptr Tox A

3  Phi & Psi (φ, ψ)  Phi and psi describe the conformation of the planar peptide (amino acid) in regards to other peptides  One amino acid – two angles Ramachandran Plot Voet, Voet & Pratt Biochemistry (Upcoming 4 th Edition) φ ψ

4  Use of Protein Geometry Database (PGD) to identify linear group existence (i.e. α-helix, β-sheet, π- helix…)  Simple repeating structures  Methods: manual searches  Hollingsworth et al. 2009. “On the occurrence of linear groups in proteins.” Protein Sci. 18:1321-25 α -Helix 3 10 Helix

5  Linear groups are only part of the picture  Not all common protein motifs are repeating structures  Many have changing conformations  Goal of this research:  Identify all common motifs in proteins  Too complex for manual searches  Enter machine learning

6  Form of artificial intelligence  Can identify clusters within a dataset  Cluster – significant grouping of data points  Visual example…

7 Topographical map of Oregon Data value: Elevation Highest points (Individual peaks) Mt. Hood (11,239 Feet) Mt. Jefferson (10,497 Feet) Three Sisters (10,358-10,047 Feet)

8 Topographical map of Oregon Data value: Elevation Highest points (Individual peaks)

9 Topographical map of Oregon Data value: Elevation Mountain ranges (Broad patterns) C A S C A D E S C O A S T R A N G E S I S K I Y O U S ( K A L A M A T H ) B L U E M T S W A L L O W A S S T E E N S S T R A W B E R R I E S O C H O C O M A H O G A N Y M T S J A C K A S S M T S H A R T M T N T U A L A T I N H I L L S T R O U T C R E E K M T S P A U L I N A M T S

10 Similar approach with our data 2-Dimensional Example φ ψ

11 Similar approach with our data 2-Dimensional Example α-helix β P II αLαL Abundance φ ψ

12  Complications…  Our Data: 4-dimensional dataset  4D to 2D distance conversions  What has and hasn’t been observed?  No definitive source  Abundance / Peak Heights

13  Machine learning programs can identify both previously documented and unknown common motifs and their abundances

14  1) Create and prep datasets with resolution of at least 1.2Å or higher, 1.75Å or higher  2) Run cuevas  3) Analyze identified clusters  Automated process using Python to remove bias  4) Analyze context of motifs 2D-visual example of cuevas clustering

15  Goal: Definitive list of the most common protein motifs  In order of abundance  “Everest” Method  Locate “highest” peak first ▪ Bad pun : “Mt. Alpha-rest”  Locate second highest peak  Locate third…….

16  Identifying motifs  Search for peaks while looking for ranges  Results:  Definitive list of common protein motifs in order of abundance  The list…

17 Points PerResidue Circle r=10Degree 2 φiφi ψiψi φ i+1 ψ i+1 ii+1Cluster SizeMotif Name New Motif 5644 18.07-63.4-42-64-40.6 αα 1 α-helix / 3 10 -helix 247 0.7909-125.5132.4-118130.2 ββ 1 β-strand 173 0.5540-69.9157.4-61-36.3 P II α 1 PII- Helix N-Cap / Capping Box 147 0.4707-65.5-21.4-90.31.5 αδ 1 Type I Turn # 125 0.4003-70.4153.6-60.4143 P II 1 117 0.3747-57.213182.4-0.6 P II δLδL 1 Type II Turn 88 0.2818-88.3-2-64.7136.9 δP II 1 Type I Turn Cap 55 0.1761-88.11.387.95.7 δδLδL 1 Schellman Motif 51 0.1633-91.8-1.9-58.4-42.5 δα 1 Reverse Type I Turn X 43 0.137793.5-0.1-71.7146 δLδL P II 1 Reverse Type II Turn X 40 0.1281-133.9164.3-62.2-34.1 βα 1 βα Turn 36 0.1153-82.4-26.8-146.3152.1 δβ 2 Classic Beta Bulge ‡ 35 0.112154.938.384.50.8 αLαL δLδL 1 Type I` Turn 34 0.1089-122.3119.652.741 βαLαL 1 β → α L X 31 0.0993-136.170.4-65-19 ζα 1 ζ → α P † 31 0.099365.328.3-67.2140.8 αLαL P II 1 G1 Beta Bulge 30 0.096182.65.6-103.1137.5 δLδL β 1 δ L → β X 29 0.092956.7-133.5-73.7-10.7 P II `δ 3 Type II` Turn 24 0.0769780.5-67.5-43.1 δLδL α 1 δ L → α X 20 0.0640-78.3116-89.1-31.1 P II δ 1 Type VIa1 Turn (S) 20 0.0640-96.60.9-133.8156.3 δβ 1 Classic Beta Bulge (S) 20 0.064050.549.9-61.2148.3 αLαL P II 1 Wide Beta Bulge (S) 19 0.0608-69.9-32.3-129.873.1 αζ 2 α → ζ † 17 0.0544-129.180.8-70.3141.9 ζP II 1 ζ → P II X 15 0.048053.748-118.9126.6 αLαL β 1 α L → β (S) X 14 0.0448-87.661-140.3149.5 γ`β 1 γ` Turn 11 0.035276.3-169.3-61.4138.3 P II `P II 2 P II ` → P II X 10 0.032078.8171.1-69.3-29.6 P II `α 1 P II ` → α (S) X 9 0.0288-138.5165.757.7-137.8 βP II ` 2 β → P II ` X 9 0.028892.8165.9-62.5-35.7 εα 1 ε → α X 8 0.0256-107.616.880-177 δP II ` 1 Reverse Type II` Turn X 8 0.025684.68.1-143169.3 δLδL β 1 δ L → β X 7 0.0224-85.871.8-83.1163.5 γ`P II 3 γ` → P II X 6 0.0192-102.4-992.6163.3 δ ε4 δ → ε X 6 0.0192-77.9-8.686.7174.2 δ ε1 δ → ε (S) X 6 0.019283.8-166.3-121.9132.1 P II `β 1 P II ` → β X 6 0.019257.144.5-152.5158.8 αLαL β 1 α L → β X 6 0.0192-128.398.756.7-133.3 ζP II ` 1 ζ → P II ` X

18  Motif “shapes”  Each motif analyzed by plotting of each motif range  Understand the shape of the cluster/motif  Results:  New insight into each motif’s structure  Context  Comparisons

19 Example Cluster Shape Type II Vs. Type II` Type II Vs. Type II` Hairpin turns 180 ° Turn Two Residues Defined as mirror images of each other Distributions show differences between the two structures Nearly four years in the making… φ ψ

20  The results go on…  Motif analysis ▪ Viral forming of “Pangea”  Range and peak method sections ▪ Adapting cuevas for our data ▪ Python automation ▪ Identification of 3 10 Helix & Type I Turn  6D, 8D, 10D and 12D clustering ▪ Full helix caps, loops, halfturns…  For full story, a manuscript for publication is being prepared:  Hollingsworth et al. “The protein parts list: motif identification through the application of machine learning.”(Unpublished)

21  Cuevas was successful in identifying both documented and undocumented motifs  Previously described: Linear groups, helix caps, β-turns (& reverses), β-bulges, α-turns, loops, helix bends, π-structures…  Numerous new motifs  Successful from 4D through 20D  Results form the “Protein Parts” List  Comprehensive list of all common protein motifs found in proteins

22 Dr. P. Andrew Karplus Dr. Weng-Keen Wong Dr. Donald Berkholz Dr. Dale Tronrud Dr. Kevin Ahern Howard Hughes Medical Institute


Download ppt "Scott Hollingsworth (Department of Biochemistry & Biophysics, Oregon State University) Mentor: Dr. P. Andrew Karplus (Department Of Biochemistry & Biophysics,"

Similar presentations


Ads by Google