Presentation is loading. Please wait.

Presentation is loading. Please wait.

Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac.

Similar presentations


Presentation on theme: "Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac."— Presentation transcript:

1 Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac

2 Overview Background Previous Work Methods Results

3 Central Dogma Background Previous Work Methods Results http://www.accessexcellence.org/RC/VL/GG/images/central.gif

4 Post-Translational Modifications (PTMs) Background Previous Work Methods Results

5 Acetylation Acetylation involves the substitution of an acetyl group (-COCH3) for hydrogen Typically occurs on N-terminal tails and lysine residues (Lys or K) Background Previous Work Methods Results

6 Previous Predictors Several PTM predictors have been created prior to this work There are also acetylation predictors prior NetAcet is a predictor for only N-terminal sites AutoMotif Server is a predictor for various PTMs and includes an acetylation portion PAIL is a lysine acetylation predictor Background Previous Work Methods Results

7 Methods Create Dataset Download articles relevant to acetylation and extract sites Rank articles in order to elucidate sites quickly SwissProt and Human Protein Reference Database (HPRD) Create Predictors Leave – one – protein – out validation Matlab Background Previous Work Methods Results

8 Article Retrieval Searched individual journal sites for articles relevant to acetylation Saved resultant html pages for each journal These pages were then used as the input for a web crawler to download articles Due to varying journal site construction each journal required a unique regular expression to extract links for articles Background Previous Work Methods Results

9 Rank Articles First locate occurrences of first phrase: “phrase 1” A = {a 1, a 2, …, a |A | } Next locate occurrences of second phrase: “phrase 2” R = {r 1, r 2 …, r |R| } c and d are constants x is the distance in characters between r and the nearest word a Background Previous Work Methods Results

10 An example: acetylation Background Previous Work Methods Results 1. word “acetylat” A = {a 1, a 2, …, a m } 2. regular expression (k  lys  lysine)(space) * (digit) + R = {r 1, r 2, …, r n }

11 An example: acetylation Background Previous Work Methods Results Score for article S: and where

12 An example: acetylation Score for article S: where: and Papers with S > 100 are rich in sites; if S < 30 “twilight” zone Background Previous Work Methods Results

13 Elucidate Sites Sites were manually extracted from articles beginning with the highest rank The original experimental paper for these sites was verified for traceable evidence Sites were extracted from SwissProt Sites were extracted from HPRD Background Previous Work Methods Results

14 Predictors Support Vector Machine Artificial Neural Network Decision Tree Background Previous Work Methods Results

15 Predictor Input Positives taken as all lysines found to be acetylated Negatives taken as all lysines not found to be acetylated Features created based on characteristics surrounding lysines Amino acid content, hydrophobicity, charge, disorder, etc. Background Previous Work Methods Results

16 Predictor Input Background Previous Work Methods Results Protein Features Acetylated 1 810.486090.0017670.489790.51508 1 1 710.921460.030190.964230.79416 1 1 000.506220.0152510.523350.51855 0 2 1020.20080.0387080.254410.36071 1 2 100.620160.0097720.628460.67525 0 2 000.277830.0289570.321620.34207 0 3 1110.892390.0183540.918840.88125 1 3 1220.873540.0223070.903490.87446 1 3 810.815490.0253390.852890.85702 1 3 200.845880.0247660.882190.86599 0

17 Article and Ranking Results 4888 articles from 10 sites were searched Nature provided 2147 articles Science Direct provided1519 articles The highest ranking article was obtained from the Journal of Biological Chemistry Score of 151.87 Contained 10 acetylation sites The highest ranking article was obtained from Nature when histones are excluded Previously ranked at #5 score of 116.36 Contained 9 unique acetylation sites Background Previous Work Methods Results

18 Top 25 RankScoreSitesArticle Source 1)151.866710Journal of Biological Chemistry 2)123.231412Cell / Science Direct 3)121.90316Nature 4)117.79889Journal of Proteome Research 5)116.35829Nature 6)111.174514Biochemistry 7)104.46526Cell / Science Direct 8)104.01667Nature 9)102.068313Molecular Cell / Science Direct 10)98.808126Journal of Biological Chemistry 11)97.646346Biochemistry 12)96.765366Journal of Biological Chemistry 13)96.08459International Journal of Mass Spectrometry / Science Direct 14)88.129679Biochemistry 15)86.171576Journal of Biological Chemistry 16)81.787055Nucleic Acids Research 17)81.309676Biochemistry 18)81.061286Molecular Cell / Science Direct 19)80.748999Journal of Biological Chemistry 20)80.162619Nature 21)79.656586Molecular Cell / Science Direct 22)77.90224Cell / Science Direct 23)77.883045Nucleic Acids Research 24)77.600878Gene / Science Direct 25)77.441986Journal of the American Society for Mass Spectrometry Background Previous Work Methods Results

19 Ranking Results Articles with scores greater than 30 had potential for providing at least one site As scores approached 30, articles became less fruitful Background Previous Work Methods Results

20 Dataset Results Dataset included 1442 total sites and 1085 non- redundant sites HPRD contributed 90 total sites Swiss-Prot contributed 825 Our Study contributed 527 Background Previous Work Methods Results

21 Background Previous Work Methods Results Dataset Results

22 Sensitivity, Specificity, and Precision Sensitivity(sn) - Specificity(sp) - Precision(pr) - Background Previous Work Methods Results

23 Accuracy and AUC Accuracy(acc) - Area Under Curve(AUC) Refers to the area under the Receiver Operating Curve (ROC) ROC is the graphical plot of sensitivity vs. 1-specificity Background Previous Work Methods Results

24 SVM Predictor Degree Polynomial kernel snsppraccAUC p = 152.371.024.661.665.2 p = 246.169.820.357.962.8 p = 331.680.823.556.260.3 Degree Gaussian kernel snsppraccAUC σ = 10 -2 43.875.824.959.864.3 σ = 10 -3 54.172.125.963.168.1 σ = 10 -6 52.870.724.661.865.3 Background Previous Work Methods Results

25 Artificial Neural Network Hidden Neurons Artificial Neural Network snsppraccAUC 168.047.720.757.861.9 365.247.719.456.458.9 565.047.219.156.157.5 Background Previous Work Methods Results

26 Decision Tree Algorithm Decision Tree snsppraccAUC Decision Tree 61.745.918.353.842.1 Background Previous Work Methods Results

27 Algorithm Comparison AlgorithmsnsppraccAUC SVM54.172.125.963.168.1 Neural Network 68.047.720.757.861.9 Decision Tree 61.745.918.353.842.1 Background Previous Work Methods Results

28 I would like to acknowledge those who have helped me throughout the duration of this project, Dr. Predrag Radivojac, Dr. Haixu Tang, and Wyatt Clark

29 I welcome your questions and/or comments

30 An example: acetylation 1. word “acetylat” A = {a 1, a 2, …, a m } 2. regular expression (k  lys  lysine)(space) * (digit) + R = {r 1, r 2, …, r n } Background Previous Work Methods Results

31 An example: acetylation Background Previous Work Methods Results Score for article S: and where


Download ppt "Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac."

Similar presentations


Ads by Google