Download presentation
Presentation is loading. Please wait.
Published byΑποστόλης Βούλγαρης Modified over 5 years ago
1
Assignment 5 Example of multivariate regression Prediction of peptide charge in electro-spray Ionization Construct input data from amino-acid sequence of peptide
2
First 4 of ~ 23,000 data pairs in excel file ChargeData
Sequence Charge AAAAAAPDDVAAQLVVADLDLVGGHVEDAFAR 2.8 AAAAADLANR 2 AAAAAQASASAAAK AAAAAVAQGGPIEDAER Possible attributes based on properties of amino acids (see next slide) Length of peptide Average mass of peptide (total/length) Factions of amino acids of each type Fractions of hydrophobic, polar, and charged residues Net formal charge Average isoelectric point (pi) Average disassociation constants(pK1 and pK2)
3
Properties of amino acids pi pK1 pK2 charge Hydrophobic? Polar? 6.01
code mass pi pK1 pK2 charge Hydrophobic? Polar? A 6.01 2.35 9.87 T F R 10.76 1.82 8.99 + N 5.41 2.14 8.72 D 2.85 1.99 9.9 - C 5.05 1.92 10.7 E 3.15 2.1 9.47 Q 5.65 2.17 9.13 G 6.06 9.78 H 7.6 1.8 9.33 I 6.05 2.32 9.76 L 2.33 9.74 K 9.6 2.16 9.06 M 5.74 2.13 9.28 5.49 2.2 9.31 P 6.3 1.95 10.64 S 5.68 2..19 9.21 5.6 2.09 9.1 W 5.89 2.46 9.41 Y 5.64 V 6.0 2.39
4
Objectives of Assignment 5
Objective 1: Write a MATLAB code to calculate the following attributes from peptide sequences in excel file “ChargeData”: (1) fraction of each type of amino acid, (2) length of sequence, (3) average mass, where “average” means total divided by the length of the sequence. Use your code to confirm the values in the first 5 rows of columns A-V of the excel file “peptides all-attributes”. Objective 2: Create a new data file by deleting columns W-AC from the excel file “peptides all-attributes” and moving column AD to column W. Use WEKA’s filter options to divide the data into training (80%) and testing (20%) sets.
5
Objectives of Assignment 5 continued
Objective 3: Apply of WEKA’s linear regression function using the training and test sets from Objective 2. Report performance on the test set. Objective 4: Apply WEKA’s simple linear regression. Report performance on the test set. What model gives the best performance? Objective 5: Apply WEKA’s multi-level perceptron. Report performance on the test set Objective 6: Apply WEKA’s RBFNetwork to predict peptide charge with data from Objective 1 with 2, 4, and 8 clusters. Report performance on the test set. How does its performance compare with the performance of the linear regression models in objectives 3, 4 and 5?
8
RBF-ANN is under Classify and called
RBFNetwork
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.