Assignment 5 Example of multivariate regression

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

 Knowledge Import  Input Sources  Linking Attributes  Importing Cases Short tour Knowledge Import in DOCTUS BEGIN TOUR  Clustering Numeric Input.
Amino Acids 1/29/2003. Amino Acids: The building blocks of proteins  amino acids because of the  carboxylic and  amino groups pK 1 and pK 2 respectively.
Partial Least Squares Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by Javier Cabrera.
Computational Learning An intuitive approach. Human Learning Objects in world –Learning by exploration and who knows? Language –informal training, inputs.
Guide to Using Excel For Basic Statistical Applications To Accompany Business Statistics: A Decision Making Approach, 7th Ed. Chapter 14: Introduction.
Chemotaxis Pathway How can physics help? Davi Ortega.
The construction of cells DNA or RNA Protein Carbohydrates Lipid etc.
The construction of cells DNA or RNA Protein Carbohydrates Lipid etc. 04.
Amino Acids. Functions of Amino Acids 1. Building blocks of proteins 2. Modified amino acids are neurotransmitters, etc.
Chapter 7 Data Management. Agenda Database concept Import data Input and edit data Sort data Function Filter data Create range name Calculate subtotal.
1 Homework  What’s important (i.e., this will be used in determining your grade): Finding features that make a difference You should expect to do some.
More Linear Regression Outliers, Influential Points, and Confidence Interval Construction.
Fundamentals of Biochemistry
Organic Chemistry 4 th Edition Paula Yurkanis Bruice Chapter 23 Amino Acids, Peptides, and Proteins Irene Lee Case Western Reserve University Cleveland,
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Bio 98 - Lecture 3 Amino acids & the peptide bond.
Amino Acids & Peptides. BIOMEDICAL IMPORTANCE the monomer units – L-α-amino Cellular functions – Nerve transmission – Biosynthesis of porphyrins – Purines.
Amino acid residues in peptides and proteins are linked together through a covalent bond called the peptide bond. Two amino acid molecules can be covalently.
Amino Acids ( 9/08/2009) 1. What are Amino Acids, and what is their 3-D structure? 2. What are the structures & properties of the individual amino acids?
1 SURVEY OF BIOCHEMISTRY Amino Acids and Proteins.
Amino acids as amphoteric compounds
Napovedovanje imunskega odziva iz peptidnih mikromrež Mitja Luštrek 1 (2), Peter Lorenz 2, Felix Steinbeck 2, Georg Füllen 2, Hans-Jürgen Thiesen 2 1 Odsek.
Chapter Three Amino Acids and Peptides
SUPERVISED NEURAL NETWORKS FOR PROTEIN SEQUENCE ANALYSIS Lecture 11 Dr Lee Nung Kion Faculty of Cognitive Sciences and Human Development UNIMAS,
BINF6201/8201 Principle components analysis (PCA) -- Visualization of amino acids using their physico-chemical properties
Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, All rights reserved.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Condensation Reactions Two molecules combine with the generation of a smaller molecule.
Amino acids/Proteins.
1 Amino Acids,. 2 Chapter Outline Amino Acids Amino Acids –Amino acid classes (G1) –Stereoisomers (G2) –Bioactive AA –Titration of AA (G3) –Modified AA.
Amino Acids Proteins are composed of 20 common amino acids Each amino acid contains: (1) Carboxylate group (2) Amino group (3) Side chain unique to each.
Amino Acids Stryer Short Course Chapter 3. Amino Acid Structure Alpha carbon Sidechain Proteins peptides.
Amino acids structure. Configuration of Amino Acids.
Amino Acids. Amino Acid Structure Basic Structure: – (α) Carbon – Carboxylic Acid Group – Amino Group – R-group Side Chain Determines properties of Amino.
1 1 Slide Simple Linear Regression Estimation and Residuals Chapter 14 BA 303 – Spring 2011.
Example of regression by RBF-ANN Prediction of charge on peptides after electron-spray ionization in mass spectrometry What are the best attributes to.
An Exercise in Machine Learning
Assignments CS fall Assignment 1 due Generate the in silico data set of 2sin(1.5x)+ N (0,1) with 100 random values of x between.
Chapter 26: Biomolecules: Amino Acids Peptides and Proteins
General, Organic, and Biological Chemistry Copyright © 2010 Pearson Education, Inc.1 Chapter 19 Amino Acids and Proteins 19.2 Amino Acids as Zwitterions.
Notes Schedule updated: tomorrow Exp.2 pre-lab Lab report –Citations: Think about intellectual contribution Lab notebook definitely needs cited Henderson-Hasselbalch.
Of Amino Acids Titration curves. Titration of amino acids Titration of glycine Titration of arginine.
Amino terminus Carboxyl terminus Basic chemical structure of an amino acid alpha (  ) carbon R = side Chain.
@relation age sex { female, chest_pain_type { typ_angina, asympt, non_anginal,
단백질의 다양성 ( 그림 5.1) 5.1 아미노산 - 아미노산 이름 및 약어 ( 표 5.1), 표준아미노산 ( 그림 5.2), - 일반구조 ( 그림 5.3): α- 탄소원자, 곁사슬, 카르복실기, 아미노기 - 프로린은 고리모양 ( 곁사슬과 아미노질소사이 ) -pH7 에서.
Amine R group Alpha Carbon Carboxylic Acid. Nonpolar side chains.
Protein chemistry Lecture Amino acids are the basic structural units of proteins consisting of: - Amino group, (-NH2) - Carboxyl group(-COOH)
Proteins. Chemical composition of the proteins. Properties of α- amino carboxylic acids.
Proteins. Chemical composition of the proteins
Lecture 7 Analysis of Proteins.
Year 6 Block A.
Chapter 3. Amino Acids and Peptides
Place Value and Mental Calculation
NEURAL NETWORK APPROACHES FOR AUTOMOBILE MPG PREDICTION
Model Development Weka User Manual.
Machine Learning with Weka
Proteomic analysis of normal human urinary proteins isolated by acetone precipitation or ultracentrifugation  Visith Thongboonkerd, Kenneth R. Mcleish,
Import Determinants of Organelle-Specific and Dual Targeting Peptides of Mitochondria and Chloroplasts in Arabidopsis thaliana  Changrong Ge, Erika Spånning,
DNA Assignment Example.
Review for test #2 Fundamentals of ANN Dimensionality reduction
Between Order and Disorder in Protein Structures: Analysis of “Dual Personality” Fragments in Proteins  Ying Zhang, Boguslaw Stec, Adam Godzik  Structure 
Fundamentals of Artificial Neural Networks
Structural differences between mesophilic, moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey  András.
Example of regression by RBF-ANN
Assignment 8 : logistic regression
Cases. Simple Regression Linear Multiple Regression.
Regression and Correlation of Data
Presentation transcript:

Assignment 5 Example of multivariate regression Prediction of peptide charge in electro-spray Ionization Construct input data from amino-acid sequence of peptide

First 4 of ~ 23,000 data pairs in excel file ChargeData Sequence Charge AAAAAAPDDVAAQLVVADLDLVGGHVEDAFAR 2.8 AAAAADLANR 2 AAAAAQASASAAAK 1.714286 AAAAAVAQGGPIEDAER Possible attributes based on properties of amino acids (see next slide) Length of peptide Average mass of peptide (total/length) Factions of amino acids of each type Fractions of hydrophobic, polar, and charged residues Net formal charge Average isoelectric point (pi) Average disassociation constants(pK1 and pK2)

Properties of amino acids pi pK1 pK2 charge Hydrophobic? Polar? 6.01 code mass pi pK1 pK2 charge Hydrophobic? Polar? A 89.09404 6.01 2.35 9.87 T F R 174.20274 10.76 1.82 8.99 + N 132.1190 5.41 2.14 8.72 D 133.10384 2.85 1.99 9.9 - C 121.15404 5.05 1.92 10.7 E 146.14594 3.15 2.1 9.47 Q 5.65 2.17 9.13 G 75.06714 6.06 9.78 H 155.15634 7.6 1.8 9.33 I 131.17464 6.05 2.32 9.76 L 2.33 9.74 K 146.18934 9.6 2.16 9.06 M 149.20784 5.74 2.13 9.28 165.1918 5.49 2.2 9.31 P 115.13194 6.3 1.95 10.64 S 105.09344 5.68 2..19 9.21 119.12034 5.6 2.09 9.1 W 204.22844 5.89 2.46 9.41 Y 181.19124 5.64 V 117.14784 6.0 2.39

Objectives of Assignment 5 Objective 1: Write a MATLAB code to calculate the following attributes from peptide sequences in excel file “ChargeData”: (1) fraction of each type of amino acid, (2) length of sequence, (3) average mass, where “average” means total divided by the length of the sequence. Use your code to confirm the values in the first 5 rows of columns A-V of the excel file “peptides all-attributes”. Objective 2: Create a new data file by deleting columns W-AC from the excel file “peptides all-attributes” and moving column AD to column W. Use WEKA’s filter options to divide the data into training (80%) and testing (20%) sets.

Objectives of Assignment 5 continued Objective 3: Apply of WEKA’s linear regression function using the training and test sets from Objective 2. Report performance on the test set. Objective 4: Apply WEKA’s simple linear regression. Report performance on the test set. What model gives the best performance? Objective 5: Apply WEKA’s multi-level perceptron. Report performance on the test set Objective 6: Apply WEKA’s RBFNetwork to predict peptide charge with data from Objective 1 with 2, 4, and 8 clusters. Report performance on the test set. How does its performance compare with the performance of the linear regression models in objectives 3, 4 and 5?

RBF-ANN is under Classify and called RBFNetwork