Abstract Accurate determination of the molecular weight (MW) of a protein is a necessity toward its isolation, purification and identification. Sodium Dodecyl Sulfate Polyacrylamide Gel Electrophoresis (SDS-PAGE) in one dimension with single percentage gels is traditionally used for that process. Gradient gels that incorporate a range of percentages have been considered less accurate, in part due to a lack of reliable mathematical models. The purpose of this project was to develop statistical models to accurately predict protein MW's on gradient gels. Six mathematical models were applied to protein standards of previously identified MW's to determine the best fitting model. Relative mobility (R m ) of the protein standards were calculated and compared to the actual MW's to make this determination. The "Cubic Model" was determined to be the best fitting and will be tested on unknown proteins suspected to play a role in amphibian fertilization. Question Which model provides the best fit for determining the known protein standards? Residuals for Cubic Model log(MW) = a + b * R m + c * R m 2 + d * R m 3 Final Predicted Weights of Unknown Proteins Using Cubic Model Conclusions The Cubic model was the best fitting of the 6 models that we tried to use on unknown molecular weights. This was determined by looking at the predicted weights, residuals, and R-square values of each of the models. Future Work will continue on gradient gels and some other possible models that could be used. Determinations 1.) The R-Squared is good for most of the models, except for the SLIC model which is a little low. R-squared is the ratio of predicted variation, (û i - ) 2, to the total variation, (u i - ) 2 where û i is the predicted value of u i for a particular model and is the mean. The Cubic model produces the R-square average with the closest fit of the 6 different models. Ideally, R-square is equal to 1, meaning that the predicted variation and the total variation are equal. 2.) The predictions of the MW are good for most of the models but the Cubic shows a smaller amount of variation. 3.) The residuals of the models show the difference between the actual data points and the predicted points. Looking over the residuals (see example below) the Cubic model produces smaller residual values than the other 5 models. Actual Cubic-LN^2Log-LogQuad.Log LinearSLIC Molecular Weights Predicted MW 200, ,028197,751197,683183,306164,210144, , ,949118,022117,919120,246117,416107,140 97,400 94,77596,99197,280101,049102,60997,197 66,200 67,68970,11169,99572,87678,95581,441 45,000 45,51944,27144,24844,67950,02857,331 31,000 31,64728,82228,73628,09029,50636,594 21,500 20,93320,51420,51821,19717,96317,906 14,400 12,27713,61813,43812,78912,82011,494 6,500 8,00710,26710,6359,37410,4729,211 R-Square Ave Procedure Measure standards in gels. Test models on measured known protein standards. Decide on best fitting model. Receive and measure unknown proteins. Begin analyzing unknowns and applying our model. Comparison of Mathematical Models to Determine Molecular Weight of Proteins: A Statistical Analysis 1 Jennifer Wright, 2 Edward J. Carroll, Jr., 1 Lawrence Clevenson Department of 1 Mathematics and 2 Biology California State University Northridge NASA/PAIR Program Models Tested CubicLog(MW) = a + b * R m + c * R m 2 + d * R m 3 -LN 2 Log(MW) = a + b * ( -Ln(R m )) + c * ( -Ln(R m )) 2 Log-LogLog(MW) = a + b * Log(R m ) + c * Log(R m ) 2 QuadLog(MW) = a + b * R m + c * R m 2 Log LinearLog(MW) = a + b * (R m ) SLICLog( Ln(MW)) = a + b * Ln( -Ln(R m )) Fig. 1 Raw Data Graphs of raw data used in deciding best models Fig. 2 – Graph of relative mobility of raw data vs. log molecular weights starting with two 7.5% gels, two 10%, two 12% and two gradient gels. Fig. 3 Raw Standards Actual Molecular Weights vs. Predicted Molecular Weights Table 1: Comparison of the 6 models and the R-square values produced by each model. Table 2: Residuals and R-square values for the Cubic model. Comparison of 3 models with a Standard Fig. 4 One set of raw data (Gel #2 VE) is set against 3 of the models tested (Log Linear, Quad., Cubic). Fig. 5 Raw data Table 3: The Cubic model was applied to unknown proteins to predict their molecular weights. This work was supported by NASA CSUN/JPL PAIR. Many thanks go to: Carol Shubin, Virginia Latham, Larry Clevenson, Edward Carroll, Gregory Frye, Jennifer Rosales and John Handy.