Presented by Namir Shammas

Presented by Namir Shammas
Machine Learning for Best Linearized Regression Model Using the HP Prime Presented by Namir Shammas

Dedication To the late Jon Johnston curator of HP Computer Museum (hpmusum.net) who lost his life in April 2016, while on a mountain-climbing expedition in Tibet. His contribution to posting documentation for HP desktop and handheld computers is valuable.

History of Best Linearized Regression Models
HP-65 and HP-67 Stat Pacs offered programs for various linearized regression models. PPC-ROM has best linearized regression. The HP-41 Advantage ROM also offers best linearized regression. Above apps chose from a set of four regression models--linear, exponential, logarithmic, and power.

History of Best Linearized Regression Models (cont.)
Several PPC members explored obtaining the best linearized regression model using a wider set of models. Example: William Kolb. I wrote programs to find best linearized regression models in: The Corvallis library. HHC presentations. HP Solver (HP 39GII programs that also work on the HP Prime).

History of Best Linearized Regression Models (cont.)
Covering a large number of linearized and multiple linearized regression models may use: A set of indices each specifying all of the transformations for the observations. A set of enumerated transformations for each variable entering the regression models. A range of powers applied to each variable ( 0  log transformation): Integer values, like from -4 to 4 in steps of 1. Floating-point values, like -4, -3.75, -3.5, …, 3.5 ,3.75, 4.

Machine Learning Basics for Finding the Best Fits
Use two data sets. First data set used for training. Second data set used for testing. Each data set should have different noise. Thus avoid, if possible, splitting one big data set into two subsets.

ML Basics (cont.) For each model do the following:
Calculate regression slope and intercept Calculate MSSE1 from training set. Calculate MSSE2 from test set using regression slope and intercept obtained from the training data set. Calculate a weighted MSSE from MSSE1 and MSSE2. Calculate best models ranked using MSSE.

Options Options for model selections: Options for handling data:
Use an enumerated set of models, each with a specific transformations for X and Y values. Use a range of powers to apply to X and Y values. The power value of 0 translates into the logarithmic transformation. Options for handling data: Use X and Y values without transformations. Normalize X and Y data using minimum and maximum values to map the data in the recommended range of [1, 2]. Subtract mean value and then divide by the standard deviation.

Bootstrap Method Alternative
Use a single data set. Repeat N time selecting a different subset (using the same number of observations) to calculate regression model statistics. Calculate average slope, intercept, and coefficient of determination. Rank models by average coefficient of determination values.

Creating Test/Training Data
Use function in HP Prime file PopData.txt. Parameters of function PopData: n - number of points x - starting value for x a, b, pwr - coefficients for y = a*b*x^pwr PEF1 - % Error factor for training data PEF2 - % Error factor for test data Matrix M1 stores training dataset. Matrix M2 stores test dataset.

Sample M1 matrix of 100 (X, Y) points.

Sample M1 matrix (X, Y) points.

Sample M2 matrix (X, Y) points.

Getting Min Max Range For normalized data use GetMinMax function to get the minimum and maximum values for X and Y data.

ML Program Version 1 Use function ML1 in HP Prime file Best_YX_LR_Machine_Learning_1.txt. Parameters of function ML1: pDataMat – matrix containing training data. pTestDataMat - matrix containing test data. MLwt – weight used to calculate stats. Returns matrix of Rsqr, MSSE, Y transformation, X transformation, slope, and intercept values.

Example of ML Program Version 1

ML Program Version 2 Use function ML2 in HP Prime file Best_YX_LR_Machine_Learning_2.txt. Parameters of function ML2: pDataMat – matrix containing training data. pTestDataMat - matrix containing test data. MLwt – weight used to calculate stats. Uses normalized values for variables X and Y. Returns matrix of Rsqr, MSSE, Y transformation, X transformation, slope, and intercept values.

Example of ML Program Version 2

Bonus Regression Programs!
Proceedings include source code for multiple linearized regression version of the programs presented earlier. File Best_ZYX_MLR_Machine_Learning_1.txt has machine learning regression for Z=f(X,Y). File Best_ZYX_MLR_Machine_Learning_2.txt has machine learning regression for Z=f(X,Y) using normalized data.

Bootstrap Regression Use function BSR in HP Prime file Best_YX_LR_Bootstrap.txt. Parameters of function BSR: pDataMat – matrix containing data. FractionDataUsed – fraction of data used in each simulation. NumSimulations – number of simulations. Uses normalized values for variables X and Y. Returns matrix of Rsqr, Y transformation, X transformation, slope, and intercept values.

Example of Bootstrap Program

Thank You!

Presented by Namir Shammas

Similar presentations

Presentation on theme: "Presented by Namir Shammas"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presented by Namir Shammas

Similar presentations

Presentation on theme: "Presented by Namir Shammas"— Presentation transcript:

Similar presentations

About project

Feedback