Scikit-Learn Intro to Data Science Presented by: Vishnu Karnam A01962610.

Scikit-Learn Intro to Data Science Presented by: Vishnu Karnam A01962610

Linear Regression in Mathematical form Given a Data set {y i,x 1i,x 2i …….,x ip } LR assumes that dependent variable Y i is linearly dependent on x vector. The relationship is modeled through a random error variable ε i Thus the equation can now be represented as Y i = β 1 x 1i + β 2 x 2i + …….. + β p x pi + ε i Y i = x T i β + ε i These set of equations are stacked and written in vector form as Y = X β + ε Where Y, X and ε represented as

Cont…

After mathematical representation, estimation methods are used to determine the parameters. The estimation method consists of Objective function and the parameters are set either to maximize or minimize the objective function. In machine learning we will have set of parameters that are calculated based on given data points.

Dimensionality Reduction Some of the famous Dimensionality Reduction Algorithms are 1)Principal Component Analysis 2)Incremental PCA 3)Approximate PCA 4) Kernal PCA 5) Linear Discriminant Analysis

Basic Idea The general idea behind all these algorithms Project the data on set of Orthogonal components Generally most of the algorithms use Eigen vectors Find the minimum number of components that is a representation of whole data.

PCA on IRIS data

Cross Validation and Metrics Once the models are prepared we need to test the model based on testing data. Various Cross Validation techniques present in SciKit-learn are K-fold Stratified k-fold Leave one out

Preprocessing Some of the algorithms require preprocessing of data so that the estimation methods in the algorithms perform better. Ex: PCA algorithm works on data with mean centered and normalized data. Some useful functions are Scale MinMaxScaler Normalize

Binarization, Encoding and Imputation of Missing Values Binarization is used to convert quantitative value to binary value. Encoding is used to convert categorical features as integers. Imputation allows to insert missing data with mean, median or with frequent data.

Scikit-Learn Intro to Data Science Presented by: Vishnu Karnam A01962610.

Similar presentations

Presentation on theme: "Scikit-Learn Intro to Data Science Presented by: Vishnu Karnam A01962610."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scikit-Learn Intro to Data Science Presented by: Vishnu Karnam A01962610.

Similar presentations

Presentation on theme: "Scikit-Learn Intro to Data Science Presented by: Vishnu Karnam A01962610."— Presentation transcript:

Similar presentations

About project

Feedback