Presentation is loading. Please wait.

Presentation is loading. Please wait.

Outlier Detection Identifying anomalous values in the real- world database is important both for improving the quality of original data and for reducing.

Similar presentations


Presentation on theme: "Outlier Detection Identifying anomalous values in the real- world database is important both for improving the quality of original data and for reducing."— Presentation transcript:

1 Outlier Detection Identifying anomalous values in the real- world database is important both for improving the quality of original data and for reducing the impact of anomalous values in the process of knowledge discovery in databases. Such anomalous values give useful information to the data analyst in discovering useful patterns. Through isolation, these data may be separated and analyzed. The analysis of outliers and influential points is an important step of the regression diagnostics. In this presentation, our aim is to detect the points which are very different from the others points. They do not seem to belong to a particular population and behave differently.

2 Multiple linear regression (MLR)
Model the relationship between two or more explanatory variables and a response variable by fitting a linear equation to observed data. Formally, the model for multiple linear regression, given n observations, is  Some of its uses: Risk scores Predictive modeling More sophisticated methods (Logistic and Poisson regression) In this example that I’m going to show you, we find outliers by using the multiple linear regression and residual analysis.

3 Terminology What’s an outlier? What’s a residual?
Outlier is an observation that is numerically distant from the rest of the data. What’s a residual? The difference between the observed value of the dependent variable (y) and the predicted value (ŷ). It’s an observation that appears to deviate markedly from other points of the sample in which it occurs And we find the outlier using the residual analysis.

4 Residual analysis Some of its uses: Validating model accuracy
Looking for patterns in the errors Points with high leverage Multicollinearity (through VIF analysis) Identifying outliers


Download ppt "Outlier Detection Identifying anomalous values in the real- world database is important both for improving the quality of original data and for reducing."

Similar presentations


Ads by Google