Presentation is loading. Please wait.

Presentation is loading. Please wait.

A presentation to El Paso del Norte Software Association

Similar presentations


Presentation on theme: "A presentation to El Paso del Norte Software Association"— Presentation transcript:

1 Business Data Solution Using Clustering, Linear Programming, and Neural Net
A presentation to El Paso del Norte Software Association Somnath (Shom) Mukhopadhyay Information and Decision Sciences Department The University of Texas at El Paso August 27th 2003

2 Outline of Presentation
Data Mining Definition Introduction of Neural Net - Physiological flavor - General framework - Classes of PDP models - Sigma-PI units - Conclusion

3 Outline of Presentation (Continued)
Examples of real-world application problems Organization of theoretical concepts - Three methods used for classification - A new LP based method for classification problem. - Application to a fictitious problem with four classes. - Comparing LP method results with the results from a neural network method - Q&A

4 Data Mining - definition
- Exploring relationships in large amount of data - Should generalize - Should be empirically validated Examples - Customer Relationship Management (CRM) - Credit Scoring - Clinical decision support

5 PDP Models and Brain Physiological Flavor
Representation and Learning in PDP models Origins of PDP - Jackson (1869) and Luria (1966) - Hebb (1950) - Rosenblatt (1959) - Grossberg (1970) - Rumelhart (1977)

6 General Framework for PDP
A set of processing units A state of activation An output function of each unit A pattern of connectivity among units A propagation rule An activation rule A learning rule An operating environment

7 The Basic Components of a PDP system

8 Classes of PDP models Simple Linear Models Linear Threshold Units
Brain State in a Box (BSB) by J. A. Anderson Thermodynamic models Grossberg Connectionist modeling

9 Sigma-PI Units

10 A few real-world applications of interest to organizations and individuals
Breast cancer detection Heart disease diagnosis Enemy sub-marine detection Mortgage delinquency prediction Stock market prediction Japanese Character recognition and conversion

11

12 What is classification?
Identification of a set of certain mutually exclusive classes Identify a set of meaningful attributes that discriminate among the classes Illustrations Using a meaningful set of attributes, can we differentiate between frequent and infrequent occurrence?

13 Decision Boundaries of a typical classification problem

14 Three Methods for Classification
Identifying decision boundaries for each class region Linear discriminant (Glover at al., 1988) Linear programming (Roy and Mukhopadhyay, 1991) Neural Networks (Rumelhart, 1986)

15 A new LP based method for classification problem
Step 1. Identify and discard outliers using Clustering Step 2. Form decision boundaries for each class region by using LP

16 Step 2: Form Decision Boundaries
Development of Boundary Functions Use convex functions to calibrate the boundary. One example function: f(x) = ai Xi + bi Xi2 +  cij Xi Xj + d where j = i + 1

17 Step 2: Form Decision Boundaries (Contd.)
One instance of the general function. fA(x) = a1 X1 + a2 X2 + b1 X12 + b2 X22 + d

18 Step 2: Form Decision Boundaries (Contd.)
LP formulation of the previous problem instance Minimize e s.t. fA(x1) >= e fA(x8) >= e fA(x9) <= -e ... fA(x18) <= -e e>= a small positive constant. Minimize e s.t. a2 + b2 + d >= e for pattern x1 a1 + b1 + d >= e for pattern x2 - a2 + b2 + d >= e for pattern x3 - a1 + b1 + d >= e for pattern x4 …. a1 + a2 + b1 + b2 + d <= - e for pattern x15 a1 - a2 + b1 + b2 + d <= - e for pattern x16 - a1 - a2 + b1 + b2 + d <= - e for pattern x17 - a1 + a2 + b1 + b2 + d <= - e for pattern x18 e>= a small positive constant.

19 Step 2: Form Decision Boundaries (Contd.)
Solution of this LP formulation gives decision boundaries. Specifically we get, a1 = 0, a2 = 0, b1 = -1, b2 = -1, d = 1+e Therefore, the boundary function fA(x) = a1 X1 + a2 X2 + b1 X12 + b2 X22 + d translates into: fA(x) = 1 - X12 - X22 + e

20 Step 2: Form Decision Boundaries (Contd.)
Putting this result into picture we have the following decision boundary:

21 Step 2: Form Multiple Decision Boundaries
A class does not have to be neatly packed within one boundary. For problems requiring multiple decision boundaries, the algorithm can find multiple disjointed regions for the same class. For example, a class called “corner seats” in a soccer stadium is scattered into four disjointed regions.

22 An example of a decision space of a fictitious problem (It has four classes: A, B, C, D)

23 Decision Boundary Identification Process for Class D only

24 Six Decision Boundaries found for Class B

25 Constructing MLP from masks Masking functions put on a network to exploit parallelism.

26 Neural Networks Method for Classification
develops non-linear functions to associate inputs with outputs no assumptions about distribution of data handles missing data well (graceful degradation) Supervised neural networks Estimating and testing the model Construct a training sample and a holdout sample Estimate model parameters using training sample Test the estimated model’s classification ability using holdout sample

27 Comparison between LP and NN performance for three real-world problem
Problem Test Error Rate (%) Total Number of Parameters Trained LP NN 1. Breast Cancer 1.7 2.96 19 990 2. Heart Disease 18.38 36.36 27 900 3. Submarine Detection 9.62 N/A 61

28 Future Research - Autonomous Learning:
learn without outside interventions does class dependent feature selection derives simple if-then type classification rules that humans can understand develops non-linear functions to associate inputs with outputs

29 Q & A Thank you.


Download ppt "A presentation to El Paso del Norte Software Association"

Similar presentations


Ads by Google