Soft Computing, Machine Intelligence and Data Mining Sankar K. Pal Machine Intelligence Unit Indian Statistical Institute, Calcutta

Soft Computing, Machine Intelligence and Data Mining Sankar K. Pal Machine Intelligence Unit Indian Statistical Institute, Calcutta http://www.isical.ac.in/~sankar

ISI, 1931, Mahalanobis ACMU ECSU MIU CVPRU MSAS CS BS SSPESSQC

 Director  Prof-in- charge  Heads Faculty = 250 Courses : B. Stat, M. Stat, M. Tech(CS), M.Tech(SQC & OR), Ph.D. Location : Calcutta (HQ) Delhi Bangalore Hyderabad, Madras Giridih, Bombay  Distinguished Scientist  Professor  Associate Professor  Lecturer

MIU Activities Pattern Recognition and Image Processing Pattern Recognition and Image Processing  Color Image Processing Data Mining Data Mining  Data Condensation, Feature Selection  Support Vector Machine  Case Generation Soft Computing Soft Computing  Fuzzy Logic, Neural Networks, Genetic Algorithms, Rough Sets  Hybridization  Case Based Reasoning Fractals/Wavelets Fractals/Wavelets  Image Compression  Digital Watermarking  Wavelet + ANN Bioinformatics Bioinformatics (Formed in March 1993)

Externally Funded Projects Externally Funded Projects  INTEL  CSIR  Silicogene Center for Excellence in Soft Computing Research Center for Excellence in Soft Computing Research Foreign Collaborations Foreign Collaborations (Japan, France, Poland, Honk Kong, Australia) Editorial Activities Editorial Activities  Journals, Special Issues  Books Achievements/Recognitions Achievements/Recognitions Faculty: 10 Research Scholar/Associate: 8

Contents What is Soft Computing ? What is Soft Computing ? - Computational Theory of Perception - Computational Theory of Perception Pattern Recognition and Machine Intelligence Pattern Recognition and Machine Intelligence  Relevance of Soft Computing Tools  Different Integrations

Emergence of Data Mining Emergence of Data Mining  Need  KDD Process  Relevance of Soft Computing Tools  Rule Generation/Evaluation Modular Evolutionary Rough Fuzzy MLP – Modular Network – Rough Sets, Granules & Rule Generation – Variable Mutation Operations – Knowledge Flow – Example and Merits

Rough-fuzzy Case Generation Rough-fuzzy Case Generation  Granular Computing  Fuzzy Granulation  Mapping Dependency Rules to Cases  Case Retrieval  Examples and Merits Conclusions Conclusions

SOFT COMPUTING (L. A. Zadeh) Aim : To exploit the tolerance for imprecision uncertainty, approximate reasoning and partial truth to achieve tractability, robustness, low solution cost, and close resemblance with human like decision making To find an approximate solution to an imprecisely/precisely formulated problem.

Parking a Car Generally, a car can be parked rather easily because the final position of the car is not specified exactly. It it were specified to within, say, a fraction of a millimeter and a few seconds of arc, it would take hours or days of maneuvering and precise measurements of distance and angular position to solve the problem.  High precision carries a high cost.

 The challenge is to exploit the tolerance for imprecision by devising methods of computation which lead to an acceptable solution at low cost. This, in essence, is the guiding principle of soft computing.

Soft Computing is a collection of methodologies (working synergistically, not competitively) which, in one form or another, reflect its guiding principle: Exploit the tolerance for imprecision, uncertainty, approximate reasoning and partial truth to achieve Tractability, Robustness, and close resemblance with human like decision making. Foundation for the conception and design of high MIQ (Machine IQ) systems.

Provides Flexible Information Processing Capability for representation and evaluation of various real life ambiguous and uncertain situations.  Real World Computing It may be argued that it is soft computing rather than hard computing that should be viewed as the foundation for Artificial Intelligence. It may be argued that it is soft computing rather than hard computing that should be viewed as the foundation for Artificial Intelligence.

At this junction, the principal constituents of soft computing are Fuzzy Logic, Neurocomputing, Genetic Algorithms and Rough Sets. RS Within Soft Computing FL, NC, GA, RS are Complementary rather than Competitive FL NC GA

FL : the algorithms for dealing with imprecision and uncertainty NC : the machinery for learning and curve fitting GA : the algorithms for search and optimization RS handling uncertainty arising from the granularity in the domain of discourse Role of

Referring back to example: “Parking a Car’’ Do we use any measurement and computation while performing the tasks in Soft Computing? We use Computational Theory of Perceptions (CTP)

Computational Theory of Perceptions (CTP) Example: Car parking, driving in city, cooking meal, summarizing story Humans have remarkable capability to perform a wide variety of physical and mental tasks without any measurement and computations Humans have remarkable capability to perform a wide variety of physical and mental tasks without any measurement and computations AI Magazine, 22(1), 73-84, 2001 Provides capability to compute and reason with perception based information

They use perceptions of time, direction, speed, shape, possibility, likelihood, truth, and other attributes of physical and mental objects They use perceptions of time, direction, speed, shape, possibility, likelihood, truth, and other attributes of physical and mental objects Reflecting the finite ability of the sensory organs and (finally the brain) to resolve details, Perceptions are inherently imprecise Reflecting the finite ability of the sensory organs and (finally the brain) to resolve details, Perceptions are inherently imprecise

Perceptions are fuzzy (F) – granular (both fuzzy and granular) Boundaries of perceived classes are unsharp Boundaries of perceived classes are unsharp Values of attributes are granulated. Values of attributes are granulated. (a clump of indistinguishable points/objects) Example: Granules in age: very young, young, not so old,… Granules in direction: slightly left, sharp right,…

F-granularity of perceptions puts them well beyond the reach of traditional methods of analysis (based on predicate logic and probability theory) F-granularity of perceptions puts them well beyond the reach of traditional methods of analysis (based on predicate logic and probability theory) Main distinguishing feature: the assumption that perceptions are described by propositions drawn from a natural language. Main distinguishing feature: the assumption that perceptions are described by propositions drawn from a natural language.

MachineIntelligence Knowledge-based Systems  Probabilistic reasoning  Approximate reasoning  Case based reasoning  Fuzzy logic  Rough sets  Pattern recognition and learning Hybrid Systems  Neuro-fuzzy  Genetic neural  Fuzzy genetic  Fuzzy neuro genetic Non-linear Dynamics  Chaos theory  Rescaled range analysis (wavelet)  Fractal analysis Data Driven Systems  Neural network system  Evolutionary computing Machine Intelligence: A core concept for grouping various advanced technologies with Pattern Recognition and Learning

Measurement  Feature  Decision Space Space Space Space Space Space – Uncertainties arise from deficiencies of information available from a situation – Deficiencies may result from incomplete, imprecise, ill-defined, not fully reliable, vague, contradictory information in various stages of a PRS Pattern Recognition System (PRS)

Relevance of Fuzzy Sets in PR Representing linguistically phrased input features for processing Representing linguistically phrased input features for processing Representing multi-class membership of ambiguous patterns Generating rules & inferences in linguistic form Extracting ill-defined image regions, primitives, properties and describing relations among them as fuzzy subsets

ANNs provide Natural Classifiers having Resistance to Noise, Resistance to Noise, Tolerance to Distorted Patterns /Images (Ability to Generalize) Tolerance to Distorted Patterns /Images (Ability to Generalize) Superior Ability to Recognize Overlapping Pattern Classes or Classes with Highly Nonlinear Boundaries or Partially Occluded or Degraded Images Superior Ability to Recognize Overlapping Pattern Classes or Classes with Highly Nonlinear Boundaries or Partially Occluded or Degraded Images Potential for Parallel Processing Non parametric

Why GAs in PR ? Methods developed for Pattern Recognition and Image Processing are usually problem dependent. Methods developed for Pattern Recognition and Image Processing are usually problem dependent. Many tasks involved in analyzing/identifying a pattern need Appropriate Parameter Selection and Efficient Search in complex spaces to obtain Optimal Solutions Many tasks involved in analyzing/identifying a pattern need Appropriate Parameter Selection and Efficient Search in complex spaces to obtain Optimal Solutions

Makes the processes Makes the processes - Computationally Intensive - Computationally Intensive - Possibility of Losing the Exact Solution - Possibility of Losing the Exact Solution GAs : Efficient, Adaptive and robust Search Processes, Producing near optimal solutions and have a large amount of Implicit Parallelism GAs : Efficient, Adaptive and robust Search Processes, Producing near optimal solutions and have a large amount of Implicit Parallelism GAs are Appropriate and Natural Choice for problems which need – Optimizing Computation Requirements, and Robust, Fast and Close Approximate Solutions GAs are Appropriate and Natural Choice for problems which need – Optimizing Computation Requirements, and Robust, Fast and Close Approximate Solutions

Relevance of FL, ANN, GAs Individually to PR Problems is Established

In late eighties scientists thought – Why NOT Integrations ? Fuzzy Logic + ANN ANN + GA Fuzzy Logic + ANN + GA Fuzzy Logic + ANN + GA + Rough Set Neuro-fuzzy hybridization is the most visible integration realized so far.

Why Fusion Fuzzy Set theoretic models try to mimic human reasoning and the capability of handling uncertainty – (SW) Neural Network models attempt to emulate architecture and information representation scheme of human brain – (HW ) NEURO-FUZZY Computing (for More Intelligent System)

FUZZY SYSTEM ANN used for learning and Adaptation NFS ANN Fuzzy Sets used to Augment its Application domain FNN

MERITS GENERIC GENERIC APPLICATION SPECIFIC APPLICATION SPECIFIC

Rough-Fuzzy Hybridization: Fuzzy Set theory assigns to each object a degree of belongingness (membership) to represent an imprecise/vague concept. The focus of rough set theory is on the ambiguity caused by limited discernibility of objects (lower and upper approximation of concept). Rough sets and Fuzzy sets can be integrated: to develop a model of uncertainty stronger than either.

Rough Fuzzy Hybridization: A New Trend in Decision Making, S. K. Pal and A. Skowron (eds), Springer-Verlag, Singapore, 1999

Neuro-Rough Hybridization: Rough set models are used to generate network parameters (weights). Roughness is incorporated in inputs and output of networks for uncertainty handling, performance enhancement and extended domain of application. Networks consisting of rough neurons are used. Neurocomputing, Spl. Issue on Rough-Neuro Computing, S. K. Pal, W. Pedrycz, A. Skowron and R. Swiniarsky (eds), vol. 36 (1-4), 2001.

Neuro-Rough-Fuzzy-Genetic Hybridization: Rough sets are used to extract domain knowledge in the form of linguistic rules generates fuzzy Knowledge based networks evolved using Genetic algorithms. Integration offers several advantages like fast training, compact network and performance enhancement.

Incorporate Domain Knowledge using Rough Sets IEEE TNN,.9, 1203-1216, 1998 11 22 33 L M H F jL F jM F jH 11 22 33 GA Tuning FjFj X X | 0 0 0 | X X 0 0 | X X X | 00 Integrating of ANN, FL, Gas and Rough Sets

Before we describe Modular Evolutionary Rough-fuzzy MLP Rough-fuzzy Case Generation System We explain Data Mining and the significance of Pattern Recognition, Image Processing and Machine Intelligence.

One of the applications of Information Technology that has drawn the attention of researchers is DATA MINING Where Pattern Recognition/Image Processing/Machine Intelligence are directly related.

Why Data Mining ? Digital revolution has made digitized information easy to capture and fairly inexpensive to store. Digital revolution has made digitized information easy to capture and fairly inexpensive to store. With the development of computer hardware and software and the rapid computerization of business, huge amount of data have been collected and stored in centralized or distributed databases. With the development of computer hardware and software and the rapid computerization of business, huge amount of data have been collected and stored in centralized or distributed databases. Data is heterogeneous (mixture of text, symbolic, numeric, texture, image), huge (both in dimension and size) and scattered. The rate at which such data is stored is growing at a phenomenal rate.

 As a result, traditional ad hoc mixtures of statistical techniques and data management tools are no longer adequate for analyzing this vast collection of data.

Pattern Recognition and Machine Learning principles applied to a very large (both in size and dimension) heterogeneous database  Data Mining Data Mining + Knowledge Interpretation  Knowledge Discovery  Knowledge Discovery Process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data

Huge Raw Data Cleaning Data Condensation Dimensionality Reduction Data Wrapping/ Description Machine Learning Classification Clustering Rule Generation Knowledge Interpretation Knowledge Extraction Knowledge Evaluation Useful Knowledge Preprocessed Data Mathe- matical Model of Data (Patterns) Data Mining (DM) Knowledge Discovery in Database (KDD) Pattern Recognition, World Scientific, 2001

Data Mining Algorithm Components Model : Function of the model (e.g., classification, clustering, rule generation) and its representational form (e.g., linear discriminants, neural networks, fuzzy logic, GAs, rough sets). Model : Function of the model (e.g., classification, clustering, rule generation) and its representational form (e.g., linear discriminants, neural networks, fuzzy logic, GAs, rough sets). Preference criterion : Basis for preference of one model or set of parameters over another. Preference criterion : Basis for preference of one model or set of parameters over another. Search algorithm : Specification of an algorithm for finding particular patterns of interest (or models and parameters), given the data, family of models, and preference criterion. Search algorithm : Specification of an algorithm for finding particular patterns of interest (or models and parameters), given the data, family of models, and preference criterion.

Why Growth of Interest ? Falling cost of large storage devices and increasing ease of collecting data over networks. Falling cost of large storage devices and increasing ease of collecting data over networks. Availability of Robust/Efficient machine learning algorithms to process data. Availability of Robust/Efficient machine learning algorithms to process data. Falling cost of computational power  enabling use of computationally intensive methods for data analysis. Falling cost of computational power  enabling use of computationally intensive methods for data analysis.

Example: Financial Investment : Stock indices and prices, interest rates, credit card data, fraud detection Financial Investment : Stock indices and prices, interest rates, credit card data, fraud detection Health Care : Various diagnostic information stored by hospital management systems. Health Care : Various diagnostic information stored by hospital management systems.  Data is heterogeneous (mixture of text, symbolic, numeric, texture, image) and huge (both in dimension and size).

Role of Fuzzy Sets — Modeling of imprecise/qualitative knowledge — Transmission and handling uncertainties at various stages — Supporting, to an extent, human type reasoning in natural form reasoning in natural form

Classification/ Clustering Classification/ Clustering Discovering association rules (describing interesting association relationship among different attributes) Discovering association rules (describing interesting association relationship among different attributes) Inferencing Inferencing Data summarization/condensation (abstracting the essence from a large amount of information). Data summarization/condensation (abstracting the essence from a large amount of information).

Role of ANN Adaptivity, robustness, parallelism, optimality Adaptivity, robustness, parallelism, optimality Machinery for learning and curve fitting (Learns from examples) Machinery for learning and curve fitting (Learns from examples)  Initially, thought to be unsuitable for — black box nature — no information available in symbolic form (suitable for human interpretation)  Recently, embedded knowledge is extracted in the form of symbolic rules making it suitable for Rule generation.

Role of GAs  Robust, parallel, adaptive search methods — suitable when the search space is large.  Used more in Prediction (P) than Description(D) D : Finding human interpretable patterns describing the data P : Using some variables or attributes in the database to predict unknown/ future values of P : Using some variables or attributes in the database to predict unknown/ future values of other variables of interest. other variables of interest.

Example : Medical Data Numeric and textual information may be interspersed Numeric and textual information may be interspersed Different symbols can be used with same meaning Different symbols can be used with same meaning Redundancy often exists Redundancy often exists Erroneous/misspelled medical terms are common Erroneous/misspelled medical terms are common Data is often sparsely distributed Data is often sparsely distributed

 Robust preprocessing system is required to extract any kind of knowledge from even medium-sized medical data sets  The data must not only be cleaned of errors and redundancy, but organized in a fashion that makes sense for the problem

So, We NEED Efficient Efficient Robust Robust Flexible Flexible Machine Learning Algorithms  NEED for Soft Computing Paradigm

Without “Soft Computing” Machine Intelligence Research Remains Incomplete.

Modular Neural Networks Task: Split a learning task into several subtasks, train a Subnetwork for each subtask, integrate the subnetworks to generate the final solution. Strategy: ‘Divide and Conquer’

The approach involves Effective decomposition of the problems s.t. the Subproblems could be solved with compact networks. Effective combination and training of the subnetworks s.t. there is Gain in terms of both total training time, network size and accuracy of solution.

Advantages Accelerated training The final solution network has more structured components Representation of individual clusters (irrespective of size/importance) is better preserved in the final solution network. The catastrophic interference problem of neural network learning (in case of overlapped regions) is reduced.

Classification Problem Split a k-class problem into k 2-class problems. Train one (or multiple) subnetwork modules for each 2-class problem. Concatenate the subnetworks s.t. Intra-module links that have already evolved are unchanged, while Inter-module links are initialized to a low value. Train the concatenated networks s.t. the Intra- module links (already evolved) are less perturbed, while the Inter-module links are more perturbed.

Class 1 SubnetworkClass 2 SubnetworkClass 3 Subnetwork I Integrate Subnetwork Modules 3-class problem 3 (2-class problem) Links to be grown Links with values preserved Final Training Phase Final Network Inter-module links grown

Modular Rough Fuzzy MLP? A modular network designed using four different Soft Computing tools. Basic Network Model: Fuzzy MLP Rough Set theory is used to generate Crude decision rules Representing each of the classes from the Discernibility Matrix. (There may be multiple rules for each class multiple subnetworks for each class)

The Knowledge based subnetworks are concatenated to form a population of initial solution networks. The final solution network is evolved using a GA with variable mutation operator. The bits corresponding to the Intra-module links (already evolved) have low mutation probability, while Inter-module links have high mutation probability.

Rough Sets Offer mathematical tools to discover hidden patterns in data. Offer mathematical tools to discover hidden patterns in data. Fundamental principle of a rough set-based learning system is to discover redundancies and dependencies between the given features of a data to be classified. Fundamental principle of a rough set-based learning system is to discover redundancies and dependencies between the given features of a data to be classified. Z. Pawlak 1982, Int. J. Comp. Inf. Sci.

Approximate a given concept both from below and from above, using lower and upper approximations. Approximate a given concept both from below and from above, using lower and upper approximations. Rough set learning algorithms can be used to obtain rules in IF-THEN form from a decision table. Rough set learning algorithms can be used to obtain rules in IF-THEN form from a decision table. Extract Knowledge from data base (decision table w.r.t. objects and attributes  remove undesirable attributes (knowledge discovery)  analyze data dependency  minimum subset of attributes (reducts)) Extract Knowledge from data base (decision table w.r.t. objects and attributes  remove undesirable attributes (knowledge discovery)  analyze data dependency  minimum subset of attributes (reducts))

Rough Sets. x Upper Approximation BX Set X Lower Approximation BX [x] B (Granules) [x] B = set of all points belonging to the same granule as of the point x in feature space  B. [x] B is the set of all points which are indiscernible with point x in terms of feature subset B.

Approximations of the set B-lower: BX = B-upper: BX = If BX = BX, X is B-exact or B-definable Otherwise it is Roughly definable Granules definitely belonging to X w.r.t feature subset B Granules definitely and possibly belonging to X

Rough Sets Uncertainty Handling Granular Computing (Using lower & upper approximations) (Using information granules)

Information compression Computational gain Granular Computing: Computation is performed using information granules and not the data points (objects)

low mediumhigh low medium high F1F1 F2F2 Rule Rule provides crude description of the class using granule Information Granules and Rough Set Theoretic Rules

Rough Set Rule Generation Decision Table: Object F 1 F 2 F 3 F 4 F 5 Decision x 1 1 0 1 0 1 Class 1 x 2 0 0 0 0 1 Class 1 x 3 1 1 1 1 1 Class 1 x 4 0 1 0 1 0 Class 2 x 5 1 1 1 0 0 Class 2 Discernibility Matrix (c) for Class 1:Objects x1 x1 x1 x1 x2x2x2x2 x3x3x3x3 x1x1x1x1  F 1, F 3 F 2, F 4 x2x2x2x2  F 1,F 2,F 3,F 4 x3x3x3x3 

Discernibility function: Discernibility function considering the object x 1 belonging to Class 1 = Discernibility of x 1 w.r.t x 2 (and) Discernibility of x 1 w.r.t x 3 = Similarly, Discernibility function considering object Dependency Rules (AND-OR form):

L1... H2 ……………... Population1 ……………... Population2Population3 ……………... GA1 (Phase 1) GA2 (Phase 1)GA3 (Phase 1) L1... H2 Rules No. of Classes=2 No. of Features=2 Partially Trained Crude Networks

L1... H2 normal links Links having small random value High mutation probability Low mutation probability Final Population GA (Phase II) (with restricted mutation probability ) Final Trained Network Concatenated......

(SN1) Network Mapping R1 (Subnet 1) R2 (Subnet 2)R3 (Subnet 3) Partial Training with Ordinary GA (SN2) (SN3) Partially Refined Subnetworks Rough Set Rules Feature Space F1 F2 F1 F2 C1 (R1) C2(R2) C2(R3) SN1 SN2 SN3 IEEE Trans. Knowledge Data Engg., 15(1), 14-25, 2003 Knowledge Flow in Modular Rough Fuzzy MLP

Concatenation of Subnetworks Evolution of the Population of Concatenated networks with GA having variable mutation operator Final Solution Network Feature Space C1 C2 F1 F2 high mutation prob. low mutation prob.

Classification Accuracy Speech Data: 3 Features, 6 Classes

Network Size (No. of Links)

Training Time (hrs) DEC Alpha Workstation @400MHz

1. MLP 4. Rough Fuzzy MLP 2. Fuzzy MLP 5. Modular Rough Fuzzy MLP 3. Modular Fuzzy MLP

Network Structure: IEEE Trans. Knowledge Data Engg., 15(1), 14-25, 2003 Structured (# of links few) Unstructured (# of links more) Modular Rough Fuzzy MLP Fuzzy MLP Histogram of weight values

Connectivity of the network obtained using Modular Rough Fuzzy MLP

Sample rules extracted for Modular Rough Fuzzy MLP:

Rule Evaluation Accuracy Accuracy Fidelity (Number of times network and rule base output agree) Fidelity (Number of times network and rule base output agree) Confusion (should be restricted within minimum no. of classes) Confusion (should be restricted within minimum no. of classes) Coverage (a rule base with smaller uncovered region i.e., % test set for which no rules are fired, is better) Rule base size (smaller the no. of rules, more compact is the rule base) Certainty (confidence of rules)

Existing Rule Extraction Algorithms: Subset: Searches over all possible combination of input weights to a node of trained networks. Rules are generated from these Subsets of links, for which sum of the weights exceed the bias for that node. MofN: Instead of AND-OR rules, the method extracts rules of the Form: IF M out of N inputs are high THEN Class I. X2R: Unlike previous two methods which consider links of a network, X2R generates rule from input-output mapping implemented by the network. C4.5: Rule generation algorithm based on decision trees.

Comparison of Rules obtained for Speech data IEEE Trans. Knowledge Data Engg., 15(1), 14-25, 2003

Uncovered Region (Sample) Number of Rules CPU Time Confusion

Case Based Reasoning (CBR) CBR involves CBR involves  adapting old solutions to meet new demands — using old cases to explain new situations or to justify new solutions justify new solutions — reasoning from precedents to interpret new situations. Cases : some typical situations, already experienced by the system. conceptualized piece of knowledge representing an experience that teaches a lesson for achieving the goals of the system.

 learns and becomes more efficient as a byproduct of its reasoning activity. Example : Medical diagnosis and Law interpretation where the knowledge available is incomplete and/or evidence is sparse.

Unlike traditional knowledge-based system, case based system operates through a process of Unlike traditional knowledge-based system, case based system operates through a process of — remembering one or a small set of concrete instances or cases and — basing decisions on comparisons between the new situation and the old ones.

Case Selection  Cases belong to the set of examples encountered. Case Generation  Constructed ‘Cases’ need not be any of the examples.

Rough Sets Uncertainty Handling Granular Computing (Using lower & upper approximations) (Using information granules)

Granular Computing and Case Generation Information Granules: A group of similar objects clubbed together by an indiscernibility relation. Granular Computing: Computation is performed using information granules and not the data points (objects) –Information compression –Computational gain IEEE Trans. Knowledge Data Engg., to appear

Cases – Informative patterns (prototypes) characterizing the problems. In rough set theoretic framework: Cases  Information Granules In rough-fuzzy framework: Cases  Fuzzy Information Granules

Cases are cluster granules, not sample points Involves only reduced number of relevant features with variable size Less storage requirements Fast retrieval Suitable for mining data with large dimension and size Characteristics and Merits

How to Achieve? Fuzzy sets help in linguistic representation of patterns, providing a fuzzy granulation of the feature space Fuzzy sets help in linguistic representation of patterns, providing a fuzzy granulation of the feature space Rough sets help in generating dependency rules to model ‘informative/representative regions’ in the granulated feature space. Rough sets help in generating dependency rules to model ‘informative/representative regions’ in the granulated feature space. Fuzzy membership functions corresponding to the ‘representative regions’ are stored as Cases.

Fuzzy (F)-Granulation: 1 0.5 Feature j Membership value  low  medium  high cLcL cMcM cHcH L M  function

C low (F j ) = m jl C medium (F j ) = m j C high (F j ) = m jh low (F j ) = C medium (F j )  C low (F j ) high (F j ) = C high (F j )  C medium (F j ) medium (F j ) = 0.5 (C high (F j )  C low (F j )) M j = mean of the pattern points along j th axis. M jl = mean of points in the range [F j min, m j ) M jh = mean of points in the range (m j, F j max ] F j max, F j min : maximum and minimum values of feature F j.

An n-dimensional pattern F i = [F i1, F i2, …, F in ] is represented as a 3n-dimensional fuzzy linguistic pattern [Pal & Mitra 1992] An n-dimensional pattern F i = [F i1, F i2, …, F in ] is represented as a 3n-dimensional fuzzy linguistic pattern [Pal & Mitra 1992] F i = [  low(Fi1) (F i ), …,  high(Fin) (F i )] Set  value at 1 or 0, if it is higher or lower Set  value at 1 or 0, if it is higher or lower than 0.5 than 0.5  Binary 3n-dimensional patterns are obtained  Binary 3n-dimensional patterns are obtained

(Compute the frequency n ki of occurrence of binary patterns. Select those patterns having frequency above a threshold Tr (for noise removal)) (Compute the frequency n ki of occurrence of binary patterns. Select those patterns having frequency above a threshold Tr (for noise removal)) Generate a decision table consisting of the binary patterns. Generate a decision table consisting of the binary patterns. Extract dependency rules corresponding to Extract dependency rules corresponding to informative regions (blocks). (e.g., class L 1  M 2 ) informative regions (blocks). (e.g., class L 1  M 2 )

Rough Set Rule Generation Decision Table: Object F 1 F 2 F 3 F 4 F 5 Decision x 1 1 0 1 0 1 Class 1 x 2 0 0 0 0 1 Class 1 x 3 1 1 1 1 1 Class 1 x 4 0 1 0 1 0 Class 2 x 5 1 1 1 0 0 Class 2

Discernibility Matrix (c) for Class 1:Objects x1 x1 x1 x1 x2x2x2x2 x3x3x3x3 x1x1x1x1  F 1, F 3 F 2, F 4 x2x2x2x2  F 1,F 2,F 3,F 4 x3x3x3x3 

Discernibility function: Discernibility function considering the object x 1 belonging to Class 1 = Discernibility of x 1 w.r.t x 2 (and) Discernibility of x 1 w.r.t x 3 = Similarly, Discernibility function considering object Dependency Rules (AND-OR form):

Mapping Dependency Rules to Cases 1. Each conjunction e.g., L 1  M 2 represents a region (block) 2. For each conjunction, store as a case: Parameters of the fuzzy membership functions corresponding to linguistic variables that occur in the conjunction. Parameters of the fuzzy membership functions corresponding to linguistic variables that occur in the conjunction. (thus, multiple cases may be generated from a rule.) Class information

Note: All features may not occur in a rule.  Cases may be represented by Different Reduced number of features. Structure of a Case: Parameters of the membership functions (center, radii), Class information

Example IEEE Trans. Knowledge Data Engg., to appear F2F2 X X X 0.9 0.4 0.2 0.10.5 0.7 F1F1 CASE 2 CASE 1 Parameters of fuzzy linguistic sets low, medium, high

Dependency Rules and Cases Obtained: Case 1: Feature No: 1, fuzzset (L): c = 0.1, = 0.5 Feature No: 2, fuzzset (H): c = 0.9, = 0.5 Class=1 Case 2: Feature No: 1, fuzzset (H): c = 0.7, = 0.4 Feature No: 2, fuzzset (L): c = 0.2, = 0.5 Class=2

Case Retrieval Similarity (sim(x,c)) between a pattern x and a case c is defined as: Similarity (sim(x,c)) between a pattern x and a case c is defined as: n: number of features present in case c

: the degree of belongingness of pattern x to fuzzy linguistic set fuzzset for feature j. For classifying an unknown pattern, the case closest to the pattern in terms of sim(x,c) is retrieved and its class is assigned to the pattern. For classifying an unknown pattern, the case closest to the pattern in terms of sim(x,c) is retrieved and its class is assigned to the pattern.

Experimental Results and Comparisons 1. Forest Covertype: Contains 10 dimensions, 7 classes and 586,012 samples. It is a Geographical Information System data representing forest cover types (pine/fir etc) of USA. The variables are cartographic and remote sensing measurements. All the variables are numeric.

2. Multiple features: This dataset consists of features of handwritten numerals (`0’-`9’) extracted from a collection of Dutch utility maps. There are total 2000 patterns, 649 features (all numeric) and all classes. 3. Iris: The dataset contains 150 instances, 4 features and 3 classes of Iris flowers. The features are numeric.

Some Existing Case Selection Methods: k-NN based –Condensed nearest neighbor (CNN), –Instance based learning (e.g., IB3), –Instance based learning with feature weighting (e.g., IB4). Fuzzy logic based Neuro-fuzzy based.

Algorithms Compared: i. Instance based learning algorithm, IB3 [Aha 1991], ii. Instance based learning algorithm, IB4 [Aha 1992] (reduced feature). The feature weighting is learned by random hill climbing in IB4. A specified number of features having high weights is selected. iii. Random case selection.

Evaluation in terms of: a) 1-NN classification accuracy using the cases. Training set 10% for case generation, and Test set 90% b) Number of cases stored in the case base. c) Average number of features required to store a case (n avg ). d)CPU time required for case generation (t gen ). e)Average CPU time required to retrieve a case (t ret ). (on a Sun UltraSparc @350 MHz Workstation)

Iris Flowers: 4 features, 3 classes, 150 samples Number of cases = 3 (for all methods)

Forest Cover Types: 10 features, 7 classes, 5,86,012 samples Number of cases = 545 (for all methods)

Hand Written Numerals: 649 features, 10 classes, 2000 samples Number of cases = 50 (for all methods)

For same number of cases Accuracy: Proposed method much superior to random selection and IB4, close IB3. Average Number of Features Stored: Proposed method stores much less than the original data dimension. Case Generation Time: Proposed method requires much less compared to IB3 and IB4. Case Retrieval Time: Several orders less for proposed method compared to IB3 and random selection. Also less than IB4.

Conclusions Relation between Soft Computing, Machine Intelligence and Pattern Recognition is explained. Emergence of Data Mining and Knowledge Discovery from PR point of view is explained. Significance of Hybridization in Soft Computing paradigm is illustrated.

Modular concept enhances performance, accelerates training and makes the network structured with less no. of links. Rules generated are superior to other related methods in terms of accuracy, coverage, fidelity, confusion, size and certainty.

Rough sets used for generating information granules. Fuzzy sets provide efficient granulation of feature space (F -granulation). Reduced and variable feature subset representation of cases is a unique feature of the scheme. Rough-fuzzy case generation method is suitable for CBR systems involving datasets large both in dimension and size.

Unsupervised case generation, Rough-SOM (Applied intelligence, to appear) Application to multi-spectral image segmentation ( IEEE Trans. Geoscience and Remote Sensing, 40(11), 2495-2501, 2002) Significance in Computational Theory of Perception (CTP)

Thank You!!

Soft Computing, Machine Intelligence and Data Mining Sankar K. Pal Machine Intelligence Unit Indian Statistical Institute, Calcutta

Similar presentations

Presentation on theme: "Soft Computing, Machine Intelligence and Data Mining Sankar K. Pal Machine Intelligence Unit Indian Statistical Institute, Calcutta"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Soft Computing, Machine Intelligence and Data Mining Sankar K. Pal Machine Intelligence Unit Indian Statistical Institute, Calcutta

Similar presentations

Presentation on theme: "Soft Computing, Machine Intelligence and Data Mining Sankar K. Pal Machine Intelligence Unit Indian Statistical Institute, Calcutta"— Presentation transcript:

Similar presentations

About project

Feedback