From Rough Set Theory to Evidence Theory Roman Słowiński Laboratory of Intelligent Decision Support Systems Institute of Computing Science Poznań University.

Slides:

Advertisements

Similar presentations

FT228/4 Knowledge Based Decision Support Systems

Advertisements

Dempster-Shafer Theory

_ Rough Sets. Basic Concepts of Rough Sets _ Information/Decision Systems (Tables) _ Indiscernibility _ Set Approximation _ Reducts and Core _ Rough Membership.

Fundamentals of Dempster-Shafer Theory presented by Zbigniew W. Ras University of North Carolina, Charlotte, NC College of Computing and Informatics University.

Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,

DETC06: Uncertainty Workshop; Evidence & Possibility Theories Evidence and Possibility Theories in Engineering Design Zissimos P. Mourelatos Mechanical.

Week 21 Basic Set Theory A set is a collection of elements. Use capital letters, A, B, C to denotes sets and small letters a 1, a 2, … to denote the elements.

Fuzzy Measures and Integrals

Bayesian Decision Theory

Huge Raw Data Cleaning Data Condensation Dimensionality Reduction Data Wrapping/ Description Machine Learning Classification Clustering Rule Generation.

Rough Sets Theory Speaker：Kun Hsiang.

PR-OWL: A Framework for Probabilistic Ontologies by Paulo C. G. COSTA, Kathryn B. LASKEY George Mason University presented by Thomas Packer 1PR-OWL.

_ Rough Sets. Basic Concepts of Rough Sets _ Information/Decision Systems (Tables) _ Indiscernibility _ Set Approximation _ Reducts and Core.

16722 Sensing and Sensors Mel Siegel )

Dempster-Shafer Theory SIU CS 537 4/12/11 and 4/14/11 Chet Langin.

August 2005RSFDGrC 2005, Regina, Canada 1 Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han 1, Ricardo Sanchez.

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

2-1 Sample Spaces and Events Conducting an experiment, in day-to-day repetitions of the measurement the results can differ slightly because of small.

Uncertainty Measure and Reduction in Intuitionistic Fuzzy Covering Approximation Space Feng Tao Mi Ju-Sheng.

Information Fusion Yu Cai. Research Paper Johan Schubert, “Clustering belief functions based on attracting and conflicting meta level evidence”, July.

Data Mining By Archana Ketkar.

Fall 2006 – Fundamentals of Business Statistics 1 Chapter 8 Introduction to Hypothesis Testing.

QMS 6351 Statistics and Research Methods Probability and Probability distributions Chapter 4, page 161 Chapter 5 (5.1) Chapter 6 (6.2) Prof. Vera Adamchik.

Machine Learning -Ramya Karri -Rushin Barot. Machine learning Rough Set Theory in Machine Learning? Knower’s knowledge – Closed World Assumption – Open.

CSc411Artificial Intelligence1 Chapter 5 STOCHASTIC METHODS Contents The Elements of Counting Elements of Probability Theory Applications of the Stochastic.

Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.

Introduction to machine learning

Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.

Data Mining Chun-Hung Chou

Ch2 Data Preprocessing part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.

On Applications of Rough Sets theory to Knowledge Discovery Frida Coaquira UNIVERSITY OF PUERTO RICO MAYAGÜEZ CAMPUS

CSE & CSE6002E - Soft Computing Winter Semester, 2011 More Rough Sets.

Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.

Classifying Attributes with Game- theoretic Rough Sets Nouman Azam and JingTao Yao Department of Computer Science University of Regina CANADA S4S 0A2

Outline Introduction Descriptive Data Summarization Data Cleaning Missing value Noise data Data Integration Redundancy Data Transformation.

3. Rough set extensions  In the rough set literature, several extensions have been developed that attempt to handle better the uncertainty present in.

Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.

Introduction to Probability  Probability is a numerical measure of the likelihood that an event will occur.  Probability values are always assigned on.

On joint modelling of random uncertainty and fuzzy imprecision Olgierd Hryniewicz Systems Research Institute Warsaw.

Belief-Function Formalism Computes the probability that the evidence supports a proposition Also known as the Dempster-Shafer theory Bayesian Formalism.

Uncertainty Management in Rule-based Expert Systems

“Principles of Soft Computing, 2 nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved. CHAPTER 12 FUZZY.

Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.

Fuzzy Measures and Integrals 1. Fuzzy Measure 2. Belief and Plausibility Measure 3. Possibility and Necessity Measure 4. Sugeno Measure 5. Fuzzy Integrals.

Data Preprocessing Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.

International Conference on Fuzzy Systems and Knowledge Discovery, p.p ,July 2011.

Dominance-Bases Rough Set Approach: Features, Extensions and Application Krzysztof Dembczyński Institute of Computing Science, Poznań University of Technology,

DATA MINING Handling Missing Attribute Values and Knowledge Discovery Shahzeb Kamal Olov Junker AmsterdamUppsala HASCO SHAH - OLOV.

Summary „Rough sets and Data mining” Vietnam national university in Hanoi, College of technology, Feb.2006.

Textbook Basics of an Expert System: – “Expert systems: Design and Development,” by: John Durkin, 1994, Chapters 1-4. Uncertainty (Probability, Certainty.

Data Mining and Decision Support

Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.

2/5/98UCLA Data Mining Short Course1 Pattern Evaluation and Process Control Wei-Min Shen Information Sciences Institute University of Southern California.

Fuzzy C-means Clustering Dr. Bernard Chen University of Central Arkansas.

Rough Sets, Their Extensions and Applications 1.Introduction  Rough set theory offers one of the most distinct and recent approaches for dealing with.

Pattern Recognition Lecture 20: Data Mining 2 Dr. Richard Spillman Pacific Lutheran University.

More Rough Sets.

Chapter 7: Counting Principles

Lecture 7: Data Preprocessing

Dependencies in Structures of Decision Tables

Rough Sets (Theoretical Aspects of Reasoning about Data)

28th September 2005 Dr Bogdan L. Vrusias

Sets and Probabilistic Models

Sets and Probabilistic Models

Sets and Probabilistic Models

Sets and Probabilistic Models

Presentation transcript:

From Rough Set Theory to Evidence Theory Roman Słowiński Laboratory of Intelligent Decision Support Systems Institute of Computing Science Poznań University of Technology  Roman Słowiński

2 Introduction We show that rough set theory (RST) proposed by Pawlak 1982 can be treated as a basis for evidence theory proposed by Dempster and Shafer (1976), called Dempster-Shafer Theory (DST) Dempster-Shafer Theory annuls the axiom of additivity of exclusive events, which is one of traditional axioms of probabilistic reasoning: P(A) + P(A) = 1 This permits to take into account both a degree of belief in a fact and a degree of ignorance (belief in a negation of the fact) which do not need to sum up to 1 (in extreme case, degrees of belief and ignorance can be 0) In DST, increased support for a hypothesis does not affect the support for the complement of this hypothesis RST operates on a decision table S=U, C{d} providing information about objects in terms of values of attributes; RST detects inconsistencies due to indiscernibility of objects belonging to different decision classes (hypotheses) Cl = {Cl 1,Cl 2,…,Cl n }; in consequence, the classes are characterized by lower and upper approximations

3 Introduction DST operates on a frame of discernment being a finite set of names of decision classes (hypotheses) Cl = {Cl 1,Cl 2,…,Cl n }; all possible clusters of classes from Cl are considered – they create power set 2 Cl ; information about a set of objects from U creating a given cluster is provided directly by some numerical functions called the basic probability assignment (gęstość prawdopodobiństwa, masy), belief function and plausibility function (dolna i górna granica stopnia przekonania) Clusters from 2 Cl are all possible homes of objects from U (hypotheses), given some pieces of evidence (premises): e.g. for Cl = {ravens, cats, whales}, the home (hypothesis) for objects with legs is the cluster {ravens, cats}, while the home for objects being mammals is the cluster {cats, whales} While in DST the membership of objects in clusters is described by directly provided belief and plausibility being functions of bpa, we will show that the values of these functions can be calculated from the decision table using the RST concepts of lower and upper approximations of decision classes

4 Evidence Theory or Dempster-Shafer Theory Frame of discernment : Cl = {Cl 1,Cl 2,…,Cl n } Basic probablitity assignment (bpa) m on Cl : m: 2 Cl R + satisfying m() = 0, where  is a cluster of decision classes (one of 2 Cl ) For a given bpa m, two functions are defined: belief function over Cl : Bel: 2 Cl R + iff for any   Cl plausibility function over Cl : Pl: 2 Cl R + iff for any   Cl Some properties: Bel() = Pl() = 0, Bel(Cl) = Pl(Cl) = 1 if   , then Bel(  ) ≤ Bel(  ) and Pl(  ) ≤ Pl(  ) Bel(  )  Pl(  ) Bel(  ) + Bel(Cl –  )  1 and Pl(  ) + Pl(Cl –  )  1

5 Evidence Theory or Dempster-Shafer Theory From a given belief function, a basic probablitity assignment (bpa) can be reconstructed :, for   Cl, e.g. for ={Cl 1,Cl 2 }, m() = Bel(Cl 1,Cl 2 )–Bel(Cl 1 )–Bel(Cl 2 ) = = Bel(Cl 1,Cl 2 )–m(Cl 1 )–m(Cl 2 ) for ={Cl 1,Cl 2,Cl 3 }, m() = Bel(Cl 1,Cl 2,Cl 3 )–Bel(Cl 1,Cl 2 )–Bel(Cl 2,Cl 3 )–Bel(Cl 1,Cl 3 ) +Bel(Cl 1 )+Bel(Cl 2 )+Bel(Cl 3 ) = = Bel(Cl 1,Cl 2,Cl 3 )–m(Cl 1,Cl 2 )–m(Cl 2,Cl 3 )–m(Cl 1,Cl 3 ) –m(Cl 1 )–m(Cl 2 )–m(Cl 3 ) …………… The union of all subsets   Cl that are focal (i.e. have the property m(  )>0) is called the core of Cl

6 Evidence Theory or Dempster-Shafer Theory Dempster’s rule of combination Suppose that 2 different belief functions Bel 1 and Bel 2 over the same frame of discernment Cl represent different pieces of evidence (przesłanki) (e.g. „with legs” or „mammals”). The assumption is that both pieces of evidence are independent. As a result of Dempster’s rule of combination a new belief function Bel 3 is computed as their orthogonal sum Bel 1  Bel 2, together with bpa of resulting  : where  1,  2 are focal elements of m 1 and m 2, respectively. If, then m 3 (  ) cannot be defined, and m 1, m 2 are said to be contradictory bpa.

7 Relationship between RST and DST Let Cl be the frame of discernment compatible with the decision table S=U, C{d}, let also T(  )={t : Cl t Cl}, where   Cl For any   Cl the belief function can be calculated as: where is called generalized decision for object x (cluster of classes, with no possibility of discernment using knowledge about S=U, C{d})

8 Relationship between RST and DST For any   Cl the plausibility function can be calculated as:

9 Example Let us consider decision table S = U, C{d}, where U = {1,…,28}, C={a, b, c, e, f} and decision d makes partition of U into decision classes Cl = {Cl 1,Cl 2, Cl 3 }, T(Cl) = {1, 2, 3} Cl 1 ={1,2,4,8,10,15,22,25} Cl 2 ={3,5,11,12,16,18,19,21,23,24,27} Cl 3 ={6,7,9,13,14,17,20,26,28} I C (1)={1,10}, I C (2)={2,11,20,23}, I C (3)={3,5}, I C (4)={4,6}, I C (7)={7,9,17}, I C (8)={8,12}, I C (13)={13,16}, I C (14)={14,21}, I C (15)={15,24}, I C (18)={18,19}, I C (22)={22}, I C (25={25,26,27}, I C (28)={28}

10 Example Lower and upper approximations of decision classes:

11 Example Generalized decisions for objects from the boundaries of decision classes

12 Example Generalized decisions for objects from the boundaries of decision classes CC CC

13 Example Values of the basic probability assignment for the frame of discernement Cl = {Cl 1,Cl 2, Cl 3 } T()T() mS()mS() {Cl 1 } {Cl 2 } {Cl 3 }{Cl 1,Cl 2 }{Cl 1,Cl 3 }{Cl 2,Cl 3 } {Cl 1,Cl 2,Cl 3 } sum up to 1

14 Example Values of belief function for the frame of discernement Cl = {Cl 1,Cl 2, Cl 3 }, e.g. Bel S ({2,3} = m S ({2})+m S ({2})+m S ({2,3}) = 1/7+1/7+1/7=12/28=3/7, i.e. 12 out of 28 objects are in Cl 2 Cl 3, among them 4 are in Cl 2, 4 are in Cl 3 and 4 are in Cl 2 or Cl 3 (with no possibility of discernment to which one). Pl S (  ) = 16/28, 19/28, 17/28, 6/7, 6/7, 25/28, 1 since Pl S (  ) = 1 – Bel(Cl –  ), for   Cl T()T() Bel S () {Cl 1 } {Cl 2 }{Cl 3 } {Cl 1,Cl 2 }{Cl 1,Cl 3 }{Cl 2,Cl 3 } {Cl 1,Cl 2,Cl 3 }

15 Final remarks about RST and Knowledge Discovery

16 Knowledge discovery principle Knowledge discovery (KD) is a type of machine learning which enables the extraction of useful information from a set of raw data in a high-level format, which a user can understand. The process of KD generally involves developing models (patterns) that describe or classify a set of measurements. Some times, the objective of KD is to study the model itself, while other times it is to construct a good classifier. This process consists of various steps which are executed in a "waterfall" type fashion. One important step is the data mining step, whereby patterns and relationships are found in the data.

17 Knowledge discovery „waterfall”

18 Knowledge discovery steps Selection: This task involves setting the data set into a format that is suitable for the discovery task. This may involve joining together several existing sets of data in order to come to the final data set. Preprocessing: This step involves cleaning the data and removing information that is deemed unnecessary or filling any missing values in the data. Transformation: The data set will be quite large and contain non relevant features or duplicate examples which must be merged. This will reduce the amount of data and also the time taken to execute mining queries. Mining: This step involves extracting patterns from the data. Evaluation: Patterns identified by the system are interpreted into knowledge which can be used to support human-decision making or other tasks.

19 Rough set theory as a framework for knowledge discovery Selection : The data representation for rough sets are flat, two-dimensional data tables. Preprocessing : If the data contains missing values, the table may be processed either classically, after completing the data table in some way, or using a new indiscernibility relation able to compare incomplete descriptions. Transformation : Data discretization is often performed on numerical attributes. This involves converting the exact observations into intervals or ranges of values. This amounts to defining a coarser view of the world and also results in a reduction on the value set size for the observations. Mining : In the rough set framework, if-then rules are mined in one of three perspectives: minimal cover (minimal number of rules), exhaustive description (all rules), satisfactory description (interesting rules). Evaluation : Individual patterns or rules can be measured or manually inspected. Rule sets can be used to classify new objects and their classificatory performance may be assessed.