Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Andrew K. C. Wong Yang Wang 國立雲林科技大學 National Yunlin University of.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Andrew K. C. Wong Yang Wang 國立雲林科技大學 National Yunlin University of."— Presentation transcript:

1 Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Andrew K. C. Wong Yang Wang 國立雲林科技大學 National Yunlin University of Science and Technology Pattern Discovery: A Data Driven Approach to Decision Support IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS-PART C: APPLICATIONS AND REVIEWS, VOL. 33, NO. 1, FEBRUARY 2003

2 Intelligent Database Systems Lab Outline Motivation Objective Introduction Brief Description of Pattern Discovery Data, Events, and Patterns Pattern Discovery Inferencing with Patterns for Decision Support Summary and Discussion N.Y.U.S.T. I.M.

3 Intelligent Database Systems Lab Motivation Decision support nowadays is more and more targeted to large scale complicated systems and domains. N.Y.U.S.T. I.M.

4 Intelligent Database Systems Lab Objective Having capability of processing large amounts of data and efficiently extracting useful knowledge from the data Develop a fundamental framework toward intelligent decision support by analyzing a large amount of mixed-mode data N.Y.U.S.T. I.M.

5 Intelligent Database Systems Lab 1. Introduction If machine intelligence could be used comfortably by the decision makers 1) Discover multiple patterns from a database without relying on prior knowledge 2) Cope with multiple and flexible decision and objectives 3) Provide explicit patterns and rules associated for interpretation 4) Render a high-speed interactive mode for information and knowledge extraction N.Y.U.S.T. I.M.

6 Intelligent Database Systems Lab 1. Introduction Three related issues are also of concern to the decision-makers 1) Flexibility and versatility of the pattern discovery process ; 2) Transparency of the supporting evidences ; 3) Processing speed. N.Y.U.S.T. I.M.

7 Intelligent Database Systems Lab 2. Brief Description of Pattern Discovery In the seventies  Quantitative basis of information measures and statistical patterns  This formed the early basis of our pattern discovery approach  Pattern recognition methods for discrete valued and continuous data N.Y.U.S.T. I.M.

8 Intelligent Database Systems Lab 2. Brief Description of Pattern Discovery In the late seventies and early eighties  Dimension was too large to pattern discovery  Database partitioning was proposed In nineties  Shift pattern recognition paradigm from the variable level to event level based N.Y.U.S.T. I.M.

9 Intelligent Database Systems Lab 3. Data, Events, and Patterns each values from its domain Generalized Event Pattern N.Y.U.S.T. I.M.

10 Intelligent Database Systems Lab 3.1 Generalized Event A Borel σ-field is the collection of all rectangles in Two advantages  geometric perspective  Probability measure N.Y.U.S.T. I.M.

11 Intelligent Database Systems Lab 3.1 Generalized Event Definition 1 : Consider the sample space A hypercell H of is called a hypercell if it has the form N.Y.U.S.T. I.M.

12 Intelligent Database Systems Lab 3.1 Generalized Event Definition 2: An event in is a hyper set. Definition 3: The volume of an event is the hypervolume of the N-dimensional subspace occupied by the event. Ex: a data set D = {ω} from a sample space N.Y.U.S.T. I.M.

13 Intelligent Database Systems Lab 3.1 Generalized Event Definition 4: The observed frequency,, of an event E in the sample space Ω is the number of data points that fall within the volume of E. Ex: as the finite set inside the volume of E then = |{ X }| Definition 5: The probability of an event E is intuitively estimated by the proportion of data points contained in the event N.Y.U.S.T. I.M.

14 Intelligent Database Systems Lab 3.2 Pattern Definition 6: Let Ω be the sample space and g(.) be a test statistic corresponding to a specified discovery criterion c. A pattern is an event E that satisfies the condition N.Y.U.S.T. I.M. be the critical value of the statistical test at a significant level of α

15 Intelligent Database Systems Lab 3.2 Pattern Definition 7: An event association is a significant joint occurrence of low-dimensional events.  N-dimensional event (N > 1) can be considered an event association, composed of N one-dimensional events. N.Y.U.S.T. I.M.

16 Intelligent Database Systems Lab 4. Pattern Discovery Definition 8: Suppose we have a data set with sample space Ω. Pattern Discovery is the search for significant events (hypercells) in a compact subspace of the sample space Ω demarcated by the available data set D. Pattern Discovery as Residual Analysis Pattern Discovery as Optimization N.Y.U.S.T. I.M.

17 Intelligent Database Systems Lab 4.1 Pattern Discovery as Residual Analysis Definition 9: The residual of an event E against a pre-assumed model is defined as the difference between the actual occurrence of the event and its expected occurrence. N.Y.U.S.T. I.M. is the expected occurrence

18 Intelligent Database Systems Lab 4.1 Pattern Discovery as Residual Analysis Definition 10: The standardized residual of event E is defined as the ratio of its residual and the square root of its expectation Definition 11: The adjusted residual of event E is defined as N.Y.U.S.T. I.M. is the variance of

19 Intelligent Database Systems Lab 4.1 Pattern Discovery as Residual Analysis Two pre-assumed model  uniform distribution ; (concentration-driven discovery) where V is the volume of S, and M is the total number of observations.  mutual independence. (dependency-driven discovery) N.Y.U.S.T. I.M. is the number of data points falls into

20 Intelligent Database Systems Lab 4.2 Pattern Discovery as Optimization C represents one of the corners of E, and L represents the lengths of the edges. Further define N.Y.U.S.T. I.M.

21 Intelligent Database Systems Lab 4.2 Pattern Discovery as Optimization The pattern discovery problem is to The objective function O(E) is the adjusted residual N.Y.U.S.T. I.M.

22 Intelligent Database Systems Lab 5. Inferencing with Patterns for Decision Support Building Classifiers Multivariate Probabilistic Density Estimation Interpretation of Patterns Discovered Patterns as Queries for Class Data Retrieval N.Y.U.S.T. I.M.

23 Intelligent Database Systems Lab 5.1 Building Classifiers Based on the mutual information in information theory (dependency-driven discovery)  information gain  weight of evidence N.Y.U.S.T. I.M. result: + ; - ; 0 I(.) is the mutual information

24 Intelligent Database Systems Lab 5.1 Building Classifiers But need to estimate the conditional probabilities or know the distribution  decompose if significant event associations related to and x are known N.Y.U.S.T. I.M.

25 Intelligent Database Systems Lab 5.1 Building Classifiers Only the significant event associations discovered from the data set are used in the inference process.  Thus, maximize the  Conditions Using the pattern as a model, any missing values of a discrete variables can be classified N.Y.U.S.T. I.M.

26 Intelligent Database Systems Lab 5.2 Multivariate Probabilistic Density Estimation The estimation of the probability density function (pdf) is a central problem in multivariate data analysis. (concentration-driven discovery)  discrete pdf Estimation  continuous pdf Estimation N.Y.U.S.T. I.M.

27 Intelligent Database Systems Lab discrete pdf Estimation Definition 12: The indicator function for a event, E i, is defined as  The probability density estimate  The normalization condition  The discrete probability density function N.Y.U.S.T. I.M.

28 Intelligent Database Systems Lab continuous pdf Estimation The basic idea is to estimate a kernel for each event.  Gaussian kernel, its continuous and satisfies where N.Y.U.S.T. I.M.

29 Intelligent Database Systems Lab continuous pdf Estimation To fit a kernel ψ(x) to the event E  compute the mean and covariance matrix  The combined pdf is estimated by where N.Y.U.S.T. I.M.

30 Intelligent Database Systems Lab 5.2 Multivariate Probabilistic Density Estimation An exmple of continuous density estimation N.Y.U.S.T. I.M.

31 Intelligent Database Systems Lab 5.3 Interpretation of Patterns Since events was discovered, rule cane be easily extracted.  association rule, form X => Y  support and confidence is measured  P(X,Y) and P(Y|X) N.Y.U.S.T. I.M.

32 Intelligent Database Systems Lab 5.4 Discovered Patterns as Queries for Class Data Retrieval One pattern The Query N.Y.U.S.T. I.M.

33 Intelligent Database Systems Lab 6. Summary and Discussion Develop a framework of data driven decision support based on pattern discovry 1) the motivation, historical background and the rationale of our approach ; 2) a novel unified framework to define and represent mixed-mode data, the most general and common data encountered in today’s database ; 3) the theoretical basis of pattern discovery based on statistical residual and optimization principles ; N.Y.U.S.T. I.M.

34 Intelligent Database Systems Lab 6. Summary and Discussion Develop a framework of data driven decision support based on pattern discovry 4) a novel and unified framework to represent probability models for mixed-mode data in the form of pdf ; 5) an inferencing system using the discovered patterns and weight of evidence for classification and prediction ; 6) a new way of data retrieval by each class queries for retrieval in a distributive database with unlimited size ; 7) supporting validation evidences of the efficacy of the proposed framework and its new development in solving large scaled problem with online interactive capability. N.Y.U.S.T. I.M.


Download ppt "Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Andrew K. C. Wong Yang Wang 國立雲林科技大學 National Yunlin University of."

Similar presentations


Ads by Google