Download presentation
Presentation is loading. Please wait.
1
GUI implementation for Supervised and Unsupervised SUBDUE System
2
Introduction zData Mining is a very vast field, which is rapidly developing today. It is the use of artificial intelligence to find common, interesting, or previously unknown patterns in large databases. zOne method for discovering knowledge in structural data is the identification of common substructures (concepts represented as graphs) within the data. 2
3
Introduction (Contd..) zThe SUBDUE system, developed by Cook and Holder [Cook and Holder, 1999] performs data mining on databases represented as graphs, i.e. it discovers interesting substructures in structural data. zThis project deals with the conversion of a textual representation of the data into a graphical visualization of the data The input data given to the program is the outputs provided by the existing SUBDUE. 3
4
The Subdue System zThe SUBDUE system is a data-mining tool that discovers interesting substructures in structural data zBy compressing previously discovered substructures in the data multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data. 4
5
Unsupervised Subdue 5 Example of the input graph to the unsupervised version of the subdue and the discovered substructure.
6
Unsupervised (Contd..) xThe substructure discovery algorithm used by SUBDUE is beam search. A substructure consists of a definition and a set of instances. The substructure definition is a connected set of vertices and edges that define a SUBGRAPH within G. An instance of a substructure is a SUBGRAPH of G that matches, graph theoretically, to the definition. The algorithm for the discovery is given below. 6
7
Unsupervised Contd.) xSUBDUE (Graph, BeamWidth, MaxBest, MaxSubSize, Limit) xParentList={} xChildList={} xBestList={} xProcessedSubs=0 xCreate a substructure from each unique vertex label and its single-vertex instances; insert the resulting substructures in ParentList xWhile ProcessedSubs < = Limit and ParentList is not empty do xWhile ParentList is not empty do xParent = RemoveHead (ParentList) xExtend each instance of Parent in all possible ways xGroup the extended instances into Child substructures xFor each Child do xIf SizeOf (Child) <= MaxSubSize then xEvaluate the Child 7
8
Unsupervised Contd.) 8 Insert Child in ChildList in order by value If Length (ChildList)>BeamWidth then Destroy the substructure at the end of the ChildList ProcessedSubs = ProcessedSubs + 1 Insert Parent in BestList in order by value If Length (BestList) > MaxBest then Destroy the substructure at the end of the BestList Switch ParentList and ChildList return BestList
9
Supervised SUBDUE performs supervised graph based relational concept learning. The SUBDUE concept learner accepts both a positive graph and a negative graph, and evaluates substructures based on their compression of the positive graph and lack of compression on the negative graph. 9
10
Supervised Contd. 10
11
Supervised (Contd.) x In the figure above each object has a shape and is related to other objects using the binary relation on and shape. The discovered substructure is as shown in the figure. As seen above the SUBDUE discovered substructure is from the positive example and not from the negative example. For this example the best substructure, which gives the maximum compression, is triangle on a square. 11
12
Subdue - Chemical Compounds xA DNA sequence can be represented as a very simple linear graph, and higher-level relationships between different parts of a sequence can be mapped to additional edges in graph. xSUBDUE system discovers patterns in the input graph in polynomial time. x SUBDUE system is capable of discovering known patterns in the DNA sequence of yeast,as well as patterns in yeast DNA that are known to be important in other organism, but which have not yet been shown to play a role in yeast. 12
13
Subdue - Chemical Compounds xA DNA sequence can be represented as a very simple linear graph, and higher-level relationships between different parts of a sequence can be mapped to additional edges in graph. xSUBDUE system discovers patterns in the input graph in polynomial time. x SUBDUE system is capable of discovering known patterns in the DNA sequence of yeast,as well as patterns in yeast DNA that are known to be important in other organism, but which have not yet been shown to play a role in yeast. 13
14
Subdue - Chemical Compounds xThe figure shown above is the backbone representation which gives more meaningful graphs then the linear representation. xThis representation separated the base names (A, C, T, G) from the vertices representing themselves. xThe backbone representation mimics the actual chemical structure of the DNA molecule, in which the DNA bases are connected by deoxyribose sugars to a linear phosphate backbone. 14
15
GUI Design xRequirements x The requirements of the GUI are as follows. File Dialog Boxes should be added for better user access to the input files. The entire visual representation of the graphs needs to be shown on the screen. Sometimes these representations exceed the length of the screen. To accommodate these large graphs, scrollbars need to be incorporated into the design. 15
16
GUI Design User interfaces must be provided so that the user can interact with the GUI for displaying the results of each and every iteration. So a button called “ Next Iteration”, which activates the display of substructures on the screen, needs to be incorporated. A Button called the “Compress” button should be provided to the GUI. This button enables the user to see the compressed graph. 16
17
GUI Design For the supervised version of SUBDUE, both negative and positive graphs need to be displayed. The vertices of the graph should display their labels inside the vertices. Since directed edges are used, the arrows with appropriate directions should be displayed. The language to be used for implementation should be portable and be able to run from a browser. It should also have good GUI components. So JDK 1.2 was used to implement the program. 17
18
Implemenation yUnsupervised SUBDUE zUnsupervised SUBDUE GUI requires two input files. A position file which determines the graph position and the output file from the SUBDUE which is parsed as per the requirements of the GUI by the conversion program. z The driver class of the applet is the unsupervised class. This class initializes the applet. When user clicks the next iteration button canvas2 class is invoked and best substructures found in that iteration is displayed. These substructures are arranged by their MDL value. 18
19
Implemenation yUnsupervised SUBDUE zWhen user clicks the compress button then canvas1 class is invoked. This class compresses the input graph by replacing the instances of the best substructure of the iteration by single vertices. The compressed graph will be further compressed when the using the results of that iteration.when-compressed button is clicked. The flow diagrams are as shown below. 19
20
Implemenation y Supervised GUI Implemenation zSupervised SUBDUE GUI requires three input files.Two position files which determines the positive graph position and the negative graph position and the output file from the SUBDUE which is parsed as per the requirements of the GUI by the conversion program. z The driver class of the applet is the supervised class. This class initializes the applet. When user clicks the next iteration button canvas2 class is invoked and best substructures found in that iteration is displayed. These substructures are arranged by their MDL value. 20
21
Implemenation ySupervised GUI Implemenation zWhen user clicks the compress button then canvas1 class is invoked. This class compresses the input graph by replacing the instances of the best substructure of the iteration by single vertices. The compressed graph will be further compressed when the using the results of that iteration.when-compressed button is clicked. The flow diagrams are as shown below. 21
22
Implemenation yParser for the GUI input. zThe Conversion program parses the output from the subdue to make it compatible to the GUI program. This can parse both supervised and unsupervised GUI.The conversion program takes in longer phrases and replaces it by shorter ones It eliminates blank lines and arranges the output in the format required for the GUI input. zEach line starts with a small phrase for E.g. It(iteration number ), C(for compression), v(vertices),e(edges),Val(Value), It(instances). 22
23
Implementation yParser for the GUI input. yThe phrase “Best substructures” indicates the starting of the iteration and the phrase “Graph is compressed using best substructure.” Indicates a compressed graph information 23
24
Demo Of the Project 24
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.