Download presentation
Presentation is loading. Please wait.
1
Structural Knowledge Discovery Used to Analyze Earthquake Activity Jesus A. Gonzalez Lawrence B. Holder Diane J. Cook
2
MOTIVATION AND GOAL l Need to analyze large amounts of information in real world databases. l Information that standard tools can not detect. l Earthquake Database. l Previous knowledge: Spatio-Temporal relations.
3
SUBDUE KNOWLEDGE DISCOVERY SYSTEM l SUBDUE discovers patterns (substructures) in structural data sets. l SUBDUE represents data as a labeled graph. l Inputs: Vertices and Edges. l Outputs: Discovered patterns and instances.
4
EXAMPLE object triangle object square on shape Vertices: objects or attributes Edges: relationships 4 instances of
5
EVALUATION CRITERION l Minimum Encoding. l Graph Compression. l Substructure Size (Tried but did not work).
6
EVALUATION CRITERION MINIMUM DESCRIPTION LENGTH l Minimum Description Length (MDL) principle. The best theory to describe a set of data is the one that minimizes the DL of the entire data set. l DL of the graph: the number of bits necessary to completely describe the graph. l Search for the substructure that results in the maximum compression.
7
THE EARTHQUAKE DATABASE l Several catalogs. l Sources like the National Geophysical Data Center. l Each record with 35 fields describing the earthquake characteristics.
8
THE EARTHQUAKE DATABASE KNOWLEDGE REPRESENTATION
9
THE EARTHQUAKE DATABASE PRIOR KNOWLEDGE l Connections between events where its epicenters were close to each other in distance (<= 75 kilometers). l Connections between events that happened close to each other in time (<= 36 hours). l Spatio-Temporal relations represented with “near_in_distance” and “near_in_time” edges.
10
l Geologist Dr. Burke Burkart. l Study of seismology caused by the Orizaba Fault. l Fault: A fracture in a surface where a displacement of rocks also happened. l Selection of the area of study, two squares: l First Longitude 94.0W through 101.0W and Latitude 17.0N through 18.0N. l Second Longitude 94.0W through 98.0W and Latitude 18.0N through 19.0N. DETERMINING EARTHQUAKE ACTIVITY
11
l Area of Study
12
DETERMINING EARTHQUAKE ACTIVITY l Divide the area in 44 rectangles of one half of a degree in both longitude and latitude. l Sample the earthquake activity in each sub-area. l Run Subdue in each sub-area.
13
DETERMINING EARTHQUAKE ACTIVITY
14
l Substructure 1 (with 19 instances) and substructure 2 (with 8 instances) found in sub-area 26.
15
DETERMINING EARTHQUAKE ACTIVITY l This pattern might give us information about the cause of the earthquakes. l Subduction also affects this area but it affects at a specific depth according to the closeness to the Pacific Ocean.
16
SUBDUE’S POTENTIAL l Subdue finds not only shared characteristics of events, but also space relations between them. l Dr. Burke Burkart is studying the patterns to give direction to this research. l Expect to find patterns representing parts of the paths of the involved fault. l Time relations not considered by Subdue. l Earthquake’s characteristics. l Important for other areas.
17
CONCLUSION l Subdue successful in real world databases. l Subdue used prior knowledge to guide search with temporal and spatial relations. l Subdue discovered interesting patterns using these temporal and spatial relations. l Subdue is being used as the data mining tool to study the “Orizaba Fault” in Mexico.
18
FUTURE WORK l Concept Learning Subdue l Theoretical analysis. l Bounds on complexity (e.g. PAC learning). l Graphic User Interface to visualize substructures and their instances.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.