Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining, Decision Trees and Earthquake Prediction

Similar presentations


Presentation on theme: "Data Mining, Decision Trees and Earthquake Prediction"— Presentation transcript:

1 Data Mining, Decision Trees and Earthquake Prediction
CS157B Lecture 4 Data Mining, Decision Trees and Earthquake Prediction Professor Sin-Min Lee

2 What is Data Mining? Process of automatically finding the relationships and patterns, and extracting the meaning of enormous amount of data. Also called “knowledge discovery”

3 Objective Extracting the hidden, or not easily recognizable knowledge out of the large data… Know the past Predicting what is likely to happen if a particular type of event occurs … Predict the future

4 Application Marketing example
Sending direct mail to randomly chosen people Database of recipients’ attribute data (e.g. gender, marital status, # of children, etc) is available How can this company increase the response rate of direct mail?

5 Application (Cont’d) Figure out the pattern, relationship of attributes that those who responded has in common Helps making decision of what kind of group of people the company should target

6 Data mining helps analyzing large amount of data, and making decision…but how exactly does it work?
One method that is commonly used is decision tree

7 Decision Tree One of many methods to perform data mining - particularly classification Divides the dataset into multiple groups by evaluating attributes Decision tree can be explained a series of nested if-then-else statements. The Decision Tree is one of the most popular classification algorithms in current use in Data Mining

8 Decision Tree (Cont’d)
Each non-leaf node has a predicate associated, testing an attribute of data Leaf node represents a class, or category To classify a data, start from root node and traverse down the tree by testing predicates and taking branches

9 Example of Decision Tree

10

11 What is a Decision Tree? 20-Questions Example
Progressive Yes-No Decisions Until an Answer is Obtained 20-Questions Machine at Linens & Things Key to the Phylum – classification tool Carl Linnaeus, Swedish Botanist, 1730’s Classifies known species: Kingdoms, Phyla, Classes, Orders, Families, Genera, and Species

12 What is a Decision Tree?

13 What are Decision Trees Used For?
Predicting the credit risk of someone applying for a bank loan

14 How to Use a Decision Tree
Start from the root of tree. Refund MarSt TaxInc YES NO Yes No Married Single, Divorced < 80K > 80K Refund MarSt TaxInc YES NO Yes No Married Single, Divorced < 80K >= 80K Test Data Deduction

15 How to Make a Decision Tree
categorical continuous class Splitting Attributes Refund Yes No NO MarSt Induction Single, Divorced TaxInc NO < 80K > 80K NO YES Model: Decision Tree Training Data

16

17

18

19

20

21

22

23

24 Hunt’s Algorithm Let Dt be the set of training records that reach a node t General Procedure: If Dt contains records that belong the same class yt, then t is a leaf node labeled as yt If Dt is an empty set, then t is a leaf node labeled by the default class, yd If Dt contains records that belong to more than one class, use an attribute test to split the data into smaller subsets. Recursively apply the procedure to each subset. Dt ?

25 Hunt’s Algorithm Refund Refund Refund Marital Marital Status Status
Don’t Cheat Yes No Don’t Cheat Refund Don’t Cheat Yes No Marital Status Single, Divorced Married Refund Don’t Cheat Yes No Marital Status Single, Divorced Married Taxable Income < 80K >= 80K

26 Measure of Purity: Gini
Gini Index for a given node t : (NOTE: p( j | t) is the relative frequency of class j at node t). Maximum (1 - 1/nc) when records are equally distributed among all classes, implying least interesting information Minimum (0.0) when all records belong to one class, implying most interesting information

27 Advantage of Decision Tree
simple to understand and interpret require little data preparation able to handle nominal and categorical data. perform well with large data in a short time the explanation for the condition is easily explained by boolean logic.

28 Advantages of Decision Tree
Easy to visualize the process of classification Can easily tell why the data is classified in a particular category - just trace the path to get to the leaf and it explains the reason Simple, fast processing Once the tree is made, just traverse down the tree to classify the data

29 Decision Tree is for… Classifying the dataset which
The predicates return discrete values Does not have an attributes that all data has the same value

30 CMT catalog: Shallow earthquakes, 1976-2005

31 INDIAN PLATE MOVES NORTH COLLIDING WITH EURASIA
Gordon & Stein, 1992

32 COMPLEX PLATE BOUNDARY ZONE IN SOUTHEAST ASIA
Northward motion of India deforms all of the region Many small plates (microplates) and blocks Molnar & Tapponier, 1977

33 India subducts beneath Burma microplate at about 50 mm/yr Earthquakes occur at plate interface along the Sumatra arc (Sunda trench) These are spectacular & destructive results of many years of accumulated motion

34 NOAA

35 IN DEEP OCEAN tsunami has long wavelength, travels fast, small amplitude - doesn’t affect ships
AS IT APPROACHES SHORE, it slows. Since energy is conserved, amplitude builds up - very damaging

36 TSUNAMI WARNING Because seismic waves travel much faster (km/s) than tsunamis, rapid analysis of seismograms can identify earthquakes likely to cause major tsunamis and predict when waves will arrive Deep ocean buoys can measure wave heights, verify tsunami and reduce false alarms

37 HOWEVER, HARD TO PREDICT EARTHQUAKES recurrence is highly variable
Sieh et al., 1989 Extend earthquake history with geologic records -paleoseismology M>7 mean 132 yr s 105 yr Estimated probability in 30 yrs 7-51%

38 EARTHQUAKE RECURRENCE AT SUBDUCTION ZONES IS COM PLICATED
In many subduction zones, thrust earthquakes have patterns in space and time. Large earthquakes occurred in the Nankai trough area of Japan approximately every 125 years since 1498 with similar fault areas In some cases entire region seems to have slipped at once; in others slip was divided into several events over a few years. Repeatability suggests that a segment that has not slipped for some time is a gap due for an earthquake, but it’s hard to use this concept well because of variability GAP? NOTHING YET Ando, 1975

39

40 1985 MEXICO EARTHQUAKE SEPTEMBER 19, 1985 M8.1 A SUBDUCTION ZONE QUAKE
ALTHOUGH LARGER THAN USUAL, THE EARTHQUAKE WAS NOT A “SURPRISE” A GOOD, MODERN BUILDING CODE HAD BEEN ADOPTED AND IMPLEMENTED

41 1985 MEXICO EARTHQUAKE EPICENTER LOCATED 240 KM FROM MEXICO CITY
400 BUILDINGS COLLAPSED IN OLD LAKE BED ZONE OF MEXICO CITY SOIL-STRUCTURE RESONANCE IN OLD LAKE BED ZONE WAS A MAJOR FACTOR

42 1985 MEXICO EARTHQUAKE: ESSENTIAL STRUCTURES--SCHOOLS

43 1985 MEXICO EARTHQUAKE: STEEL FRAME BUILDING

44 1985 MEXICO EARTHQUAKE: POUNDING

45 1985 MEXICO EARTHQUAKE: NUEVA LEON APARTMENT BUILDINGS

46 1985 MEXICO EARTHQUAKE: SEARCH AND RESCUE

47 Definition Characteristics Project:California Earthquake Prediction)

48 Characteristics (cont.)

49 Characteristics (cont.)
2. Locality: information transferred by a neuron is limited by its nearby neurons. CAEP: short term earthquake prediction is highly influenced by it’s geologic figure locally.

50 Characteristics (cont.)
3. Weighted sum and activation function with nonlinearity: input signal is weighted at the synoptic connection by a connection weight. CAEP: nearby location will be weighted with each activation function.

51 Characteristics (cont.)
4. Plasticity: connection weights change according to the information fed to the neuron and the internal state. This plasticity of the connection weights leads to learning and self-organization. The plasticity realizes the adaptability against the continuously varying environment. CAEP: calculate the stress of focused point according to the seismic wave history in the around area

52 Characteristics (cont.)
5. Generalization: A neural network constructs its own view of the world by inferring an optimal action on the basis of previously learned events by interpolation, and extrapolation. CAEP: get a view of one area from past experience by pattern representation Prediction.

53 Basic Function of CSEP Neuron: list of locations along San Andreas Fault, and two of the associated faults—Hayward and Calaveras.

54 Basic Function of CSEP (cont.)
Neuron’s parameters: magnitude, date, latitude, longitude, depth, location, ground water, observation, etc.

55

56

57 Learning Learning is essential for unknown environments,
i.e., when designer lacks omniscience Learning is useful as a system construction method, i.e., expose the agent to reality rather than trying to write it down Learning modifies the agent's decision mechanisms to improve performance

58 Learning agents

59 Learning element Design of a learning element is affected by
Which components of the performance element are to be learned What feedback is available to learn these components What representation is used for the components Type of feedback: Supervised learning: correct answers for each example Unsupervised learning: correct answers not given Reinforcement learning: occasional rewards

60 Inductive learning Simplest form: learn a function from examples
f is the target function An example is a pair (x, f(x)) Problem: find a hypothesis h such that h ≈ f given a training set of examples (This is a highly simplified model of real learning: Ignores prior knowledge Assumes examples are given)

61 Learning decision trees
Problem: decide whether to wait for a table at a restaurant, based on the following attributes: Alternate: is there an alternative restaurant nearby? Bar: is there a comfortable bar area to wait in? Fri/Sat: is today Friday or Saturday? Hungry: are we hungry? Patrons: number of people in the restaurant (None, Some, Full) Price: price range ($, $$, $$$) Raining: is it raining outside? Reservation: have we made a reservation? Type: kind of restaurant (French, Italian, Thai, Burger) WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)

62 Attribute-based representations
Examples described by attribute values (Boolean, discrete, continuous) E.g., situations where I will/won't wait for a table: Classification of examples is positive (T) or negative (F)

63 Decision trees One possible representation for hypotheses
E.g., here is the “true” tree for deciding whether to wait:

64 Expressiveness Decision trees can express any function of the input attributes. E.g., for Boolean functions, truth table row → path to leaf: Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic in x) but it probably won't generalize to new examples Prefer to find more compact decision trees

65 Hypothesis spaces How many distinct decision trees with n Boolean attributes? = number of Boolean functions = number of distinct truth tables with 2n rows = 22n E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees

66 Hypothesis spaces How many distinct decision trees with n Boolean attributes? = number of Boolean functions = number of distinct truth tables with 2n rows = 22n E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees How many purely conjunctive hypotheses (e.g., Hungry  Rain)? Each attribute can be in (positive), in (negative), or out  3n distinct conjunctive hypotheses More expressive hypothesis space increases chance that target function can be expressed increases number of hypotheses consistent with training set  may get worse predictions

67 Decision tree learning
Aim: find a small tree consistent with the training examples Idea: (recursively) choose "most significant" attribute as root of (sub)tree

68 Choosing an attribute Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative" Patrons? is a better choice

69 Using information theory
To implement Choose-Attribute in the DTL algorithm Information Content (Entropy): I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi) For a training set containing p positive examples and n negative examples:

70 Information gain A chosen attribute A divides the training set E into subsets E1, … , Ev according to their values for A, where A has v distinct values. Information Gain (IG) or reduction in entropy from the attribute test: Choose the attribute with the largest IG

71 Information gain For the training set, p = n = 6, I(6/12, 6/12) = 1 bit Consider the attributes Patrons and Type (and others too): Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root

72 Example contd. Decision tree learned from the 12 examples:
Substantially simpler than “true” tree---a more complex hypothesis isn’t justified by small amount of data


Download ppt "Data Mining, Decision Trees and Earthquake Prediction"

Similar presentations


Ads by Google