Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining with Artificial Evolution. Helen Johnson Anna Kwiatkowska David Sweeney Panagiotis Tzionas Problem leader: Michele Sebag Team leader: Michael.

Similar presentations


Presentation on theme: "Data Mining with Artificial Evolution. Helen Johnson Anna Kwiatkowska David Sweeney Panagiotis Tzionas Problem leader: Michele Sebag Team leader: Michael."— Presentation transcript:

1 Data Mining with Artificial Evolution. Helen Johnson Anna Kwiatkowska David Sweeney Panagiotis Tzionas Problem leader: Michele Sebag Team leader: Michael Herdy

2 Data mining A multi-objective optimisation problem Aims to extract valid, novel and interesting rules (laws) from data. Validity Support Confidence Law generality Law accuracy

3 Data provided by V. Athias and C. Jeandel “The flows of particles of various sizes in the austral seas” Details of the data set: Particles at four size groups measured at two depths: 2000 and 3000 m A total of 51 measurements over a period of a few hundred days The ‘real’ data problem Example Concentration

4 Dissolved phase Suspended particles Sinking particles OBSERVATIONS Adsorption Agglomeration Sinking TRANSFORMATIONS AIM Model interactions Interactions between particles Parameters

5 Methodology Target = LAW Phenotype: a linear combination of terms 1.2x 2 + x 3 sin(x 1 ) + 3.6x 1 x 2 Genotype: coding of the phenotype (1.2,0,2,3), (1,2,3,1), (3.6,1,1,2) where 0 = x i ; 1 = x i * x j ; 2 = x i sin x j Mixed integer–real valued representation hybrid ES/GA Selection: The problem to find a set of laws (Michigan, Pittsburgh, Universal Suffrage)

6 Example Result P1 P2 Assessing the fitness of one law The law is calculated for each example The results are sorted Plateaux are identified Fitness function is calculated

7 Testing a simple fitness function Fitness function = Σ length(P i ) The known law (A 0 * A 1 = cst). Found laws 1)-0.37A 0 * A 1 – 0.36A 2 /A 2 + 0.07A 0 /A 0 2)-0.04A 0 *A 0 – 0.008A 1 *A 2 – 0.77A 1 *A 0 Example Result v j Example Result v j Fitness=8 Problem with the fitness function:

8 The new fitness function Identifying the maximum length plateau for each example. Example Result v j Correct law: A 0 *A 1 =0.156 One of our best results:A 0 *A 1 =0.12138 Fitness=8 Example Result v j Fitness=64

9 The tautology problem A tautology: A 0 -A 0 =0 A 1 /A 1 =1 A tautology provides no knowledge. The derived laws must be checked for tautologies. Apply laws to a random data set. If the law fits all the data then it is a tautology.

10 Lessons from preliminary experiments 1.Population size: no influence on the laws 2.Probability of crossover: Decrease from 0.6 to 0.4: many tautologies So decrease “tautology threshold”: elimination of some tautologies. 3.Probability of mutation: Decrease from 0.1 to 0.05: improvement in laws 4.Plateau threshold Decreasing the threshold in steps: improved laws

11 Plot generated after optimisation Example Result

12 Conclusions Powerful technique for finding knowledge in data The fitness function is crucial Tuning of the algorithm is data dependant No single optimum algorithm for a specific dataset

13 Pre-processing of data ? Criteria for defining a plateau ? Number of constructs and type of constructs ? How important is law interpretation ? Questions arising

14


Download ppt "Data Mining with Artificial Evolution. Helen Johnson Anna Kwiatkowska David Sweeney Panagiotis Tzionas Problem leader: Michele Sebag Team leader: Michael."

Similar presentations


Ads by Google