Presentation is loading. Please wait.

Presentation is loading. Please wait.

Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011.

Similar presentations


Presentation on theme: "Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011."— Presentation transcript:

1 Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

2 Knowledge Discovery Automation Our goal: –Given input dataset, automatically construct KF and offer output knowledge that the user is satisfied with –Create such a system is a big deal! Automated Knowledge Discovery

3 Knowledge Discovery Automation What is Knowledge Discovery? –Transformation of input data to human- interpretable knowledge –Oriented graph of actions (Knowledge Flow) is a suitable approach

4 Knowledge Discovery Ontology Ontology (definition) –Formal representation of a domain –Specification of entities, their properties and relations –Provides a vocabulary, which can be used to model a domain E.g.: dataset, model, testing sample, scatter plot, confusion matrix, association rule…

5 Knowledge Discovery Ontology Ontology design problems in KD: –Which KFs are reasonable? –How should the output report look like? –May the metadata be helpful? –Are the some categories of users with similar interests? Two ideas concerning Ontology: –Deductive approach –Inductive approach

6 Knowledge Discovery Ontology Deductive approach: –Ontology is given –Based on the Ontology, and the given dataset, try to construct appropriate KF

7 Knowledge Discovery Ontology Deductive approach: Taken from: M. Žáková, P. Křemen, F. Železný, Nada Lavrač: Automating Knowledge Discovery Workflow Composition Through Ontology-Based Planning (2010)

8 Knowledge Discovery Ontology Inductive approach: –No prior assumptions about the Ontology –Learn the Ontology based on a database of KFs designed by experts Meta- Knowledge Discovery Discovered KD Ontology

9 Our Approach: Revolutionary Reporting There may be thousands of useful KFs –Different datasets may require different actions –Different users may require different knowledge Maybe, users form clusters: –„DM Scientist“ – may experiment with different algorithms on a given dataset –„Business Manager“ – may appreciate beer-and-diapers rule

10 Let’s design a system capable of learning what do users like! –Adopt Interactive Evolutionary Computation –Collect feedback to evaluate fitness of a given KF, for a given user, on a given dataset, –Store the feedback, along with the metadata, to a database –As the DB grows, offer intelligent KF mutation based on the experience Our Approach: Revolutionary Reporting

11 Interactive Evolutionary Computation (IEC) –Also known as „Aesthetic Selection“ –Evolutionary Computation using Human evaluation as fitness function Inspiration: http://picbreeder.orghttp://picbreeder.org Our Approach: Revolutionary Reporting

12 PicBreeder Jimmy Secretan Kenneth Stanley Interactive Evolution by

13 Next generation … and so on …

14 And after 75 generations...... you eventually get something interesting

15 The technology hidden behind x z grayscale x z Neural net draws the image

16 Neuroevolution grayscale By clicking, you increase fitness of nets Next generations inherit fit building patterns x z

17 Gallery of discovered images

18 Collaborative evolution You start your evolution, where others finished … … and when discover something interesting … … you store it to database.

19 System core Experience Database Feedback User Our Approach: Revolutionary Reporting

20 First Experiments: Data Projection Transform input Dataset to 2D Similar to PCA, Sammon projection etc. Examples in n-Dimensional space 2D

21 Experiment Setup User Web Client AJAX Google API Tomcat Server Feedback Collection GUI RapidMiner 5 jabsorb JSON-RPC (via HTTP) MySQL Genetic Algorithm Current Population Feedback

22 Data Projection Experiments Linear transformation –Evolve coefficient matrix –Do the transformation using formula: … resulting a point in 2D-space

23 [ Demonstration ]

24 Data Projection Experiments Sigmoidal transformation –Evolve coefficient matrix –Do the transformation using formula: a b c

25 Interactive Evolution: Issues Fitness function is too costly: –GA requires a lot of evaluations –User may get annoyed, bored, tired… Heuristic approach needed to speed up the evolution! –„Hard-wired“ estimation of projection quality E.g. Clustering homogenity, separability, intra-cluster variability… Puts a limitation on what „quality“ means! –Modeling user’s preferences…?

26 Surrogate Model Optimization approach in areas where evaluation is too expensive Builds an approximation model of the fitness function Given training dataset of so-far-known candidate solutions and their fitness… …predicts fitness of newly generated candidates

27 Surrogate Model 1.Collect fitness of an initial sample 2.Construct Surrogate Model 3.Search the Surrogate Model Surrogate Model is cheap to evaluate Genetic Algorithm may be employed 4.Collect fitness at new locations found in step 3. 5.If solution is not good enough, go to 2.

28 Evaluating Fitness In order to construct fitness-prediction models, training dataset must be delivered Information about fitness provided by the user is indirect –In scope of single population, good projection is sure better than bad one –However, better is a relative term –Is good projection in generation #2 better than bad projection in generation #10…?

29 Interconnecting generations In each generation, population may be divided to up to 3 categories: –bad, neutral, good Let’s copy the best projection to the next- epoch population –So-called elitism in Evolutionary Computation –In scope of new population, the elite will again fall in one of these 3 categories –This gives us information about cross-generation progress!

30 Generation #1 Absolutizing Fitness

31 Generation #2 Equivalence relation Partial order relation Equivalence classes Absolutizing Fitness

32 Generation #3

33 Fitness Prediction KF in RM Training dataset Current population Normalization Learning (3NN) Fitness prediction

34 Thank you for your attention! Tomáš Řehořek rehorto2@fel.cvut.cz


Download ppt "Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011."

Similar presentations


Ads by Google