Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011.

Slides:



Advertisements
Similar presentations
Heuristic Search techniques
Advertisements

Random Forest Predrag Radenković 3237/10
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Support Vector Machines
Huge Raw Data Cleaning Data Condensation Dimensionality Reduction Data Wrapping/ Description Machine Learning Classification Clustering Rule Generation.
EXPERT SYSTEMS apply rules to solve a problem. –The system uses IF statements and user answers to questions in order to reason just like a human does.
Biologically Inspired AI (mostly GAs). Some Examples of Biologically Inspired Computation Neural networks Evolutionary computation (e.g., genetic algorithms)
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Neuro-Evolution of Augmenting Topologies Ben Trewhella.
The Data Mining Visual Environment Motivation Major problems with existing DM systems They are based on non-extensible frameworks. They provide a non-uniform.
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Academic Advisor: Prof. Ronen Brafman Team Members: Ran Isenberg Mirit Markovich Noa Aharon Alon Furman.
Learning From Data Chichang Jou Tamkang University.
D Nagesh Kumar, IIScOptimization Methods: M1L4 1 Introduction and Basic Concepts Classical and Advanced Techniques for Optimization.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Chapter 5 Data mining : A Closer Look.
Data Mining Techniques
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
Chapter 14: Artificial Intelligence Invitation to Computer Science, C++ Version, Third Edition.
Chapter 10 Artificial Intelligence. © 2005 Pearson Addison-Wesley. All rights reserved 10-2 Chapter 10: Artificial Intelligence 10.1 Intelligence and.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
An Introduction to Artificial Intelligence and Knowledge Engineering N. Kasabov, Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering,
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Artificial Intelligence
Lecture 8: 24/5/1435 Genetic Algorithms Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Fuzzy Genetic Algorithm
Open Terminology Portal (TOP) Frank Hartel, Ph.D. Associate Director, Enterprise Vocabulary Services National Cancer Institute, Center for Biomedical Informatics.
Genetic Algorithms Siddhartha K. Shakya School of Computing. The Robert Gordon University Aberdeen, UK
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Using Interactive Evolution for Exploratory Data Analysis Tomáš Řehořek Czech Technical University in Prague.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
1 CS 385 Fall 2006 Chapter 1 AI: Early History and Applications.
PICTURE your design. Purpose : Functions & Content Functions the facilities that make the content of the ICT useful for relevant users and other ICT’s.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Data Mining and Decision Support
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
Evolving RBF Networks via GP for Estimating Fitness Values using Surrogate Models Ahmed Kattan Edgar Galvan.
World Futures Studies Federation (WFSF) 19. World Conference Creating objective evaluations, or the possibilities of similarity analysis Pitlik, L., Pető.
2D-LDA: A statistical linear discriminant analysis for image matrix

A field of study that encompasses computational techniques for performing tasks that require intelligence when performed by humans. Simulation of human.
Organic Evolution and Problem Solving Je-Gun Joung.
Principles in the Evolutionary Design of Digital Circuits J. F. Miller, D. Job, and V. K. Vassilev Genetic Programming and Evolvable Machines.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Genetic Algorithms An Evolutionary Approach to Problem Solving.
Chapter 12 Case Studies Part B. Control System Design.
It is time for a change: Two New Deep Learning Algorithms beyond Backpropagation By Bojan PLOJ.
LECTURE 01: Introduction to Algorithms and Basic Linux Computing
Machine Learning overview Chapter 18, 21
USING MICROBIAL GENETIC ALGORITHM TO SOLVE CARD SPLITTING PROBLEM.
Chapter 11: Artificial Intelligence
CHAPTER 1 Introduction BIC 3337 EXPERT SYSTEM.
An Artificial Intelligence Approach to Precision Oncology
School of Computer Science & Engineering
Introduction to Genetic Algorithm (GA)
Chapter 15 QUERY EXECUTION.
Advanced Artificial Intelligence Evolutionary Search Algorithm
Overview of Machine Learning
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
Presentation transcript:

Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Knowledge Discovery Automation Our goal: –Given input dataset, automatically construct KF and offer output knowledge that the user is satisfied with –Create such a system is a big deal! Automated Knowledge Discovery

Knowledge Discovery Automation What is Knowledge Discovery? –Transformation of input data to human- interpretable knowledge –Oriented graph of actions (Knowledge Flow) is a suitable approach

Knowledge Discovery Ontology Ontology (definition) –Formal representation of a domain –Specification of entities, their properties and relations –Provides a vocabulary, which can be used to model a domain E.g.: dataset, model, testing sample, scatter plot, confusion matrix, association rule…

Knowledge Discovery Ontology Ontology design problems in KD: –Which KFs are reasonable? –How should the output report look like? –May the metadata be helpful? –Are the some categories of users with similar interests? Two ideas concerning Ontology: –Deductive approach –Inductive approach

Knowledge Discovery Ontology Deductive approach: –Ontology is given –Based on the Ontology, and the given dataset, try to construct appropriate KF

Knowledge Discovery Ontology Deductive approach: Taken from: M. Žáková, P. Křemen, F. Železný, Nada Lavrač: Automating Knowledge Discovery Workflow Composition Through Ontology-Based Planning (2010)

Knowledge Discovery Ontology Inductive approach: –No prior assumptions about the Ontology –Learn the Ontology based on a database of KFs designed by experts Meta- Knowledge Discovery Discovered KD Ontology

Our Approach: Revolutionary Reporting There may be thousands of useful KFs –Different datasets may require different actions –Different users may require different knowledge Maybe, users form clusters: –„DM Scientist“ – may experiment with different algorithms on a given dataset –„Business Manager“ – may appreciate beer-and-diapers rule

Let’s design a system capable of learning what do users like! –Adopt Interactive Evolutionary Computation –Collect feedback to evaluate fitness of a given KF, for a given user, on a given dataset, –Store the feedback, along with the metadata, to a database –As the DB grows, offer intelligent KF mutation based on the experience Our Approach: Revolutionary Reporting

Interactive Evolutionary Computation (IEC) –Also known as „Aesthetic Selection“ –Evolutionary Computation using Human evaluation as fitness function Inspiration: Our Approach: Revolutionary Reporting

PicBreeder Jimmy Secretan Kenneth Stanley Interactive Evolution by

Next generation … and so on …

And after 75 generations you eventually get something interesting

The technology hidden behind x z grayscale x z Neural net draws the image

Neuroevolution grayscale By clicking, you increase fitness of nets Next generations inherit fit building patterns x z

Gallery of discovered images

Collaborative evolution You start your evolution, where others finished … … and when discover something interesting … … you store it to database.

System core Experience Database Feedback User Our Approach: Revolutionary Reporting

First Experiments: Data Projection Transform input Dataset to 2D Similar to PCA, Sammon projection etc. Examples in n-Dimensional space 2D

Experiment Setup User Web Client AJAX Google API Tomcat Server Feedback Collection GUI RapidMiner 5 jabsorb JSON-RPC (via HTTP) MySQL Genetic Algorithm Current Population Feedback

Data Projection Experiments Linear transformation –Evolve coefficient matrix –Do the transformation using formula: … resulting a point in 2D-space

[ Demonstration ]

Data Projection Experiments Sigmoidal transformation –Evolve coefficient matrix –Do the transformation using formula: a b c

Interactive Evolution: Issues Fitness function is too costly: –GA requires a lot of evaluations –User may get annoyed, bored, tired… Heuristic approach needed to speed up the evolution! –„Hard-wired“ estimation of projection quality E.g. Clustering homogenity, separability, intra-cluster variability… Puts a limitation on what „quality“ means! –Modeling user’s preferences…?

Surrogate Model Optimization approach in areas where evaluation is too expensive Builds an approximation model of the fitness function Given training dataset of so-far-known candidate solutions and their fitness… …predicts fitness of newly generated candidates

Surrogate Model 1.Collect fitness of an initial sample 2.Construct Surrogate Model 3.Search the Surrogate Model Surrogate Model is cheap to evaluate Genetic Algorithm may be employed 4.Collect fitness at new locations found in step 3. 5.If solution is not good enough, go to 2.

Evaluating Fitness In order to construct fitness-prediction models, training dataset must be delivered Information about fitness provided by the user is indirect –In scope of single population, good projection is sure better than bad one –However, better is a relative term –Is good projection in generation #2 better than bad projection in generation #10…?

Interconnecting generations In each generation, population may be divided to up to 3 categories: –bad, neutral, good Let’s copy the best projection to the next- epoch population –So-called elitism in Evolutionary Computation –In scope of new population, the elite will again fall in one of these 3 categories –This gives us information about cross-generation progress!

Generation #1 Absolutizing Fitness

Generation #2 Equivalence relation Partial order relation Equivalence classes Absolutizing Fitness

Generation #3

Fitness Prediction KF in RM Training dataset Current population Normalization Learning (3NN) Fitness prediction

Thank you for your attention! Tomáš Řehořek