Three methods developed for these objectives Based on machine learning and supervised learning Under the evolutionary paradigm –specifically Genetic Programming.

Slides:



Advertisements
Similar presentations
RIPPER Fast Effective Rule Induction
Advertisements

Decision Tree Approach in Data Mining
Data Mining Tri Nguyen. Agenda Data Mining As Part of KDD Decision Tree Association Rules Clustering Amazon Data Mining Examples.
Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.
Non-Linear Problems General approach. Non-linear Optimization Many objective functions, tend to be non-linear. Design problems for which the objective.
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
Chapter 16 Parallel Data Mining 16.1From DB to DW to DM 16.2Data Mining: A Brief Overview 16.3Parallel Association Rules 16.4Parallel Sequential Patterns.
By Russell Armstrong Supervisor Mrs Wei Ji Diagnosis Analysis of Lung Cancer by Genome Expression Profiles.
COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.
Feature Selection for Regression Problems
About ISoft … What is Decision Tree? Alice Process … Conclusions Outline.
Basic Data Mining Techniques Chapter Decision Trees.
BlueJam Genetic Programming and Evolutionary Algorithms Capturing Creativity with Algorithmic Music Evolutionary Music Composition Machine Learning and.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Basic Data Mining Techniques
Evolutionary Algorithms Simon M. Lucas. The basic idea Initialise a random population of individuals repeat { evaluate select vary (e.g. mutate or crossover)
Genetic Algorithms Learning Machines for knowledge discovery.
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Evolving Agents in a Hostile Environment Alex J. Berry.
12 -1 Lecture 12 User Modeling Topics –Basics –Example User Model –Construction of User Models –Updating of User Models –Applications.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Chapter 5 Data mining : A Closer Look.
Enterprise systems infrastructure and architecture DT211 4
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang.
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Breeding Decision Trees Using Evolutionary Techniques Papagelis Athanasios - Kalles Dimitrios Computer Technology Institute & AHEAD RM.
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
Improved Gene Expression Programming to Solve the Inverse Problem for Ordinary Differential Equations Kangshun Li Professor, Ph.D Professor, Ph.D College.
Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.
Lecture 8: 24/5/1435 Genetic Algorithms Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Basic Data Mining Technique
16 October 2015All Rights Reserved, Edward Tsang Computing Intelligence Meets Forecasting Forecasting System Predictions, in the form of: prices opportunities.
1 Machine Learning: Lecture 12 Genetic Algorithms (Based on Chapter 9 of Mitchell, T., Machine Learning, 1997)
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 16 February 2007 William.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
Initial Population Generation Methods for population generation: Grow Full Ramped Half-and-Half Variety – Genetic Diversity.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
1 Appendix D: Application of Genetic Algorithm in Classification Duong Tuan Anh 5/2014.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Machine Learning A Quick look Sources: Artificial Intelligence – Russell & Norvig Artifical Intelligence - Luger By: Héctor Muñoz-Avila.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Cluster Analysis, an Overview Laurie Heyer. Why Cluster? Data reduction – Analyze representative data points, not the whole dataset Hypothesis generation.
Principles in the Evolutionary Design of Digital Circuits J. F. Miller, D. Job, and V. K. Vassilev Genetic Programming and Evolvable Machines.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
5. Methodology Compare the performance of XCS with an implementation of C4.5, a decision tree algorithm called J48, on the reminder generation task. Exemplar.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
An Evolutionary Algorithm for Neural Network Learning using Direct Encoding Paul Batchis Department of Computer Science Rutgers University.
George Yauneridge.  Machine learning basics  Types of learning algorithms  Genetic algorithm basics  Applications and the future of genetic algorithms.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
University of Essex Alma Lilia García Almanza Edward P.K. Tsang Simplifying Decision Trees Learned by Genetic Programming.
Great Workshop La Palma -June 2011 Handling Imbalanced Datasets in Multistage Classification Mauro López Centro de Astrobiología.
GP FOR ADAPTIVE MARKETS Jake Pacheco 6/11/2010. The Goal  Produce a system that can create novel quantitative trading strategies for the stock market.
Evolving Decision Rules (EDR)
Rule Induction for Classification Using
Presented by: Dr Beatriz de la Iglesia
Heuristic-based Recommendation for Metamodel–OCL Coevolution
iSRD Spam Review Detection with Imbalanced Data Distributions
Lecture 4. Niching and Speciation (1)
Presentation transcript:

Three methods developed for these objectives Based on machine learning and supervised learning Under the evolutionary paradigm –specifically Genetic Programming (GP) Objectives: 1.Improve precision and recall 2.Classify minority class in extreme imbalanced datasets 2. Produce a range of rules to suit user's preferences 3. Generate comprehensible solutions for user interaction. Post-processing of Decision Trees Alma Carcia Almanza Edward Tsang

New Classification Methods For Gathering Patterns in the context of Genetic Programming Alma Lilia Garcia & Edward Tsang University of Essex

New Classification Methods Scenario Method (Pruning)Pruning Repository Method R α … Rµ R’ k (Rules Collection)Rules Collection Evolving Decision Rules Repository Method + Evolution =

Methods overview Repository Method (RM) 1.Classify minority class in imbalanced data sets 2.Produce a range of classifications to suit the user's preferences 3.Provide understandable rules Our aim is to extract and collect different patterns that classify the positive cases (rare instances) in different ways. Evolving decision rules Evolving decision rules (EDR) EDR evolves a population of decision trees to form a repository of rules. The resulting rules are used to create a range of classifications Scenario Method (SM) 1. Analyze decision tree to detect and remove the rules that do not contribute to the classification task. Scenario Method (SM) is a pruning procedure for decision trees created by GP. This pruning is based on the analysis of patterns in the decision tree.

Contributions overview ContributionMethod Minority Class detection Repository Method Evolving Decision rules Simplification Repository Method Evolving Decision rules Scenario Method Range of classifications Repository Method Evolving Decision rules Scenario Method Pruning of trees Scenario Method Analysis in GP Repository Method

In order to mine the knowledge acquired by the evolutionary process Repository Method performs the following steps: Evolve a GP to create a population of decision trees R1R2…RnR1R2…Rn The rule R k is selected by precision; R k is simplified to R’ k 1- Rule extraction 2- Rule simplification R’ k is compared to the rules in the repository by similarity (genotype) R α … Rµ 3- New rule detection 4- Add rule to the repository If R’ k is a new rule R’ k is added to the rule repository R’ k RM Presentation

Evolving Decision Trees EDR Presentation Initial Population generated at random R1R2…R3R1R2…R3... Every Rule R k whose precision achieve a predefined precision threshold is simplified to remove redundant and vacuous conditions Every decision tree is divided in rules (patterns) R’ k Is compared to the rules in a rule collection RaRb…RnRaRb…Rn if R’k is a new rule it is added to the rule set R’ k generate a new population of decision tree by mutating and hill- climbing on the rules in the repository and generating trees at random... The new population is processed until the maximum number of iterations is reached

Scenario Method R1= { 2, 7, 10} R2= { 2, 14, 18} R3= { 2, 14, 21} IF 0 AND 1 > 2 Var2 3 AND 6 Var3 4 OR 5 AND 13 > 7 Var > 10 Var > 14 Var2 15 Var5 16 OR 17 > 18 Var > 21 Var9 22 > 24 OR 25 The tree is composed by the following rules Where the numbers represent the node of the condition  Evaluate every decision rule  Consider that the R2 is not contributing to the classification task. Thus, we analyze every condition in R2 to determine which conditions are involved in other rules  The only condition that is not involved in the other rules is 18  Remove the condition Procedure R1= { 2, 7, 10} R2= { 2, 14, 21} Var3 23 SM Presentation