Chapter 8 The k-Means Algorithm and Genetic Algorithm.

Slides:



Advertisements
Similar presentations
Genetic Algorithms Contents 1. Basic Concepts 2. Algorithm
Advertisements

Lazy vs. Eager Learning Lazy vs. eager learning
Tuesday, May 14 Genetic Algorithms Handouts: Lecture Notes Question: when should there be an additional review session?
Classification and Decision Boundaries
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
Data Mining CS 341, Spring 2007 Genetic Algorithm.
Data classification based on tolerant rough set reporter: yanan yean.
Basic Data Mining Techniques Chapter Decision Trees.
Basic Data Mining Techniques
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Genetic Algorithm Genetic Algorithms (GA) apply an evolutionary approach to inductive learning. GA has been successfully applied to problems that are difficult.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Chapter 6: Transform and Conquer Genetic Algorithms The Design and Analysis of Algorithms.
Radial Basis Function Networks
Evaluating Performance for Data Mining Techniques
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Genetic Algorithm.
Evolutionary Intelligence
Inductive learning Simplest form: learn a function from examples
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
1 Local search and optimization Local search= use single current state and move to neighboring states. Advantages: –Use very little memory –Find often.
Chapter 9 Neural Network.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CS 484 – Artificial Intelligence1 Announcements Lab 3 due Tuesday, November 6 Homework 6 due Tuesday, November 6 Lab 4 due Thursday, November 8 Current.
Lecture 8: 24/5/1435 Genetic Algorithms Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Basic Data Mining Technique
Zorica Stanimirović Faculty of Mathematics, University of Belgrade
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Fuzzy Genetic Algorithm
Genetic Algorithms Introduction Advanced. Simple Genetic Algorithms: Introduction What is it? In a Nutshell References The Pseudo Code Illustrations Applications.
Genetic Algorithms Siddhartha K. Shakya School of Computing. The Robert Gordon University Aberdeen, UK
GENETIC ALGORITHM A biologically inspired model of intelligence and the principles of biological evolution are applied to find solutions to difficult problems.
Derivative Free Optimization G.Anuradha. Contents Genetic Algorithm Simulated Annealing Random search method Downhill simplex method.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
1 Genetic Algorithms K.Ganesh Introduction GAs and Simulated Annealing The Biology of Genetics The Logic of Genetic Programmes Demo Summary.
Introduction to Genetic Algorithms. Genetic Algorithms We’ve covered enough material that we can write programs that use genetic algorithms! –More advanced.
Genetic Algorithms Genetic algorithms provide an approach to learning that is based loosely on simulated evolution. Hypotheses are often described by bit.
Chapter 12 FUSION OF FUZZY SYSTEM AND GENETIC ALGORITHMS Chi-Yuan Yeh.
EE749 I ntroduction to Artificial I ntelligence Genetic Algorithms The Simple GA.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Waqas Haider Bangyal 1. Evolutionary computing algorithms are very common and used by many researchers in their research to solve the optimization problems.
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Data Mining and Decision Support
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Genetic Algorithms. Underlying Concept  Charles Darwin outlined the principle of natural selection.  Natural Selection is the process by which evolution.
Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.
Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
1 Comparative Study of two Genetic Algorithms Based Task Allocation Models in Distributed Computing System Oğuzhan TAŞ 2005.
Genetic Algorithms. Solution Search in Problem Space.
EVOLUTIONARY SYSTEMS AND GENETIC ALGORITHMS NAME: AKSHITKUMAR PATEL STUDENT ID: GRAD POSITION PAPER.
Genetic Algorithms An Evolutionary Approach to Problem Solving.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
 Presented By: Abdul Aziz Ghazi  Roll No:  Presented to: Sir Harris.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Chapter 14 Genetic Algorithms.
Genetic Algorithms.
COSC 4335: Other Classification Techniques
Searching for solutions: Genetic Algorithms
Avoid Overfitting in Classification
Presentation transcript:

Chapter 8 The k-Means Algorithm and Genetic Algorithm

Data Warehouse and Data Mining Chapter 8 2 Contents k-Means algorithm Genetic algorithm Rough set approach Fuzzy set approaches

Data Warehouse and Data Mining Chapter 8 3 The K-Means Algorithm The K-Means algorithm is a simple yet effective statistical clustering technique. Here is the algorithm: 1. Choose a value for K, the total number of clusters to be determined. 2. Choose K instances (data points) within the dataset at random. These are the initial cluster centers. 3. Use simple Euclidean distance to assign the remaining instances to their closest cluster center.

Data Warehouse and Data Mining Chapter Use the instances in each cluster to calculate a new mean for each cluster. 5. If the new mean values are identical to the mean values of the previous iteration the process terminates. Otherwise, use the new means as cluster centers and repeat steps 3-5. The K-Means Algorithm

Data Warehouse and Data Mining Chapter 8 5 The K-Means Algorithm An Example Using K-Means The K-Means Algorithm An Example Using K-Means

Data Warehouse and Data Mining Chapter 8 6 The K-Means Algorithm An Example Using K-Means The K-Means Algorithm An Example Using K-Means

Data Warehouse and Data Mining Chapter 8 7 The K-Means Algorithm General Considerations The K-Means Algorithm General Considerations

Data Warehouse and Data Mining Chapter 8 8 The K-Means Algorithm General Considerations The K-Means Algorithm General Considerations

Data Warehouse and Data Mining Chapter 8 9 The k-Nearest Neighbor Algorithm All instances correspond to points in the n-D space. The nearest neighbor are defined in terms of Euclidean distance. The target function could be discrete- or real- valued.. _ + _ xqxq + _ _ + _ _

Data Warehouse and Data Mining Chapter 8 10 The k-Nearest Neighbor Algorithm For discrete-valued, the k-NN returns the most common value among the k training examples nearest to x q. Vonoroi diagram: the decision surface induced by 1-NN for a typical set of training examples.. _ + _ xqxq + _ _ + _ _

Data Warehouse and Data Mining Chapter 8 11 Discussion on the k-NN Algorithm The k-NN algorithm for continuous-valued target functions –Calculate the mean values of the k nearest neighbors Distance-weighted nearest neighbor algorithm –Weight the contribution of each of the k neighbors according to their distance to the query point x q giving greater weight to closer neighbors –Similarly, for real-valued target functions

Data Warehouse and Data Mining Chapter 8 12 Genetic Learning Here we present a basic genetic learning algorithm. 1. Initialize a population P of n elements, often referred to as chromosomes, as a potential solution. 2. Until a specified termination condition is satisfied: a. Use a fitness function to evaluate each element of the current solution. If an element passes the fitness criteria, it remains in P. b. The population now contains m elements (m<=n). Use genetic operators to create (n-m) new elements. Add the new elements to the population.

Data Warehouse and Data Mining Chapter 8 13 Genetic Learning Genetic Algorithms and Supervised Learning Genetic Learning Genetic Algorithms and Supervised Learning

Data Warehouse and Data Mining Chapter 8 14 Genetic Learning Genetic Algorithms and Supervised Learning Genetic Learning Genetic Algorithms and Supervised Learning

Data Warehouse and Data Mining Chapter 8 15 Genetic Learning Genetic Algorithms and Supervised Learning Genetic Learning Genetic Algorithms and Supervised Learning

Data Warehouse and Data Mining Chapter 8 16 Genetic Learning Genetic Algorithms and Supervised Learning Genetic Learning Genetic Algorithms and Supervised Learning

Data Warehouse and Data Mining Chapter 8 17 Genetic Learning Genetic Algorithms and... Supervised Learning Genetic Learning Genetic Algorithms and... Supervised Learning

Data Warehouse and Data Mining Chapter 8 18 Genetic Learning Genetic Algorithms and..Unsupervised Clustering Genetic Learning Genetic Algorithms and..Unsupervised Clustering

Data Warehouse and Data Mining Chapter 8 19 Genetic Learning Genetic Algorithms and Unsupervised Clustering Genetic Learning Genetic Algorithms and Unsupervised Clustering

Data Warehouse and Data Mining Chapter 8 20 Genetic Learning General Considerations Genetic Learning General Considerations Here is a list of considerations when using a problem-solving approach based on genetic learning:  Genetic algorithms are designed to find globally optimized solutions. However, there is no guarantee that any given solution is not the result of a local rather than a global optimization.  The fitness function determines the computational complexity of a genetic algorithm. A fitness function involving several calculations can be computationally expensive.

Data Warehouse and Data Mining Chapter 8 21 Genetic Learning General Considerations Genetic Learning General Considerations  Genetic algorithms explain their results to the extent that the fitness function is understandable.  Transforming the data to form suitable for a genetic algorithm can be a challenge.

Data Warehouse and Data Mining Chapter 8 22 Genetic Algorithms GA: based on an analogy to biological evolution Each rule is represented by a string of bits An initial population is created consisting of randomly generated rules Based on the notion of survival of the fittest, a new population is formed to consists of the fittest rules and their offsprings The fitness of a rule is represented by its classification accuracy on a set of training examples Offsprings are generated by crossover and mutation

Data Warehouse and Data Mining Chapter 8 23 Genetic Algorithms Population-based technique for discovery of....knowledge structures Based on idea that evolution represents search for optimum solution set Massively parallel

Data Warehouse and Data Mining Chapter 8 24 The Vocabulary of GAs Population –Set of individuals, each represented by one or more strings of characters Chromosome –The string representing an individual

Data Warehouse and Data Mining Chapter 8 25 Locus : The ordinal place... on a chromosome where a specific gene is found Allele : The value of a specific gene Gene The basic informational unit on a chromosome The vocabulary of GAs, contd.

Data Warehouse and Data Mining Chapter 8 26 Genetic operators Reproduction –Increase representations of strong individuals Crossover –Explore the search space Mutation –Recapture “ lost ” genes due to crossover

Data Warehouse and Data Mining Chapter 8 27 Genetic operators illustrated...

Data Warehouse and Data Mining Chapter 8 28 GAs rely on the concept of “ fitness ” Ability of an individual to survive into the next generation “ Survival of the fittest ” Usually calculated in terms of an objective fitness function Maximization Minimization Other functions

Data Warehouse and Data Mining Chapter 8 29 Genetic Programming Based on adaptation and evolution Structures undergoing adaptation are computer programs of varying size and shape Computer programs are genetically “ bred ” over time

Data Warehouse and Data Mining Chapter 8 30 The Learning Classifier System Rule-based knowledge discovery and concept learning tool Operates by means of evaluation, credit assignment, and discovery applied to a population of “ chromosomes ” (rules) each with a corresponding “ phenotype ” (outcome)

Data Warehouse and Data Mining Chapter 8 31 Components of a Learning Classifier System Performance –Provides interaction between environment and rule base –Performs matching function Reinforcement –Rewards accurate classifiers –Punishes inaccurate classifiers Discovery –Uses the genetic algorithm to search for plausible rules

Data Warehouse and Data Mining Chapter 8 32 Rough Set Approach Rough sets are used to approximately or “ roughly ” define equivalent classes A rough set for a given class C is approximated by two sets: –a lower approximation (certain to be in C) –an upper approximation (cannot be described as not belonging to C)

Data Warehouse and Data Mining Chapter 8 33 Fuzzy Set Approaches Fuzzy logic uses truth values between 0.0 and 1.0 to represent the degree of membership (such as using fuzzy membership graph) Attribute values are converted to fuzzy values –e.g., income is mapped into the discrete categories {low, medium, high} with fuzzy values calculated

Data Warehouse and Data Mining Chapter 8 34 Fuzzy Set Approaches For a given new sample, more than one fuzzy value may apply Each applicable rule contributes a vote for membership in the categories Typically, the truth values for each predicted category are summed.

Data Warehouse and Data Mining Chapter 8 35 Chapter Summary The K-Means algorithm is a statistical unsupervised clustering technique. All input attributes to the algorithm must be numeric and the user is required to make a decision about..... how many clusters are to be discovered. The algorithm begins by randomly choosing one data point to represent each cluster. Each data instance is then placed in the cluster to which it is most similar. New cluster centers are computed and the process continues until.....the cluster centers do not change.

Data Warehouse and Data Mining Chapter 8 36 Chapter Summary The K-Means algorithm is easy to implement and understand. However, the algorithm is not guaranteed to converge to a globally optimal solution, lacks the ability to explain what has been found, unable to tell which attributes are significant in determining the formed clusters. Despite these limitations, the K-Means algorithm is among the most widely used clustering techniques.

Data Warehouse and Data Mining Chapter 8 37 Chapter Summary Genetic algorithms apply the theory of evolution to inductive learning. Genetic learning can be supervised...or...unsupervised typically used for problems that cannot be solved with traditional techniques. A standard genetic approach to learning applies a fitness function to a set of data elements to determine which elements survive from one generation to the next.

Data Warehouse and Data Mining Chapter 8 38 Chapter Summary Those elements not surviving are used to create new instances to replace deleted elements. In addition to being used for supervised learning and unsupervised clustering, genetic techniques can be employed in conjunction with other learning techniques.

Data Warehouse and Data Mining Chapter 8 39 Key Terms Affinity analysis. The process of determining which things are typically grouped together. Confidence. Given a rule of the form “If A then B,” confidence is defined as the conditional probability that B is true when A is known to be true. Crossover. A genetic learning operation that creates new population elements by combining parts of two or more elements from the current population.

Data Warehouse and Data Mining Chapter 8 40 Key Terms Genetic algorithm. A data mining technique based on the theory of evolution. Mutation. A genetic learning operation that creates a new population element by randomly modifying a portion of an existing element. Selection. A genetic learning operation that adds copies of current population elements with high fitness scores to the next generation of the population.

Data Warehouse and Data Mining Chapter 8 41 Reference Data Mining: Concepts and Techniques (Chapter 7 Slide for textbook), Jiawei Han and Micheline Kamber, Intelligent Database Systems Research Lab, School of Computing Science, Simon Fraser University, Canada