Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining Chapter 4_2: Classification Methods (Examples) Prepared by: Mahmoud Rafeek Al-Farra 2013 www.cst.ps/staff/mfarra
Course’s Out Lines Introduction Data Preparation and Preprocessing Data Representation Classification Methods Evaluation Clustering Methods Mid Exam Association Rules Knowledge Representation Special Case study : Document clustering Discussion of Case studies by students
Out Lines Comparing Classification Methods Machine learning techniques Decision Trees k-Nearest Neighbors Naïve Bayesian Classifiers Neural Networks
Comparing Classification Methods Predictive Accuracy: Ability to correctly predict the class label. Speed: Computation costs involved in generating and using model Robustness: Ability to make correct predictions given noisy or/and missing values
Comparing Classification Methods Scalability: Ability to construct model efficiently given large amounts of data Interpretability: Level of understanding and insight that is provided by the model.
Machine learning techniques Things learn when they change their behavior in a way that makes them perform better in the future. Machine learning is the subfield of artificial intelligence that is concerned with the design and development of algorithms that allow computers (machines) to improve their performance over time (to learn) based on data, such as from sensor data or databases
Machine learning techniques Examples of machine learning techniques: Decision Trees k-Nearest Neighbors Naïve Bayesian Classifiers
Decision Trees Decision tree learning is a common method used in data mining. It is an efficient method for producing classifiers from data. A Decision Tree is a tree-structured plan of a set of attributes to test in order to predict the output.
Decision Trees
Decision tree consist of: An internal node is a test on an attribute, e.g. Body temperature . A branch represents an outcome of the test, e.g., Warm A leaf node represents a class label e.g. Mammals At each node, one attribute is chosen to split training examples into distinct classes as much as possible A new case is classified by following a matching path to a leaf node.
Decision tree consist of:
Weather Data: Play or not Play? Outlook Temperature Humidity Windy Play? sunny hot high false No High true overcast Yes rain mild cool normal
Weather Data: Play or not Play? Outlook Case Study: How To Build a tree? sunny rain overcast Humidity Yes Windy true false high normal No Yes No Yes
How To Build a tree? Top-down tree construction Which is the best attribute? ….
Next … k-Nearest Neighbors Naïve Bayesian Classifiers
Thanks