DATA MINING AND MACHINE LEARNING Addison Euhus and Dan Weinberg.

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
CHAPTER 9: Decision Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Classification Algorithms – Continued. 2 Outline  Rules  Linear Models (Regression)  Instance-based (Nearest-neighbor)
Classification Algorithms – Continued. 2 Outline  Rules  Linear Models (Regression)  Instance-based (Nearest-neighbor)
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) KDD Group Research Seminar.
Classification and Decision Boundaries
Lecture Notes for Chapter 4 Introduction to Data Mining
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Basic Data Mining Techniques Chapter Decision Trees.
Instance Based Learning. Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return the answer associated.
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Basic Data Mining Techniques
Lecture 5 (Classification with Decision Trees)
Data Mining Classification: Alternative Techniques
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
CES 514 – Data Mining Lec 9 April 14 Mid-term k nearest neighbor.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
Chapter 6 Decision Trees
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
CSc288 Term Project Data mining on predict Voice-over-IP Phones market Huaqin Xu.
Basic Data Mining Techniques
An Exercise in Machine Learning
Module 04: Algorithms Topic 07: Instance-Based Learning
Data mining and machine learning A brief introduction.
Inductive learning Simplest form: learn a function from examples
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Algorithms: The Basic Methods Witten – Chapter 4 Charles Tappert Professor of Computer Science School of CSIS, Pace University.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Data Management and Database Technologies 1 DATA MINING Extracting Knowledge From Data Petr Olmer CERN
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
Qualitative Data: consists of attributes, labels or non-numerical entries Examples: Quantitative Data: consists of numerical measurements or counts Examples:
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
An Exercise in Machine Learning
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Lecture Notes for Chapter 4 Introduction to Data Mining
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
CSCI 347, Data Mining Chapter 4 – Functions, Rules, Trees, and Instance Based Learning.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.5: Instance-based Learning Rodney Nielsen Many / most of these slides were adapted.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Machine Learning Week 1.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Chapter 7: Transformations
Data Mining CSCI 307, Spring 2019 Lecture 21
Presentation transcript:

DATA MINING AND MACHINE LEARNING Addison Euhus and Dan Weinberg

Introduction Data Mining: Practical Machine Learning Tools and Techniques (Witten, Frank, Hall) What is Data Mining/Machine Learning? “Relationship Finder” Sorts through data… computer based for speed and efficiency Weka Freeware Applications

Components of Machine Learning Concept/Relation: What is being learned from the data set Instances: “Input” Attributes: “Values” Nominal, Ordinal, Interval, Ratio “Name, Order, Distance, Number”

The Output Variety of forms depending on the concept to be learned Can be a table, regression, chart/diagram, grouping/clustering, decision tree, or set of rules Different output will describe different things about same set of data Cross Validation

Some Algorithms in Machine Learning Decision Trees (ID3 Method) Split an attribute into tree nodes and categorize the data and continue splitting into further nodes Want to chose the attribute that is most “pure” – quantitatively speaking, most amount of “information” Entropy(x,y,z) = -xlog 2 (x)-ylog 2 (y)-zlog 2 (z), the amount of uncertainty in the data set info([a,b,c]) = entropy(a/t,b/t,c/t) where t=a+b+c

Decision Trees Continued Gain(Attribute) = info(unsplit) – info(split) The attribute with the most gain should be split first in the decision tree algorithm An example: Gain = info([9,5])-info([2,3],[4,0],[3,2]) =.247 Attribute Yes No Yes, Yes, Yes, Yes Yes No

Other Algorithms Covering Approach: Creates a set of rules, unlike a decision tree However, Same top-down, divide and conquer approach Begin with the end values and then choose the attribute with the most “positive instances” For ratio attributes, models such as line-of-best-fit and other regressions are used Least-Squares regression

Instance-Based Learning At run-time the program uses a variety of different calculations to find instances that are most similar to the test instance Find the “nearest-neighbor,” based off of “distances” Measures “distances” and how close one instance is to another. The closest instances have their value assigned to the test instance Standard Euclidean or alternative forms

Applications Wine Quality Dataset Nearest Neighbor: 65.39% Accuracy Decision Tree: 58.25% Accuracy

Wine Quality Output Nearest Neighbor Decision Tree

A Challenging Classification Problem Shortfalls and Limitations Poker Hands: (Used with over 5,000 different instances) Decision Tree: 55.90% Nearest Neighbor: 46.86% Some speed algorithms: <35% The issue: Suit is numerical (0,1,2,3) as well as the order the cards are presented in