AI Week 14 Machine Learning: Introduction to Data Mining Lee McCluskey, room 3/10

Slides:



Advertisements
Similar presentations
1 CS 391L: Machine Learning: Rule Learning Raymond J. Mooney University of Texas at Austin.
Advertisements

Data Mining Tools Overview Business Intelligence for Managers.
Decision Tree Rong Jin. Determine Milage Per Gallon.
AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
AI Week 22 Machine Learning Data Mining Lee McCluskey, room 2/07
Week 9 Data Mining System (Knowledge Data Discovery)
Data Mining.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Part I: Classification and Bayesian Learning
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Introduction to machine learning
Introduction to Data Mining Engineering Group in ACL.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Data Mining Techniques
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
03/23/09 AI Week 15 Machine Learning: Data Mining : Association Rule Mining, Associative Classification, Applications Lee McCluskey, room 3/10
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
MACHINE LEARNING 張銘軒 譚恆力 1. OUTLINE OVERVIEW HOW DOSE THE MACHINE “ LEARN ” ? ADVANTAGE OF MACHINE LEARNING ALGORITHM TYPES  SUPERVISED.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Data Clustering 1 – An introduction
Machine Learning CSE 681 CH2 - Supervised Learning.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
CS Learning Rules1 Learning Sets of Rules. CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Using Data Mining Technologies to find Currency Trading Rules A. G. Malliaris M. E. Malliaris Loyola University Chicago Multinational Finance Society,
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
1 CSC 8520 Spring Paula Matuszek Kinds of Machine Learning Machine learning techniques can be grouped into several categories, in several ways: –What.
Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
Spam Detection Ethan Grefe December 13, 2013.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CogNova Technologies 1 Evaluating Induced Models Evaluating Induced Models with Daniel L. Silver Daniel L. Silver Copyright (c), 2004 All Rights Reserved.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
Data Mining and Decision Support
Chong Ho Yu.  Data mining (DM) is a cluster of techniques, including decision trees, artificial neural networks, and clustering, which has been employed.
Academic Year 2014 Spring Academic Year 2014 Spring.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
01-Business intelligence
Machine Learning with Spark MLlib
What Is Cluster Analysis?
Machine Learning overview Chapter 18, 21
Machine Learning overview Chapter 18, 21
DATA MINING © Prentice Hall.
Results for all features Results for the reduced set of features
Alan D. Mead, Talent Algorithms Inc. Jialin Huang, Amazon
Data Mining 101 with Scikit-Learn
Evaluating classifiers for disease gene discovery
What is Pattern Recognition?
Prepared by: Mahmoud Rafeek Al-Farra
CS Fall 2016 (Shavlik©), Lecture 2
Data Mining: Introduction
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Supervised machine learning: creating a model
Shapes.
Machine Learning overview Chapter 18, 21
Machine Learning in Business John C. Hull
Presentation transcript:

AI Week 14 Machine Learning: Introduction to Data Mining Lee McCluskey, room 3/10

Artform Research Group Data Mining: from Machine Learning and Databases DM involves discovering patterns from large data bases or data warehouses for different purposes. It is the science of extracting meaningful information from (large) databases. Two Types of Learning: Data Mining can be  “Learning from Example” (Classifiation) where we want to learn the features that that characteristic of a class eg environmental conditions that lead to an Earthquake. Classes can be binary e.g. spam or notspam Classes can be many e.g. classification of documents  “Learning from Observation” (Knowledge Discovery) where we have lots of observations and we want the DM to discover interesting patterns. We might want to analysis “raw data” (e.g. points in space) to see if there are any pattern, or analyse records and discover patterns in a Relational DB (eg a data warehouse).

Artform Research Group Data Mining Predominantly the techniques used in DM are SYNTACTIC and STATISTICAL. Applications: Data mining and knowledge discovery techniques have been applied to many, many areas including..  Market analysis and Retail  Decision support  Financial analysis  Discovering environmental trends  Disease analysis  Traffic trend analysis We will focus on learning RULES

Artform Research Group Data Mining of Rules: Example Inputs Input to Data Mining Algorithms: Sets of records – eg like data base records. For example -a shopping list might be considered a record where data fields are “nominal” -an environmental observation (temp, wind speed, pressure, wind direction, time) where data fields are more complex – eg real numbers Classification Rule Mining: a class we are interested in characterising (depending on type of learning)

Artform Research Group Data Mining of Rules: Example Outputs Classification Rule Mining: Each record is input with a class C(i) label it is an example of, and OUTPUT is a (set of) classification rules Features => C(1) …. Features => C(n) That can be used in the future to put a record into a class. Association Rule Mining: A set of the most common association rules between features within record is output e.g. If a record with a certain set of features is found (x,y,z, …}, then it is likely that the following are present {a,b,c,…}

Artform Research Group Data Mining and Data Clensing Data Mining is often part of a larger process aimed at getting more out of data warehouses and involves data clensing data clensing: is the process of identifying and removing or correcting corrupted or missing records from a database. This makes the data consistent with other similar data sets in the database. Eg the process may remove invalid post codes, spurious extreme values (eg ).

Artform Research Group Classification Rule Mining: Rule Induction and Use

Artform Research Group Classification Rule Mining - jargon A classification rule LHS => C is built up from examples (and counter examples) of a class C A rule … -- covers an example if the features of LHS are present in the example. -- is characteristic if it is covers all members of a class -- is maximally characteristic if it contains the largest LHS to cover all members of a class -- is discriminating – if it covers NO counter examples (= examples of other classes, if classes are disjoint)

Artform Research Group Classification Rule Mining - jargon X E E E E X X Example space Hypothesis Space Characteristic hypothesis VESRION SPACE – set of all Characteristic and discriminating hypothesis Discriminating hypothesis

Artform Research Group Classification Rule Mining – example.. Size = medium, colour = green, shape = square => c1 Size = small, colour = red, shape = square => c1 Size = small, colour = blue, shape = circle => c1 Size = small, colour = green, shape = triangle => c2 Size = large, colour = white, shape = circle => c2 We aim to find “hypotheses” that are: Characteristic and Discriminating

Artform Research Group Classification Rule Mining: Use Typically two sets of data are used in data mining: 1.Training Set 2.Validation Set These sets are randomly selected. A classifier is a set of classification rules. These are formed on set (1.) and tested out on set (2.) to find out their accuracy. The technique of cross-validation is where the sets are swapped round: the training set becomes the validation set etc

Artform Research Group Conclusions Data Mining is a powerful set of techniques to help analyse data, and discover hidden knowledge There is a growing amount of data available. DM has many applications. DM can be supervised or unsupervised.