Download presentation
Presentation is loading. Please wait.
Published byColin Wells Modified over 9 years ago
1
AI Week 14 Machine Learning: Introduction to Data Mining Lee McCluskey, room 3/10 Email lee@hud.ac.uklee@hud.ac.uk http://scom.hud.ac.uk/scomtlm/cha2555/
2
Artform Research Group Data Mining: from Machine Learning and Databases DM involves discovering patterns from large data bases or data warehouses for different purposes. It is the science of extracting meaningful information from (large) databases. Two Types of Learning: Data Mining can be “Learning from Example” (Classifiation) where we want to learn the features that that characteristic of a class eg environmental conditions that lead to an Earthquake. Classes can be binary e.g. spam or notspam Classes can be many e.g. classification of documents “Learning from Observation” (Knowledge Discovery) where we have lots of observations and we want the DM to discover interesting patterns. We might want to analysis “raw data” (e.g. points in space) to see if there are any pattern, or analyse records and discover patterns in a Relational DB (eg a data warehouse).
3
Artform Research Group Data Mining Predominantly the techniques used in DM are SYNTACTIC and STATISTICAL. Applications: Data mining and knowledge discovery techniques have been applied to many, many areas including.. Market analysis and Retail Decision support Financial analysis Discovering environmental trends Disease analysis Traffic trend analysis We will focus on learning RULES
4
Artform Research Group Data Mining of Rules: Example Inputs Input to Data Mining Algorithms: Sets of records – eg like data base records. For example -a shopping list might be considered a record where data fields are “nominal” -an environmental observation (temp, wind speed, pressure, wind direction, time) where data fields are more complex – eg real numbers Classification Rule Mining: a class we are interested in characterising (depending on type of learning)
5
Artform Research Group Data Mining of Rules: Example Outputs Classification Rule Mining: Each record is input with a class C(i) label it is an example of, and OUTPUT is a (set of) classification rules Features => C(1) …. Features => C(n) That can be used in the future to put a record into a class. Association Rule Mining: A set of the most common association rules between features within record is output e.g. If a record with a certain set of features is found (x,y,z, …}, then it is likely that the following are present {a,b,c,…}
6
Artform Research Group Data Mining and Data Clensing Data Mining is often part of a larger process aimed at getting more out of data warehouses and involves data clensing data clensing: is the process of identifying and removing or correcting corrupted or missing records from a database. This makes the data consistent with other similar data sets in the database. Eg the process may remove invalid post codes, spurious extreme values (eg - 999999.999).
7
Artform Research Group Classification Rule Mining: Rule Induction and Use
8
Artform Research Group Classification Rule Mining - jargon A classification rule LHS => C is built up from examples (and counter examples) of a class C A rule … -- covers an example if the features of LHS are present in the example. -- is characteristic if it is covers all members of a class -- is maximally characteristic if it contains the largest LHS to cover all members of a class -- is discriminating – if it covers NO counter examples (= examples of other classes, if classes are disjoint)
9
Artform Research Group Classification Rule Mining - jargon X E E E E X X Example space Hypothesis Space Characteristic hypothesis VESRION SPACE – set of all Characteristic and discriminating hypothesis Discriminating hypothesis
10
Artform Research Group Classification Rule Mining – example.. Size = medium, colour = green, shape = square => c1 Size = small, colour = red, shape = square => c1 Size = small, colour = blue, shape = circle => c1 Size = small, colour = green, shape = triangle => c2 Size = large, colour = white, shape = circle => c2 We aim to find “hypotheses” that are: Characteristic and Discriminating
11
Artform Research Group Classification Rule Mining: Use Typically two sets of data are used in data mining: 1.Training Set 2.Validation Set These sets are randomly selected. A classifier is a set of classification rules. These are formed on set (1.) and tested out on set (2.) to find out their accuracy. The technique of cross-validation is where the sets are swapped round: the training set becomes the validation set etc
12
Artform Research Group Conclusions Data Mining is a powerful set of techniques to help analyse data, and discover hidden knowledge There is a growing amount of data available. DM has many applications. DM can be supervised or unsupervised.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.