Data Mining – A First View Roiger & Geatz
Definition Data mining is the process of employing one or more computer learning techniques to automatically analyze and extract knowledge contained within a database. Knowledge Discovery in Databases (KDD) is same a data mining. Knowledge from a data mining session gives us a model or generalization of the data. Induction-based learning – generalize by observing specifics.
What Can Computer Learn? Facts Concepts Procedures Principles Computers are good at learning concepts – concepts are the outputs from a data mining session.
Three Concept Views Classical view – all concepts have definite defining properties. Probabilistic view – concepts are represented by properties that are probable of concept members. Exemplar view –a given instance is determined to be example of a particular concept if the instance is similar enough to set of one or more known examples of that concept.
Supervised Learning Also known as induction-based supervised concept learning Attribute-value matrix – table 1.1 Decision tree
Unsupervised Clustering Builds models without predefined classes. Table 1.3. Example questions.
Data Mining? Can we clearly define the problem? Does potentially meaningful data exist? Does the data contain hidden knowledge? Or is the data factual and useful for reporting purposes only?
Data Mining or Data Query Shallow knowledge – factual, easily stored and manipulated. SQL is a good tool. Multidimensional knowledge – is also factual but multidimensional knowledge _ OLAP tools. Hidden knowledge – patterns and regularities in data – no SQL – data mining algorithms. Deep knowledge – knowledge in database that can be found only with some direction – current data mining tools are ineffective.
Expert Systems or Data Mining Data Mining: Data – data mining tool – knowledge Expert Systems – Human Expert – Knowledge Engineer – ES building tool – Knowledge
Data Mining Application Fraud detection Health care Business and finance Scientific applications Sports and gaming