Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining.

Similar presentations

Presentation on theme: "Data Mining."— Presentation transcript:

1 Data Mining

2 Data Mining Taxonomy Predictive Method
- …predict the value of a particular attribute… Descriptive Method - …foundation of human-interpretable patterns that describe the data…

3 Overview Introduction Data Mining Taxonomy
Data Mining Models and Algorithms Quick Wins with Data Mining Privacy-Preserving Data Mining

4 Definition of Data Mining
“…The non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data…” Fayyad, Piatetsky-Shapiro, Smyth [1996]

5 Overview Introduction Data Mining Taxonomy
Data Mining Models and Algorithms Quick Wins with Data Mining Privacy-Preserving Data Mining

6 Data Mining Taxonomy Clustering Association Predictive Models
Descriptive Models Clustering Association Creation of different customer segments, unrelated products that are bought together (market basket analysis). Predictive Models Classification Regression customer’s likelihood of switching to a competitor, an insurance claim’s likelihood of being fraudulent, the likelihood someone will place a catalog order, the revenue a customer will generate during the next year

7 Classification & Regression
…aim to identify the characteristics that indicate the group to which each case belongs… Two Crows Corporation Regression: …uses existing values to forecast what other values will be…

8 Clustering & Association
…divides a database into different groups… …find groups that are very different from each other, with similar members…. Two Crows Corporation Association: …involve determinations of affinity-how frequently two or more things occur together…

9 Deviation Detection & Pattern Discovery
…discovering most significant changes in data from previously measured or normative values… V. Kumar, M. Joshi, Tutorial on High Performance Data Mining. Sequential Pattern Discovery: …process of looking for patterns and rules that predict strong sequential dependencies among different events…

10 Overview Introduction Data Mining Taxonomy
Data Mining Models and Algorithms Quick Wins with Data Mining Privacy-Preserving Data Mining

11 Data Mining Models & Algorithms
Neural Networks Decision Trees Rule Induction K-nearest Neighbor Logistic regression Discriminant Analysis

12 Neural Networks efficiently model large and complex problems;
may be used in classification problems or for regressions; Starts with input layer => hidden layer => output layer 3 1 4 6 2 5 Output Inputs Hidden Layer

13 Neural Networks (cont.)
can be easily implemented to run on massively parallel computers; can not be easily interpret; require an extensive amount of training time; require a lot of data preparation (involve very careful data cleansing, selection, preparation, and pre-processing); require sufficiently large data set and high signal-to noise ratio.

14 Decision Trees (cont.) handle very well non-numeric data;
work best when the predictor variables are categorical;

15 Decision Trees Income>40,000
-a way of representing a series of rules that lead to a class or value; -basic components of a decision tree: decision node, branches and leaves; Income>40,000 Job> High Debt Low Risk High Risk High Risk Low Risk No Yes Yes No No Yes

16 Rule Induction method of deriving a set of rules to classify cases;
generate a set of independent rules which do not necessarily form a tree; may not cover all possible situations; may sometimes conflict in their predictions.

17 K-nearest neighbor decides in which class to place a new case by examining some number of the most similar cases or neighbors; assigns the new case to the same class to which most of its neighbors belong; X X x X Y x X N X X Y

18 Artificial Neural Networks

19 Introduction What is neural computing/neural networks?
The brain is a remarkable computer. It interprets imprecise information from the senses at an incredibly high speed.  

20 Introduction A good example is the processing of visual information: a one-year-old baby is much better and faster at recognising objects, faces, and other visual features than even the most advanced AI system running on the fastest super computer. Most impressive of all, the brain learns (without any explicit instructions) to create the internal representations that make these skills possible

21 Biological Neural Systems
The brain is composed of approximately 100 billion (1011) neurons A typical neuron collects signals from other neurons through a host of fine structures called dendrites. The neuron sends out spikes of electrical activity through a long, thin strand known as an axon, which splits into thousands of branches. At the end of the branch, a structure called a synapse converts the activity from the axon into electrical effects that inhibit or excite activity in the connected neurons. When a neuron receives excitatory input that is sufficiently large compared with its inhibitory input, it sends a spike of electrical activity down its axon. Schematic drawing of two biological neurons connected by synapses Learning occurs by changing the effectiveness of the synapses so that the influence of one neuron on the other changes

22 What is a Neural Net? A neural net simulates some of the learning functions of the human brain. It can recognize patterns and "learn." You can use it to forecast and make smarter business decisions. It can also serve as an "expert system" that simulates the thinking of an expert and can offer advice. Unlike conventional rule-based artificial-intelligence software, a neural net extracts expertise from data automatically - no rules are required. In other words through the use of a trial and error method the system “learns” to become an “expert” in the field the user gives it to study.

23 Components Needed: In order for a neural network to learn it needs 2 basic components: Inputs Which consists of any information the expert uses to determine his/her final decision or outcome. Outputs Which are the decisions or outcome arrived at by the expert that correspond to the inputs entered.

24 How does a neural network learn?
A neural network learns by determining the relation between the inputs and outputs. By calculating the relative importance of the inputs and outputs the system can determine such relationships. Through trial and error the system compares its results with the expert provided results in the data until it has reached an accuracy level defined by the user. With each trial the weight assigned to the inputs is changed until the desired results are reached.

25 Artificial Neural Networks
Artificial neurons are analogous to their biological inspirers Here the neuron is actually a processing unit, it calculates the weighted sum of the input signal to the neuron to generate the activation signal a, given by An artificial neuron where wi is the strength of the synapse connected to the neuron, xi is an input feature to the neuron

26 Artificial Neural Networks
The activation signal is passed through a transform function to produce the output of the neuron, given by The transform function can be linear, or non-linear, such as a threshold or sigmoid function [more later …]. For a linear function, the output y is proportional to the activation signal a. For a threshold function, the output y is set at one of two levels, depending on whether the activation signal a is greater than or less than some threshold value. For a sigmoid function, the output y varies continuously as the activation signal a changes.

27 Artificial Neural Networks
Artificial neural network models (or simply neural networks) are typically composed of interconnected units or artificial neurons. How the neurons are connected depends on some specific task that the neural network performs. Two key features of neural networks distinguish them from any other sort of computing developed to date: Neural networks are adaptive, or trainable Neural networks are naturally massively parallel These features suggest the potential for neural network systems capable of learning, autonomously improving their own performance, adapting automatically to changing environments, being able to make decisions at high speed and being fault tolerant.

28 Neural Network Architectures
Feed-forward single layered networks Feed-forward multi-layer networks Recurrent networks

29 Neural Network Applications
Speech/Voice recognition Optical character recognition Face detection/Recognition Pronunciation (NETtalk) Stock-market prediction Navigation of a car Signal processing/Communication Imaging/Vision ….

Download ppt "Data Mining."

Similar presentations

Ads by Google