Download presentation
Presentation is loading. Please wait.
Published byKelly Campbell Modified over 9 years ago
1
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information so derived to make crucial business and strategic decision. To discover meaningful patterns and rules.
2
Data Warehouse to Data Mining D.W.H, being a subject-oriented, integrated, time variant and non-volatile collection of data aims at supporting business decision, whereas data mining process is a natural and logical continuation of D.W.H D.M needs accurate, consistent and good quality data as the model mined are to be consistent and accurate. D.W.H is collection of all aspects D.M can give more relations, uncovered patterns and rules it mined from different and more data sources D.W.H Query directions are made easy in D.W.H
3
Steps of Data Mining 1.Identifying the Data – Data could be all over the world not just in the enterprise. data could distributed (paper, people heads etc..) 2.Getting the data ready – (Getting the data ready) – It is to put in data in right format in DB are to be built in the system 3.Mining the data – After right data is known, its cleaned, scrubbed and remove unnecessary items and get only data essential for mining. 4.Getting useful results – After mining, What outcomes do we want? Do we want the tools to find interesting patterns? Are these tools available or do we build the tools? 5.Identifying action – After having determined the data and tools, we start the tools to operate the data. It can produce lots of data so we know what to do with patterns?
4
6.Implementing the actions – After getting useful results? Examine results and identify actions that can be taken eg analyzing pattern items go together and put together 7.Evaluating the benefits – After actions implemented, we wait to save results. These results may immediately or may take longtime. Once we are in position to determine the benefits and costs, re- evaluate the procedure. By that the data may have changed. New tools may be recommended. So plan the next mining cycle and find how to go about it 8.Determining what to do next 9.Carrying out the next cycle
5
Outcomes of Data Mining There are six activities in data mining which are known as data mining tasks or types 1.Classification 2.Estimation 3.Prediction 4.Affinity grouping or Association rules 5.Clustering 6.Description and Visualization
6
Classification Classification consists of examining the features of a newly presented object and assigning to it a predefined class the common features are extracted. Classification is carried out by developing training sets with pre-classified examples and then building a model that fits the description of the classes. In classification, a group of entities is partitioned based on a predefined value of some attributes. Classification deals with discrete outcomes yes or no, debit card or car loan.
7
Estimation Based on the spending patterns of a person and his age one can estimate his salary or the number of children he has. Estimation deals with continuously valued outcome. Classification and estimation are used together.
8
Prediction This task predicts the future behaviour of some values. For example based on the education of a person, his current job and trends in the industry one can predict that his salary will be a certain amount be year 2008. Predictive task feel different because the records are classifed according to some predicted future behaviour or estimate future value. With prediction the only way to check the accuracy of the classification is to wait and see. Historical data is used to build a model that current observed behavior, when applied to current input it can predict future behavior.
9
Affinity grouping or Association rules The task of affinity grouping is to determine is to determine which things go together. This determines the items that go together eg who are the people that travel together? What are the items that are purchased together? Affinity grouping can also be used to identify cross selling opportunity and to design attractive packages or grouping of product and services.
10
Clustering Clustering is a DM task that is often confused with classification. Clustering are formed by analyzing data.For eg group X prefer zen, Y prefer ford icon, z prefer nano. Once cluster are obtained then each cluster can be examined and mined future for other outcomes such as estimation and classification. Clustering ids often done as a prelude to some other form of data mining or modeling. Clustering might be the first step in a market segment efforts
11
Description and Visualization For eg john usually goes shopping after he goes to the bank, but last week he went to church after shopping Anomaly detection is a form of deviation detection and is used for applications such as fraud detection and medical illness detection.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.