The KDD process & Data mining What is KDD? Knowledge Discovery in Databases (KDD) is the extraction of interesting non-trivial, implicit, previously unknown.

The KDD process & Data mining What is KDD? Knowledge Discovery in Databases (KDD) is the extraction of interesting non-trivial, implicit, previously unknown and potentially useful information or patterns from data in large databases. What is Data Mining? Data Mining is the most important step in the KDD process, consisting of applying data analysis and discovery algorithms that, under acceptable computational efficiency limitations, produce a particular enumeration of patterns over the data.

Steps of a KDD Process Knowledge discovery as a process consists of an iterative sequence of the following steps: 1. Data cleaning: Fill in missing values, smooth noisy data, identify or remove outliers, and resolve inconsistencies. 2. Data integration : Integration of multiple databases, data cubes, or files. 3. Data selection : where data relevant to the analysis task are retrieved from the database. 4. Data transformation : where data are transformed into forms appropriate for mining. 5. Data mining: where intelligent methods are applied in order to extract data patterns — an essential step. 6. Pattern evaluation: to identify the truly interesting patterns. 7. Knowledge presentation: where the mined knowledge are presented to the user using representation techniques.

Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data. noisy: containing errors or outliers. inconsistent: containing discrepancies in codes or names. No quality data, no quality mining results! Quality decisions must be based on quality data. Data warehouse needs consistent integration of quality data. Data preprocessing techniques can improve the quality of data that help to improve the accuracy and efficiency of the subsequent mining process. Why Data Preprocessing?

Why Data Mining? Potential Applications Market analysis and management target marketing, customer relation management, market basket analysis, cross selling, market segmentation Risk analysis and management Forecasting, customer retention, improved underwriting, quality control, competitive analysis Fraud detection and management

Types of knowledge to be mined (Data mining functions to be performed) Characterization Summarize the features of the class under study (target class) in general terms. E.g. Summarize the characteristics of customers who spent more than $1000 during 2003. Discrimination Compare the feature of the target class with one or a set of comparative classes (contrasting classes). E.g. Create a comparative profile of customers that shop often to customers that shop rarely in our store Association The discovery of association rules showing attribute-value conditions that frequently occur Widely used for market-basket analysis and transaction data analysis

Classification/prediction The process of finding a set of models (or functions) that describe and distinguish data classes or concepts. The derived model is based on the analysis of training data, whose class label is known. Examples Classify countries based on climate, Classify cars based on gas mileage Clustering Clustering principle: Objects that belong to the same cluster must be similar to each other, while objects that belong to different clusters must be dissimilar to each other. Example Identify homogeneous subpopulations of customers.

Outlier analysis Identify outliers in a set of data. Outlier: a data object that does not comply with the general behavior of the data It can be considered as noise or exception but is quite useful in fraud detection, rare events analysis Example Discovery of fraudulent usage of credit cards.

Agents Agents are defined as computer systems that are situated in some environment and are capable of autonomous action in this environment in order to meet their design objectives. Intelligent agents are defined as agents that can react to changes in their environment, to reach their goals

Multi-Agent Systems (MAS) Multi-Agent Systems (MAS) are communities of software entities, operating under decentralized control, designed to address (often complex) applications in a distributed problem solving manner. MAS offer a number of general advantages with respect to computer supported cooperative working, distributed computation and resource sharing. Advantages include: 1. Autonomy. 2. Decentralized control. 3. Robustness. 4. Sharing of expertise. 5. Sharing of resources. 6.Simple extendibility.

Why use MAS in KDD ? Location of Data. Sensitivity of Data. Protection of Property rights. Diversity of DM Algorithms. Distributed and Parallel Processing.

Multi-Agent Data Mining (MADM) There are two themes of agent and interaction and integration : DM for agents, referred to as mining-driven agents ;and agents for DM, referred to as agent-driven DM, commonly known as Multi-Agent Data Mining Here we concern with the use of agent and MAS to perform DM activities.

The user interface component receives data mining requests. Once the request is received, it is processed (parsed) to determine the data mining algorithms and data sources required to respond to the request (this is the function of the Planning and management component). The identified data sources are then mined(the processing component), through access to the data interface component, and the results returned to the user via the user (interface) component. Structural Analysis

System General Architecture

User Agent 1. Receive and interpret user data mining requests. 2. Communicate the user request to the processing agents. 3. Return the generated results, in a suitable and easily understandable format, back to the user. 4. Expect and respond to asynchronous events, such as “a stop mining” or “end operations” instructions that may be issued by the user.

Registration Agent: The general excepted function of the registration is to inform all appropriate agents that are already in the system and interested of a new agent arrival. Facilitator Agent (or Broker Agent): The facilitator agent serves as an advisor agent that facilitates the distribution of requests to agents that have expressed an ability to handle them. Task Agent (or Management Agent): the task agent will ask the facilitator agent for DM agents which can fulfil the desired tasking.

Data Mining (DM) Agent: A DM agent implements a specific DM technique or algorithm. The process module contain methods for initiating and carrying out the DM activity, capturing the results of DM, and communicating it to a data agent or a task agent. Data Agent (or Resource Agent): A data agent is responsible for a data source and maintains meta-data information about the data source.

Agent-based data mining model Agent-based data mining model contains many different types of Agent used to achieve different functions. The commonly used types are as follows: 1) Agent which is responsible for interpersonal interaction; 2)Agent used to select data pre-processing method; 3)Agent used for data pre-processing; 4)Agent used to select data mining algorithms; 5) Agent used to achieve the specific data mining process; 6) Agent used to express the results and for post-processing. The Model can contain multiple functions with the same Agent. The structure of the mining model M and the information interaction[3] between different types of Agent are shown in ( Figure 1 ).

Fig. 1 Intelligent Agent-based data mining model and the interactions between the various members

The specific steps include: (1)The Agent that is responsible for human-computer interaction generate the mining task according to the users requirements. This step can use the graphical interface to allow the user to input information about implementing mining. For example: the location that the database stored, the type of mining results and the forms of expression, etc.; (2) After receiving the mining task, the interactive Agent passed the information to Agent that used to select the method of data preprocessing, the Methods Agent determine whether conduct the data pre-processing according to the received information. If necessary, it then select the appropriate pretreatment method based on their knowledge, after that the Agent pass the pre-processing tasks to the appropriate data pre-processing Agent; if it doesn't need to conduct the pretreatment, then directly choose the mining algorithms;

(3) The data pre-processing Agent is mainly used to complete the data pre-processing, and sent the results to the mining algorithm selection Agent; (4) The Agent that used to select mining algorithms receives the data, combined the information needs of users and their expertise to select the appropriate mining algorithm, the task then pass to the corresponding data mining algorithms Agent; (5) Data mining algorithms Agent receives the information, call the corresponding algorithm and started mining tasks, mining results will eventually be outputted to the Agent used to express the results and for post -processing;

(6) The Agent that used to express the results and for post- processing receives the results, then choose the form that meet the users 'preferences and sent it to the interaction Agent to display to the users.The model proposed above is just a simplified model,it need further analysis If want to get really effective knowledge, such as the Agent requires the expression of knowledge, collaboration rules between the different Agent, Agent's autonomic computing mining algorithm design.

Data Mining in the Railway Information Integration Program use Agent-based data mining model The specific procedure of data Mining in the railway information integration program can be divided into the following four steps: (1)Embed the function of the interaction Agent and the representation and post-processing Agent into the Ministry of Railways, Railways and Stations of the User Agent group, in the integration task of all User Agent group, reflect the user's mining needs through a specific, appropriate manner. (2)In the program of develop specific solutions,the task analysis Agent use the knowledge of the field mission to determine whether exist integration of data mining-related tasks, if there exists, instruct it with an effective specific form in the program.

(3) After determined the integration program, using the evaluate Agent to evaluate the program, and converted it into executable tasks. For the data mining tasks, use the production management Agent to establish the basic function-perform body's that are used to perform the corresponding data mining tasks. Each body performs different parts of the task. These function-perform body's are composition to create global and local data model Agent. (4) When performing the data mining tasks, function-perform body's finish the responsible data mining tasks based on timing logic within the appropriate time, The information obtained is the effective integration of data model components. This information is used to support the implementation of decisions of Railways, Ministry of Railways, Railways and Stations of the integration with the inner information.

Finally, In sometime around 1993, another effort was started on agent-based data mining [11,12,13]. It is to utilize agent technology to enhance data mining. The enhancement may be embodied in terms of varying aspects, for instance, infrastructure, distributed processing, human involvement. The following lists some of research topics. - Agent-based enterprise data mining - Agent-based data mining infrastructure - Agent-based data warehouse - Agent-based mining process and project management - Agent-based distributed data mining - Agent-based multi-data source mining - Agent-based interactive data mining - Agent-based parallel data mining - Agent-based web mining - Agent-based text mining

- Agent-based human mining cooperation - Agent-based link mining - Agent-enriched ontology mining - Agent-based ubiquitous data mining - Agent knowledge management in distributed data mining - Agent for data mining data preparation - Agent-human-cooperated data mining - Agent networks in distributed knowledge discovery and servicing - Agent service-based KDD infrastructure - Agent-supported domain knowledge involvement in KDD - Agent system providing data mining services - Automated data mining learning - Distributed agent-based data preprocessing - Distributed learning - Domain intelligence in agent-based data mining - Mobile agent-based knowledge discovery - Multi-agent reinforcement learning - Multi-agent knowledge discovery - Protocols for agent-based data mining - Self-organizing data mining learning, etc.

[ 1] L.M. Jia, G. Liu and Y. Qin: Intelligent Agent Based Dynamical Collaboration Mechanism for Complex Task Solving[M].Science Press2007:8-14 [2] J. W. Han, M. Kimber: Date Mining Concepts and Techniques, Second Edition[M].China Machine Press ， 2008:146-155. [3] N. Saiwaki, J. Kawabata, H.Tsujimoto: Cooperative performance system based on virtual agents[C].IEEE Conf.USA:1996.3230-3234 [4] S. Bergamaschi, G. Cabir, F. Guerra, L. Leonardi, M. Vincini and F. Zambonelli. Exploiting agents to support information integration[J].Internation Journal on Cooperative Information Systems,2002,11(3) [5] M. R. Genesereth, A. M. Keller and O. M. Duschka. Info-master :an information integration system[C].Proceeding of SIGMOD 97,New York,May 1997,539-542 6. Davies, W.: ANIMALS: A Distributed, Heterogeneous Multi-Agent Learning System. MSc Thesis, University of Aberdeen (1993) 7. Davies, W.: Agent-Based Data-Mining (1994) 8. Edwards, P., Davies, W.: A Heterogeneous Multi-Agent Learning System. In Deen, S.M. (ed) Proceedings of the Special Interest Group on Cooperating Knowledge Based Systems. University of Keele (1993) 163–184 9. An investigation into the issues of Multi-Agent Data Mining, the University of Liverpool for the degree of Doctor in Philosophy By Kamal Ali Albashiri References

The KDD process & Data mining What is KDD? Knowledge Discovery in Databases (KDD) is the extraction of interesting non-trivial, implicit, previously unknown.

Similar presentations

Presentation on theme: "The KDD process & Data mining What is KDD? Knowledge Discovery in Databases (KDD) is the extraction of interesting non-trivial, implicit, previously unknown."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The KDD process & Data mining What is KDD? Knowledge Discovery in Databases (KDD) is the extraction of interesting non-trivial, implicit, previously unknown.

Similar presentations

Presentation on theme: "The KDD process & Data mining What is KDD? Knowledge Discovery in Databases (KDD) is the extraction of interesting non-trivial, implicit, previously unknown."— Presentation transcript:

Similar presentations

About project

Feedback