Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.

Similar presentations


Presentation on theme: "Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it."— Presentation transcript:

1 Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information. Information that can be used to increase revenue, cuts costs, or both. Technically, Data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.

2 Data, Information, and Knowledge
Data are any facts, numbers, or text that can be processed by a computer. Operational or transactional data such as, sales, cost, inventory, payroll, and accounting Nonoperational data, such as industry sales, forecast data, and macro economic data Meta data - data about the data itself, such as logical database design or data dictionary definitions

3 Continued: Information can be converted into knowledge about historical patterns and future trends. For example, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer or retailer could determine which items are most susceptible to promotional efforts.

4 Data Warehouses Defined as a process of centralized data management and retrieval. Represents an ideal vision of maintaining a central repository of all organizational data. Centralization of data is needed to maximize user access and analysis. Data analysis software are allowing users to access this data freely. 

5 Data Mining Work Data mining provides the link between the transaction and analytical systems. Analyzes relationships and patterns in stored transaction data based on open-ended user queries. Several types of analytical software are available: statistical, machine learning, and neural networks. 

6 Types of Relationships
Classes: Stored data is used to locate data in predetermined groups. Clusters: Data items are grouped according to logical relationships or consumer preferences. Associations: Data can be mined to identify associations. Sequential patterns: Data is mined to anticipate behavior patterns and trends.

7 Data Mining Elements Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals. Analyze the data by application software. Present the data in a useful format, such as a graph or table.

8 Data Mining for Software Engineering
Consists of collecting software engineering data, extracting some knowledge from it and, if possible, use this knowledge to improve the software engineering process. Data mining for software engineering can be decomposed along three axes like the goal, the input data used, and the mining technique used.

9 Continued The goal. Software engineering at large consists of many tasks from specification, design, development, monitoring at runtime, etc. Each task is itself decomposed in many smaller scale tasks. A developer constantly switches between tasks, such as navigating code, reading documentation, writing code, debugging, etc.

10 Continued The input data. The software engineering process in its entirety manipulates all kinds of data. Of course, one thinks of code, but there are also many written documents (specifications, documentation), design documents (diagrams, formulas), runtime documents (logs), etc. Depending on the targeted goal, some artifacts are more or less appropriate.

11 Continued The techniques. there is a wealth of data-mining and machine learning techniques. Mature implementations are available and powerful hardware enables techniques to scale to large datasets. To manipulate software engineering data, there is no one-size-fits-all solution. From supervised to unsupervised approaches, numerical or categorical, batch or online, many techniques have been used.


Download ppt "Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it."

Similar presentations


Ads by Google