CHAPTER 8 DATA MINING BASICS.

Slides:



Advertisements
Similar presentations
1 Chapter 34 Data Mining Transparencies © Pearson Education Limited 1995, 2005.
Advertisements

Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
1. Abstract 2 Introduction Related Work Conclusion References.
Chapter 9 Business Intelligence Systems
Managing Data Resources
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining Knowledge Discovery in Databases Data 31.
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Data Mining – Intro.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Business Intelligence
Chapter 35 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
3-1 Chapter Three. 3-2 Secondary Data vs. Primary Data Secondary Data: Data that have been gathered previously. Primary Data: New data gathered to help.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
By: Dr. Mohammed Alojail College of Computer Sciences & Information Technology 1.
Dr. Awad Khalil Computer Science Department AUC
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Understanding Data Analytics and Data Mining Introduction.
Computers Are Your Future Tenth Edition Chapter 12: Databases & Information Systems Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall1.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Artificial Intelligence is the field of computer science that studies how machines can be made to act intelligently. The benefit of using The benefit of.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Chapter 14 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining Copyright KEYSOFT Solutions.
Customer Relationship Management (CRM) Chapter 4 Customer Portfolio Analysis Learning Objectives Why customer portfolio analysis is necessary for CRM implementation.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
CRISP-DM Tommy Wei Cory Hutchinson ISDS Overview What is CRISP-DM (CRoss Industry Standard Process for Data Mining) Blueprint Phases and Tasks Summary.
DATA MINING It is a process of extracting interesting(non trivial, implicit, previously, unknown and useful ) information from any data repository. The.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Copyright  2007 McGraw-Hill Pty Ltd PPTs t/a Marketing Research 2e by Lukas, Hair, Bush and Ortinau Slides prepared by Judy Rex 19-1 Chapter Nineteen.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Supplemental Chapter: Business Intelligence Information Systems Development.
Data Mining.
Introduction BIM Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Adrian Tuhtan CS157A Section1
Kenneth C. Laudon & Jane P. Laudon
Presentation transcript:

CHAPTER 8 DATA MINING BASICS

CHAPTER OBJECTIVE: What is Data Mining – Data Mining Definitions Data Mining Architecture Data Mining Applications Advantages and Disadvantages of Data Mining Data Mining Processes Data Mining Techniques

Knowledge Discovery Knowledge discovery is a process that extracts implicit, potentially useful or previously unknown information from the data. The knowledge discovery process is described as follows: Knowledge Discovery Process

Let’s examine the knowledge discovery process in the diagram above in details: Data comes from variety of sources is integrated into a single data store called target data Data then is pre-processed and transformed into standard format. The data mining algorithms process the data to the output in form of patterns or rules. Then those patterns and rules are interpreted to new or useful knowledge or information. The ultimate goal of knowledge discovery and data mining process is to find the patterns that are hidden among the huge sets of data and interpret them to useful knowledge and information. As described in process diagram above, data mining is a central part of knowledge discovery process. Let answer the question “what is data mining?” by examining several data mining definitions.

What is Data Mining – Data Mining Definitions Data mining is a process used by companies to turn raw data into useful information. By using software to look for patterns in large batches of data, businesses can learn more about their customers and develop more effective marketing strategies as well as increase sales and decrease costs. Data mining is a process that analyzes the large amount of data to find the new and hidden information that improves business efficiency. Various industries have been adopt data mining to their mission-critical business processes to gain competitive advantages and help business grows. Data mining is the practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis.

The key properties of data mining are: Data mining depends on effective data collection and warehousing as well as computer processing. Data mining uses sophisticated mathematical algorithms to segment the data and evaluate the probability of future events. Data mining is also known as Knowledge Discovery in Data (KDD). The key properties of data mining are: Automatic discovery of patterns Prediction of likely outcomes Creation of actionable information Focus on large data sets and databases Data mining can answer questions that cannot be addressed through simple query and reporting techniques.

Data Mining Architecture There are four possible architectures of a data mining system as follows: No-coupling :in this architecture, A data mining system does not utilize any functionality of a database or data warehouse system. A no-coupling data mining system retrieves data from a particular data sources such as file system, processes data using major data mining algorithms and stores results into file system. The no-coupling data mining architecture does not take any advantages of database or data warehouse that is already very efficient in organizing, storing, accessing and retrieving data. The no-coupling architecture is considered a poor architecture for data mining system however it is used for simple data mining processes.

2) Loose Coupling In this architecture, A data mining system uses database or data warehouse for data retrieval. In loose coupling data mining architecture, data mining system retrieves data from database or data warehouse, processes data using data mining algorithms and stores the result in those systems. This architecture is mainly for memory-based data mining system that does not require high scalability and high performance.

3) Semi-tight Coupling In this architecture, Linking to database or data warehouse system. Uses several features of database or data warehouse systems to perform some data mining tasks including sorting, indexing, aggregation…etc. Some intermediate result can be stored in database or data warehouse system for better performance.

4) Tight Coupling In this architecture, Database or data warehouse is treated as an information retrieval component of data mining system using integration. All the features of database or data warehouse are used to perform data mining tasks. Provides system scalability, high performance and integrated information.

Data Mining Applications Data Mining Applications in Sales/Marketing Data mining enables the businesses to understand the patterns hidden inside past purchase transactions, thus helping in plan and launch new marketing campaigns in prompt and cost effective way. Data mining is used for market basket analysis to provides insight information on what product combinations were purchased, when they were bought and in what sequence by customers. This information helps businesses to promote their most profitable products to maximize the profit. In addition, it encourages customers to purchase related products that they may have been missed or overlooked. Retails companies uses data mining to identify customer’s behavior buying patterns.

Data Mining Applications in Banking / Finance Several data mining techniques such as distributed data mining has been researched, modeled and developed to help credit card fraud detection. Data mining is used to identify customers loyalty by analyzing the data of customer’s purchasing activities such as the data of frequency of purchase in a period of time, total monetary value of all purchases and when was the last purchase. After analyzing those dimensions, the relative measure is generated for each customer. To help bank to retain credit card customers, data mining is used. By analyzing the past data, data mining can help banks to predict customers that likely to change their credit card affiliation so they can plan and launch different special offers to retain those customers. Credit card spending by customer groups can be identified by using data mining. The hidden correlation’s between different financial indicators can be discovered by using data mining. From historical market data, data mining enable to identify stock trading rules.

Data Mining Applications in Health Care and Insurance The growth of the insurance industry is entirely depends on the ability of converting data into the knowledge about customers, competitors and its markets. The data mining applications in insurance industry are listed below: Data mining is applied in claims analysis such as identifying which medical procedures are claimed together. Data mining enables to forecasts which customers will potentially purchase new policies. Data mining allows insurance companies to detect risky customers’ behavior patterns.

Data Mining Applications in Medicine Data mining enables to characterize patient activities to see coming office visits. Data mining help identify the patterns of successful medical therapies for different illnesses.

Advantages and Disadvantages of Data Mining Data mining is an important part of knowledge discovery process that analyzes large enormous set of data and gives us unknown, hidden and useful information and knowledge. Data mining has not only applied effectively in business environment but also in other fields such as weather forecast, medicine, transportation, healthcare, insurance, government and etc. Data mining brings a lot of advantages when using in a specific industry. Besides those advantages, data mining also has its own disadvantages as well such as privacy, security and misuse of information. We will examine the advantage and disadvantages of data mining in different industries in a greater details.

Disadvantages of data mining Privacy Issues The concerns about the personal privacy have been increasing enormously recently especially when internet is booming with social networks, e-commerce, forums, blogs…. Because of privacy issues, people are afraid of their personal information is collected and used in unethical way that potentially causing them a lot of trouble. Security issues Security is a big issue. Businesses owns information about their employee and customers including social security number, birthday, payroll and etc. However how properly this information is taken is still in questions. There have been a lot of cases that hackers were accesses and stole big data of customers from big corporation such as Ford Motor Credit Company, Sony… with so much personal and financial information available, the credit card stolen and identity theft become a big problem. Misuse of information/inaccurate information Information collected through data mining intended for marketing or ethical purposes can be misused. This information is exploited by unethical people or business to take benefit of vulnerable people or discriminate against a group of people.

Data Mining Processes The data mining process must be reliable and repeatable by business people with little knowledge or no data mining background. In 1990, a cross-industry standard process for data mining (CRISP-DM) first published after going through a lot of workshops, and contributions from over 300 organizations. Let’s examine the cross-industry standard process for data mining in greater details.

Cross-Industry Standard Process for Data Mining (CRISP-DM) The Cross-Industry Standard Process for Data Mining (CRISP-DM) Cross-Industry Standard Process for Data Mining (CRISP-DM) consists of six phases intended as a cyclical process as the following figure: Cross-Industry Standard Process for Data Mining (CRISP-DM)

Business understanding phase Understand business objectives clearly and make sure to find out what the client really want to achieve. Assess the current situation by finding about the resources, assumptions, constraints. Create data mining goals to achieve the business objective and within the current situation. A good data mining plan has to be established to achieve both business and data mining goals. Data understanding Starts with initial data collection that collects data from available sources to get familiar with data. The “gross” or “surface” properties of acquired data need to be examined carefully and reported. The data need to be explored by tackling the data mining questions, which can be addressed using querying, reporting and visualization. The data quality must be examined by answering some important questions such as “Is the acquired data complete?”, “Is there any missing values in the acquired data?”

Data preparation Once data sources available are identified, they need to be selected, cleaned, constructed and formatted into the desired form. Modeling Modeling techniques have to be selected to be used for the prepared dataset. The test scenario must be generated to validate the models’ quality and validity. One or more models are created by running the modeling tool on the prepared dataset. Models need to be assessed carefully involving stakeholders to make sure that created models are meet business initiatives.

Evaluation The model results must be evaluated in the context of business objectives in the first phase. New business requirements may be raised due to new patterns has been discovered in the model results or from other factors. Deployment The information that gain through data mining process needs to be presented in such a way that stakeholders can use it when they want it. Based on the business requirements, the deployment phase could be as simple as creating a report or as complex as a repeatable data mining process across the organization. The CRISP-DM: Offers a uniform framework for experience documentation and guidelines. Can apply in different industry with different type of data.

Data Mining Techniques There are several major data mining techniques have been developed and used in data mining projects recently including: Association Classification Clustering Prediction

Association Association is one of the best known data mining technique. In association, a pattern is discovered based on a relationship of a particular item on other items in the same transaction. For example, the association technique is used in market basket analysis to identify what products that customers frequently purchase together. Based on this data businesses can have corresponding marketing campaign to sell more products to make more profit.

2) Classification Basically classification is used to classify each item in a set of data into one of predefined set of classes or groups. Classification method makes use of mathematical techniques such as decision trees, linear programming, neural network and statistics. The software can learn how to classify the data items into groups. For example, we can apply classification in application that “given all past records of employees who left the company, predict which current employees are probably to leave in the future.” In this case, we divide the employee’s records into two groups that are “leave” and “stay”. And then we can ask our data mining software to classify the employees into each group.

3) Clustering Clustering is a data mining technique that makes meaningful or useful cluster of objects that have similar characteristic using automatic technique. Clustering technique also defines the classes and put objects in them, while in classification objects are assigned into predefined classes. For example. In a library, books have a wide range of topics available. The challenge is how to keep those books in a way that readers can take several books in a specific topic without hassle. By using clustering technique, we can keep books that have some kind of similarities in one cluster or one shelf and label it with a meaningful name. If readers want to grab books in a topic, he or she would only go to that shelf instead of looking the whole in the library.

4) Prediction The prediction as it name implied is one of a data mining techniques that discovers relationship between dependent and independent variables. For instance, prediction analysis technique can be used in sale to predict profit for the future if we consider sale is an independent variable, profit could be a dependent variable.