Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.

Slides:



Advertisements
Similar presentations
Chapter 1 Business Driven Technology
Advertisements

By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
1 Chapter 34 Data Mining Transparencies © Pearson Education Limited 1995, 2005.
IS500: Information Systems Instructor: Dr. Boris Jukic Decision Support Systems.
Data Mining Knowledge Discovery in Databases Data 31.
Data Mining By Archana Ketkar.
Data Mining and Data Warehousing – a connected view.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
DataMining By Guan Hang Su CS157A section 2 fall 2005.
Data Mining.
Business Intelligence
Chapter 35 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Operational Data Tools Chapter Eight. Copyright © Houghton Mifflin Company. All rights reserved.8–28–2 Chapter Eight Learning Objectives To learn database.
Dr. Awad Khalil Computer Science Department AUC
Chapter 5: Data Mining for Business Intelligence
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Management for Decision Support Session-2 Prof. Bharat Bhasker.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Enabling Organization-Decision Making
Database Systems – Data Warehousing
DATA WAREHOUSING. Introduction Modern organizations have huge amounts of data but are starving for information – facing information gap! Reasons for information.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Decision Support Systems Chapter 10.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for.
Banking on Analytics Dr A S Ramasastri Director, IDRBT.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Amer Kanj Data Mining For Business Professionals.
1.file. 2.database. 3.entity. 4.record. 5.attribute. When working with a database, a group of related fields comprises a(n)…
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
Data Mining By Dave Maung.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
 Understand the basic definitions and concepts of data warehouses  Describe data warehouse architectures (high level).  Describe the processes used.
Chapter 14 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
MIS2502: Data Analytics Advanced Analytics - Introduction.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Data Mining Copyright KEYSOFT Solutions.
DATA MINING It is a process of extracting interesting(non trivial, implicit, previously, unknown and useful ) information from any data repository. The.
Chapter 2 Data, Text, and Web Mining. Data Mining Concepts and Applications  Data mining (DM) A process that uses statistical, mathematical, artificial.
Business Intelligence Overview. What is Business Intelligence? Business Intelligence is the processes, technologies, and tools that help us change data.
Data Mining – Intro.
Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.
MIS2502: Data Analytics Advanced Analytics - Introduction
Introduction to Data Mining
Data and Applications Security Introduction to Data Mining
Data Warehousing and Data Mining By N.Gopinath AP/CSE
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Analysis.
Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make.
Presentation transcript:

Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006

Overview Explanation of Data Mining Benefits of Data Mining Data Mining Background Data Mining Models Data Warehousing Problems and Issues of Data Mining Potential Applications of Data Mining

What Is Data Mining? Data mining is: The automated extraction of hidden predictive information from databases. It is an extension of statistics with a few artificial intelligence and machine learning twists.

What Is Data Mining? (cont.) Now the term data mining is stretched beyond its limits and applied to any form of data analysis. It encompasses a number of different technical approaches, such as clustering, data summarization, learning classification rules, finding dependency networks, analyzing changes, and detecting anomalies.

Why Data Mining? Data mining software allows users to analyze large databases to solve business decision problems. For example, the data mining software would use the historical information of previous interaction between a business and its customer to build a model of customer behavior for predicting customer responses to new products.

Data Mining Background Data mining research has drawn on a number of other fields:

Data Mining Background Data mining research has drawn on a number of other fields: Machine learning

Data Mining Background Data mining research has drawn on a number of other fields: Machine learning Statistics

Data Mining Background Data mining research has drawn on a number of other fields: Machine learning Statistics Inductive learning

Inductive Learning Strategies Inductive learning where the system infers knowledge itself from observing its environment has two main strategies:

Inductive Learning Strategies Inductive learning where the system infers knowledge itself from observing its environment has two main strategies: Supervised learning

Inductive Learning Strategies Inductive learning where the system infers knowledge itself from observing its environment has two main strategies: Supervised learning Unsupervised learning

Data Mining Models IBM has identified two types of models or modes of operation which may be used to reveal information of interest to users:

Data Mining Models IBM has identified two types of models or modes of operation which may be used to reveal information of interest to users: Verification Model

Data Mining Models IBM has identified two types of models or modes of operation which may be used to reveal information of interest to users: Verification Model Discovery Model

Data Warehousing Data mining potential can be enhanced if the appropriate data has been collected and stored in a data warehouse. The data warehousing market consists of tools, technologies, and methodologies that allow for the construction, usage, management, and maintenance of the hardware and software used for a data warehouse, as well as the actual data itself.

Data Warehouse The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way:

Data Warehouse The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way: "A warehouse is a subject-oriented, integrated, time- variant and non-volatile collection of data in support of management's decision making process".

Data Warehouse (cont.) Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations.

Data Warehouse (cont.) Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations. Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.

Data Warehouse (cont.) Time-Variant: All data in the data warehouse is identified with a particular time period.

Data Warehouse (cont.) Time-Variant: All data in the data warehouse is identified with a particular time period. Non-Volatile: Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business.

Problems and Issues of Data Mining Data mining systems rely on database to supply the raw data for input. Problems rise because databases tend to be dynamic, incomplete, noisy, and large. Other problems relate to adequacy and the information stored.

Problems and Issues

Limited information

Problems and Issue Limited information Uncertainty

Problems and Issue Limited information Uncertainty Size, update, and irrelevant fields

Problems and Issue Limited information Uncertainty Size, update, and irrelevant fields Noise and missing values

Ways to Treat Missing Data by Discovery Systems

Simplify disregard missing values.

Ways to Treat Missing Data by Discovery Systems Simplify disregard missing values. Omit the corresponding records.

Ways to Treat Missing Data by Discovery Systems Simplify disregard missing values. Omit the corresponding records. Infer missing values from known values.

Ways to Treat Missing Data by Discovery Systems Simplify disregard missing values. Omit the corresponding records. Infer missing values from known values. Treat missing data as a special value to be included additionally in the attribute domain.

Ways to Treat Missing Data by Discovery Systems Simplify disregard missing values. Omit the corresponding records. Infer missing values from known values. Treat missing data as a special value to be included additionally in the attribute domain. Average over the missing values using Bayesian techniques.

Potential Applications of Data Mining

Retail and Marketing

Potential Applications of Data Mining Retail and Marketing Identify buying patterns from customers Find associations among customer demographic characteristics Predict response to mailing campaigns Analyze Market basket

Potential Applications of Data Mining Banking

Potential Applications of Data Mining Banking Detect patterns of fraudulent credit card use Identify “loyal” customers Predict customers likely to change their credit card affiliation Determine credit card spending by customer groups Find hidden correlations between different financial indicators Identify stock trading rules from historical market data

Potential Applications of Data Mining Insurance and Health Care

Potential Applications of Data Mining Insurance and Health Care Claim analysis – i.e. which medical procedures are claimed together Predict which customers will buy new policies Identify behavior patterns of risky customers Identify fraudulent behavior

Potential Applications of Data Mining Transportation

Potential Applications of Data Mining Transportation Determine the distribution schedules among outlets Analyze loading patterns

Potential Applications of Data Mining Medicine

Potential Applications of Data Mining Medicine Characterize patient behavior to predict office visits Identify successful medical therapies for different illnesses

References Dilly, R. (n.d.). Retrieved March 30, 2006, from Data Mining Web site: m_book_1.html Reed, M. (n.d.). A definition of data warehousing. Retrieved March 30, 2006, from Internet Journal Web site: Thearling, K. (n.d.). Retrieved March 30, 2006, from Information about data mining and analytic technologies Web site: