Download presentation
Presentation is loading. Please wait.
Published byMelanie Tyler Modified over 9 years ago
1
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006
2
Overview Explanation of Data Mining Benefits of Data Mining Data Mining Background Data Mining Models Data Warehousing Problems and Issues of Data Mining Potential Applications of Data Mining
3
What Is Data Mining? Data mining is: The automated extraction of hidden predictive information from databases. It is an extension of statistics with a few artificial intelligence and machine learning twists.
4
What Is Data Mining? (cont.) Now the term data mining is stretched beyond its limits and applied to any form of data analysis. It encompasses a number of different technical approaches, such as clustering, data summarization, learning classification rules, finding dependency networks, analyzing changes, and detecting anomalies.
5
Why Data Mining? Data mining software allows users to analyze large databases to solve business decision problems. For example, the data mining software would use the historical information of previous interaction between a business and its customer to build a model of customer behavior for predicting customer responses to new products.
6
Data Mining Background Data mining research has drawn on a number of other fields:
7
Data Mining Background Data mining research has drawn on a number of other fields: Machine learning
8
Data Mining Background Data mining research has drawn on a number of other fields: Machine learning Statistics
9
Data Mining Background Data mining research has drawn on a number of other fields: Machine learning Statistics Inductive learning
10
Inductive Learning Strategies Inductive learning where the system infers knowledge itself from observing its environment has two main strategies:
11
Inductive Learning Strategies Inductive learning where the system infers knowledge itself from observing its environment has two main strategies: Supervised learning
12
Inductive Learning Strategies Inductive learning where the system infers knowledge itself from observing its environment has two main strategies: Supervised learning Unsupervised learning
13
Data Mining Models IBM has identified two types of models or modes of operation which may be used to reveal information of interest to users:
14
Data Mining Models IBM has identified two types of models or modes of operation which may be used to reveal information of interest to users: Verification Model
15
Data Mining Models IBM has identified two types of models or modes of operation which may be used to reveal information of interest to users: Verification Model Discovery Model
16
Data Warehousing Data mining potential can be enhanced if the appropriate data has been collected and stored in a data warehouse. The data warehousing market consists of tools, technologies, and methodologies that allow for the construction, usage, management, and maintenance of the hardware and software used for a data warehouse, as well as the actual data itself.
17
Data Warehouse The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way:
18
Data Warehouse The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way: "A warehouse is a subject-oriented, integrated, time- variant and non-volatile collection of data in support of management's decision making process".
19
Data Warehouse (cont.) Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations.
20
Data Warehouse (cont.) Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations. Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.
21
Data Warehouse (cont.) Time-Variant: All data in the data warehouse is identified with a particular time period.
22
Data Warehouse (cont.) Time-Variant: All data in the data warehouse is identified with a particular time period. Non-Volatile: Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business.
23
Problems and Issues of Data Mining Data mining systems rely on database to supply the raw data for input. Problems rise because databases tend to be dynamic, incomplete, noisy, and large. Other problems relate to adequacy and the information stored.
24
Problems and Issues
25
Limited information
26
Problems and Issue Limited information Uncertainty
27
Problems and Issue Limited information Uncertainty Size, update, and irrelevant fields
28
Problems and Issue Limited information Uncertainty Size, update, and irrelevant fields Noise and missing values
29
Ways to Treat Missing Data by Discovery Systems
30
Simplify disregard missing values.
31
Ways to Treat Missing Data by Discovery Systems Simplify disregard missing values. Omit the corresponding records.
32
Ways to Treat Missing Data by Discovery Systems Simplify disregard missing values. Omit the corresponding records. Infer missing values from known values.
33
Ways to Treat Missing Data by Discovery Systems Simplify disregard missing values. Omit the corresponding records. Infer missing values from known values. Treat missing data as a special value to be included additionally in the attribute domain.
34
Ways to Treat Missing Data by Discovery Systems Simplify disregard missing values. Omit the corresponding records. Infer missing values from known values. Treat missing data as a special value to be included additionally in the attribute domain. Average over the missing values using Bayesian techniques.
35
Potential Applications of Data Mining
36
Retail and Marketing
37
Potential Applications of Data Mining Retail and Marketing Identify buying patterns from customers Find associations among customer demographic characteristics Predict response to mailing campaigns Analyze Market basket
38
Potential Applications of Data Mining Banking
39
Potential Applications of Data Mining Banking Detect patterns of fraudulent credit card use Identify “loyal” customers Predict customers likely to change their credit card affiliation Determine credit card spending by customer groups Find hidden correlations between different financial indicators Identify stock trading rules from historical market data
40
Potential Applications of Data Mining Insurance and Health Care
41
Potential Applications of Data Mining Insurance and Health Care Claim analysis – i.e. which medical procedures are claimed together Predict which customers will buy new policies Identify behavior patterns of risky customers Identify fraudulent behavior
42
Potential Applications of Data Mining Transportation
43
Potential Applications of Data Mining Transportation Determine the distribution schedules among outlets Analyze loading patterns
44
Potential Applications of Data Mining Medicine
45
Potential Applications of Data Mining Medicine Characterize patient behavior to predict office visits Identify successful medical therapies for different illnesses
46
References Dilly, R. (n.d.). Retrieved March 30, 2006, from Data Mining Web site: http://www.ppc.qub.ac.uk/tec/courses/determining/stu_notes/d m_book_1.html Reed, M. (n.d.). A definition of data warehousing. Retrieved March 30, 2006, from Internet Journal Web site: http://www.intranetjournal.com/features/datawarehousing.html. Thearling, K. (n.d.). Retrieved March 30, 2006, from Information about data mining and analytic technologies Web site: http://www.thearling.com/.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.