Download presentation
Presentation is loading. Please wait.
Published byMartha Cannon Modified over 8 years ago
1
INTRODUCTION Elsayed Hemayed Data Mining Course
2
Outline The Motivation Knowledge Discovery in Databases (KDD) Knowledge Discovery Process Data mining application types Association Clustering Classification Prediction Commercial Data Mining Tool Acknowledgement: some of the material in these slides are from [Max Bramer, “Principles of Data Mining”, Springer-Verlag London Limited 2007] 2 Introduction to Data Mining
3
The Data Explosion The current NASA Earth observation satellites generate a terabyte (i.e. 10 9 bytes) of data every day. The Human Genome project is storing thousands of bytes for each of several billion genetic bases. As long ago as 1990, the US Census collected over a million million bytes of data. Many companies maintain large Data Warehouses of customer transactions. A fairly small data warehouse might contain more than a hundred million transactions. There are vast amounts of data recorded every day on automatic recording devices, such as credit card transaction files and web logs, as well as non-symbolic data such as CCTV recordings. 3 Introduction to Data Mining
4
Knowledge buried in the data knowledge that can be critical to a company’s growth or decline knowledge that could lead to important discoveries in science knowledge that could enable us accurately to predict the weather and natural disasters knowledge that could enable us to identify the causes of and possible cures for lethal illnesses knowledge that could literally mean the difference between life and death. 4 Introduction to Data Mining
5
Data Rich but Knowledge Poor We are data rich but knowledge poor 5 Introduction to Data Mining
6
What is Data Mining? Data mining—searching for knowledge (interesting patterns) in your data. 6 Introduction to Data Mining
7
Knowledge Discovery The ‘non-trivial extraction of implicit, previously unknown and potentially useful information from data’. It is a process of which data mining forms just one part (a central one). 7 Introduction to Data Mining
8
Data mining as a step in the process of Knowledge Discovery 8 Introduction to Data Mining
9
Knowledge Discovery Process 1. Data cleaning (to remove noise and inconsistent data) 2. Data integration (where multiple data sources may be combined) 3. Data selection (where data relevant to the analysis task are retrieved from the database) 4. Data transformation (where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance) 5. Data mining (an essential process where intelligent methods are applied in order to extract data patterns) 6. Pattern evaluation (to identify the truly interesting patterns representing knowledge based on some interestingness measures) 7. Knowledge presentation (where visualization and knowledge representation techniques are used to present the mined knowledge to the user) 9 Introduction to Data Mining
10
Applications of Data Mining Analysis of organic compounds Automatic abstracting Credit card fraud detection Electric load prediction Financial forecasting Medical diagnosis Predicting share of television audiences Real estate valuation Targeted marketing Thermal power plant optimisation Toxic hazard analysis Weather forecasting 10 Introduction to Data Mining
11
More Applications to come A supermarket chain: optimise targeting of high value customers A major hotel chain: identify attributes of a ‘high-value’ prospect Improving the ability to predict bad loans Reducing fabrication flaws in VLSI chips Arrange show schedules to maximise market share and increase advertising revenues Predicting the probability that a cancer patient will respond to chemotherapy 11 Introduction to Data Mining
12
What is (not) Data Mining? Look up phone number in phone directory Query a Web search engine for information about “Amazon” – Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in Boston area) – Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,) What is not Data Mining?What is Data Mining? 12 Introduction to Data Mining
13
Main Applications Applications can be divided into four main types: Association, Classification, Prediction, Clustering. 13 Introduction to Data Mining
14
Labelled and Unlabelled Data There is a specially designated attribute and the aim is to use the data given to predict the value of that attribute for instances that have not yet been seen. [Supervised Learning – Classification and Prediction] Data that does not have any specially designated attribute is called unlabelled. [Unsupervised Learning – Association and Clustering] 14 Introduction to Data Mining
15
Attributes and Data Example Introduction to Data Mining 15 categorical continuous class
16
Association Rules A relationship amongst the values of variables. Association rules are frequently used to generate rules from market-basket data. A market basket corresponds to the sets of items a consumer purchases during one visit to a supermarket. Example: IF variable 1 > 85 and switch 6 = open THEN variable 23 < 47.5 and switch 8 = closed 16 Introduction to Data Mining
17
Market Basket Analysis IF cheese AND milk THEN bread (Confidence= 0.7) indicates that 70% of the customers who buy cheese and milk also buy bread. Thus, move the bread closer to the cheese and milk counter for customer convenience. or separate them to encourage impulse buying of other products. 17 Introduction to Data Mining
18
Confidence and Support Support: The minimum percentage of instances in the database that contain all items listed in a given association rule. Confidence: Given a rule of the form A=>B, rule confidence is the conditional probability that B is true when A is known to be true. Confidence can be computed as support(A U B) / support(A) 18 Introduction to Data Mining
19
Market Basket Example Transaction_IdTimeItems_bought 1016:35Milk, bread, cookies, juice 7927:38Milk, juice 11308:05Milk, eggs 17358:40Bread, cookies, coffee Consider the two rules: Milk juice and bread juice RuleMilk juiceBread juice Support50%25% Confidence66.7%50% 19 Introduction to Data Mining
20
Clustering The goal is to place records into groups where the records in a group are highly similar to each other and dissimilar to records in other groups. For example, an insurance company might group customers according to income, age, types of policy purchased or prior claims experience. The adult population in Egypt can be categorized into five groups from most likely to buy to least likely to buy a new product. 20 Introduction to Data Mining
21
Clustering Example 21 Introduction to Data Mining
22
Classification Classification is one of the most common applications for data mining. Classify medical patients into those who are at high, medium or low risk of acquiring a certain illness Classify people interviewed into those who are likely to vote for each of a number of political parties or are undecided Classify a student project as distinction, good, pass or fail. Classify customers in a supermarket into discount- seeking shoppers, loyal regular shoppers, shoppers attached to name brands and infrequent shoppers. 22 Introduction to Data Mining
23
Degree Classification Example Goal: find some way of predicting the classification for other students given only their grade ‘profiles’. 23 Introduction to Data Mining
24
Classification Methods Nearest Neighbour Matching: identifying (say) the five examples that are ‘closest’ in some sense to an unclassified one. Classification Rules: IF SoftEng = A AND Project = A THEN Class = First IF SoftEng = A AND Project = B AND ARIN = B THEN Class = Second IF SoftEng = B THEN Class = Second 24 Introduction to Data Mining
25
Decision Tree 25 Introduction to Data Mining
26
Prediction Classification is one form of prediction, where the value to be predicted is a label. Numerical prediction (often called regression) is another. Goal: determine how certain attributes will behave in the future. In this case we wish to predict a numerical value. Example: How much sales volume a store will generate in a given period. A very popular way of doing this is to use a Neural Network. 26 Introduction to Data Mining
27
Commercial Data Mining Tool 27 Introduction to Data Mining
28
Summary The Motivation Knowledge Discovery in Databases (KDD) Knowledge Discovery Process Data mining application types Association Clustering Classification Prediction Commercial Data Mining Tool 28 Introduction to Data Mining
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.