Download presentation
Presentation is loading. Please wait.
Published byAnnabelle Bradford Modified over 6 years ago
1
Knowledge Discovery Systems: Systems That Create Knowledge
Lecture 9
2
Objectives To explain how knowledge is discovered
To describe knowledge discovery systems, including design considerations, and how they rely on mechanisms and technologies To explain data mining (DM) technologies To discuss the role of DM in customer relationship management
3
Knowledge Synthesis through Socialization
To discover tacit knowledge Socialization enables the discovery of tacit knowledge through joint activities between masters and apprentices between researchers at an academic conference back-of-the-napkin diagrams E.g. Honda encourages socialization through “brainstorming camps” to resolve problems faced in R&D projects
4
Knowledge Discovery through Socialization
Brainstorming camps Sharing a meal Cocktail napkins Creative brainstorming Facilitator — person controlling the process Innovator — brainstorm solutions to the customer’s problem or need Lateral thinking — Identifying wild, crazy, or silly ideas may trigger ideas in other innovators which may actually solve the problem
5
Knowledge Discovery from Data: Data Mining
Knowledge Discovery in Databases is the process of finding and interpreting patterns from data, involving the application of algorithms to interpret the patterns generated by these algorithms Another name for KDD is data mining (DM) Data mining systems have made a significant contribution in scientific fields for years The recent proliferation of e-commerce applications, providing reams of hard data ready for analysis, presents us with an excellent opportunity to make profitable use of data mining
6
Data Mining Techniques Applications
Marketing — Predictive DM techniques, like artificial neural networks (ANN), have been used for target marketing including market segmentation Direct marketing — customers are likely to respond to new products based on their previous consumer behaviour Retail — DM methods have likewise been used for sales forecasting Market basket analysis — uncover which products are likely to be purchased together
7
Banking — Trading and financial forecasting are used to determine derivative securities pricing, futures price forecasting, and stock performance Insurance — DM techniques have been used for segmenting customer groups to determine premium pricing and predict claim frequencies Telecommunications — Predictive DM techniques have been used to attempt to reduce churn, and to predict when customers will attrition to a competitor Operations management — Neural network techniques have been used for planning and scheduling, project management, and quality control
8
Examples of DM Use Diagnosis of faults in rotating machinery
Load prediction for electric utilities DM for credit applications Inventory optimisation mining to level sales in flower online retailer Web mining to customise income-appropriate customer offers from online bags retailer Retailer DM to asses country investment risk
9
Designing the Knowledge Discovery System: CRISP DM
Business Understanding — To obtain the highest benefit from data mining, there must be a clear statement of the business objectives Data Understanding — Knowing the data well can permit the designer to tailor the algorithm or tools used for data mining to his/her specific problem Data Preparation — Data selection, variable construction and transformation, integration, and formatting
10
Model Building and Validation — Building an accurate model is a trial and error process. The process often requires the data mining specialist to iteratively try several options, until the best model emerges Evaluation and Interpretation — Once the model is determined, the validation dataset is fed through the model (because the outcome of this dataset is known, the predicted results are compared with the actual results in the validation data set) Deployment — Involves implementing the ‘live’ model within an organisation to aid the decision making process
11
1. Business Understanding Process
Determine Business Objectives — To obtain the highest benefit from data mining, there must be a clear statement of the business objectives Situation Assessment — The majority of the people in a marketing campaign who receive a target mail, do not purchase the product Determine Data Mining Goal — Identifying the most likely prospective buyers from the sample, and targeting the direct mail to those customers could save the organization significant costs Produce Project Plan — This step also includes the specification of a project plan for the DM study
12
2. Data Understanding Process
Data Collection — Defines the data sources for the study, including the use of external public data, and proprietary databases (data sources for the study) Data Description — Describes the contents of each file or table. Some of the important items in this report are: number of fields (columns) and percent of records missing Data Quality and Verification — Define if any data can be eliminated because of irrelevance or lack of quality (data must be correct and consistent) Exploratory Analysis of the Data — Use to develop a hypothesis of the problem to be studied, and to identify the fields that are likely to be the best predictors
13
3. Data Preparation Process
Selection — Requires the selection of the predictor variables and the sample set Construction and Transformation of Variables — Often, new variables must be constructed to build effective models Data Integration — The dataset for the data mining study may reside on multiple databases, which would need to be consolidated into one database Formatting — Involves the reordering and reformatting of the data fields, as required by the DM model
14
4. Model Building and Validation Process
Generate Test Design — Building an accurate model is a trial and error process The data mining specialist iteratively try several options, until the best model emerges Build Model — Different algorithms could be tried with the same dataset Results are compared to see which model yields the best results Model Evaluation — In constructing a model, a subset of the data is usually set aside for validation purposes The validation data set is used to calculate the accuracy of predictive qualities of the model
15
5. Evaluation and Interpretation Process
Evaluate Results — Once the model is determined, the predicted results are compared with the actual results in the validation dataset Review Process — Verify the accuracy of the process Determine Next Steps — List of possible actions decision
16
6. Deployment Process Plan Deployment — This step involves implementing the “live” model within an organisation to aid the decision-making process Produce Final Report — Write a final report Plan Monitoring and Maintenance — Monitor how well the model predicts the outcomes, and the benefits that this brings to the organisation Review Project — Experience and documentation
17
CRISP-DM Data Mining Process Methodology
18
The Iterative Nature of the KDD Process
19
Alternate DM Organisations
Customer Profile Exchange (CPEX): offers a vendor-neutral, XML-based open standard for facilitating the privacy-enabled interchange of customer information across disparate enterprise applications and systems Data Mining Group (DMG): is an independent, vendor led group, which develops data mining standards, such as the Predictive Model Markup Language (PMML)
20
Data Mining Techniques
Predictive Techniques Classification: Data mining techniques in this category serve to classify the discrete outcome variable Prediction or Estimation: DM techniques in this category predict a continuous outcome (as opposed to classification techniques that predict discrete outcomes) Descriptive Techniques Affinity or association: Data mining techniques in this category serve to find items closely associated in the data set Clustering: DM techniques in this category aim to create clusters of input objects, rather than an outcome variable
21
Summary of Applicability of Inferential Statistical Techniques
22
Summary of Applicability of Noninferential Predictive Technique
23
Summary of Applicability of Decision Tree Techniques
24
Summary of Applicability of Clustering and Association Techniques
25
Web Content Mining Text mining—“reading” large documents of text written in natural language and being able to derive knowledge from the process Web mining—Web crawling with online text mining (to discover patterns form web) Disadvantages data on the web is unstructured requires natural language processing Web pages are indexed by the words they contain IR (information retrieval) indexing techniques consists of calculating the term frequency inverse document frequency (TFIDF)
26
Web Mining Techniques Linguistic Analysis
identify key concept descriptors Uses stemming, stoplists, and semantic analysis Statistical and Co-occurrence Analysis Link analysis Similarity functions Statistical and Neural Networks Clustering and Categorisation Visualization and Human Computer Interfaces
27
Uses for Web Data Mining
Web Structure Mining — inter and intra page structure Categorise web pages Generate relationships and similarities among web sites Based on analysis of HTML/XML tags Web Usage Mining — clickstream analysis, is the identification of patterns in user navigation through Web pages in a domain Preprocessing — cleanses and converts usage, content, and structure from different data sources into datasets ready for pattern discovery Pattern analysis — uses visualisation/OLAP (online analytical processing) to understand relationships Pattern discovery — using variations on techniques Web Content Mining — based on text mining and IR
28
Data Mining and Customer Relationship Management
CRM is the mechanisms and technologies used to manage the interactions between a company and its customers The data mining prediction model is used to calculate a score: a numeric value assigned to each record in the database to indicate the probability that the customer represented by that record will behave in a specific manner
29
Uses for CRM Integrate the customer viewpoint across all touchpoints—
The CRM promise is to build an integrated view of the customer Understand the customer touchpoints Enable organisations to better recognise and service the needs of the customer Respond to customer demands in “Web time”—competitive environments require organisations to react to increasingly complex customer requests at faster speeds Also the analysis and interpretation of Web data can be used to enhance and personalise customer offerings Analysis of Web data can uncover new knowledge about customer behaviour and preferences Derive more value from CRM investments Market segmentation studies Narrowcast (send out target s) customers Market basket analysis
30
Case Study: Harrah’s One of the most recognised brand names in the casino entertainment industry Harrah’s comprehensive data warehouse and CRM implementation enabled them to keep track of millions of customers’ activities allowing them to market more effectively, thus increasing attraction and retention of targeted customers Nationally recognised CRM implementation Increased market share from 1999–2002 from 36% to 42% and earnings per share by 110% Targeted customers based on preferences Hotel voucher awards to out-of-town guests Show tickets to day-visiting customers
31
Barriers to the Use of DM
Two of the most significant historical barriers that prevented the earlier deployment of knowledge discovery in the business relate to: Lack of data to support the analysis Limited computing power to perform the mathematical calculations required by the DM algorithms Neither is applicable today because processing power and storage have risen to meet demand Today’s challenge is bring data in data stores (data tombs) to life
32
Requires skilled PM to orchestrate effort
High knowledge barrier to usability Statistical knowledge Knowledge of the software package Knowledge of the business Requires skilled PM to orchestrate effort DM client — understands the business problem, but inadequate technical skills DM analyst — translates the business needs into technical requirements DM engineer — develops the DM model with the DM client and analyst IT analyst — provides access to hardware, software, and data
33
Challenges to be Addressed in DM Implementation
Scaling analysis to large databases: Current DM techniques require that data sets be loaded into the computer’s memory and this is a significant barrier when very large databases and data warehouses must be scanned to identify patterns Scaling to high-dimensional data and models: Typical statistical analysis studies require humans to formulate a model and then use techniques to validate the model via understanding how well the data fit the model But it may be increasingly difficult for humans to formulate models a priori based on a very large number of variables, which increasingly add dimension to the problem
34
Automating the search: DM studies typically require the researcher to enumerate the hypothesis under study a priori In the future, DM algorithms may be able to perform this work automatically Finding patterns and models understandable and interesting to users: In the past, DM projects focused on measures of accuracy (how well the model predicts the data) and utility (the benefit derived from the pattern, typically money saved) New benefit measures like understandability of the model and novelty of the results must also be developed
35
Case Study: An Application of Rule Induction to Real Estate Appraisal Systems
In this case, we seek specific knowledge that we know can be found in the data in databases, but which can be difficult to extract Procedure to create the decision tree: Data preparation and preprocessing Tree construction House pruning Paired leaf analysis
36
Case Study: An Application of Rule Induction to Real Estate Appraisal Systems
Attribute Induction Results Expert Estimate Difference Living Area $15 - $31 $15 - $25 % Bedrooms $ $5212 $ $3500 % Bathrooms $ $5718 $ $2000 % Garage $ $4522 $ $3500 % Pool $ $11697 $ $12000 % Fireplace $ $4180 $ $2000 % Year Built % % %
37
Case Study: An Application of Rule Induction to Real Estate Appraisal Systems
Partial Decision Tree Results for Real Estate Appraisal
38
Case Study: An Application of Web Content Mining to Expertise Locator Systems
NASA Expert Seeker Web Miner demo A KM system that locates experts based on published documents requires: Automatic method for identifying employee names A method to associate employee names with skill keywords embedded in those documents
39
Novel Knowledge Discovery on the Web
Web searches easily uncover deep/broad knowledge Novel knowledge is trickier Like looking for a needle in a haystack Overload of knowledge from indiscernible search terms What is interesting/relevant varies across users Novel Knowledge uses discovery of new strategic opportunities development of learning capabilities support the creative thinking leading to innovation solving “wicked” problems
40
Conclusions Described knowledge discovery systems, including design considerations, and how they rely on mechanisms and technologies Learned how knowledge is discovered: Through socialization with other knowledgeable persons Through DM by finding interesting patterns in observations, typically embodied in explicit data Explained data mining (DM) technologies Discussed the role of DM in customer relationship management
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.