Download presentation
Presentation is loading. Please wait.
Published byHarriet Lucas Modified over 8 years ago
1
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar
2
Overview What can we gain from data? Business and marketing applications Public policy decision-making Scientific research Why do we need the KDD process? Increasing use of data analytics Size of databases involved Being able to access raw data isn’t enough
3
The KDD Process
4
Part 1: Selection Formulating the target dataset What kinds of records to consider? Desired fields? Incorporates domain knowledge Background knowledge in relevant field Goals of the dataset
5
Part 2: Pre-processing Preparing raw data for transformation Removal of noise, outliers Strategy for handling missing records Missing/unknown value mappings
6
Part 3: Transformation Data reduction Grouping to reduce number of variables considered Aggregation to higher row unit Useful representations of data Summary statistics
7
Part 4: Data Mining Selection of data model Summarization, classification, clustering, regression analysis Searching for patterns in data
8
Part 5: Interpretation Interpreting the model used in the previous step Check results if they make sense Consider different models, returning to prior steps Utilize the obtained results
9
Challenges of KDD Massive datasets Algorithmic efficiency, approximation, parallel processing Making interaction possible for analysts Develop better tools that allow for human-computer interaction Overfitting, measures of significance Testing on randomly chosen sections Missing or invalid data Strategies to identify hidden variables and dependencies Making data understandable by humans Improved data visualization methods
10
Challenges of KDD Rapidly changing data Incrementally updating discovered patterns Integration Coordinating database tools (OLAP) and data mining tools Nonstandard data (e.g. multimedia) “Beyond the scope of current KDD technology”
11
Conclusion Emerging nature of KDD & data mining fields Human interaction still necessary Incorporating machines to cope with scale of data Improve tools to make better decisions using data
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.