Download presentation
Presentation is loading. Please wait.
Published byDarleen Neal Modified over 8 years ago
1
Why Intelligent Data Analysis? Joost N. Kok Leiden Institute of Advanced Computer Science Universiteit Leiden
2
Overview Data Analysis Data Mining Applications Outlook
3
Data Analysis
4
Data Mining ``Data Mining is one of the five key note technologies that will have a major impact across a wide range of industries within the next three to five years’’ (Gartner) ``Data Mining is one of the top ten new technologies in which companies will invest during the next five years’’ (Gartner) ``Data Mining is an overhyped concept’’ (OTR)
5
Data Analysis Data analysis = Processing data Exploratory vs. Confirmatory –are there interesting structures? –can we predict the value? Descriptive vs. Inferential –statement about data set –draw more general conclusions Data analysis = process of computing various summaries and derived values from the given collection of data
6
Tools Cookbook fallacy: Data analysis = picking and applying the right tool. –Tools are not independent. –Matching is an iterative process (which needs intelligence).
7
Stat vs. ML Statistics –Mathematics Machine Learning –Experimental Computer Science ``Statistics is difficult’’ ``Algorithms are not exact’’
8
Models Models vs. Algorithms Empirical vs. Mechanistic Models Understanding vs. Prediction Models vs. Patterns Overfitting Constraints
9
Algorithms Enabling data analysis Too many: often no foundations, no applications In practice only a restricted set of algorithms is used
10
The nature of Data Different kinds of data –Numerical Data –Text –Images –Sound Raw data has –missing values –distortions –misrecording –inadequate sampling –etc.
11
The nature of data Data sets can be large –horizontal –vertical Curse of dimensionality Experiments Sampling
12
The nature of data Too little –Example: storm situations Too much –Example: image segmentation Static vs. dynamic Off-line vs. On-line Infoglut What is collected?
13
Overview Statistical methods and concepts Bayesian methods Time series Rule induction Neural networks Fuzzy logic Stochastic search methods Applications
14
Overview Why Intelligent Data Analysis Fundamental Concepts of Statistics Intelligent Data Analysis: Issues and Challenges Artificial Neural Networks Fuzzy Logic Industrial Applications of Neuro- Fuzzy Networks Statistical Methods for Data Analysis Time Series Analysis
15
Overview Chaos and Reality Bayesian Networks ANN Visualization Tools Rule Induction Evolutionary Systems Data Analysis in Real-World Applications
16
Enrichment Data Fusion –combine data sets Example: –customer database –survey information
17
Data Mining Database technology Data visualization Data warehouse vs Operational database –time-dependent –non-volatile –subject-oriented –integrated Target: decision making
18
Data Mining
19
Selection Cleaning Enrichment Coding Data Mining Reporting
20
Cleaning Remove duplicates Check domain consistency Remove data Project data Combine data in one table
21
Coding Adress - Region Date of birth - Age Scaling of numerical data Date - Number of months
22
Data Mining SQL queries Clustering Pattern Recognition ES ML Statistics Visual DB KDD
23
Nearest Neighbor Search k nearest points
24
Oil Search Shell research South-East Asia measurements kinds of stone coring
25
Applications
26
Outlook
27
Positive –Moore’s Law –New kinds of computers –Data collection –More data is more easy reachable Negative –Collective memory gets lost –Infoglut Data battle
28
Outlook Merge of Machine Learning and Statistics Algorithms –Adaptive parameters –Black Box data mining From suites to tailored tools
29
Intelligent Data Analysis –User Interaction –also uses tools from Machine Learning
30
NetTalk Sound generator Speech-synthesis expert system INTELLI Sound Generator Speech-synthesis expert system NetTalk Neural Network
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.