Presentation is loading. Please wait.

Presentation is loading. Please wait.

Why Intelligent Data Analysis? Joost N. Kok Leiden Institute of Advanced Computer Science Universiteit Leiden.

Similar presentations


Presentation on theme: "Why Intelligent Data Analysis? Joost N. Kok Leiden Institute of Advanced Computer Science Universiteit Leiden."— Presentation transcript:

1 Why Intelligent Data Analysis? Joost N. Kok Leiden Institute of Advanced Computer Science Universiteit Leiden

2 Overview Data Analysis Data Mining Applications Outlook

3 Data Analysis

4 Data Mining ``Data Mining is one of the five key note technologies that will have a major impact across a wide range of industries within the next three to five years’’ (Gartner) ``Data Mining is one of the top ten new technologies in which companies will invest during the next five years’’ (Gartner) ``Data Mining is an overhyped concept’’ (OTR)

5 Data Analysis Data analysis = Processing data Exploratory vs. Confirmatory –are there interesting structures? –can we predict the value? Descriptive vs. Inferential –statement about data set –draw more general conclusions Data analysis = process of computing various summaries and derived values from the given collection of data

6 Tools Cookbook fallacy: Data analysis = picking and applying the right tool. –Tools are not independent. –Matching is an iterative process (which needs intelligence).

7 Stat vs. ML Statistics –Mathematics Machine Learning –Experimental Computer Science ``Statistics is difficult’’ ``Algorithms are not exact’’

8 Models Models vs. Algorithms Empirical vs. Mechanistic Models Understanding vs. Prediction Models vs. Patterns Overfitting Constraints

9 Algorithms Enabling data analysis Too many: often no foundations, no applications In practice only a restricted set of algorithms is used

10 The nature of Data Different kinds of data –Numerical Data –Text –Images –Sound Raw data has –missing values –distortions –misrecording –inadequate sampling –etc.

11 The nature of data Data sets can be large –horizontal –vertical Curse of dimensionality Experiments Sampling

12 The nature of data Too little –Example: storm situations Too much –Example: image segmentation Static vs. dynamic Off-line vs. On-line Infoglut What is collected?

13 Overview Statistical methods and concepts Bayesian methods Time series Rule induction Neural networks Fuzzy logic Stochastic search methods Applications

14 Overview  Why Intelligent Data Analysis  Fundamental Concepts of Statistics  Intelligent Data Analysis: Issues and Challenges  Artificial Neural Networks  Fuzzy Logic  Industrial Applications of Neuro- Fuzzy Networks  Statistical Methods for Data Analysis  Time Series Analysis

15 Overview  Chaos and Reality  Bayesian Networks  ANN Visualization Tools  Rule Induction  Evolutionary Systems  Data Analysis in Real-World Applications

16 Enrichment Data Fusion –combine data sets Example: –customer database –survey information

17 Data Mining Database technology Data visualization Data warehouse vs Operational database –time-dependent –non-volatile –subject-oriented –integrated Target: decision making

18 Data Mining

19 Selection Cleaning Enrichment Coding Data Mining Reporting

20 Cleaning Remove duplicates Check domain consistency Remove data Project data Combine data in one table

21 Coding Adress - Region Date of birth - Age Scaling of numerical data Date - Number of months

22 Data Mining SQL queries Clustering Pattern Recognition ES ML Statistics Visual DB KDD

23 Nearest Neighbor Search k nearest points

24 Oil Search Shell research South-East Asia measurements kinds of stone coring

25 Applications

26 Outlook

27 Positive –Moore’s Law –New kinds of computers –Data collection –More data is more easy reachable Negative –Collective memory gets lost –Infoglut Data battle

28 Outlook Merge of Machine Learning and Statistics Algorithms –Adaptive parameters –Black Box data mining From suites to tailored tools

29 Intelligent Data Analysis –User Interaction –also uses tools from Machine Learning

30 NetTalk Sound generator Speech-synthesis expert system INTELLI Sound Generator Speech-synthesis expert system NetTalk Neural Network


Download ppt "Why Intelligent Data Analysis? Joost N. Kok Leiden Institute of Advanced Computer Science Universiteit Leiden."

Similar presentations


Ads by Google