Why Intelligent Data Analysis? Joost N. Kok Leiden Institute of Advanced Computer Science Universiteit Leiden.

Slides:



Advertisements
Similar presentations
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Advertisements

Konstanz, Jens Gerken ZuiScat An Overview of data quality problems and data cleaning solution approaches Data Cleaning Seminarvortrag: Digital.
2015/6/1Course Introduction1 Welcome! MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology
C SC 421: Artificial Intelligence …or Computational Intelligence Alex Thomo
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 7: Expert Systems and Artificial Intelligence Decision Support.
Data Mining By Archana Ketkar.
Intelligent Systems Group Emmanuel Fernandez Larry Mazlack Ali Minai (coordinator) Carla Purdy William Wee.
Building Knowledge-Driven DSS and Mining Data
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Algorithms for Data Analytics Chapter 3. Plans Introduction to Data-intensive computing (Lecture 1) Statistical Inference: Foundations of statistics (Chapter.
Microsoft Enterprise Consortium Data Mining Concepts Introduction: The essential background Prepared by David Douglas, University of ArkansasHosted by.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Techniques
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 4 Analytics, Decision Support, and Artificial Intelligence:
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Introduction: The essential background
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.
Multimedia Databases (MMDB)
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Data Mining Process A manifestation of best practices A systematic way to conduct DM projects Different groups has different versions Most common standard.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Laboratory for Computational Intelligence, University of British Columbia Belief & Decision Networks Stochastic Local Search Neural NetworksGraph Searching.
Chapter 3 DECISION SUPPORT SYSTEMS CONCEPTS, METHODOLOGIES, AND TECHNOLOGIES: AN OVERVIEW Study sub-sections: , 3.12(p )
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Presentation for CS490 Other Topics By: Chihwei Hsu By: Chihwei Hsu Date: Nov 17, 2003 Date: Nov 17, 2003 Class: CS490 Class: CS490.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
DATABASES AND DATA WAREHOUSES
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Data Mining. Overview the extraction of hidden predictive information from large databases Data mining tools predict future trends and behaviors, allowing.
Smart Web Search Agents Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: -
Data Mining is the process of analyzing data and summarizing it into useful information Data Mining is usually used for extremely large sets of data It.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Introduction Characteristics Advantages Limitations
Data Mining: Concepts and Techniques Course Outline
Data Warehousing and Data Mining
CSc4730/6730 Scientific Visualization
I don’t need a title slide for a lecture
Data Mining: Concepts and Techniques
Supporting End-User Access
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Welcome! Knowledge Discovery and Data Mining
Presentation transcript:

Why Intelligent Data Analysis? Joost N. Kok Leiden Institute of Advanced Computer Science Universiteit Leiden

Overview Data Analysis Data Mining Applications Outlook

Data Analysis

Data Mining ``Data Mining is one of the five key note technologies that will have a major impact across a wide range of industries within the next three to five years’’ (Gartner) ``Data Mining is one of the top ten new technologies in which companies will invest during the next five years’’ (Gartner) ``Data Mining is an overhyped concept’’ (OTR)

Data Analysis Data analysis = Processing data Exploratory vs. Confirmatory –are there interesting structures? –can we predict the value? Descriptive vs. Inferential –statement about data set –draw more general conclusions Data analysis = process of computing various summaries and derived values from the given collection of data

Tools Cookbook fallacy: Data analysis = picking and applying the right tool. –Tools are not independent. –Matching is an iterative process (which needs intelligence).

Stat vs. ML Statistics –Mathematics Machine Learning –Experimental Computer Science ``Statistics is difficult’’ ``Algorithms are not exact’’

Models Models vs. Algorithms Empirical vs. Mechanistic Models Understanding vs. Prediction Models vs. Patterns Overfitting Constraints

Algorithms Enabling data analysis Too many: often no foundations, no applications In practice only a restricted set of algorithms is used

The nature of Data Different kinds of data –Numerical Data –Text –Images –Sound Raw data has –missing values –distortions –misrecording –inadequate sampling –etc.

The nature of data Data sets can be large –horizontal –vertical Curse of dimensionality Experiments Sampling

The nature of data Too little –Example: storm situations Too much –Example: image segmentation Static vs. dynamic Off-line vs. On-line Infoglut What is collected?

Overview Statistical methods and concepts Bayesian methods Time series Rule induction Neural networks Fuzzy logic Stochastic search methods Applications

Overview  Why Intelligent Data Analysis  Fundamental Concepts of Statistics  Intelligent Data Analysis: Issues and Challenges  Artificial Neural Networks  Fuzzy Logic  Industrial Applications of Neuro- Fuzzy Networks  Statistical Methods for Data Analysis  Time Series Analysis

Overview  Chaos and Reality  Bayesian Networks  ANN Visualization Tools  Rule Induction  Evolutionary Systems  Data Analysis in Real-World Applications

Enrichment Data Fusion –combine data sets Example: –customer database –survey information

Data Mining Database technology Data visualization Data warehouse vs Operational database –time-dependent –non-volatile –subject-oriented –integrated Target: decision making

Data Mining

Selection Cleaning Enrichment Coding Data Mining Reporting

Cleaning Remove duplicates Check domain consistency Remove data Project data Combine data in one table

Coding Adress - Region Date of birth - Age Scaling of numerical data Date - Number of months

Data Mining SQL queries Clustering Pattern Recognition ES ML Statistics Visual DB KDD

Nearest Neighbor Search k nearest points

Oil Search Shell research South-East Asia measurements kinds of stone coring

Applications

Outlook

Positive –Moore’s Law –New kinds of computers –Data collection –More data is more easy reachable Negative –Collective memory gets lost –Infoglut Data battle

Outlook Merge of Machine Learning and Statistics Algorithms –Adaptive parameters –Black Box data mining From suites to tailored tools

Intelligent Data Analysis –User Interaction –also uses tools from Machine Learning

NetTalk Sound generator Speech-synthesis expert system INTELLI Sound Generator Speech-synthesis expert system NetTalk Neural Network