Introduction to Data-Mining Marko Grobelnik Institut Jozef Stefan.

Slides:



Advertisements
Similar presentations
Overview of Data Mining and the KDD Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Advertisements

Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
modified by Marius Bulacu
Data Mining Knowledge Discovery in Databases Data 31.
Dr. Tahar Kechadi Dr. Joe Carthy
Data Mining By Archana Ketkar.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Data Mining.
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Using IBM Intelligent Miner Presented by: Qiyan (Jennifer ) Huang.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Chapter 1. Introduction Motivation: Why data mining?
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
Business Intelligence
Data Mining: Introduction. Why Data Mining? l The Explosive Growth of Data: from terabytes to petabytes –Data collection and data availability  Automated.
Data Mining: Concepts and Techniques
Data Mining Techniques As Tools for Analysis of Customer Behavior Lecture 2:
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Data Warehousing/Mining 1 Data Warehousing/Mining Comp 150 DW Chapter 1. Introduction Instructor: Dan Hebert.
Chapter 1 Introduction to Data Mining
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
Christoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD) 1 Knowledge Discovery in Data [and Data Mining] (KDD) Let us find something interesting!
2015年10月18日星期日 2015年10月18日星期日 2015年10月18日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
October 18, 2015 Data Mining: Concepts and Techniques 1 DATA MINING Motivation: Why data mining? What is data mining? Data Mining: On what kind of data?
2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
MIS2502: Data Analytics Advanced Analytics - Introduction.
January 8, 2016Data Mining: Concepts and Techniques1 Data Mining: Trends and Applications.
Conclusions. Why Data Mining? -- Potential Applications Database analysis and decision support – Market analysis and management target marketing, customer.
Academic Year 2014 Spring Academic Year 2014 Spring.
February 13, 2016 Data Mining: Concepts and Techniques 1 1 Data Mining: Concepts and Techniques These slides have been adapted from Han, J., Kamber, M.,
Data Warehousing/Mining 1. 2 Chapter 1. Introduction v Motivation: Why data mining? v What is data mining? v Data Mining: On what kind of data? v Data.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
2016年6月12日星期日 2016年6月12日星期日 2016年6月12日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Lecture-2 Bscshelp.com.  Why Data Mining and What Kinds of Data Can Be Mined?  Potential Applications 2.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
July 7, 2016 Data Mining: Concepts and Techniques 1 1.
Data Mining.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Data Mining – Intro.
Data Mining Motivation: “Necessity is the Mother of Invention”
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
DATA MINING BY: PRADEEP AGRAWAL MBA (SEC – A) ALLIANCE UNIVERSITY – SCHOOL OF BUSINESS.
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Mining: Concepts and Techniques
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining Concepts and Techniques
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Presentation transcript:

Introduction to Data-Mining Marko Grobelnik Institut Jozef Stefan

Outline Motivation & Definition What are typical applications? How do we build solutions? Method & algorithms Tools & standards …conclusion

Motivation: “Necessity is the Mother of Invention” Data explosion problem: Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories We are drowning in data, but starving for knowledge!

Data pyramid Data Information Knowledge Wisdom Data + context Information + rules Knowledge + experience

What Is Data Mining? Data mining (knowledge discovery in databases - KDD, business intelligence): Extraction of interesting ( non-trivial, implicit, previously unknown and potentially useful) information from data in large databases “Tell me something interesting about the data.” “Describe the data.”

Potential Applications Database analysis and decision support Market analysis and management Risk analysis and management Fraud detection and management Text analysis - Text Mining Web analysis - Web Mining Intelligent query answering

Market Analysis and Management Where are the data sources for analysis? Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies. Target marketing: Find clusters of “model” customers who share the same characteristics: interest, income level, spending habits, etc. Determine customer purchasing patterns over time: Conversion of single to a joint bank account: marriage, etc.

Analysis and Risk Management Finance planning and asset evaluation: cash flow analysis and prediction cash flow analysis and prediction time series analysis (trend analysis, etc.) time series analysis (trend analysis, etc.) Resource planning: summarize and compare the resources and spending summarize and compare the resources and spending Competition: Monitor competitors and market directions Set pricing strategy in a highly competitive market

Fraud Detection and Management Use historical data to build models of fraudulent behavior and use data mining to help identify similar instances Examples application: Auto Insurance: detect a group of people who stage accidents to collect on insurance Money Laundering: detect suspicious money transactions Detecting telephone fraud: detecting suspicious patterns (generate call model - destination, time, duration)

Other Areas of application Sports Analysis of game in NBA (eg., detect the opponent’s strategy) Astronomy discovery and classification of new objects Internet analysis of Web access logs, discovery of user behavior patterns, analyzing effectiveness of Web marketing, improving Web site organization Text news analysis, medical record analysis, automatic sorting and filtering, automatic document categorization

Data mining: intersection of multiple disciplines Database systems, data warehouse and OLAP Statistics Machine learning Visualization Information science High performance computing Other disciplines: Neural networks, mathematical modeling, information retrieval, pattern recognition,...

From data to knowledge Data mining: the core of knowledge discovery process. Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation

Main steps of KDD Learning the application domain: relevant prior knowledge and goals of application Data cleaning and preprocessing: (may take 60% of effort!): creating a target data set: data selection find useful features, generate new features, map feature values, discretization of values Choosing data mining tools/algorithms summarization, classification, regression, association, clustering. Data mining: search for patterns of interest Interpretation: analysis of results. visualization, transformation, removing redundant patterns, etc. Use of discovered knowledge.

Data Mining and Business Intelligence Increasing potential to support business decisions End User Business Analyst Data Analyst DBA Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration OLAP, MDA Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts Data Sources Paper, Files, Information Providers, Database Systems, OLTP

Mining the data: what kind of data? Relational databases Data warehouses Transactional databases Advanced DB systems and information repositories: object-oriented and object-relational databases, spatial databases, time-series data and temporal data, text databases and multimedia databases, heterogeneous and legacy databases, WWW

Data mining algorithms (I) Association: finding rules like “if the customer bought item A, then in X% of transactions she/he also bought item B”. This holds for Y% of all transactions Classification and Prediction: classify data based on the values in a classifying attribute, e.g., classify countries based on climate, or classify cars based on gas mileage predict some unknown or missing attribute values based on other information

Data mining algorithms (II) Clustering: group data to form new classes, e.g., find groups of customers with similar behavior Time-series analysis: trend and deviation analysis: find and characterize evolution trend, sequential patterns, similar sequences, and deviation data, e.g., stock analysis. similarity-based pattern-directed analysis: find and characterize user-specified patterns in large databases. cyclicity/periodicity analysis: find segment-wise or total cycles or periodic behaviors in time-related data. Other pattern-directed or statistical analysis

Association rules Finding associations or correlations among a set of items Applications: basket data analysis, cross-marketing,… Example: buying beer and chips -> ketchup [0.5%,60%] rule form:LHS  RHS [support, confidence]

Classification Finding rules that describe given groups of objects Applications: credit approval, target marketing, medical diagnosis, treatment effectiveness analysis,... Example: based on the past symptoms and diagnoses of patients generate a model describing influence of symptoms to disease to be used for classification of future test data and better understanding of each class Methods: decision-trees (e.g., ID3, C5), statistics, neural networks,...

Classification using decision trees A decision tree: Top-down decision tree generation algorithm, at each step: partition examples based on the selected attribute value select attribute favoring the partitioning which makes the majority of examples belong to a single class windy sunny rain overcast NP P NP humidity outlook

Classification methods Decision trees and decision rules: give a training set of labeled data tree pruning used for noise handling and avoiding data overfiting Bayesian classification: Naïve Bayesian classification Bayesian belief networks Neural network approach: multi-layer networks and back-propagation Genetic algorithms: genetic operators (mutation, cross-over,…) and fitness function selection

Clustering methods partitioning a set of data into a set of classes, called clusters, such that the members of each class are sharing some interesting common properties. h igh quality clusters if the intra-class similarity is high and the inter-class similarity is low Important is distance measure

Data-Mining tools Main producers of Data-Mining software: IBM – Intelligent Miner, extender for DB2 SAS – Enterprise Miner SPSS – Clementine Microsoft – Analysis Server (…part of SQL Server 2000) …many more smaller producers

Data Mining standards PMML (Predictive Modelling Markup Language) XML like language for saving and sharing models (most widely accepted standard) CRISP standardized methodology for building Data Mining applications OLE DB for Data Mining Microsoft’s standard for developing OLEDB/COM components for extending Analysis server with new Data Mining functionality (uses customized SQL language) IBM and Oracle prepared standard extensions to SQL language to support Data Mining functionality

…conclusion Data Mining is an area in the rapid development Who and Why needs Data Mining? …(almost) everybody having the data? …to get something more out of the data More information: