Data mining By Aung Oo.

Slides:



Advertisements
Similar presentations
Data Warehousing and Data Mining J. G. Zheng May 20 th 2008 MIS Chapter 3.
Advertisements

1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining Knowledge Discovery in Databases Data 31.
Data Mining By Archana Ketkar.
Clarifying the Research Question through Secondary Data and Exploration Chapter 5 組員 黎旭崴 李承霖.
Data Mining Ketaki Borkar CS157A November 29, 2007.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
DataMining By Guan Hang Su CS157A section 2 fall 2005.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Data Mining: A Closer Look
Data Mining.
Business Intelligence
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Chapter 5: Data Mining for Business Intelligence
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining: Introduction. Why Data Mining? l The Explosive Growth of Data: from terabytes to petabytes –Data collection and data availability  Automated.
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.
© 2008 Pearson Prentice Hall, Experiencing MIS, David Kroenke Slide 1 Chapter 9 Competitive Advantage with Information Systems for Decision Making.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
Principles of Data Mining. Introduction: Topics 1. Introduction to Data Mining 2. Nature of Data Sets 3. Types of Structure Models and Patterns 4. Data.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
New Developments in Business Intelligence ( Decision Support Systems) BUS 782.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Secondary Data Searches
DATA MINING Using Association Rules by Andrew Williamson.
MIS2502: Data Analytics Advanced Analytics - Introduction.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
CS507 Information Systems. Lesson # 11 Online Analytical Processing.
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining. Overview the extraction of hidden predictive information from large databases Data mining tools predict future trends and behaviors, allowing.
Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Introduction BIM Data Mining.
Data Mining – Intro.
By Arijit Chatterjee Dr
Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
MIS 451 Building Business Intelligence Systems
MIS5101: Data Analytics Advanced Analytics - Introduction
CSE591: Data Mining by H. Liu
Presentation transcript:

Data mining By Aung Oo

What is Data Mining? Different perspectives: CS, Business, IT As a field of research in CS: Science of extracting useful information from large data sets or databases Also known as Knowledge Discovery and Data Mining (KDD) Knowledge Discovery in Databases (KDD)

Knowledge Discovery and Data Mining (KDD) KDD can be said to lie at the intersection of statistics, machine learning, data bases, pattern recognition, information retrieval and artificial intelligence.

Data Mining Definitions Analysis of datasets to find unsuspected relationships Summarize data in novel ways that are understandable useful to data owner Extraction of knowledge from data non-trivial extraction of implicit, previously unknown & potentially useful knowledge from data Process of discovering patterns: automatically or semi-automatically, in large quantities of data Patterns discovered must be useful: meaningful in that they lead to some advantage, usually economic

Why Data Mining? Large datasets are common: due to advances in digital data acquisition and storage technology. Automatic data production leads to need for automatic data consumption Large databases mean vast amounts of information Difficulty lies in accessing it Business Supermarket transactions Credit card usage records Telephone call details Government statistics Scientific Images of astronomical bodies Molecular databases Medical records

Why Data Mining? Data mining is ready for application in the business community because it is supported by three technologies that are now sufficiently mature: Massive data collection Powerful multiprocessor computers Data mining algorithms

Example of Data Mining If a store tracks the purchases of a customer and notices that a customer buys a lot of silk shirts, the data mining system will make a correlation between that customer and silk shirts. The store may begin direct mail marketing of silk shirts to that customer or it may alternatively attempt to get the customer to buy a wider range of products . Another example: analysts found that beers and diapers were often bought together . So place the high-profit diapers next to the high-profit beers. This technique is often referred to as "Market Basket Analysis".

Steps in the Evolution of Data Mining Evolutionary Step Business Question Enabling Technologies Data Collection (1960s) "What was my total revenue in the last five years?" Computers, tapes, disks Data Access (1980s) "What were unit sales in New England last March?" Relational databases (RDBMS), Structured Query Language (SQL), ODBC Data Warehousing & Decision Support (1990s) "What were unit sales in New England last March? Drill down to Boston." On-line analytic processing (OLAP), multidimensional databases, data warehouses Data Mining (Emerging Today) "What’s likely to happen to Boston unit sales next month? Why?" Advanced algorithms, multiprocessor computers, massive databases

The Scope of Data Mining Automated prediction of trends and behaviors. Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings. Automated discovery of previously unknown patterns. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. More columns. High performance data mining allows users to explore the full depth of a database, without pre-selecting a subset of variables. More rows. Larger samples yield lower estimation errors and variance, and allow users to make inferences about small but important segments of a population.

Data Mining vs. Statistics Objective of data mining exercise plays no role in data collection strategy In this way it differs from much of statistics For this reason, data mining is referred to as secondary data analysis KDD more complicated than initially thought 80% preparing data 20% mining data

Query: Data Base vs. Data Mining Data Base: When you know exactly what you are looking for. Data Mining: When you only vaguely know what you are looking for.

Data Mining Tasks and Techniques Not so much a single technique Idea that there is more knowledge hidden in the data than shows itself on the surface Any technique that helps to extract more out of data is useful Five major task types: 1. Exploratory Data Analysis (Visualization) 2. Descriptive Modeling (Density estimation, Clustering) 3. Predictive Modeling (Classification and Regression) 4. Discovering Patterns and Rules (Association rules) 5. Retrieval by Content (Retrieve items similar to pattern of interest)

Privacy concerns For example, if an employer has access to medical records, they may screen out people who have diabetes or have had a heart attack. Screening out such employees will cut costs for insurance, but it creates ethical and legal problems. Essentially, data mining gives information that would not be available otherwise. It must be properly interpreted to be useful. When the data collected involves individual people, there are many questions concerning privacy, legality, and ethics.

Notable Uses of Data Mining Data mining has been cited as the method by which the U.S. Army intelligence unit, Able Danger, supposedly had identified the 9/11 attack leader, Mohamed Atta, and three other 9/11 hijackers as possible members of an al Qaeda cell operating in the U.S. more than a year before the attack.

References http://www.cedar.buffalo.edu/~srihari/CSE626 http://en.wikipedia.org/wiki/Data_Mining http://www.thearling.com/text/dmwhite/dmwhite.htm