1 Data Mining Data Mining “Application of Information and Communication Technology to Production and Dissemination of Official statistics” 10 May – 11.

Slides:



Advertisements
Similar presentations
Supporting End-User Access
Advertisements

Lateral Thinking Thinking that is Outside the Box!
Centre for Excellence and Outcomes in Children and Young People's Services Decision Making.
Six Thinking Hats. O This tool was created by Edward de Bono in his book '6 Thinking Hats'.6 Thinking Hats O Many successful people think from a very.
Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.
Data Mining.
DATA WAREHOUSING.
Clementine Server Clementine Server A data mining software for business solution.
Clarifying the Research Question through Secondary Data and Exploration Chapter 5 組員 黎旭崴 李承霖.
Data Mining Ketaki Borkar CS157A November 29, 2007.
Data Mining – Intro.
1 Data and Knowledge Management. 2 Data Management: A Critical Success Factor The difficulties and the process Data sources and collection Data quality.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data mining By Aung Oo.
DataMining By Guan Hang Su CS157A section 2 fall 2005.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
1 Data Management (2) Data Management (2) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Data Mining: A Closer Look
Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Enterprise systems infrastructure and architecture DT211 4
Chapter 4 Data, Text, and Web Mining
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Techniques
CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Understanding Data Analytics and Data Mining Introduction.
De Bono’s Thinking Hats
Business Research Process 2
Six Thinking Hats Looking at a Decision From All Points of View 'Six Thinking Hats' is an important and powerful technique. It is used to look at decisions.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Datawarehouse Objectives
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
CHAPTER 4 Data Warehousing, Access, Analysis, Mining, and Visualization 2 1.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Pertemuan 16 Materi : Buku Wajib & Sumber Materi :
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Data Mining and Decision Support
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
QUESTIONING EDU 395: Module 4B: Creative Thinking Dr. Margaret Maughan c SUNY Plattsburgh.
Data Mining. Overview the extraction of hidden predictive information from large databases Data mining tools predict future trends and behaviors, allowing.
Data Mining Copyright KEYSOFT Solutions.
Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.
Chapter 2 Data, Text, and Web Mining. Data Mining Concepts and Applications  Data mining (DM) A process that uses statistical, mathematical, artificial.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
DT4 – Major Project Brainstorming Briefs. Design Pedagogy FORMAL ASPECTS INFORMAL ASPECTS ANALYSIS & SPECIFICATION INITIAL IDEAS CLARITY OF COMMUNICATION.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
MIS2502: Data Analytics Advanced Analytics - Introduction
Lateral Thinking Lateral thinking is a term coined by Edward de Bono, for the solution of problems through an indirect and creative approach. Lateral thinking.
Should they build a campsite and car park in Edale?
Chapter 13 – Data Warehousing
MIS5101: Data Analytics Advanced Analytics - Introduction
The white hat thinker call for information known or needed
Supporting End-User Access
Welcome! Knowledge Discovery and Data Mining
CSE591: Data Mining by H. Liu
Presentation transcript:

1 Data Mining Data Mining “Application of Information and Communication Technology to Production and Dissemination of Official statistics” 10 May – 11 July 2006 M Q Hasan Lecturer/ Statistician UN Statistical Institute for Asia and the Pacific Chiba, Japan

2 Objectives Understanding data mining Basis for future planning and development

3 Contents What is data mining Evolution of data mining Technology and techniques involved Software packages References Exercises

4 What is “data mining” : “The nontrivial extraction of implicit, previously unknown, and potentially useful information from data" “The science of extracting useful information from large data sets or databases". Wikipedia, the free encyclopaedia

5 What is “data mining” : Also term as “data discovery” Process of analyzing data to identify patterns or relationship Extraction of pattern or information from stored information

6 What is “data mining” …. Prediction of future events, behaviors, estimating value etc. –Accuracy. Confidence level.

7 What is “data mining” …. Process of data mining –the initial exploration of available data –model building or pattern identification with validation –the application of the model to new data in order to generate predictions

8 What is “data mining” …. Requirements –Data –Concepts –Instances –Parameters

9 What is NOT data mining : Data warehousing SQL / ad hoc queries / reporting Software agents Online analytical processing (OLAP) Data visualization

10 Why DM now ? … Development and refinement of three technologies over the years. –Massive data collection and storage facility. Databases of terabyte order. Includes publicly available data –Powerful multiprocessor computers. Parallel processing technology, distributed technology, speed. –Data mining algorithms. Statistical, Data Modeling etc.

11 Evolutionary Step Business QuestionEnabling Technologies Characteristics Data Collection (1960s) “What was my total revenue in the last five years?” Computers, tapes, disks Retrospective, static data delivery Data Access (1980s) “What were unit sales in New England last March?” RDBMS, SQL, ODBC Retrospective, dynamic data delivery at record level Data Warehousing & Decision Support (1990s) “What were unit sales in New England last March? Drill down to Boston." On-line analytic processing (OLAP), multidimensional databases, data warehouses Retrospective, dynamic data delivery at multiple levels Data Mining (Ememrged) “What’s likely to happen to Boston unit sales next month? Why?” Advanced algorithms, multiprocessor computers, massive databases Prospective, proactive information delivery

12 Tools Case based reasoning. Case-based reasoning tools provide a means to find records similar to a specified record or records. These tools let the user specify the "similarity" of retrieved records. Data visualization. Data visualization tools let the user easily and quickly view graphical displays of information from different perspectives.

= 1 Is it possible ?

14 Let a = b Then a 2 = ab Then 2a 2 = a 2 + ab Then 2a 2 – 2ab = a 2 – ab Then 2(a 2 – ab) = 1(a 2 – ab) Then (1 + 1)(a 2 – ab) = 1(a 2 – ab) Canceling (a 2 – ab) from both sides = 1 Where is the FALASY ?

15 In data mining think from all sides ? Avoid the FALASIES

16 Thinking Hat techniques White hat:. With this thinking hat you focus on the data available. Look at the information you have, and see what you can learn from it. Look for gaps in your knowledge, and either try to fill them or take account of them. This is where you analyse past trends, and try to extrapolate from historical data.

17 Thinking Hat techniques Red hat: 'Wearing' the red hat, you look at problems using intuition, gut reaction, and emotion. Also try to think how other people will react emotionally. Try to understand the responses of people who do not fully know your reasoning.

18 Thinking Hat techniques Black hat: using black hat thinking. Look at all the bad points of the decision. Look at it cautiously and defensively. Try to see why it might not work. Helps to make plans 'tougher' and resilient. Help you to spot fatal flaws and risks. Helps sometime successful people get so used to thinking positively that often they cannot see problems in advance.

19 Thinking Hat techniques Yellow hat: using yellow hat thinking. Helps “think positively.” Helps you to see all the benefits of the decision and the value in it. Helps you to keep going when everything looks gloomy and difficult.

20 Thinking Hat techniques Green hat: the green hat stands for creativity. This is time to develop creative solutions to a problem. Little criticism of ideas. A whole range of creativity tools can help.

21 Thinking Hat techniques Blue hat: the blue hat stands for process control.. This is the hat worn by people chairing meetings. When running into difficulties because ideas are running dry, they may direct activity into green hat thinking. When contingency plans are needed, they will ask for black hat thinking, etc.

22 Some DM terms : Instances Attributes Objects Class Relationships Rule indications

23 Machine learning

24 Some DM techniques : Decision Trees Neural Networks Genetic Algorithms Nearest neighbor methods Rule indications

25 Some DM techniques Decision trees –Tree shaped structure with branches –2 main types: Classification trees label records and assign them to the proper class Regression trees estimate the value of a target variable –Various algorithms Chi square automatic interaction detection (CHAID) Classification & regression trees (CART) Etc

26 Some DM techniques Neural Networks –Learn through training –Resemble to biological networks in structure –Can produce very good predictions –Not easy to use and to understand –Cannot deal with missing data

27 Some DM techniques Genetic Algorithms –Optimization techniques Genetic combinations Natural selections Concepts of evolution Etc

28 Some DM techniques Nearest neighbor methods –K-nearest neighbor technique –Classification trees based on combination of classes

29 Some DM techniques Rule indications –Extraction of if, then, else rules from data based on statistical significance

30 How DM works ? Modeling –Predicting FUTURE !!!! Build once –apply /use many

31 How DM works ? Test validity modeling –Known cases with known data

32 Data Mining Software Numap7, freeware for fast development, validation, and application of regression type networks including the multi layer perception, functional link net, piecewise linear network, self organizing map and k- means. –

33 Data Mining Software Tiberius, MLP Neural Network for classification and regression problems. –

34 Data Mining Software Eurostat-funded research projects –SODAS – symbolic official data analysis –System => ASSO –KESO – knowledge extraction for statistical –Offices –Spin! – Spatial mining for data of public interest

35 Data Mining Software SAS data mining tools –Enterprise miner and text miner –Applications relevant to national statistical offices –Build a model of real world based on various –Data –Use the model to produce patterns –Reveal trends –Explain known outcomes –Predict the future outcomes –Forecast resource demands –Identify factors to secure a desired effect –Produce new knowledge to better inform –Decision makers before they act –Predict new opportunities

36 Data Mining Software SAS data mining process : A framework for data mining: sample, explore, modify, model, assess Integrated models and algorithms: –Decision trees –Neural networks –Regression –Memory based reasoning –Bagging and boosting ensembles –Two-stage models –Clustering –Time series –Associations

37 Data Mining Software SPSS Clementine –Data mining workbench –Applications relevant to national statistical offices Find useful relationships in large data sets Develop predictive models Improve decision making –Modeling Prediction and classification: neural networks, decision Trees and rule induction, linear regression, logistic Regression, multinomial logistic regression Clustering and segmentation: Kohonen network, Kmeans, And two steps Association detection: GRI, apriori, and sequence Data reduction: factor analysis and principle Components analysis Meta-modeling – combination of models

38 Data Mining Software Open source data mining – - Weka (Waikato –Environment for knowledge analysis) –Data mining software in java –Collection of machine learning algorithms for data –Mining tasks: Data pre-processing Classification Regression Clustering Association rules Visualization –Platforms: Linux, windows and Macintosh –Apply directly to a dataset or call from java code –Online documentation: Tutorial User guide API documentation

39 References : Statistical Data Mining Tutorials – Data Mining Glossary – Mind tools - Decision Tree Analysis – Welcome to TheDataMine – An Introduction to Data Mining - Discovering hidden value in your data warehouse – An Introduction to Data Mining – Data Mining for Official Statistics, Phan Tuan Pham (UNSD) –SIAP ICT, Chiba, 7 – 9 June 2004 Wikipedia, the free encyclopaedia –