David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:

Slides:



Advertisements
Similar presentations
Data Mining in Computer Games By Adib Adam Hussain & Mohammed Sarfraz.
Advertisements

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Data Mining By Archana Ketkar.
Chapter 15 Data Warehousing, OLAP, and Data Mining
Data Mining and Data Warehousing – a connected view.
Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data mining By Aung Oo.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Business Intelligence
Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Microsoft Enterprise Consortium Data Mining Concepts Introduction: The essential background Prepared by David Douglas, University of ArkansasHosted by.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining Techniques
Business Intelligence, Data Mining and Data Analytics/Predictive Analytics By: Asela Thomason IS 495 Summer 2015.
Data Mining Chun-Hung Chou
Introduction: The essential background
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
4. Secondary Data.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
Business Intelligence - 2 BUS 782. Topics Data warehousing Data Mining.
Unit 2: Information Systems OCR Level 3 Cambridge Technicals in IT.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Data Mining and Decision Support
CHAPTER 8 DATA MINING BASICS.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Data Mining Copyright KEYSOFT Solutions.
Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
Supplemental Chapter: Business Intelligence Information Systems Development.
Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Data Mining 101 with Scikit-Learn
Supporting End-User Access
Course Introduction CSC 576: Data Mining.
Data Warehousing Data Mining Privacy
Kenneth C. Laudon & Jane P. Laudon
Welcome! Knowledge Discovery and Data Mining
Presentation transcript:

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Mining (and machine learning) DM Lecture 1: Overview of DM, and overview of the DM part of the DM&ML module Many of these slides are highly derivative of Nick Taylor’s slides used for this module in previous years

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Overview of My Lectures All at: 25/9 Overview of DM (and of these 8 lectures) 02/10: Data Cleaning - usually a necessary first step for large amounts of data 09/10 Basic Statistics for Data Miners - essential knowledge, and very useful 16/10 Basket Data/Association Rules (A Priori algorithm) - a classic algorithm, used much in industry NO THURSDAY LECTURE OCTOBER 23rd 30/10 Cluster Analysis and Clustering - simple algs that tell you much about the data NO THURSDAY LECTURE November 6th 13/11: Similarity and Correlation Measures - making sure you do clustering appropriately for the given data 20/11: Regression - the simplest algorithm for predicting data/class values 27/11: A Tour of Other Methods and their Essential Details - every important method you may learn about in future

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Mining - Definition & Goal Definition – Data Mining is the exploration and analysis of large quantities of data in order to discover meaningful patterns and rules Goal – To permit some other goal to be achieved or performance to be improved through a better understanding of the data

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Some examples of huge databases Retail basket data: much commercial DM is done with this. In one store, 18,000 baskets per month Tesco has >500 stores. Per year, 100,000,000 baskets ? The Internet ~ >15,000,000,000 pages Lots of datasets: UCI Machine Learning repository How can we begin to understand and exploit such datasets? Especially the big ones?

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Like this …

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: and this …

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: and this …

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: or this … (see

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: or this … see ndemo/html/root.html

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Mining - Basics Data Mining is the process of discovering patterns and inferring associations in raw data Data Mining is a collection of powerful techniques intended to analyse large amounts of data There is no single Data Mining approach Data Mining can employ a range of techniques, either individually or in combination with each other

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Mining – Why is it important? Data are being generated in enormous quantities Data are being collected over long periods of time Data are being kept for long periods of time Computing power is formidable and cheap A variety of Data Mining software is available

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Mining – History The approach has its roots over 40 years ago In the early 1960s Data Mining was called statistical analysis, and the pioneers were statistical software companies such as SPSS By the late 1980s these traditional techniques had been augmented by new methods such as machine induction, artificial neural networks, evolutionary computing, etc.

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Mining – Two Major Types Directed (Farming) – Attempts to explain or categorise some particular target field such as income, medical disorder, genetic characteristic, etc. Undirected (Exploring) – Attempts to find patterns or similarities among groups of records without the use of a particular target field or collection of predefined classes Compare with Supervised and Unsupervised systems in machine learning

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Mining – Tasks Classification - Example: high risk for cancer or not Estimation - Example: household income Prediction - Example: credit card balance transfer average amount Affinity Grouping - Example: people who buy X, often also buy Y with a probability of Z Clustering - similar to classification but no predefined classes Description and Profiling – Identifying characteristics which explain behaviour - Example: “More men watch football on TV than women”

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Warehousing Note that Data Mining is very generic and can be used for detecting patterns in almost any data – Retail data – Genomes – Climate data – Etc. Data Warehousing, on the other hand, is almost exclusively used to describe the storage of data in the commercial sector

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Warehousing - Definitions “ A subject-oriented, integrated, time-variant and nonvolatile collection of data in support of management's decision making process ” W. H. Inmon, "What is a Data Warehouse?" Prism Tech Topic, Vol. 1, No. 1, a very influential definition. “ A copy of transaction data, specifically structured for query and analysis ” Ralph Kimball, from his 2000 book, “The Data Warehouse Toolkit”

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Warehouse – why? For organisational learning to take place data from many sources must be gathered together over time and organised in a consistent and useful way Data Warehousing allows an organisation to remember its data and what it has learned about its data Data Mining techniques make use of the data in a Data Warehouse and subsequently add their results to it

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Warehouse - Contents A Data Warehouse is a copy of transaction data specifically structured for querying, analysis and reporting The data will normally have been transformed when it was copied into the Data Warehouse The contents of a Data Warehouse, once acquired, are fixed and cannot be updated or changed later by the transaction system - but they can be added to of course

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Marts A Data Mart is a smaller, more focused Data Warehouse – a mini-warehouse A Data Mart will normally reflect the business rules of a specific business unit within an enterprise – identifying data relevant to that unit’s acitivities

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: From Data Warhousing to Machine Learning, via Data Marts

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: The Big Challenge for Data Mining The largest challenge that a Data Miner may face is the sheer volume of data in the Data Warehouse It is very important, then, that summary data also be available to get the analysis started The sheer volume of data may mask the important relationships in which the Data Miner is interested Being able to overcome the volume and interpret the data is essential to successful Data Mining

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: What happens in practice … Data Miners, both “farmers” and “explorers”, are expected to utilise Data Warehouses to give guidance and answer a limitless variety of questions The value of a Data Warehouse and Data Mining lies in a new and changed appreciation of the meaning of the data There are limitations though - A Data Warehouse cannot correct problems with its data, although it may help to more clearly identify them

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Which brings us to “data cleaning”, next week …