Dr. Russell Anderson Dr. Musa Jafar West Texas A&M University.

Slides:



Advertisements
Similar presentations
Decision Tree Approach in Data Mining
Advertisements

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
1 Chapter 34 Data Mining Transparencies © Pearson Education Limited 1995, 2005.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Introduction to Data Mining with XLMiner
Chapter 17 Overview of Multivariate Analysis Methods
Lecture Notes for Chapter 4 Introduction to Data Mining
Chapter 9 Business Intelligence Systems
Neural Networks Chapter Feed-Forward Neural Networks.
Evaluation of MineSet 3.0 By Rajesh Rathinasabapathi S Peer Mohamed Raja Guided By Dr. Li Yang.
Data Mining By Archana Ketkar.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Data Mining – Intro.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Chapter 5 Data mining : A Closer Look.
Chapter 35 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Gavin Russell-Rockliff BI Technical Specialist Microsoft BIN305.
Introduction to Directed Data Mining: Decision Trees
Microsoft Enterprise Consortium Data Mining Concepts Introduction: The essential background Prepared by David Douglas, University of ArkansasHosted by.
Peter Myers Bitwise Solutions Pty Ltd. Predictive Analytics PresentationExplorationDiscovery Passive Interactive Proactive Business Insight Canned.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Dr. Awad Khalil Computer Science Department AUC
Data Mining Techniques
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Data Mining Chun-Hung Chou
Introduction: The essential background
Chapter 13 Genetic Algorithms. 2 Data Mining Techniques So Far… Chapter 5 – Statistics Chapter 6 – Decision Trees Chapter 7 – Neural Networks Chapter.
Anomaly detection with Bayesian networks Website: John Sandiford.
(a.k.a: The statistical bare minimum I should take along from STAT 101)
COMP3503 Intro to Inductive Modeling
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
EMBC2001 Using Artificial Neural Networks to Predict Malignancy of Ovarian Tumors C. Lu 1, J. De Brabanter 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Data Mining Copyright KEYSOFT Solutions.
Machine Learning in CSC 196K
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
Chapter 2 Data, Text, and Web Mining. Data Mining Concepts and Applications  Data mining (DM) A process that uses statistical, mathematical, artificial.
Fraud Detection Notes from the Field. Introduction Dejan Sarka –Data.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
Machine Learning with Spark MLlib
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
RESEARCH APPROACH.
Machine Learning & Data Science
Sangeeta Devadiga CS 157B, Spring 2007
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
MIS2502: Data Analytics Clustering and Segmentation
Decision trees MARIO REGIN.
Welcome! Knowledge Discovery and Data Mining
Presentation transcript:

Dr. Russell Anderson Dr. Musa Jafar West Texas A&M University

 The process of discovering useful information in large data repositories. (Tan, P-N., Steinbach, M., and Kumar, V., Introduction to Data Mining, Addison-Wesley, 2006)  Discovered information should be:  Valid  Previously unknown  Actionable

 Seven objectives of Lenox and Cuff in 2002 (based on ACM 2001 Ironman Report)  Prepare and warehouse data  Process data based on set of DM algorithms  Analyze results  Make predictions  Select proper algorithm  Make application  Motivated to continue graduate studies in DM  We have added  Get to know data using statistical analysis tools  Use visualization tools for analysis and review

1. Get to know the data. 2. Select an appropriate data mining algorithm based on the data and the mining objective. 3. Construct a model using the selected algorithm. 4. Analyze the results. 5. Make application.

 How is it structured?  Single table/flat-file.  Multi-table – relationships  Number of observations  Number of dimensions (attributes)  Compute summary statistics using tool such as MS-Excel  Visually evaluate characteristics of the data

 Tools developed:  Correlation Matrix  Scatter Plot  Parallel Coordinate Plot

 Distributions of data  Data ranges of numeric attributes  Cardinality of discrete attributes  Shape of distribution  Skewed  Multi-model  Location of outliers  Identification possible relationships between attributes  Identification of subpopulations within the data

 Microsoft Business Intelligence Tools  Association Analysis – aka market basket analysis  Classification  Decision Trees  Artificial Neural Network  Bayesian Analysis  Regression  Cluster Analysis  Custom Tools with Embedded Visual Presentation  Artificial neural network for both classification and regression  Self-Organizing Map (SOM) for cluster analysis

 Purpose of each methodology  Steps of underlying algorithm  Data types supported  Issues in construction and application  Parameter settings  Results interpretation

 Does the model fit the training data too well?  Need to separate available into training and validation subsets.  Visual view of training progress valuable.

 Mushroom edibility classifiers Classifier A Actual EdiblePoisonous PredictedEdible38%0% Poisonous8%54% Classifier B Actual EdiblePoisonous PredictedEdible44%1% Poisonous2%53%

 Black Box - models built using sophisticated methodologies (ANN’s for example) perform very well, but gaining an understanding of the model itself is difficult.  Contribution of individual input attributes  Nature of contribution (shape of curve)  Interaction between input attributes

 For a detailed presentation of the mechanics of the software deployed, attend our workshop tomorrow morning.  Saturday: 8-10 AM  Kachina A  Microsoft SQL Server Business Intelligence Studio  Visualization Tools