C.-C. Chan Department of Computer Science University of Akron Akron, OH 44325-4003 USA 1 UA Faculty Forum 2008 by C.-C. Chan.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Data Mining Concepts 1.1 COT5230 Data Mining Week 1 Data Mining Concepts M O N A S H A U S T R A L I A ’ S I N T E R N A T I O N A L U N I V E R S I T.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data mining By Aung Oo.
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Data Mining CS 157B Section 2 Keng Teng Lao. Overview Definition of Data Mining Application of Data Mining.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Chapter 1 Introduction to Data Mining
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Data Mining By Dave Maung.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Data Mining and Decision Support
Web Analytics Xuejiao Liu INF 385F: WIRED Fall 2004.
Chapter 26: Data Mining Prepared by Assoc. Professor Bela Stantic.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Ahmed K. Ezzat, SQL Server 2008 and Data Mining Overview 1 Data Mining and Big Data.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data Mining – Intro.
By Arijit Chatterjee Dr
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
MIS 451 Building Business Intelligence Systems
Data Mining 101 with Scikit-Learn
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Machine Learning Ali Ghodsi Department of Statistics
Waikato Environment for Knowledge Analysis
Data Mining: Concepts and Techniques Course Outline
MIS5101: Data Analytics Advanced Analytics - Introduction
Sangeeta Devadiga CS 157B, Spring 2007
כריית מידע -- מבוא ד"ר אבי רוזנפלד.
Data Warehousing and Data Mining
I don’t need a title slide for a lecture
Machine Learning with Weka
Data Warehousing Data Mining Privacy
Presentation transcript:

C.-C. Chan Department of Computer Science University of Akron Akron, OH USA 1 UA Faculty Forum 2008 by C.-C. Chan

Outline Overview of Data Mining Software Tools A Rule-Based System for Data Mining Concluding Remarks 2 UA Faculty Forum 2008 by C.-C. Chan

Data Mining (KDD) From Data to Knowledge Process of KDD (Knowledge Discovery in Databases) Related Technologies Comparisons 3 UA Faculty Forum 2008 by C.-C. Chan

Why KDD? We are drowning in information, but starving for knowledge  John Naisbett Growing Gap between Data Generation and Data Understanding: Automation of business activities: Telephone calls, credit card charges, medical tests, etc. Earth observation satellites: Estimated will generate one terabyte (10 15 bytes) of data per day. At a rate of one picture per second. Biology: Human Genome database project has collected over gigabytes of data on the human genetic code [Fasman, Cuticchia, Kingsbury, 1994.] US Census data: NASA databases: … World Wide Web: 4 UA Faculty Forum 2008 by C.-C. Chan

Process of KDD 5 [1] Fayyad, U., Editorial, Int. J. of Data Mining and Knowledge Discovery, Vol.1, Issue 1, [2] Fayyad, U., G. Piatetsky-Shapiro, and P. Smyth, "From data mining to knowledge discovery: an overview," in Advances in Knowledge Discovery and Data Mining, Fayyad et al (Eds.), MIT Press, UA Faculty Forum 2008 by C.-C. Chan

Process of KDD 1. Selection  Learning the application domain  Creating a target dataset 2. Pre-Processing  Data cleaning and preprocessing 3. Transformation  Data reduction and projection 4. Data Mining  Choosing the functions and algorithms of data mining  Association rules, classification rules, clustering rules 5. Interpretation and Evaluation  Validate and verify discovered patterns 6. Using discovered knowledge 6 UA Faculty Forum 2008 by C.-C. Chan

Typical Data Mining Tasks Finding Association Rules [Rakesh Agrawal et al, 1993] Each transaction is a set of items. Given a set of transactions, an association rule is of the form X  Y where X and Y are sets of items. e.g.: 30% of transactions that contain beer also contain diapers; 2% of all transactions contain both of these items. Applications: Market basket analysis and cross-marketing Catalog design Store layout Buying patterns 7 UA Faculty Forum 2008 by C.-C. Chan

Finding Sequential Patterns Each data sequence is a list of transactions. Find all sequential patterns with a user-specified minimum support. e.g.: Consider a book-club database A sequential pattern might be 5% of customers bought “Harry Potter I”, then “Harry Potter II”, and then “Harry Potter III”. Applications: Add-on sales Customer satisfaction Identify symptoms/diseases that precede certain diseases 8 UA Faculty Forum 2008 by C.-C. Chan

Finding Classification Rules Finding discriminant rules for objects of different classes. Approaches: Finding Decision Trees Finding Production Rules Applications: Process loans and credit cards applications Model identification 9 UA Faculty Forum 2008 by C.-C. Chan

Text Mining Web Usage Mining Etc. 10 UA Faculty Forum 2008 by C.-C. Chan

Related Technologies Database Systems MS SQL server Transaction databases OLAP (Data Cubes) Data Mining Decision Trees Clustering Tools Machine Learning/Data Mining Systems CART (Classification And Regression Trees) C 5.x (Decision Trees) WEKA (Waikato Environment for Knowledge Analysis) LERS ROSE 2 Rule-Based Expert System Development Environments CLIPS, JESS EXSYS Web-based Platforms Java MS.Net 11 UA Faculty Forum 2008 by C.-C. Chan

12 Pre- Processing Learning Data Mining Inference Engine End-User Interface Web-Based Access Reasoning with Uncertainties MS SQL Server N/ADecision Trees Clustering N/A CART C 5.x N/ADecision TreesBuilt-inEmbeddedN/A WEKAYesTrees, Rules, Clustering, Association N/AEmbeddedNeed Programming N/A CLIPS JESS N/A Built-inEmbeddedNeed Programming 3 rd parties Extensions Comparisons UA Faculty Forum 2008 by C.-C. Chan

Rule-Based Data Mining System Objectives Develop an integrated rule-based data mining system provides Synergy of database systems, machine learning, and expert systems Dealing with uncertain rules Delivery of web-based user interface 13 UA Faculty Forum 2008 by C.-C. Chan

Structure of Rule-Based Systems 14 UA Faculty Forum 2008 by C.-C. Chan

15 System Workflow Input Data Set Data Pre- processing Rule Generator User Interface Generator UA Faculty Forum 2008 by C.-C. Chan

16 Input Data Set: Text file with comma separated values (CSV) It is assumed that there are N columns of values corresponding to N variables or parameters, which may be real or symbolic values. The first N – 1 variables are considered as inputs and the last one is the output variable. Data Preprocessing: Discretize domains of real variables into a finite number of intervals Discretized data file is then used to generate an attribute information file and a training data file. Rule Generator: A symbolic learning program called BLEM2 is used to generate rules with uncertainty User Interface Generator: Generate a web-based rule-based system from a rule file and corresponding attribute file UA Faculty Forum 2008 by C.-C. Chan

Architecture of RBC generator 17 Requests Middle Tier Client Responses SQL DB server Workflow of RBC generator Rule set File Metadata File SQL Rule Table Rule Table Definition RBC Generator UA Faculty Forum 2008 by C.-C. Chan

Concluding Remarks A system for generating rule-based classifier from data with the following benefits: No need of end user programming Automatic rule-based system creation Delivery system is web-based provides easy access 18 UA Faculty Forum 2008 by C.-C. Chan

Project Status The current version 1.4 of our system provides fundamental features for data mining from data including: Data Preprocessing Management of preprocessed data files Machine Learning tool to generate rules from data Rule-Based Classifier system supporting uncertain rules Web-Based access 19 UA Faculty Forum 2008 by C.-C. Chan

Future Work More advanced features in Data Preprocessing such as data cleansing, data transformation, and data statistics Learning from multi-criteria inputs with preferential rankings to support Multiple Criteria Decision Making processes Concept-Oriented information retrieval and search 20 UA Faculty Forum 2008 by C.-C. Chan

Thank You! 21 UA Faculty Forum 2008 by C.-C. Chan