The Mining Mart Approach 1.The process of knowledge discovery and its common practice 2.Supporting the re-use of successful knowledge discovery cases Supporting.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
KDDML: A Middleware Language and System for Knowledge Discovery in Databases Dipartimento di Informatica, Università di Pisa A. Romei, S. Ruggieri, F.
Chapter 2 Data Models Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Oct 31, 2000Database Management -- Fall R. Larson Database Management: Introduction to Terms and Concepts University of California, Berkeley School.
Chapter 3 Database Management
Fundamentals of Information Systems, Second Edition 1 Organizing Data and Information Chapter 3.
“DOK 322 DBMS” Y.T. Database Design Hacettepe University Department of Information Management DOK 322: Database Management Systems.
Chapter 9 Database Design
Data Mining – Intro.
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
Oracle Developer Tools for Visual Studio.NET Curtis Rempe.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
2 1 Chapter 2 Data Model Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Understanding Data Analytics and Data Mining Introduction.
© D. Wong 2002 © D. Wong CS610 / CS710 Database Systems I Daisy Wong.
Chapter 5 Lecture 2. Principles of Information Systems2 Objectives Understand Data definition language (DDL) and data dictionary Learn about popular DBMSs.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 6 Slide 1 Requirements Engineering Processes l Processes used to discover, analyse and.
CSCI 3140 Module 2 – Conceptual Database Design Theodore Chiasson Dalhousie University.
Data Mining Process A manifestation of best practices A systematic way to conduct DM projects Different groups has different versions Most common standard.
Maintenance and Support Week 15 CMIS570. User Training Need to consider the same 2 groups: End users Use the system to achieve the business purpose Creating,
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Chapter 3 DECISION SUPPORT SYSTEMS CONCEPTS, METHODOLOGIES, AND TECHNOLOGIES: AN OVERVIEW Study sub-sections: , 3.12(p )
What is a schema ? Schema is a collection of Database Objects. Schema Objects are logical structures created by users to contain, or reference, their data.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Data Validation OPEN Development Conference September 19, 2008 Sushmita De Systems Analyst.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Preprocessing for Data Mining Vikram Pudi IIIT Hyderabad.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Overview of MOT Knowledge representation system : Basic Modeling Editor LexiconGrammarSemantics Pragmatics MOT Editor.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
1 Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund, www-ai.cs.uni-dortmund.de Knowledge Discovery in Databases (KDD) The Mining.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Foundations of Business Intelligence: Databases and Information Management.
Telecommunication Case Modelling – Call Center C.Chudzian, J.Granat, W.Traczyk Decision Support Systems Laboratory National Institute of Telecommunications.
Workforce Scheduling Release 5.0 for Windows Implementation Overview OWS Development Team.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
Algorithms Emerging IT Fall 2015 Team 1 Avenbaum, Hamilton, Mirilla, Pisano.
Requirements engineering The process of establishing the services that the customer requires from a system and the constraints under which it operates.
Entity Relationship Diagram (ERD). Objectives Define terms related to entity relationship modeling, including entity, entity instance, attribute, relationship.
Data Mining Copyright KEYSOFT Solutions.
Welcome: To the fifth learning sequence “ Data Models “ Recap : In the previous learning sequence, we discussed The Database concepts. Present learning:
ENTITY RELATIONSHIP DIAGRAM. Objectives Define terms related to entity relationship modeling, including entity, entity instances, attribute, relationship.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
 The processes used for RE vary widely depending on the application domain, the people involved and the organisation developing the requirements.  However,
1 Middle East Users Group 2008 Self-Service Engine & Process Rules Engine Presented by: Ryan Flemming Friday 11th at 9am - 9:45 am.
THE LEONS COLLEGE OF LAW1 Organizing Data and Information Chapter 4.
Tools Of Structured Analysis
Model Discovery through Metalearning
Definition CASE tools are software systems that are intended to provide automated support for routine activities in the software process such as editing.
Data Virtualization Tutorial: Introduction to SQL Script
Fundamentals & Ethics of Information Systems IS 201
DATA MODELS.
课程名 编译原理 Compiling Techniques
Waikato Environment for Knowledge Analysis
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
COSC 6340 Projects & Homeworks Spring 2002
Data, Databases, and DBMSs
Machine Learning with Weka
2. An overview of SDMX (What is SDMX? Part I)
Course: Module: Lesson # & Name Instructional Material 1 of 32 Lesson Delivery Mode: Lesson Duration: Document Name: 1. Professional Diploma in ERP Systems.
Database Design Hacettepe University
DATA MODELS.
Chapter 3 Database Management
Semi-Automatic Data-Driven Ontology Construction System
Presentation transcript:

The Mining Mart Approach 1.The process of knowledge discovery and its common practice 2.Supporting the re-use of successful knowledge discovery cases Supporting pre-processing Meta-data for concepts, data, and cases Documenting and adapting a case Compiling meta-data into SQL – executing a case 3.System demonstration 4.Summary

CRISP-DM Process Model Business understanding Data understanding Data preparation Modeling EvaluationDeployment

Common Practice database manual pre-processing: tedious time consuming not re-useable no documentation low level operations drawbacks ?

Without Mining Mart Pre-processing is not supported by the tools. –80 % of the efforts in a knowledge discovery application are invested during pre-processing. –Pre-processing enhances data – better data deliver better data mining results. Documentation of pre-processing is missing. –Similar procedures are performed over and over again. –Experience is not passed over to new employees. Operators do not access the database directly, but can only handle an excerpt.

Using Mining Mart M4-Concept Editor M4-Case Editor M4-Conceptual Model M4-Case Model Conceptual Model (Shops, items, sales...) Abstract case (Selection of shops, items, Running the support vector machine...) Linking business data and conceptual model, Compile the case and see the results! M4-Relational Model

Mining Mart Users The database administrator delivers the relational data model. The data analyst –acquires the conceptual model from the end-user (decision maker), –develops (adapts) the case, –links relational and conceptual model, –runs the case and delivers the results to the end-user.

The Meta Model for Meta Data The Relational Model describes the database The Conceptual Model describes the individuals and classes of the domain with their relations The Case Model describes chains of preprocessing operators The Execution Model generates SQL statements or calls to external tools

The Conceptual Model Concept –Attributes: name, subConceptRestriction –Associations: isA, correspondsToColumnSet, FromConcept, ToConcept, Constraints Relationship FeatureAttribute Value RoleRestriction DomainDataType

The Case Model Case –Attributes: name, Case mode {test, final} caseInput – list of entities from the conceptual model caseOutput – concept, typically the input to data mining step Documentation – free text –Associations: listOfSteps Population – the concept of interest in this case targetAttributes – FeatureAttribute to which the data analysis is applied

Documentation The case model documents the sequence of steps that have led to a good data mining result. For each step, the input, output, and parameter settings are stored. Since steps refer to concepts, the case model can be understood even by non-experts.

Steps and operators Step –Attributes:name –Associations: belongsToCase, embedsOperator, predecessor, successor Operator –Attributes (binary): manual, Loopable – apply operator several times with changed parameters Multi-step – operator delivers several results which will be processed in parallel –Associations: all input to a step (parameters) Conditions – to be checked given the data Constraints – to be checked without access to data Assertions – will be true after operator execution Validity of operator chains are checked, unnecessary database scans are avoided!

Manual Operators Operator FeatureSelectionMultirelational Feature Construction RowSelection GroupingScaling MissingValues Discretization Linear Scaling Log Scaling SelectCases DeleteRecords With MissingValues... Chaining Propositionalisation...

Time Operators TimeOperator Signal2 Symbol movingFunctionWindowing EMA SMA WMA

Learning Operators Classification Regression Clustering Associations Subgroup discovery SVM_light decisionTree MySVM k-means Apriori Midos Learning operators are not only good for the data mining step! Example: C4.5 for discretisation or prediction of missing values. NEU

Supporting Pre-processing The operators are implemented – users just select them. Most operators directly access the database. Intermediate results can be inspected. The system is open for the integration of further operators: –Store the SQL implementation –Store the meta-data within the M4 tables.

Meta-data Meta-model and meta-data are stored in the database. Used –in order to verify applicability conditions –in order to avoid unnecessary steps –by the compiler –by the GUI

The Internet Case Base M4 Schema Instance

Demo

The Concept Editor Define and edit concepts and relations Mapping from concepts to relations of the database.

The Case Editor Tree View Chain Editor

Setting up an SVM Step

Summary Mining Mart eases pre-processing: –Many operators are implemented in the database. –Validity and necessity of operator execution is checked. Mining Mart documents cases of successful data mining. These can be used as blueprints and easily be adapted to similar data. Meta-data are made operational by the compiler.

Mining Mart Partners Univ. Dortmund, Univ. Piemonte del Avogadro (DISTA), Univ. Economics Prague, Perot Systems Netherland, Fraunhofer Gesellschaft (AIS), SwissLife, Telecom Italia Laboratory, National Institute of Telecommunication Warsaw

You may use the Mining Mart system. You may contribute to the public case base. Only conceptual and case model, please. www-ai.cs.uni-dortmund.de/MMWEB/index.html