KDDML: A Middleware Language and System for Knowledge Discovery in Databases Dipartimento di Informatica, Università di Pisa A. Romei, S. Ruggieri, F.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Florida International University COP 4770 Introduction of Weka.
XML: Extensible Markup Language
Artificial Neural Networks And XML
Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
C.-C. Chan Department of Computer Science University of Akron Akron, OH USA 1 UA Faculty Forum 2008 by C.-C. Chan.
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
WEKA (sumber: Machine Learning with WEKA). What is WEKA? Weka is a collection of machine learning algorithms for data mining tasks. Weka contains.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
A First Attempt towards a Logical Model for the PBMS PANDA Meeting, Milano, 18 April 2002 National Technical University of Athens Patterns for Next-Generation.
The Data Mining Visual Environment Motivation Major problems with existing DM systems They are based on non-extensible frameworks. They provide a non-uniform.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
1 COS 425: Database and Information Management Systems XML and information exchange.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Data Mining – Intro.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Data Mining: A Closer Look
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Techniques
An Exercise in Machine Learning
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
COMP3503 Intro to Inductive Modeling
Appendix: The WEKA Data Mining Software
Introduction to MDA (Model Driven Architecture) CYT.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
2. Database System Concepts and Architecture
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Data Mining By Dave Maung.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
The Mining Mart Approach 1.The process of knowledge discovery and its common practice 2.Supporting the re-use of successful knowledge discovery cases Supporting.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
DATA MINING By Cecilia Parng CS 157B.
Working with Ontologies Introduction to DOGMA and related research.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
An Exercise in Machine Learning
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Data Mining and Decision Support
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Weka Tutorial. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering – association rule Created by.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
Data Preprocessing: Data Reduction Techniques Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
@relation age sex { female, chest_pain_type { typ_angina, asympt, non_anginal,
KNOWLEDGE DISCOVERY & DATA MINING Abhishek M. Mehta ROLL NO:24.
DATA MINING © Prentice Hall.
MIS 451 Building Business Intelligence Systems
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Waikato Environment for Knowledge Analysis
Data Mining: Concepts and Techniques Course Outline
A Unifying View on Instance Selection
Machine Learning with Weka
Data Model.
Metadata Framework as the basis for Metadata-driven Architecture
Data Mining CSCI 307, Spring 2019 Lecture 7
Data Mining CSCI 307, Spring 2019 Lecture 8
Presentation transcript:

KDDML: A Middleware Language and System for Knowledge Discovery in Databases Dipartimento di Informatica, Università di Pisa A. Romei, S. Ruggieri, F. Turini Thirteenth Italian Symposium on Sistemi Evoluti per Basi di Dati (SEBD-2005) Brixen, Italy – June, 2005

SEBD Brixen, June 2005 Application Area: KDD Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful, understandable patterns in data.

SEBD Brixen, June 2005 The CRISP-DM process Main focus on automatic-phases: Data pre-processing Modeling Post-processing Model evaluation

SEBD Brixen, June 2005 In this work KDDML: an XML-based middleware language and system in support of the KDD process. KDDML as language. KDDML as system.

SEBD Brixen, June 2005 Requirements R 1 : data/models repository should be available for storing input, output and intermediate objects of the KDD process. Several representations of data can be available. Automatic format conversions. Automatic meta-data mapping (e.g., ARFF, SQL). R 2 : specifying logical meta-data (meta-model) in addition to the physical data (model). R 3 : compositionality of mining operations in the design of the language (closure principle). R 4 : high extensibility of the system architecture.

SEBD Brixen, June 2005 KDDML as XML-based System XML as data/model representation (R 1, R 2 ). Machine-processable language. XML as language definition. Ensures compositionality of operators (R 3 ). Extensibility and modularity (R 4 ).

SEBD Brixen, June 2005 Data/Model Representation

SEBD Brixen, June 2005 Data Format Separing the logical data from the physical instances. Data schema via proprietary XML. Actual data stored in CSV (Comma Separated Values). CSV has been chosen as a trade-off between readability (binary file) and space occupation (xml).

SEBD Brixen, June 2005 Data Format: Example …. …. Logical Metadata Physical Data

SEBD Brixen, June 2005 Model Format PMML (Predictive Model Markup Language) An industry standard for actual models representation as XML documents. Consists of DTDs for a wide spectrum of models, including RdA, decision trees, clustering, regression, neural networks. It does not cover the process of extracting models, but the exchange of extracted knowledge.

SEBD Brixen, June 2005 Model Format: Example …. … …... Logical Metadata Physical Model

SEBD Brixen, June 2005 Language

SEBD Brixen, June 2005 Closure Principle (1) Arguments of an operator must be of an appropriate type and sequence. We denote the signature of an operator op:t 1 x … x t n  t by defining a DTD for KDDML queries that constraints sub- elements to be of type t 1, …, t n.

SEBD Brixen, June 2005 Closure Principle (2) Where: kdd_query_trees: all operators returning a classification tree; kdd_query_table: all operators returning a table; TREE_CLASSIFY belongs to the kdd_query_table entity. f TREE_CLASSIFY : tree x table  table <!ELEMENT TREE_CLASSIFY ((%kdd_query_trees;), (%kdd_query_table;))>

SEBD Brixen, June 2005 KDDML Types The set of types of KDDML operators consists of: Table, PPtable Tree, clusters, rda, sequence, hierarchy Algs, condition, expression

SEBD Brixen, June 2005 KDDML Query structure The structure of a KDDML query has a precise format. XML tags element correspond to operation on data and models; XML attributes correspond to parameters of those operations XML sub-elements define the arguments passed to the operators (KDDML Types)

SEBD Brixen, June 2005 Example (1) Construction and application of a decision tree. Loading of an ARFF source as training set. Simple sampling on training set. Construction of a decision tree on sampled training set. Target attribute: play. Algorithm: C4.5. Loading of a test set from the system repository. Application of the decision tree on the test set.

SEBD Brixen, June 2005 Example (2) Repository Data Table Loader Source: weather_test.xml Tree Classify Tree Miner Alg: c4.5 Pruning confidence: 40% Num instances: 6 Sampling Alg: simple sampling Percentage: 66% Arff Loader Source: weather.arff Repository ARFF

SEBD Brixen, June 2005 Language Operators Data/Model access. Preprocessing. Data Cleaning, Sampling, Normalization, Discretization. Model Extraction. Model application and evaluation. Model meta-reasoning & filtering.

SEBD Brixen, June 2005 Example one: Discretization.... <PP_NUMERIC_DISCRETIZATION xml_dest= "census_discrete.xml", attribute_name = "age", label_type = "enumeration", enumerated_label_list = "young, middle, old">.... Discretization of a numeric attribute “age” into three intervals using the natural binning method.

SEBD Brixen, June 2005 Example two: RdA filtering Selects the rules with item “bread” in the body and not having the item “milk” in the head and having exactly two items in the head and having the support greater than 30%.

SEBD Brixen, June 2005 System Architecture

SEBD Brixen, June 2005 Design targets Extensibility Data sources Algorithms Models Portability Modularity. Architecture structured in 3 layers.

SEBD Brixen, June 2005 Architecture Layers Repository Layer Operators Layer Interpreter Layer To upper layers… DataModels Operators Layer: Implementation of language operators. is implemented as a Java class satisfying an interface. Interface is task-dependent. Repository Layer: Manages the read/write access to data and models repository. Manages the read/write access to data and models from external sources. Give a programmatic functionality to the higher layers. Interpreter Layer: Accepts a validated KDDML query and returns the result as XML document. Recursively traverse the DOM tree representation. The interpreter is not-affected by data/algorithms/model extensibility.

SEBD Brixen, June 2005 KDDML as Middleware System Compiler Query MQL Query KDDML Results Repository Layer Operators Layer Interpreter Layer DataModels MQL High Level GUI Query KDDML

SEBD Brixen, June 2005 Experiences with KDDML

SEBD Brixen, June 2005 ClickWorld Extract DM models from visits to a city- news portal with the intent to characterize topics-of-interest of new visitors. M. Baglioni, U. Ferrara, A. Romei, S. Ruggieri, F. Turini Preprocessing and mining web log data for web personalization. 8th Italian Conf. on Artificial Intelligence : Vol of LNCS, September Preprocessing and mining web log data for web personalization.

SEBD Brixen, June 2005 KDDML-G OP 1 OP OP 2 OP 3 A system for KDD on the GRID. Exploit the parallelism offered by the GRID Data immovability by moving the code on the place.

SEBD Brixen, June 2005 Download KDDML GNU (General Public Licence)