8 th European AEC/APC Conference - Dresden 2007 Extracting correlated sets using the chi-squared measure within n-ary relations: an implementation A. Casali.

Slides:



Advertisements
Similar presentations
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Advertisements

Nadia Andreani Dwiyono DESIGN AND MAKE OF DATA MINING MARKET BASKET ANALYSIS APLICATION AT DE JOGLO RESTAURANT.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Lab4 CPIT 440 Data Mining and Warehouse.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
NORM BASED APPROACHES FOR AUTOMATIC TUNING OF MODEL BASED PREDICTIVE CONTROL Pastora Vega, Mario Francisco, Eladio Sanz University of Salamanca – Spain.
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Application of Apriori Algorithm to Derive Association Rules Over Finance Data Set Presented By Kallepalli Vijay Instructor: Dr. Ruppa Thulasiram.
Multimedia Security Digital Video Watermarking Supervised by Prof. LYU, Rung Tsong Michael Presented by Chan Pik Wah, Pat Nov 20, 2002 Department of Computer.
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Data Mining – Intro.
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
Investigating mobile based prediction modelling of academic performance for primary school pupils: a data mining approach. by Mvurya Mgala Supervisors:
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
IE 594 : Research Methodology – Discrete Event Simulation David S. Kim Spring 2009.
Experiment Databases: Towards better experimental research in machine learning and data mining Hendrik Blockeel Katholieke Universiteit Leuven.
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
More on Data Mining KDnuggets Datanami ACM SIGKDD
Water Contamination Detection – Methodology and Empirical Results IPN-ISRAEL WATER WEEK (I 2 W 2 ) Eyal Brill Holon institute of Technology, Faculty of.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Data Warehouse & Data Mining
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
Data mining: some basic ideas Francisco Moreno Excerpts from Fundamentals of DB Systems, Elmasri & Navathe and other sources.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Investigation.
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.
CartoMundi Valorization of Cartographic Heritage.
1 Scalable Exploratory Data Mining of Distributed Geoscientific Data Authors : E.C Shek, R.R Muntz, E. Mesrobian and K. Ng by Sona Srinivasan.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
HW#2: A Strategy for Mining Association Rules Continuously in POS Scanner Data.
Principles of Database Design, Conclusions AIMS 2710 R. Nakatsu.
Blind Contrast Restoration Assessment by Gradient Ratioing at Visible Edges Nicolas Hautière 1, Jean-Philippe Tarel 1, Didier Aubert 1-2, Eric Dumont 1.
Data Mining with Artificial Evolution. Helen Johnson Anna Kwiatkowska David Sweeney Panagiotis Tzionas Problem leader: Michele Sebag Team leader: Michael.
Digital Media Lab 1 Data Mining Applied To Fault Detection Shinho Jeong Jaewon Shim Hyunsoo Lee {cinooco, poohut,
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
Taguchi. Abstraction Optimisation of manufacturing processes is typically performed utilising mathematical process models or designed experiments. However,
Motivation Conclusion Two EMSE-CMP original contributions in variable ranking and selection were implemented with the LS-SVM regression method in a Matlab.
Text Mining & NLP based Algorithm to populate ontology with A-Box individuals and object properties Alexandre Kouznetsov and Christopher J. O. Baker, University.
Principles of Database Design, Conclusions MBAA 609 R. Nakatsu.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Face Detection Using Skin Color and Gabor Wavelet Representation Information and Communication Theory Group Faculty of Information Technology and System.
Why preprocessing? Learning method needs data type: numerical, nominal,.. Learning method cannot deal well enough with noisy / incomplete data Too many.
Data Mining and Decision Support
A new clustering tool of Data Mining RAPID MINER.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Comparing Association Rules and Decision Trees for Disease.
Efficient Rule-Based Attribute-Oriented Induction for Data Mining Authors: Cheung et al. Graduate: Yu-Wei Su Advisor: Dr. Hsu.
Tallahassee, Florida, 2016 CIS4930 Introduction to Data Mining Midterm Review Peixiang Zhao.
1. ABSTRACT Information access through Internet provides intruders various ways of attacking a computer system. Establishment of a safe and strong network.
Tallahassee, Florida, 2016 CIS4930 Introduction to Data Mining Final Review Peixiang Zhao.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Melody Characterization by a Fuzzy Rule System Pedro J. Ponce de León, David Rizo, José M. Iñesta (DLSI, Univ. Alicante) Rafael Ramírez (MTG, Univ. Pompeu.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Data Mining – Intro.
Data Mining Jim King.
A Methodology for Finding Bad Data
Introduction to Data Mining
MIS 451 Building Business Intelligence Systems
Course Introduction CSC 576: Data Mining.
Chapter 17 Designing Databases
Feature Selection Methods
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Presentation transcript:

8 th European AEC/APC Conference - Dresden 2007 Extracting correlated sets using the chi-squared measure within n-ary relations: an implementation A. Casali 1, C. Ernst 2, F. Gasnier 3, J. Stephan 2 1: Université de la Méditerranée / LIF ― 2: École des Mines de St Étienne / CMP-GC ― 3: STMicroElectronics Rousset The field of APC aims at highlighting correlations between Production parameters. This study focuses on the device analysis of the principal trajectories impacting the yield. The goal is to detect correlations between data measurements structured as n-ary relations and involving (at least) one target attribute. The method uses a data mining levelwise algorithm based on both the chi-squared and the support measures. Motivations Methodology: a KDD approach Results This approach makes it possible for STMicroElectronics Rousset to highlight unknown correlations between various parameters, validated by electrical and/or physical analysis. While the proposed mining method confirmed that levelwise algorithms do not provide results beyond four search levels, it proved its value for n-ary relations with a very large number of numerical attributes. The study aims at supporting the development of effective R2R control loops. Conclusions Future Work This work was initiated while the fourth author was at Ecole des Mines de Saint-Étienne / CMP-GC, and was supported by Research Project “Rousset ”, financed by the Communauté du Pays d'Aix, Conseil Général des Bouches du Rhône and Conseil Régional Provence Alpes Côte d'Azur. Acknowledgments Selected File Raw (Excel) Data Measurement Files Preprocessed File Transformed File SELECTION PREPROCESSING TRANSFORMATION DATA MINING IN : ItemSet I, Fraction p%, Threshold mc (chi2), Threshold s (support), Target Attribute ta, Relation r OUT : Set of minimal correlated patterns 1 C 2 := APrioriGen(I);// (2-pattern) candidates generation 2 i := 2 3 while C i <> 0 do 4 L i := 0 5 for each X  C i do 6 Build the contingency table of X 7 if p% of the table’s cells have a support  s then 8 if chi2(X)  mc then L i := L i  X 9 endif 10 end for 11 C i+1 := APrioriGen(C i – L i ) 12 i := i end while 14 return  i L i // limited to the patterns including one item of ta Attribute removal. Criteria: attributes - with too few distinct values - having too many null values - presenting doubles (one is kept) - with a too small standard deviation Files with a vast number of numerical attributes (and often incomplete data) Current developments are focused on: - The optimization of the procedure, - And the implementation of other search methods. We plan to initiate a background procedure integrating different sets of methods, measurements and results. → Automatic generation of the most suitable result for each new analysis. - Normalization - Interval discretization / Item encoding - Elimination of attributes with no item having the support INTERPRETATION - Item decoding - Presentation (processing) of correlations Knowledge Generation Retrieved Patterns Report Item1Item2Item3Item4Chi2 …………… …………… A complete data transformation, mining and interpretation Model for correlation detection within data measurements Attribute1Attribute2…Target Attribute ………… _9592_TRAN-- PCTH- [-47.8, -32.7]0.41- [0.3, 11.8]0.82 _2565_EPPO-_4692_IMPT- PCTH- [2060.6, ]0.39[328.5, 373.5]0.62 [0.3, 11.8]0.82 _3700_ALIX-_4692_IMPT- PCTH- [17.5, 23.0]0.37[328.5, 373.5]0.62 [0.3, 11.8]0.82 _4572_EOXR-_4692_IMPT- PCTH- [127.1, 136.5]0.38[328.5, 373.5]0.62 [0.3, 11.8]0.82 _4690_ALIY-_4692_IMPT- PCTH- [52.3, 75.5]0.37[328.5, 373.5]0.62 [0.3, 11.8]0.82 _4692_IMPT-_4748_EPTE- PCTH- [328.5, 373.5]0.62[79.6, 81.1]0.34 [0.3, 11.8]0.82 …………