Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014.

Slides:



Advertisements
Similar presentations
An Inductive Database for Mining Temporal Patterns in Event Sequences Alexandre Vautier, Marie-Odile Cordier and René Quiniou
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
ADBIS 2007 A Clustering Approach to Generalized Pattern Identification Based on Multi-instanced Objects with DARA Rayner Alfred Dimitar Kazakov Artificial.
What makes an image memorable?
CUSTOMER NEEDS ELICITATION FOR PRODUCT CUSTOMIZATION Yue Wang Advisor: Prof. Tseng Advanced Manufacturing Institute Hong Kong University of Science and.
Mining for High Complexity Regions Using Entropy and Box Counting Dimension Quad-Trees Rosanne Vetro, Wei Ding, Dan A. Simovici Computer Science Department.
 To provide you with an overview of the aspects that make up a relational database.  This includes: › Tables › Records › Fields › Data types › Keys.
Mining Hierarchical Decision Rules from Hybrid Data with Categorical and Continuous Valued Attributes Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua.
Theory and Applications
1/25 Pointer Logic Changki PSWLAB Pointer Logic Daniel Kroening and Ofer Strichman Decision Procedure.
SEWEBAR - a Framework for Creating and Dissemination of Analytical Reports from Data Mining Jan Rauch, Milan Šimůnek University of Economics, Prague, Czech.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Optimal Stopping of the Context Collection Process in Mobile Sensor Networks Christos Anagnostopoulos 1, Stathes Hadjiefthymiades 2, Evangelos Zervas 3.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
RESEARCH A systematic quest for undiscovered truth A way of thinking
Dr Jianfa Shen Department of Geography and Resource Management
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Inter-sectoral partnerships at the regional level in Slovakia: Regional innovation policies and potentials for clustering Tallinn, 8 November 2012 Dr.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Slovakia and the Bologna process Impact and Experiences prof. Libor Vozár President of the Slovak Rector´s Conference Rector of Constantine the Philosopher.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Theory and Applications
ASC2003 (July 15,2003)1 Uniformly Distributed Sampling: An Exact Algorithm for GA’s Initial Population in A Tree Graph H. S.
3. Rough set extensions  In the rough set literature, several extensions have been developed that attempt to handle better the uncertainty present in.
SOFT COMPUTING TECHNIQUES FOR STATISTICAL DATABASES Miroslav Hudec INFOSTAT – Bratislava MSIS 2009.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Stratified K-means Clustering Over A Deep Web Data Source Tantan Liu, Gagan Agrawal Dept. of Computer Science & Engineering Ohio State University Aug.
Regional cities. BRATISLAVA About people live there The first written mention from year 907 monuments: Bratislava Castle, Apollo Bridge and many.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm.
Theory and Applications
MUNICIPALITIES CLASSIFICATION BASED ON FUZZY RULES
KR A Principled Framework for Modular Web Rule Bases and its Semantics Anastasia Analyti Institute of Computer Science, FORTH-ICS, Greece Grigoris.
AIRBNB AS AN INCOME SUBSTITUTION IN SLOVAKIA InGRID WINTER SCHOOL FROM UBER TO AMAZON MECHANICAL TURK: NONTRADITIONAL LABOUR MARKETS DRIVEN BY TECHNOLOGICAL.
Introduction Many organizations use decision rules to alleviate incentive problems as opposed to incentive contracts The principal typically retains some.
Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Advanced Residual Analysis Techniques for Model Selection A.Murari 1, D.Mazon 2, J.Vega 3, P.Gaudio 4, M.Gelfusa 4, A.Grognu 5, I.Lupelli 4, M.Odstrcil.
Set Theoretic Models 1. IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models.
Recent Trends in Fuzzy Clustering: From Data to Knowledge Shenyang, August 2009
1 Fuzzy Versus Quantitative Association Rules: A Fair Data-Driven Comparison Shih-Ming Bai and Shyi-Ming Chen Department of Computer Science and Information.
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
Development of a community-based participatory network for integrated solid waste management By: Y.P. Cai, G.H. Huang, Q. Tan & G.C. Li EVSE, Faculty of.
1 Logic Our ability to state invariants, record preconditions and post- conditions, and the ability to reason about a formal model depend on the logic.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Fuzzy Logic 1. Introduction Form of multivalued logic Deals reasoning that is approximate rather than precise The fuzzy logic variables may have a membership.
Educational Communication & E-learning
Introduction Machine Learning 14/02/2017.
Landfilling in the Slovak Republic – legislation and actual situation
Associative Query Answering via Query Feature Similarity
A weight-incorporated similarity-based clustering ensemble method based on swarm intelligence Yue Ming NJIT#:
CSE572, CBS598: Data Mining by H. Liu
Urban Audit data availability
Disseminating statistical data by short quantified sentences of natural language Miroslav Hudec Faculty of Economic Informatics, University of Economics.
Mining Unexpected Rules by Pushing User Dynamics
Managing uncertainty and quality in the classification process
CSE572, CBS572: Data Mining by H. Liu
Block Matching for Ontologies
Chapter 3: Polynomial Functions
Evaluation of Relational Operations: Other Techniques
CSE572: Data Mining by H. Liu
Relational Calculus Chapter 4, Part B
Presentation transcript:

Linguistic summaries on relational databases Miroslav Hudec University of Economics in Bratislava, Department of Applied Informatics FSTA, 2014

Relational knowledge from a data set Most of municipalities with high altitude have small pollution? Validity of rule If then rules: if population density is high then waste production is high?

Linguistic summary - introduction Q is a linguistic quantifier, X ={x} is a universe of disclosure and P(x) is a predicate depicting summariser S Qx(Px) Q entities in database are (have) S Truth value of summaries called validity and gets values from the [0, 1] interval

Linguistic summary - elementary Q entities in database are (have) S where n is the cardinality of database (number of entities), is the proportion of objects in a database that satisfy P(x), µ q is quantifier

Linguistic summary - extended Q R objects in database are (have) S the proportion of R objects in a database that satisfy S, t is a t- norm, µ q is quantifier.

Linguistic summary - graph Q R objects in database are (have) S

Issues

Summarizer Let D min and D max be the lowest and the highest domain values of attribute A i.e. Dom(A) = [D min, D max ] and L and H be the lowest and the highest values in the current content of a database respectively. In practice, [L, H] [D min, D max ]. This fact should be considered in linguistic summaries.

Family of summarizer The uniform domain covering method (Tudorie, 2008)

Quantifier For a regular non-decreasing quantifier (e.g. most) its membership function should meet the following property: Quantifier most might be given as (Kacprzyk and Zadrożny 2009)

Example Linguistic summary (rule)Validity Most municipalities having high population density have high production of waste 0,662 Most municipalities having medium population density have medium production of wa 0 Most municipalities having small population density have small production of waste 1 if population density is small then production of waste is small with cf = 1; if population density is high then production of waste is high with cf = Rules

Family of quantifiers Uniform domain covering method on the [0, 1] interval,,,,,

Comparison of quantifiers

Optimization of summaries 1.Decision maker creates particular linguistic summary or sentence of interest and evaluate its validity 2.Automatic generation of relevant linguistic summaries (Liu, 2011). is a set of relevant quantifiers, is a set of relevant linguistic expressions, is a set defining subpopulation of interest and β is the threshold value from the {0, 1] interval. Each solution produces a linguistic summary Q* R * are S*.

Optimization of summaries {(small, small), (small, medium), (medium, medium), (high, high)}

Fuzzy functional dependencies and linguistic summaries

Queries by summaries Data on lower hierarchical level are basis for summaries but only data on higher level are revealed ranked downward from the best to the worst. Select regions where most of municipalities has small attitude above sea level where n is number of entities in whole database, N i is number of entities in cluster i (municipalities in region i), R is number of clusters in database (regions), µ p (x ji ) is matching degree of j-th entity in i-th cluster. Advantages: 1.Sensitive or data that are not free of charge remain hidden 2.Policy maker… is interested in general overview not in data

Example Select regions where most of municipalities has small attitude above sea level Region Validity of the summary Bratislava 1 Trnava 1 Nitra 1 Trenčín Košice Banská Bystrica Žilina 0 Prešov 0

Conclusion The work demonstrates how we can start with a simple linguistic summary and build more complex summaries by merging knowledge from several fields: mining parameters for functions of summarizers from data and extending to defining parameters of quantifiers, optimization of summaries, fuzzy queries. Although fuzzy set theory has been already established as an adequate framework to deal with linguistic summaries, there is still space for improvements.

Some topics for further research Linguistic summaries on fuzzy databases, Operations research task for optimisation the process of rules generation Full applications for practitioners Fuzzy functional dependencies and linguistic summaries in data mining

Thank you for your attention