Presentation is loading. Please wait.

Presentation is loading. Please wait.

Disseminating statistical data by short quantified sentences of natural language Miroslav Hudec Faculty of Economic Informatics, University of Economics.

Similar presentations


Presentation on theme: "Disseminating statistical data by short quantified sentences of natural language Miroslav Hudec Faculty of Economic Informatics, University of Economics."— Presentation transcript:

1 Disseminating statistical data by short quantified sentences of natural language
Miroslav Hudec Faculty of Economic Informatics, University of Economics in Bratislava NTTS 2017

2 Motivation Data summarization by statistical figures is a convenient way, but understandable for rather small group of specialists. Graphical interpretation is also a valuable way of summarizing, but cannot be always effective. Semantic uncertainty in the data and concepts is a usual sign, our world is not always based on two-valued logic Linguistically summarized sentence can be read out by a text-to-speech synthesis system. It is especially welcome when the visual attention should not be disturbed or in communication with disabled people.

3 Linguistic Summaries (LSs)
Yager et al (1990) “…method of summarization would be especially practicable if it could provide us with summaries that are not as terse as the mean.” Arithmetic mean is 2896,32 with standard deviation of 324,18 Most of records are around (near) mean value Few records are around (near) mean value

4 Benefits Explanation of relational knowledge in the data
About half of young respondents have rather positive opinion about population census regularly on Sundays visitors’ satisfaction is low Quantified sentence in query condition Find districts where most of municipalities have high ratio of arable land Find areas where most of pollutants exceeds their limits

5 Mathematical background
Calculations of validity, quality, matching degrees to flexible concepts like young, high pollution, most of are explained in the paper. In references you can find deeper explanations of various aspects related to this topic.

6 Illustrative example 1 comparison customers and their respective ages
{C1:26, C2:28, C3:32, C4:40, C5:54, C6:56, C7:57} the average age is , standard deviation is and median is 40 LS validity Most of commuters are young 0.0000 Most of commuters are middle aged Most of commuters are old About half of commuters are young 0.8570 About half commuters are middle aged 0.1425 About half commuters are old 1.0000 Few customers are young 0.1430 Few customers are middle aged 0.8575 Few customers are old

7 Concepts

8 Illustrative example 2 Summaries from tourism statistics might be like this LS validity about half visits from remoted countries are of long stay 1.0000 few visits from remoted countries are of medium stay 0.8575 about half visits from remoted countries are of short stay 0.8528

9 Conclusion Less sensitive to imprecision in data and inliers
In the same way work with data expressed by classical numbers, fuzzy numbers, short sentences, categorical data Easily understandable for majority of data users Quality indicators ensures avoiding conclusion from outliers Data disclosure is not a problem, but…. Calculations might take more time, but… For data dissemination but might influence data collection…. Our work is still in progress…. we are open to participate in projects and other activities: research, cooperation, applications


Download ppt "Disseminating statistical data by short quantified sentences of natural language Miroslav Hudec Faculty of Economic Informatics, University of Economics."

Similar presentations


Ads by Google