Download presentation
Presentation is loading. Please wait.
Published byIsabel Norris Modified over 9 years ago
1
Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy talia@deis.unical.it
2
Motivations for Taking Care of Data Data is everywhere (Big, complex, real-time, unstructured) Putting data at the center of research work on energy issues may bring some benefits. (Today the focus is on algorithms). Cost metrics of data management techniques (communication, storing, access, query, analysis) will help professionals and users to save energy in data-intensive apps. Energy-scalable data management is important for sustainable data science. 2
3
Data Availability or Data Deluge? Every life process today is data intensive. The information stored in digital data archives is enormous and its size is still growing very rapidly. 3
4
Data Availability or Data Deluge? Some decades ago the main problem was the shortage of information, now the challenge is the very large volume of information to deal with and the associated complexity to process it and to extract significant and useful parts or summaries. 4
5
Complex Big Problems … Bigger and more complex problems must be solved by using large-scale distributed computing systems. DATA SOURCES are larger and larger and ubiquitous (Web, sensor networks, mobile devices, telescopes, …). 5
6
… and Big Data Even where accessible, much data in many fields cannot be read by humans so The huge amount of data available today requires smart data analysys techniques to aid people to deal with it and Scalable algorithms, techniques, and systems are needed (time and energy scalability). 6
7
Data: From Storing to Analysis Storing data is not the only main problem. A key issue is analyse, mine, and process data for making it useful. Source: The Economist 7
8
Towards Models for Energy- aware Data Management The main focus today is on energy-aware algorithms, tasks, applications. The other side of the coin is data and costs of operating on it. Abstract energy-cost models for exchanging, accessing and transform data are primary elements for energy- aware data management at large scale. They are useful for sustainable data science. 8
9
An Example: Energy-aware Mining of Data We evaluated the energy cost of analyzing data by using some well-known data mining techniques on mobile devices. Our interest was mainly on how the same technique consumes energy when dimension of data change. Tests with different Data set dimensions, Attribute number, Class number. 9
10
Data Mining Techniques Energy characterization of data mining techniques running on mobile devices k -means (data clustering) J48 (data classification) Apriori (association rules) Common performance parameters Number of instances (data set size) Number of attributes Algorithm-specific performance parameters k-means: number of clusters J48: decision tree size Apriori: Number of rules, minimum support and minimum confidence 10
11
k-means (1) 11 Increasing the number of instances,with different produced clusters
12
k-means (2) 12 Increasing the number of attributes with different produced clusters
13
Apriori (1) 13 Increasing the number of instances with different number of attributes
14
Apriori (2) 14 Increasing the data set size with different number of rules
15
Apriori (3) 15 Increasing the data set size with different minimum confidence
16
J48 16 Increasing the number of instances with different number of attributes
17
Results on different devices Results obtained with different smart phones Sony Xperia P:1 GHz Dual CoreARM processor and 1 GB RAM HTC Hero:528 MHz Qualcomm processor and 288 MB RAM 17
18
Results on different devices 18 Results obtained with different smart phones Sony Xperia P:1 GHz Dual CoreARM processor and 1 GB RAM HTC Hero:528 MHz Qualcomm processor and 288 MB RAM
19
Results on different devices Results obtained with different smart phones Sony Xperia P:1 GHz Dual Core ARM processor and 1 GB RAM HTC Hero:528 MHz Qualcomm processor and 288 MB RAM Samsung Galaxy ACE: 800 MHz Qualcomm processor and 512 MB RAM 19
20
Concluding Remarks Data-intensive applications demands for energy cost models based on data characteristics. This should be done for sensors, smart phones, HPC servers, and clouds. In general, for large scale computing systems. Sustainible data center services and applications may benefit from these models. Preliminary experiments show useful data. 20
21
Data Sets Census (http://archive.ics.uci.edu/ml/datasets/Census+Income) Used with K-means Data set size: 14 MB Number of instances: 244348 Number of attributes: 11 Census_disc (http://archive.ics.uci.edu/ml/datasets/Census+Income) Used with Apriori Data set size: 19 MB Number of instances: 333011 Number of attributes: 11 Covertype (http://archive.ics.uci.edu/ml/datasets/Covertype) Used with J48 Data set size: 14.5 MB Number of instances: 114556 Number of attributes: 55 21
22
22
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.