Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy

Slides:



Advertisements
Similar presentations
1 Towards an Open Service Framework for Cloud-based Knowledge Discovery Domenico Talia ICAR-CNR & UNIVERSITY OF CALABRIA, Italy Cloud.
Advertisements

Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
Nokia Technology Institute Natural Partner for Innovation.
Parallel Research at Illinois Parallel Everywhere
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,
Gueyoung Jung, Nathan Gnanasambandam, and Tridib Mukherjee International Conference on Cloud Computing 2012.
Managing Data Resources
Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors:
Business Intelligence components Introduction. Microsoft® SQL Server™ 2005 is a complete business intelligence (BI) platform that provides the features,
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Information Security for Managers (Master MIS)
P2P Systems Meet Mobile Computing A Community-Oriented Software Infrastructure for Mobile Social Applications Cristian Borcea *, Adriana Iamnitchi + *
Abstract Load balancing in the cloud computing environment has an important impact on the performance. Good load balancing makes cloud computing more.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Tyson Condie.
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
Hussein Suleman University of Cape Town Department of Computer Science Advanced Information Management Laboratory High Performance.
Parallel Processing CS453 Lecture 2.  The role of parallelism in accelerating computing speeds has been recognized for several decades.  Its role in.
Meeting the Data Protection Demands of a 24x7 Economy Steve Morihiro VP, Programs & Technology Quantum Storage Solutions Group
UNIT - 1Topic - 2 C OMPUTING E NVIRONMENTS. What is Computing Environment? Computing Environment explains how a collection of computers will process and.
Journey to a Real Time Enterprise
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
Master Thesis Defense Jan Fiedler 04/17/98
Distributed Systems: Concepts and Design Chapter 1 Pages
Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Henri Bal Vrije Universiteit Amsterdam High Performance Distributed Computing.
 Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). 
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.
FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.
Software Architecture for Dynamic Thermal Management in Datacenters Tridib Mukherjee Graduate Research Assistant IMPACT Lab ( Department.
Presenting By CH . MADHURI(12QU1D5806) Under the supervision of
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Designing the User Interface: Strategies for Effective Human-Computer.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS
July 2013 Elastic Offloading by Dale Denis. Dale Denis The Elastic Offloading of Computationally Intensive Tasks to the Cloud to Augment the Computing.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Data Mining with Big Data. Abstract Big Data concerns large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development.
High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.
Axis AI Solves Challenges of Complex Data Extraction and Document Classification through Advanced Natural Language Processing and Machine Learning MICROSOFT.
Information Eastman. Business Process Skills Order to Cash, Forecasting & Budgeting, etc. Process Modeling Project Management Technical Skills.
Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.
What is Cloud Computing? Irving Wladawsky-Berger.
BIG DATA. The information and the ability to store, analyze, and predict based on that information that is delivering a competitive advantage.
June 12, 2016CITALA'121 Cloud Computing Technology For Large Scale and Efficient Arabic Handwriting Recognition System HAMDI Hassen, KHEMAKHEM Maher
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
Data Science Interview Questions 1.What do you mean by word Data Science? Data Science is the extraction of knowledge from large.
System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Managing Data Resources File Organization and databases for business information systems.
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Large Scale Semantic Data Integration and Analytics through Cloud: A Case Study in Bioinformatics Tat Thang Parallel and Distributed Computing Centre,
Dynamic Mobile Cloud Computing: Ad Hoc and Opportunistic Job Sharing.
Mobile fog: A programming model for large-scale applications on the internet of things Kirak Hong, David lillethun, umakishore Ramachandran, beate ottenwalder,
Algorithms for Big Data Delivery over the Internet of Things
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
SCALABLE OPEN ACCESS Hussein Suleman
Big DATA.
Advanced Geospatial Techniques: Aiding Earth Observation Applications
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
Presentation transcript:

Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy

Motivations for Taking Care of Data  Data is everywhere (Big, complex, real-time, unstructured)  Putting data at the center of research work on energy issues may bring some benefits. (Today the focus is on algorithms).  Cost metrics of data management techniques (communication, storing, access, query, analysis) will help professionals and users to save energy in data-intensive apps.  Energy-scalable data management is important for sustainable data science. 2

Data Availability or Data Deluge? Every life process today is data intensive. The information stored in digital data archives is enormous and its size is still growing very rapidly. 3

Data Availability or Data Deluge? Some decades ago the main problem was the shortage of information, now the challenge is the very large volume of information to deal with and the associated complexity to process it and to extract significant and useful parts or summaries. 4

Complex Big Problems … Bigger and more complex problems must be solved by using large-scale distributed computing systems. DATA SOURCES are larger and larger and ubiquitous (Web, sensor networks, mobile devices, telescopes, …). 5

… and Big Data Even where accessible, much data in many fields cannot be read by humans so The huge amount of data available today requires smart data analysys techniques to aid people to deal with it and Scalable algorithms, techniques, and systems are needed (time and energy scalability). 6

Data: From Storing to Analysis Storing data is not the only main problem. A key issue is analyse, mine, and process data for making it useful. Source: The Economist 7

Towards Models for Energy- aware Data Management  The main focus today is on energy-aware algorithms, tasks, applications.  The other side of the coin is data and costs of operating on it.  Abstract energy-cost models for exchanging, accessing and transform data are primary elements for energy- aware data management at large scale.  They are useful for sustainable data science. 8

An Example: Energy-aware Mining of Data  We evaluated the energy cost of analyzing data by using some well-known data mining techniques on mobile devices.  Our interest was mainly on how the same technique consumes energy when dimension of data change.  Tests with different Data set dimensions, Attribute number, Class number. 9

Data Mining Techniques  Energy characterization of data mining techniques running on mobile devices  k -means (data clustering)  J48 (data classification)  Apriori (association rules)  Common performance parameters  Number of instances (data set size)  Number of attributes  Algorithm-specific performance parameters  k-means: number of clusters  J48: decision tree size  Apriori: Number of rules, minimum support and minimum confidence 10

k-means (1) 11  Increasing the number of instances,with different produced clusters

k-means (2) 12  Increasing the number of attributes with different produced clusters

Apriori (1) 13  Increasing the number of instances with different number of attributes

Apriori (2) 14  Increasing the data set size with different number of rules

Apriori (3) 15  Increasing the data set size with different minimum confidence

J48 16  Increasing the number of instances with different number of attributes

Results on different devices  Results obtained with different smart phones  Sony Xperia P:1 GHz Dual CoreARM processor and 1 GB RAM  HTC Hero:528 MHz Qualcomm processor and 288 MB RAM 17

Results on different devices 18  Results obtained with different smart phones  Sony Xperia P:1 GHz Dual CoreARM processor and 1 GB RAM  HTC Hero:528 MHz Qualcomm processor and 288 MB RAM

Results on different devices  Results obtained with different smart phones  Sony Xperia P:1 GHz Dual Core ARM processor and 1 GB RAM  HTC Hero:528 MHz Qualcomm processor and 288 MB RAM  Samsung Galaxy ACE: 800 MHz Qualcomm processor and 512 MB RAM 19

Concluding Remarks  Data-intensive applications demands for energy cost models based on data characteristics.  This should be done for sensors, smart phones, HPC servers, and clouds. In general, for large scale computing systems.  Sustainible data center services and applications may benefit from these models.  Preliminary experiments show useful data. 20

 Data Sets  Census (  Used with K-means  Data set size: 14 MB  Number of instances:  Number of attributes: 11  Census_disc (  Used with Apriori  Data set size: 19 MB  Number of instances:  Number of attributes: 11  Covertype (  Used with J48  Data set size: 14.5 MB  Number of instances:  Number of attributes: 55 21

22