Zibin Zheng Jieming Zhu *Rung-Tsong Michael Lyu Service-generated Big Data and Big Data-as-a- Service: An Overview IEEE 2nd International.

Slides:



Advertisements
Similar presentations
A leading provider of Content Identification & Monetization services, powering next generation Community and UGC web-sites.
Advertisements

Collaborative QoS Prediction in Cloud Computing Department of Computer Science & Engineering The Chinese University of Hong Kong Hong Kong, China Rocky.
Manajemen Basis Data Pertemuan 8 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.
Brenda Woods John Williams Daniel Bailey Breia Stamper.
Big Data A big step towards innovation, competition and productivity.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Chapter 4-1. Chapter 4-2 Database Management Systems Overview  Not a database  Separate software system Functions  Enables users to utilize database.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
© 2013 IBM Corporation Version 1.0 The New Eye Insight through Big Data and Analytics: A Case Study on Citizen Sentiment Analysis Sandipan Sarkar, Executive.
Tyson Condie.
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
Big Data. What is Big Data? Big Data Analytics: 11 Case Histories and Success Stories
Pattern Matching in DAME using AURA technology Jim Austin, Robert Davis, Bojian Liang, Andy Pasley University of York.
Distributed QoS Evaluation for Real- World Web Services Zibin Zheng, Yilei Zhang, and Michael R. Lyu July 07, 2010 Department of Computer.
Introduction to Business Intelligence
© 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche.
A Framework For User Feedback Based Cloud Service Monitoring
BFTCloud: A Byzantine Fault Tolerance Framework for Voluntary-Resource Cloud Computing Yilei Zhang, Zibin Zheng, and Michael R. Lyu
Information Explosion. Reality: New Machine-Generated Data Non-relational and relational data outside of the EDW † Source: Analytics Platforms – Beyond.
Performance Evaluation of Image Conversion Module Based on MapReduce for Transcoding and Transmoding in SMCCSE Speaker : 吳靖緯 MA0G IEEE.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
WSP: A Network Coordinate based Web Service Positioning Framework for Response Time Prediction Jieming Zhu, Yu Kang, Zibin Zheng and Michael R. Lyu The.
WS-DREAM: A Distributed Reliability Assessment Mechanism for Web Services Zibin Zheng, Michael R. Lyu Department of Computer Science & Engineering The.
ICDCS 2014 Madrid, Spain 30 June-3 July 2014
Big Data: Electronic Gold And why Oreus should invest in Big Data Thomas Snuverink.
Information Systems in Organizations
SUPPLY CHAIN OF BIG DATA. WHAT IS BIG DATA?  A lot of data  Too much data for traditional methods  The 3Vs  Volume  Velocity  Variety.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
Manage data in a bigger way with Hadoop. Introduction The journey to destination of success is has become quite a hard task today. Even with advent of.
Service Reliability Engineering The Chinese University of Hong Kong
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Axis AI Solves Challenges of Complex Data Extraction and Document Classification through Advanced Natural Language Processing and Machine Learning MICROSOFT.
Investigating QoS of Web Services by Distributed Evaluation Zibin Zheng Feb. 8, 2010 Department of Computer Science & Engineering.
Azure Machine Learning Introduction to Azure ML. Setting Expectations This presentation is for you if…  you hear the buzzword “Machine Learning” and.
Information Systems in Organizations Managing the business: decision-making Growing the business: knowledge management, R&D, and social business.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Internet of Things. Creating Our Future Together.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Big Data analytics in the Cloud Ahmed Alhanaei. What is Cloud computing?  Cloud computing is Internet-based computing, whereby shared resources, software.
BIG DATA. The information and the ability to store, analyze, and predict based on that information that is delivering a competitive advantage.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
What is the Big Data Challenge? Organizations are seeking solutions that combine the real-time analytics capabilities of SAP HANA and accessibility to.
BIG DATA BIGDATA, collection of large and complex data sets difficult to process using on-hand database tools.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Experience Report: System Log Analysis for Anomaly Detection
A Collaborative Quality Ranking Framework for Cloud Components
ANOMALY DETECTION FRAMEWORK FOR BIG DATA
BIG DATA IN ENGINEERING APPLICATIONS
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
Trends in my profession, Information Technology
© 2016 Global Market Insights, Inc. USA. All Rights Reserved Fuel Cell Market size worth $25.5bn by 2024Low Power Wide Area Network.
Hadoop Market
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
A Must to Know - Testing IoT
AUDIT AND VALIDATION TESTING FOR BIG DATA APPLICATIONS
Big Data Young Lee BUS 550.
INNOvation in TRAINING BUSINESS ANALYSTS HAO HElEN Zhang UniVERSITY of ARIZONA
Dep. of Information Technology By: Raz Dara Mohammad Amin
Big Data: Four Vs Salhuldin Alqarghuli.
Exploring Latent Features for Memory-Based QoS Prediction in Cloud Computing Yilei Zhang 17/05/2011.
Big DATA.
Data Analysis and R : Technology & Opportunity
Presentation transcript:

Zibin Zheng Jieming Zhu *Rung-Tsong Michael Lyu Service-generated Big Data and Big Data-as-a- Service: An Overview IEEE 2nd International Congress on Big Data June 27-July 2, 2013, Santa Clara, USA

2 Outline  Introduction  Overview  Service-generated Big Data  Service Trace Logs  Service QoS Information  Service Relationship  Big Data-as-a-Service  Big Data Infrastructure-as-a-Service  Big Data Platform-as-a-Service  Big Data Analytics-as-a-Service  Business Aspects of Big Data-as-a-Service  Conclusion & Future Work

Introduction

4 Service Economy Big Data Service Computing  Service and Big Data  Service economy, service computing, big data  Takes more than 60% of the world output (World Bank)  The percentage in developed countries exceeds 70%  Modern services  Large number of services and service users.  Service-generated data: too large and complex  volume  velocity  variety  veracity

5 Introduction  In March 2012, the Obama administration announced the big data research and development initiative.  The leading IT companies, such as SAG, Oracle, IBM, Microsoft, SAP and HP, have spent more than $15 billion on buying data management and analytics software.  This industry on its own is worth more than $100 billion.

6 Introduction

7  Growing at almost 10% a year, which is roughly twice as fast as the software business as a whole.

8 Introduction Big data initiatives span four unique dimensions : Nowadays’large-scale systems are awash with ever-growing data, easily amassing terabytes or even petabytes of information Volume Veracity Velocity Variety Time-sensitive processes, such as bottleneck detection and service QoS prediction, could be achieved as data stream into the system Structured and unstructured data are generated in various data types, making it possible to explore new insights when analyzing these data together Detecting and correcting noisy and inconsistent data are important to conduct trustable analysis. Establishing trust in big data presents a huge challenge as the variety and number of sources grows

9  Service-generated Big Data  Fast increase of system size and the associated massive volume of service-generated data  Creating value from Service-generated Big Data  Big Data-as-a-Service  Effective processing of big data within acceptable processing time  Easy access of the big data and the big data analysis results Challenge :

Overview

11 Overview

Service-generated Big Data

13 Service-generated Big Data Big data generated:  send an  post a microblog  shop on e-commerce Websites  ……

14 Service-generated Big Data How can the service generated data be processed and analyzed to enhance system performance? Huge volume of trace logs ( Billions of daily logs, gigabytes of tracing logs per hour) Difficult to manually diagnose the performance problems Large volume of QoS data are recorded in both server-side and user-side. The volume of user-side QoS data is much larger than that of server-side QoS data. QoS values of service components are changing dynamically from time to time, making the user-side service QoS information explosively increase. Service trace logs Service relationship Service QoS information Involve a large number of service components Have complex invocation relationships.

15 Service-generated Big Data  Service trace logs Trace log visualization How to investigate the trace logs to find the value?  Log visualization provides tools for abstract visualization of log files  Lots of previous research investigations  More research investigations are needed to enable real-time processing and visualization Performance problem diagnosis  Identify which module is the root cause  How to exploit the tremendous trace logs effectively and efficiently  Most previous solutions suffers from low efficiency in handling large volume of data.  Require more efficient storage, management, and analysis approaches for service-generated trace logs

16 Service-generated Big Data  Service QoS information Valuable information can be obtained through investigating these user- side service QoS information in order to enhance system performance. Adaptive fault tolerance  Functionally-equivalent Web services can be employed to build fault-tolerant service-oriented systems.  Server-side fault tolerance is not enough in dynamic Internet environment. Personalized user-side fault tolerance needs to be considered.  Online learning algorithms are needed to speedup the analysis and computation of the large volume of service QoS information. QoS prediction  Aims at providing personalized QoS value prediction for service users, by employing the historical QoS values of different users.  Very challenging research problem: How to efficiently process the large volume of available service QoS data and accurately predict the missing QoS values in the huge user-service-time matrix.

17 Service-generated Big Data  Service Relationship By exploiting the service invocation graph, valuable information can be obtained by significant service component identification and service migration. Significant service identification  Helps us understand how to improve the structure of a system and how to improve the reliability of the system.  The nature of dynamic composition of service components make the service invocation graph continuously updated at runtime.  Stochastic ranking techniques can be employed to identify the significant service component in the graph for a distributed system. Service migration  Dynamic service migration is in need by moving the service from one physical machine to another at runtime.  By modeling and exploiting the service invocation relationship and past service usage experiences, a proper migration of the services can improve the experience for existing users.  To cope with the growing size of the service migration problem, more efficient approaches are needed.

Big Data-as-a-Service

 includes three layers: Provides the most basic services and the higher layers provide more advanced services. Provide more advanced services.

Big Data-as-a-Service  Big Data Infrastructure-as-a-Service ChallengesSpecialtyIncluding  Storage-as-a-service  Computing-as-a-service  To store and process the massive data  Requirement to support many different data types computing-as-a-service  Needs to support reuse and share of the big data  The technologies for processing big data have to combine with data storage technology

Big Data-as-a-Service  Big Data Platform-as-a-Service IncludingFeature  Cloud Storage  DaaS (Data-as-a-Service)  DBaaS (Database-as-a-Service)  Allows users to access, analyze and build analytic applications on top of large data sets (e.g. Google’s BigQuery).

Big Data-as-a-Service  Big Data Analytics-as-a-Service Meaning  The process of examining large amounts of data of various types to uncover hidden patterns, unknown correlations, and other valuable information.

Big Data-as-a-Service  Big Data Analytics-as-a-Service InvolvesAdvantages  Faster deployment  Powerful computing and storage capacity  Less management  Less cost

Big Data-as-a-Service  Business aspects of Big Data-as-a-Service Divided into two types :  The owner of big data conducts data storage, management, and analysis and provide Web APIs for users to access the service-generated big data or the analyzed results.  The owner of big data outsources the big data processing (or part of it) to a third party. It consumes the Big Data-as-a- Service provided by third party and allows the service provider to work on it to extract values.

Conclusion & Future Work

26 Conclusion  Three types of service-generated big data are exploited.  Big Data-as-a-Service is investigated to provide APIs for accessing the service-generated big data and big data analytics results.  More types of service-generated big data will be investigated.  More comprehensive studies of various service-generated big data analytics approaches will also be conducted. Future workIn this paper

Thank You !