Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.

Similar presentations


Presentation on theme: "Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND."— Presentation transcript:

1 Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND ANALYTICS

2 Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-2 INTRODUCTION  Big Data  Data that exist in very large volumes and many different varieties (data types) and that need to be processed at a very high velocity (speed).  Analytics  Analytics is the systematic process of transforming raw data into knowledge and insight for making better decisions.

3 Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-3 CHARACTERISTICS OF BIG DATA  The Three (Five) Vs of Big Data  Volume – much larger quantity of data than typical for relational databases  Variety – lots of different data types and formats  Velocity – data comes at very fast rate (e.g. mobile sensors, web click stream)  Veracity – traditional data quality methods don’t apply; how to judge the data’s accuracy and relevance?  Value – big data is valuable to the bottom line, and for fostering good organizational actions and decisions

4 Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-4 Figure 11-2 Schema on write vs. schema on read Traditional database design The big data approach

5 Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-5 Figure 11-1 Examples of JSON and XML JavaScript Object Notation eXtensible Markup Language

6 Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-6 NOSQL  NoSQL = Not Only SQL  A category of recently introduced data storage and retrieval technologies not based on the relational model  Scaling out (more servers) rather than scaling up (a larger server)  Natural for a cloud environment  Supports schema on read  Largely open source

7 Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-7 Figure 11-3 Four types of NoSQL databases

8 Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-8 MAPREDUCE  Computation model for massive parallel processing of various types of computing tasks in an environment consisting of a large number of commodity servers  Core idea – divide a computing task so that multiple nodes can work on it at the same time  Each node works on local data doing local processing.  Two stages:  Map stage – divide for local processing  Reduce stage – integrate the results of the individual map processes

9 Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-9 Figure 11-6 Schematic representation of MapReduce

10 Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-10 HADOOP  Hadoop  Hadoop is an open source implementation framework of MapReduce on HDFS.  Hadoop Distributed File System (HDFS)  Hadoop Distributed File System (HDFS) is a file system designed for managing a large number of potentially very large files in a highly distributed environment  HDFS is a file system, not a DBMS, not relational  Hadoop is the most talked about Big-Data data management product today

11 Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-11 HADOOP DISTRIBUTED FILE SYSTEM (HDFS)  Breaks data into blocks and distributes them on various computers (nodes) throughout a Hadoop cluster  Each cluster consists of a NameNode (master server) and some DataNodes (slaves)  Overall control through YARN (“yet another resource allocator”)  No updates to existing data in files, just appending to files  Move computation to the data

12 Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-12 INTEGRATED DATA ARCHITECTURE Figure 11-8 Teradata Unified Data Architecture – logical view

13 Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-13  Descriptive analytics – describes the past status of the domain of interest using a variety of tools through techniques such as reporting and data visualization  Predictive analytics – applies statistical and computational methods and models to data regarding past and current events to predict what might happen in the future  Prescriptive analytics –uses results of predictive analytics along with optimization and simulation tools to recommend actions that will lead to a desired outcome TYPES OF ANALYTICS

14 Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-14 ONLINE ANALYTICAL PROCESSING (OLAP)  Online Analytical Processing (OLAP)  Online Analytical Processing (OLAP) -- the use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques  Mainly for descriptive analytics  OLAP operations  Slicing a (hyper)cube  Pivoting  Drill down/Roll-up

15 Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-15 Figure 11-12 Slicing a data cube Slicing, dicing, pivoting, and drill-down are useful cube operations

16 Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-16 Figure 11-13 Example of drill-down Summary report Drill-down with color added Starting with summary data, users can obtain details for particular cells.

17 Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-17 PREDICTIVE ANALYTICS  Predictive analytics uses statistical and computational methods for prediction based on past and current data  Data mining is the systematic process of discovering hidden patterns and knowledge in data  Data mining techniques and applications  Classification – credit evaluation, fraud detection  Clustering - market segmentation  Association - market basket analysis, Web usage analysis  (Numeric) Prediction - stock price forecasting

18 Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-18


Download ppt "Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND."

Similar presentations


Ads by Google