Download presentation
Presentation is loading. Please wait.
Published byCalvin Holt Modified over 8 years ago
1
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND ANALYTICS
2
Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-2 INTRODUCTION Big Data Data that exist in very large volumes and many different varieties (data types) and that need to be processed at a very high velocity (speed). Analytics Analytics is the systematic process of transforming raw data into knowledge and insight for making better decisions.
3
Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-3 CHARACTERISTICS OF BIG DATA The Three (Five) Vs of Big Data Volume – much larger quantity of data than typical for relational databases Variety – lots of different data types and formats Velocity – data comes at very fast rate (e.g. mobile sensors, web click stream) Veracity – traditional data quality methods don’t apply; how to judge the data’s accuracy and relevance? Value – big data is valuable to the bottom line, and for fostering good organizational actions and decisions
4
Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-4 Figure 11-2 Schema on write vs. schema on read Traditional database design The big data approach
5
Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-5 Figure 11-1 Examples of JSON and XML JavaScript Object Notation eXtensible Markup Language
6
Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-6 NOSQL NoSQL = Not Only SQL A category of recently introduced data storage and retrieval technologies not based on the relational model Scaling out (more servers) rather than scaling up (a larger server) Natural for a cloud environment Supports schema on read Largely open source
7
Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-7 Figure 11-3 Four types of NoSQL databases
8
Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-8 MAPREDUCE Computation model for massive parallel processing of various types of computing tasks in an environment consisting of a large number of commodity servers Core idea – divide a computing task so that multiple nodes can work on it at the same time Each node works on local data doing local processing. Two stages: Map stage – divide for local processing Reduce stage – integrate the results of the individual map processes
9
Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-9 Figure 11-6 Schematic representation of MapReduce
10
Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-10 HADOOP Hadoop Hadoop is an open source implementation framework of MapReduce on HDFS. Hadoop Distributed File System (HDFS) Hadoop Distributed File System (HDFS) is a file system designed for managing a large number of potentially very large files in a highly distributed environment HDFS is a file system, not a DBMS, not relational Hadoop is the most talked about Big-Data data management product today
11
Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-11 HADOOP DISTRIBUTED FILE SYSTEM (HDFS) Breaks data into blocks and distributes them on various computers (nodes) throughout a Hadoop cluster Each cluster consists of a NameNode (master server) and some DataNodes (slaves) Overall control through YARN (“yet another resource allocator”) No updates to existing data in files, just appending to files Move computation to the data
12
Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-12 INTEGRATED DATA ARCHITECTURE Figure 11-8 Teradata Unified Data Architecture – logical view
13
Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-13 Descriptive analytics – describes the past status of the domain of interest using a variety of tools through techniques such as reporting and data visualization Predictive analytics – applies statistical and computational methods and models to data regarding past and current events to predict what might happen in the future Prescriptive analytics –uses results of predictive analytics along with optimization and simulation tools to recommend actions that will lead to a desired outcome TYPES OF ANALYTICS
14
Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-14 ONLINE ANALYTICAL PROCESSING (OLAP) Online Analytical Processing (OLAP) Online Analytical Processing (OLAP) -- the use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques Mainly for descriptive analytics OLAP operations Slicing a (hyper)cube Pivoting Drill down/Roll-up
15
Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-15 Figure 11-12 Slicing a data cube Slicing, dicing, pivoting, and drill-down are useful cube operations
16
Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-16 Figure 11-13 Example of drill-down Summary report Drill-down with color added Starting with summary data, users can obtain details for particular cells.
17
Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-17 PREDICTIVE ANALYTICS Predictive analytics uses statistical and computational methods for prediction based on past and current data Data mining is the systematic process of discovering hidden patterns and knowledge in data Data mining techniques and applications Classification – credit evaluation, fraud detection Clustering - market segmentation Association - market basket analysis, Web usage analysis (Numeric) Prediction - stock price forecasting
18
Chapter 11 Copyright © 2016 Pearson Education, Inc. 11-18
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.