Download presentation
Presentation is loading. Please wait.
1
Big Data Management – Fall 2016
5/27/2018 Introduction Big Data Management – Fall 2016 © Philippe Bonnet, 2014
2
Outline Big Data Defined Trends Underlying Big Data
Relational vs. Big Data Management Lambda Architecture Course Outline Examples Trends Underlying Big Data Application Pull Business Push Technology Push: Modern IT Infrastructure © Philippe Bonnet, 2015
3
Relational Database Management
Narrow scope A database is created to serve a well defined purpose Structured data Conceptual/Logical/Physical schema Relational model dominates since 80s Entity Relationship defines conceptual schema Close world assumption Data as an instance of the schema The data which is not part of an instance does not exist Any query on the database returns a value based on the current instance Data at rest Data is loaded and stored in the database, on disk. Fully interactive architecture 3 tier architecture: Web server; App Server; Database Server.
4
Not a product, but a collection of processes.
Big Data Not a product, but a collection of processes. Big Data Resource Data Collection Data Cleaning Extraction, Transfer, Load Federation DBs Docs Feeds Analog Data analysis Data mining Long-term Archival Data maintenance Data Preparation Data Integration © Philippe Bonnet, 2014
5
What is a Big Data Resource?
A big data resource is a collection of data which is made available for analysis What is data analysis? Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision making. For example: The databases that underly the learnit blog or FM’s Building Management System are not big data resources The data made available by Eurostat on unemployment in Europe can be considered a big data resource. More examples of this kind at Google Public Data Explorer © Philippe Bonnet, 2014
6
What is Big Data Analysis?
Need for insight based on data not currently available for analysis How can we set goals for energy management at the IT University? We do not know how electricity is consumed. We do not know where electricity is consumed. We do not know how to link electricity consumption and people’s practices. How long does it take you to answer your mails? © Philippe Bonnet, 2014
7
Who is involved? Data Analyst Data Manager DBs Big Data Resource Docs
Feeds Analog Long-term Archival Data Curator Management © Philippe Bonnet, 2014
8
Big Data Management Wide scope Data Variety Open world assumption
Data is made available for yet-to-be-defined analysis Data Variety Time series are highly structured; Text is not Open world assumption Data sources might be added or removed So any analysis is only valid based on the current state of the big data resource Data in movement, and at rest Data streams complements stored data Some data streams are stored, others are not Lambda Architecture (or variant thereof) Batch layer; serving layer; speed layer; Analytics © Philippe Bonnet, 2014
9
The 3 Vs Volume, Velocity, Variety, Veracity, Validity, Viability, Value, ... At best, the Vs are dimensions to structure: Non-functional requirements Capacity sizing Performance evaluation © Philippe Bonnet, 2014
10
Lambda Architecture © Philippe Bonnet, 2015
© Philippe Bonnet, 2015
11
Course Outline How is data represented? Batch processes
Beyond Relations (lecture 3) Dealing with High-Volume (lecture 4) Data Integration (lecture 10) Primary vs. Derived Data Batch processes Data derivation processes; Map-reduce (lecture 5) Systems underlying Lambda Architecture Modern IT infrastructure (lecture 1) Hadoop ecosystem and Spark (exercises and lecture 6) NoSQL and NewSQL (lecture 7) Data Streaming Platforms (lecture 8) Data Pipeline Management (lecture 9) Analytics 101 OLAP (lecture11) Big Data Mining (lecture 12) © Philippe Bonnet, 2015
12
Big Data at City Scale End-users App developers Technology providers,
Solution providers Cyber-Physical System Building instrumentation, wireless infrastructure, … Data Sets City map, bus routes, meter data, … Data Streams Live position of all buses, el-distribution network status,, … Data Marketplace Batch Transformations Real-Time Analytics Apps Data cleaning, machine learning inference, aggregate views, … Motion inference, pollution alerts, … Mobility app, Energy tracker app, … Storage (raw and derived data), computation, security, … Digital Infrastructure Cloud-based IoT-based Digital Infrastructure Data Providers Data Providers Public Administration © Philippe Bonnet, 2015 Hitachi City Data Market
13
Big Data at Building Scale
Light on/off events not logged Big Data at Building Scale Wireless Router Monitoring IT Dept Building Management System Facility Management Learnit Log RL Teacher Students? © Philippe Bonnet, 2014
14
Big Data at Building Scale
Light on/off events Big Data at Building Scale EF FM Mgt Wireless Router Monitoring Building Management System ITU Big Data Resources available for Analytics within and outside the IT University Learnit Log © Philippe Bonnet, 2014
15
Is Pokemon Go Big Data? © Philippe Bonnet, 2015
16
Search and Rescue Example
Map by David Strip based on Google maps and OurAirports.com See Jame Fallows’article at the Atlantic. © Philippe Bonnet, 2014
17
Outline Big Data Defined Trends Underlying Big Data
Relational vs. Big Data Management Lambda Architecture Course Outline Examples Trends Underlying Big Data Application Pull Business Push Technology Push: Modern IT Infrastructure © Philippe Bonnet, 2015
18
Application Pull: Sense making
Build conceptual model Build a physical model Answer the questions Questions to answer Build a logical model Collect the data Load the data (Tune) t Time to Insight: Weeks to Months OLD SCHOOL @ Dennis Shasha and Philippe Bonnet, 2013 Source -
19
@ Dennis Shasha and Philippe Bonnet, 2013
Available Data Scope of Analysis Model Available Data Model Traditional System Model Traditional System Model Model New System Model @ Dennis Shasha and Philippe Bonnet, 2013 Source –
20
@ Dennis Shasha and Philippe Bonnet, 2013
Monitor, Mine, Manage Knowledge Application Knowledge Application Knowledge Knowledge Model Generation Structure / Value Information Information BIG DATA Data Data Transform & Load Information Production Signal Digital Shoebox t Time to Insight Effort / Latency @ Dennis Shasha and Philippe Bonnet, 2013 Source:
21
Business Push: Data Growth
@ Dennis Shasha and Philippe Bonnet, 2013 Source:
22
@ Dennis Shasha and Philippe Bonnet, 2013
Source: @ Dennis Shasha and Philippe Bonnet, 2013
23
Source: http://blogs. technet
© Philippe Bonnet, 2015
24
Technology Push: Warehouse-Scale Computer
LOOK UP: Werner Voegels on virtualization. @ Dennis Shasha and Philippe Bonnet, 2013 Source:
25
Technology Push: Storage
source: Virtual Geek’s take on storage tree of life M.Wei et al. I/O speculation in the microsecond era. Usenix ATC’14. SSD Architecture © Philippe Bonnet, 2014
26
Storage Architectures
source: Virtual Geek’s take on storage tree of life – A MUST READ!! Storage RAM Interconnect © Philippe Bonnet, 2014
27
Data-Intensive Applications: Server-side Architectures
Look up Fabric Computing on Wikipedia. source: Virtual Geek’s take on storage tree of life – A MUST READ!! © Philippe Bonnet, 2014
28
© Philippe Bonnet, 2016
29
Take Away Points Big data is not a product but a collection of processes centered around big data resources Collections of data made available for analysis Primary focus on data manager (less on data analyst) Not a data mining class Lambda Architecture is a good way to organise Big Data management Evolution of storage helps structure application architectures and database landscape © Philippe Bonnet, 2014
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.