Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data Management – Fall 2016

Similar presentations


Presentation on theme: "Big Data Management – Fall 2016"— Presentation transcript:

1 Big Data Management – Fall 2016
5/27/2018 Introduction Big Data Management – Fall 2016 © Philippe Bonnet, 2014

2 Outline Big Data Defined Trends Underlying Big Data
Relational vs. Big Data Management Lambda Architecture Course Outline Examples Trends Underlying Big Data Application Pull Business Push Technology Push: Modern IT Infrastructure © Philippe Bonnet, 2015

3 Relational Database Management
Narrow scope A database is created to serve a well defined purpose Structured data Conceptual/Logical/Physical schema Relational model dominates since 80s Entity Relationship defines conceptual schema Close world assumption Data as an instance of the schema The data which is not part of an instance does not exist Any query on the database returns a value based on the current instance Data at rest Data is loaded and stored in the database, on disk. Fully interactive architecture 3 tier architecture: Web server; App Server; Database Server.

4 Not a product, but a collection of processes.
Big Data Not a product, but a collection of processes. Big Data Resource Data Collection Data Cleaning Extraction, Transfer, Load Federation DBs Docs Feeds Analog Data analysis Data mining Long-term Archival Data maintenance Data Preparation Data Integration © Philippe Bonnet, 2014

5 What is a Big Data Resource?
A big data resource is a collection of data which is made available for analysis What is data analysis? Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision making. For example: The databases that underly the learnit blog or FM’s Building Management System are not big data resources The data made available by Eurostat on unemployment in Europe can be considered a big data resource. More examples of this kind at Google Public Data Explorer © Philippe Bonnet, 2014

6 What is Big Data Analysis?
Need for insight based on data not currently available for analysis How can we set goals for energy management at the IT University? We do not know how electricity is consumed. We do not know where electricity is consumed. We do not know how to link electricity consumption and people’s practices. How long does it take you to answer your mails? © Philippe Bonnet, 2014

7 Who is involved? Data Analyst Data Manager DBs Big Data Resource Docs
Feeds Analog Long-term Archival Data Curator Management © Philippe Bonnet, 2014

8 Big Data Management Wide scope Data Variety Open world assumption
Data is made available for yet-to-be-defined analysis Data Variety Time series are highly structured; Text is not Open world assumption Data sources might be added or removed So any analysis is only valid based on the current state of the big data resource Data in movement, and at rest Data streams complements stored data Some data streams are stored, others are not Lambda Architecture (or variant thereof) Batch layer; serving layer; speed layer; Analytics © Philippe Bonnet, 2014

9 The 3 Vs Volume, Velocity, Variety, Veracity, Validity, Viability, Value, ... At best, the Vs are dimensions to structure: Non-functional requirements Capacity sizing Performance evaluation © Philippe Bonnet, 2014

10 Lambda Architecture © Philippe Bonnet, 2015
© Philippe Bonnet, 2015

11 Course Outline How is data represented? Batch processes
Beyond Relations (lecture 3) Dealing with High-Volume (lecture 4) Data Integration (lecture 10) Primary vs. Derived Data Batch processes Data derivation processes; Map-reduce (lecture 5) Systems underlying Lambda Architecture Modern IT infrastructure (lecture 1) Hadoop ecosystem and Spark (exercises and lecture 6) NoSQL and NewSQL (lecture 7) Data Streaming Platforms (lecture 8) Data Pipeline Management (lecture 9) Analytics 101 OLAP (lecture11) Big Data Mining (lecture 12) © Philippe Bonnet, 2015

12 Big Data at City Scale End-users App developers Technology providers,
Solution providers Cyber-Physical System Building instrumentation, wireless infrastructure, … Data Sets City map, bus routes, meter data, … Data Streams Live position of all buses, el-distribution network status,, … Data Marketplace Batch Transformations Real-Time Analytics Apps Data cleaning, machine learning inference, aggregate views, … Motion inference, pollution alerts, … Mobility app, Energy tracker app, … Storage (raw and derived data), computation, security, … Digital Infrastructure Cloud-based IoT-based Digital Infrastructure Data Providers Data Providers Public Administration © Philippe Bonnet, 2015 Hitachi City Data Market

13 Big Data at Building Scale
Light on/off events not logged Big Data at Building Scale Wireless Router Monitoring IT Dept Building Management System Facility Management Learnit Log RL Teacher Students? © Philippe Bonnet, 2014

14 Big Data at Building Scale
Light on/off events Big Data at Building Scale EF FM Mgt Wireless Router Monitoring Building Management System ITU Big Data Resources available for Analytics within and outside the IT University Learnit Log © Philippe Bonnet, 2014

15 Is Pokemon Go Big Data? © Philippe Bonnet, 2015

16 Search and Rescue Example
Map by David Strip based on Google maps and OurAirports.com See Jame Fallows’article at the Atlantic. © Philippe Bonnet, 2014

17 Outline Big Data Defined Trends Underlying Big Data
Relational vs. Big Data Management Lambda Architecture Course Outline Examples Trends Underlying Big Data Application Pull Business Push Technology Push: Modern IT Infrastructure © Philippe Bonnet, 2015

18 Application Pull: Sense making
Build conceptual model Build a physical model Answer the questions Questions to answer Build a logical model Collect the data Load the data (Tune) t  Time to Insight: Weeks to Months OLD SCHOOL @ Dennis Shasha and Philippe Bonnet, 2013 Source -

19 @ Dennis Shasha and Philippe Bonnet, 2013
Available Data Scope of Analysis Model Available Data Model Traditional System Model Traditional System Model Model New System Model @ Dennis Shasha and Philippe Bonnet, 2013 Source –

20 @ Dennis Shasha and Philippe Bonnet, 2013
Monitor, Mine, Manage Knowledge Application Knowledge Application Knowledge Knowledge Model Generation Structure / Value Information Information BIG DATA Data Data Transform & Load Information Production Signal Digital Shoebox t  Time to Insight Effort / Latency @ Dennis Shasha and Philippe Bonnet, 2013 Source:

21 Business Push: Data Growth
@ Dennis Shasha and Philippe Bonnet, 2013 Source:

22 @ Dennis Shasha and Philippe Bonnet, 2013
Source: @ Dennis Shasha and Philippe Bonnet, 2013

23 Source: http://blogs. technet
© Philippe Bonnet, 2015

24 Technology Push: Warehouse-Scale Computer
LOOK UP: Werner Voegels on virtualization. @ Dennis Shasha and Philippe Bonnet, 2013 Source:

25 Technology Push: Storage
source: Virtual Geek’s take on storage tree of life M.Wei et al. I/O speculation in the microsecond era. Usenix ATC’14. SSD Architecture © Philippe Bonnet, 2014

26 Storage Architectures
source: Virtual Geek’s take on storage tree of life – A MUST READ!! Storage RAM Interconnect © Philippe Bonnet, 2014

27 Data-Intensive Applications: Server-side Architectures
Look up Fabric Computing on Wikipedia. source: Virtual Geek’s take on storage tree of life – A MUST READ!! © Philippe Bonnet, 2014

28 © Philippe Bonnet, 2016

29 Take Away Points Big data is not a product but a collection of processes centered around big data resources Collections of data made available for analysis Primary focus on data manager (less on data analyst) Not a data mining class Lambda Architecture is a good way to organise Big Data management Evolution of storage helps structure application architectures and database landscape © Philippe Bonnet, 2014


Download ppt "Big Data Management – Fall 2016"

Similar presentations


Ads by Google