Big Data Management – Fall 2016

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
An Information Architecture for Hadoop Mark Samson – Systems Engineer, Cloudera.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
NIST BIG DATA WG Reference Architecture Subgroup Meeting Agenda Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented Intelligence)
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
David Besemer, CTO On Demand Data Integration with Data Virtualization.
Chapter 1 Overview of Databases and Transaction Processing.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
` tuplejump The data engineering platform. A startup with a vision to simplify data engineering and empower the next generation of data powered miracles!
Introduction. Outline What is database tuning What is changing The trends that impact database systems and their applications What is NOT changing The.
Ch 5. The Evolution of Analytic Processes
© 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche.
Right In Time Presented By: Maria Baron Written By: Rajesh Gadodia
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented Intelligence)
1 CS851 Data Services in Advanced System Applications Sang H. Son
Joe Caserta President Elliott Cordo Chief Architect September 30, 2015, Javits Center, New York City Building a Data Lake for Digital Music Dominance.
NIST BIG DATA WG Reference Architecture Subgroup Draft Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented Intelligence) August.
IoT Meets Big Data Standardization Considerations
Big Data RA Topics 1 Industries Data Characteristics “V”s Curation Processing Changes E, T, L Scalable Infrastructure Management Security Data Sources.
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Microsoft Ignite /28/2017 6:07 PM
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
Data Analytics (CS40003) Introduction to Data Lecture #1
DATA Storage and analytics with AZURE DATA LAKE
Building Automation System Integration into a Cloud-based BIM-FM Model
Connected Infrastructure
  Choice Hotels’ journey to better understand its customers through self-service analytics Narasimhan Sampath & Avinash Ramineni Strata Hadoop World |
Big Data Enterprise Patterns
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Connected Living Connected Living What to look for Architecture
Smart Building Solution
ANOMALY DETECTION FRAMEWORK FOR BIG DATA
BIG DATA IN ENGINEERING APPLICATIONS
Enable the Hybrid Data Platform
Smart Building Solution
Connected Living Connected Living What to look for Architecture
Chapter 13 The Data Warehouse
Connected Infrastructure
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Software Architecture in Practice
Shubha Vijayasarathy Program Manager, Azure Event Hubs - Microsoft
9/21/2018 3:41 AM BRK3180 Architect your big data solutions with SQL Data Warehouse & Azure Analysis Services Josh Caplan & Matt Usher Program Managers.
Challenges and Opportunities in a Data-Driven World
Federico Perrero – Plant Manager
Celtic-Plus Proposers Day 20th June 2017, Helsinki
Big Data - in Performance Engineering
Data Warehouse and OLAP
Data Warehousing and Data Mining
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Near Real Time ETLs with Azure Serverless Architecture
Ch 4. The Evolution of Analytic Scalability
Syllabus and Introduction Keke Chen
Database Systems Instructor Name: Lecture-3.
Technical Capabilities
Introduction of Week 9 Return assignment 5-2
Artem A. Nazarenko, Joao Sarraipa, Paulo Figueiras,
Big Data Analysis in Digital Marketing
Data Warehousing Concepts
Agenda Need of Cloud Computing What is Cloud Computing
Big DATA.
CRM DMP – a marriage of two acronyms
Data Warehouse and OLAP
Big Data.
Presentation transcript:

Big Data Management – Fall 2016 5/27/2018 Introduction Big Data Management – Fall 2016 © Philippe Bonnet, 2014

Outline Big Data Defined Trends Underlying Big Data Relational vs. Big Data Management Lambda Architecture Course Outline Examples Trends Underlying Big Data Application Pull Business Push Technology Push: Modern IT Infrastructure © Philippe Bonnet, 2015

Relational Database Management Narrow scope A database is created to serve a well defined purpose Structured data Conceptual/Logical/Physical schema Relational model dominates since 80s Entity Relationship defines conceptual schema Close world assumption Data as an instance of the schema The data which is not part of an instance does not exist Any query on the database returns a value based on the current instance Data at rest Data is loaded and stored in the database, on disk. Fully interactive architecture 3 tier architecture: Web server; App Server; Database Server.

Not a product, but a collection of processes. Big Data Not a product, but a collection of processes. Big Data Resource Data Collection Data Cleaning Extraction, Transfer, Load Federation DBs Docs Feeds Analog Data analysis Data mining Long-term Archival Data maintenance Data Preparation Data Integration © Philippe Bonnet, 2014

What is a Big Data Resource? A big data resource is a collection of data which is made available for analysis What is data analysis? Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision making. For example: The databases that underly the learnit blog or FM’s Building Management System are not big data resources The data made available by Eurostat on unemployment in Europe can be considered a big data resource. More examples of this kind at Google Public Data Explorer © Philippe Bonnet, 2014

What is Big Data Analysis? Need for insight based on data not currently available for analysis How can we set goals for energy management at the IT University? We do not know how electricity is consumed. We do not know where electricity is consumed. We do not know how to link electricity consumption and people’s practices. How long does it take you to answer your mails? © Philippe Bonnet, 2014

Who is involved? Data Analyst Data Manager DBs Big Data Resource Docs Feeds Analog Long-term Archival Data Curator Management © Philippe Bonnet, 2014

Big Data Management Wide scope Data Variety Open world assumption Data is made available for yet-to-be-defined analysis Data Variety Time series are highly structured; Text is not Open world assumption Data sources might be added or removed So any analysis is only valid based on the current state of the big data resource Data in movement, and at rest Data streams complements stored data Some data streams are stored, others are not Lambda Architecture (or variant thereof) Batch layer; serving layer; speed layer; Analytics © Philippe Bonnet, 2014

The 3 Vs Volume, Velocity, Variety, Veracity, Validity, Viability, Value, ... At best, the Vs are dimensions to structure: Non-functional requirements Capacity sizing Performance evaluation © Philippe Bonnet, 2014

Lambda Architecture © Philippe Bonnet, 2015 http://www.drdobbs.com/database/applying-the-big-data-lambda-architectur/240162604 © Philippe Bonnet, 2015

Course Outline How is data represented? Batch processes Beyond Relations (lecture 3) Dealing with High-Volume (lecture 4) Data Integration (lecture 10) Primary vs. Derived Data Batch processes Data derivation processes; Map-reduce (lecture 5) Systems underlying Lambda Architecture Modern IT infrastructure (lecture 1) Hadoop ecosystem and Spark (exercises and lecture 6) NoSQL and NewSQL (lecture 7) Data Streaming Platforms (lecture 8) Data Pipeline Management (lecture 9) Analytics 101 OLAP (lecture11) Big Data Mining (lecture 12) © Philippe Bonnet, 2015

Big Data at City Scale End-users App developers Technology providers, Solution providers Cyber-Physical System Building instrumentation, wireless infrastructure, … Data Sets City map, bus routes, meter data, … Data Streams Live position of all buses, el-distribution network status,, … Data Marketplace Batch Transformations Real-Time Analytics Apps Data cleaning, machine learning inference, aggregate views, … Motion inference, pollution alerts, … Mobility app, Energy tracker app, … Storage (raw and derived data), computation, security, … Digital Infrastructure Cloud-based IoT-based Digital Infrastructure Data Providers Data Providers Public Administration © Philippe Bonnet, 2015 Hitachi City Data Market

Big Data at Building Scale Light on/off events not logged Big Data at Building Scale Wireless Router Monitoring IT Dept Building Management System Facility Management Learnit Log RL Teacher Students? © Philippe Bonnet, 2014

Big Data at Building Scale Light on/off events Big Data at Building Scale EF FM Mgt Wireless Router Monitoring Building Management System ITU Big Data Resources available for Analytics within and outside the IT University Learnit Log © Philippe Bonnet, 2014

Is Pokemon Go Big Data? © Philippe Bonnet, 2015

Search and Rescue Example Map by David Strip based on Google maps and OurAirports.com See Jame Fallows’article at the Atlantic. © Philippe Bonnet, 2014

Outline Big Data Defined Trends Underlying Big Data Relational vs. Big Data Management Lambda Architecture Course Outline Examples Trends Underlying Big Data Application Pull Business Push Technology Push: Modern IT Infrastructure © Philippe Bonnet, 2015

Application Pull: Sense making Build conceptual model Build a physical model Answer the questions Questions to answer Build a logical model Collect the data Load the data (Tune) t  Time to Insight: Weeks to Months OLD SCHOOL @ Dennis Shasha and Philippe Bonnet, 2013 Source - http://www.youtube.com/watch?v=2YWvmBtEymE

@ Dennis Shasha and Philippe Bonnet, 2013 Available Data Scope of Analysis Model Available Data Model Traditional System Model Traditional System Model Model New System Model @ Dennis Shasha and Philippe Bonnet, 2013 Source – http://www.vldb.org/2011/files/slides/keynotes/campbell_keynote.pptx

@ Dennis Shasha and Philippe Bonnet, 2013 Monitor, Mine, Manage Knowledge Application Knowledge Application Knowledge Knowledge Model Generation Structure / Value Information Information BIG DATA Data Data Transform & Load Information Production Signal Digital Shoebox t  Time to Insight Effort / Latency @ Dennis Shasha and Philippe Bonnet, 2013 Source: http://www.vldb.org/pvldb/vol4/p694-campbell.pdf

Business Push: Data Growth @ Dennis Shasha and Philippe Bonnet, 2013 Source: http://permabit.com/media-center/blogs/

@ Dennis Shasha and Philippe Bonnet, 2013 Source: https://practicalanalytics.wordpress.com/2011/11/06/explaining-hadoop-to-management-whats-the-big-data-deal/ @ Dennis Shasha and Philippe Bonnet, 2013

Source: http://blogs. technet © Philippe Bonnet, 2015

Technology Push: Warehouse-Scale Computer LOOK UP: Werner Voegels on virtualization. @ Dennis Shasha and Philippe Bonnet, 2013 Source: http://www.morganclaypool.com/doi/abs/10.2200/S00193ED1V01Y200905CAC006

Technology Push: Storage source: Virtual Geek’s take on storage tree of life M.Wei et al. I/O speculation in the microsecond era. Usenix ATC’14. SSD Architecture © Philippe Bonnet, 2014

Storage Architectures source: Virtual Geek’s take on storage tree of life – A MUST READ!! Storage RAM Interconnect © Philippe Bonnet, 2014

Data-Intensive Applications: Server-side Architectures Look up Fabric Computing on Wikipedia. source: Virtual Geek’s take on storage tree of life – A MUST READ!! © Philippe Bonnet, 2014

© Philippe Bonnet, 2016

Take Away Points Big data is not a product but a collection of processes centered around big data resources Collections of data made available for analysis Primary focus on data manager (less on data analyst) Not a data mining class Lambda Architecture is a good way to organise Big Data management Evolution of storage helps structure application architectures and database landscape © Philippe Bonnet, 2014