Processing Temporal Telemetry Data -aka- Storing BigData in a Small Space =tg= Thomas H. Grohser, SQL Server MVP, Senior Director - Technical Solutions.

Slides:



Advertisements
Similar presentations
© 2010 Orchid Technical Consultancy (P) Ltd. Problems facing businesses today Non-availability of information on time –Delayed or improper decision making.
Advertisements

CA Confidential; provided under NDA. © 2014 CA. All rights reserved.2014 Industry Analyst Symposium | 1 Evolving Role of Mainframe in the Dynamic Data.
Reliability, Trust, Quality and Integrity.. MSys, Inc. is a leading Information Technology Consulting, Services, and Business Process organization that.
A Fast Growing Market. Interesting New Players Lyzasoft.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Symantec Vision and Strategy for the Information-Centric Enterprise Muhamed Bavçiç Senior Technology Consultant SEE.
UMT and Microsoft Presenting Tips and Tricks Basics What’s new in Microsoft Project 2010 Brian Feder, MBA, PMP Senior Vice President UMT Consulting Group.
Amdocs Services Reach New Heights of Success
Bogdan Lazaroae: Using technology for improved decision making Bucharest, Romania, May 30, 2007 From Call Data.
22/9/ About Company Business Portfolios IT Solutions BPO Services Business Approach How we can Help Advantage of Outsourcing Why Crystal Business.
Windows Azure Tour Benjamin Day Benjamin Day Consulting, Inc.
Performance Management in Practice
PO320: Reporting with the EPM Solution Keshav Puttaswamy Program Manager Lead Project Business Unit Microsoft Corporation.
Business Productivity Infrastructure Optimization The Business Productivity Infrastructure Optimization Campaign For Microsoft Office 2007 Module 25 –
Communicate with All Workers Involved in the Process of Delivering High-Quality Health Care by Choosing Dossier365 on the Azure Platform MICROSOFT AZURE.
MIS2502: Data Analytics The Information Architecture of an Organization.
Best Practices for Implementing
Company Profile. MerchantPro Express (MPX)  MerchantPro Express (MPX) is a credit card payments processing company, powered by industry leader First.
SAM for SQL Workloads Presenter Name.
Business Intelligence Training Siemens Engineering Pakistan Zeeshan Shah December 07, 2009.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware A Cloud Computing Methodology Study of.
Data Warehousing 101 Howard Sherman Director – Business Intelligence xwave.
Patrick Ortiz Global SQL Solution Architect Dell Inc. BIN209.
The VERSO Product Returns Portal Incorporates Office 365 Outlook and Excel Add-Ins to Create Seamless Workflow for All Participating Users OFFICE 365 APP.
© 2015 TriZetto Corporation Managed Services Overview Presenter Names 2.
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
Optimizing SQL Server and Databases for large Fact Tables =tg= Thomas Grohser, NTT Data SQL Server MVP SQL Server Performance Engineering SQL Saturday.
AVANI CAPABILITIES An Overview
Making Data Work for Everyone Gordon Phillips May 28, 2014.
Challenges to designing financial warehouses: lessons learnt.
READ ME FIRST Use this template to create your Partner datasheet for Azure Stack Foundation. The intent is that this document can be saved to PDF and provided.
We Optimize. You Capitalize Software Development Services
Jaclyn Hansberry MIS2502: Data Analytics The Things You Can Do With Data The Information Architecture of an Organization Jaclyn.
Fourth Dimension Technologies
Microsoft Certification Paths
Data Platform Modernization
Partner Toolbox Cloud Infrastructure & Management
Establishing a Service Level Agreement SLA
Optimizing SQL Server and Databases for large Fact Tables
What is Cloud Computing - How cloud computing help your Business?
Azure Hybrid Use Benefit Overview
CIOs, IT, and Digital Transformation
Datacenter Transformation
Optimizing SQL Server and Databases for large Fact Tables
# - it’s not about social media it’s about temporary tables and data
# - it’s not about social media it’s about temporary tables and data
9/21/2018 3:41 AM BRK3180 Architect your big data solutions with SQL Data Warehouse & Azure Analysis Services Josh Caplan & Matt Usher Program Managers.
SYSTEMART, LLC We Optimize. You Capitalize Software Application Development
Welcome! Power BI User Group (PUG)
Contextual Intelligence Platform Delivers Rich, Interactive Add-Ins to Microsoft Office and Brings Users Efficiency, Quick Access to Valuable Data MICROSOFT.
Data Platform Modernization
Why most candidates fail the interview in the first five minutes
Welcome! Power BI User Group (PUG)
Delivering an End-to-End Business Intelligence Solution
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
Shaving of Microseconds
Why most candidates fail the interview in the first minute
Optimizing SQL Server and Databases for large Fact Tables
XtremeData on the Microsoft Azure Cloud Platform:
Microsoft Certification Paths
Why most candidates fail the interview in the first five minutes
Data Warehousing Concepts
Get data insights faster with Data Wrangling
=tg= Thomas Grohser SQL Saturday Philadelphia 2019 TSQL Functions 42.
Data Wrangling as the key to success with Data Lake
Why most Candidates fail the Interview in the first five Minutes
Why most Candidates fail the Interview in the first five Minutes
42 TSQL Functions =tg= Thomas Grohser SQL Saturday
Hybrid Buffer Pool The Good, the Bad and the Ugly
Visual Studio and SQL Server Data Tools
Presentation transcript:

Processing Temporal Telemetry Data -aka- Storing BigData in a Small Space =tg= Thomas H. Grohser, SQL Server MVP, Senior Director - Technical Solutions Architecture, NTT Data, Inc or SQL Saturday #437 BI Edition Boston, MA Microsoft Conference Center, Kendall Square, 1 Cambridge Center, Cambridge, MA 02142

select * from =tg= where topic = =tg= Thomas Grohser, NTT DATA Focus on SQL Server Security, Performance Engineering, Infrastructure and Architecture New Papers coming late 2015 Close Relationship with SQLCAT (SQL Server Customer Advisory Team) SCAN (SQL Server Customer Advisory Network) TAP (Technology Adoption Program) Product Teams in Redmond Active PASS member and PASS Summit Speaker 21 Years with SQL Server

And one more thing … All I know about BI is how to not install it

© 2015 NTT DATA, Inc.4 20,000 professionals – Optimizing balanced global delivery $1.6B – Annual revenues with history of above-market growth Long-term relationships – >1,000 clients; mid-market to large enterprise Delivery excellence – Enabled by process maturity, tools and accelerators Flexible engagement – Spans consulting, staffing, managed services, outsourcing, and cloud Industry expertise – Driving depth in select industry verticals NTT DATA in North America NTT DATA North America Headquarters, Plano Texas

Agenda  What is Temporal Telemetry data  Collecting it in staging  Cleaning it up  Optimizing and saving it in a fact table  Questions

Drawing at the end of the session  Drop your business card or fill out provided blank card and drop in the box  Must be present at the time of drawing at the end of the session to win: ½ day of free consulting

The good and the bad  Good:  Everything I show today can be done in BI and Standard edition  Bad  You need to understand your data to do it. There is no “one shoe fits all the problems” solution

What is temporal telemetry data Lots of

What is temporal telemetry data  Data points collected over time  CPU utilization  Database size  Page expressions  Sales  Heartrate  Weight  Speed  Temperature  Stock prices  …

How to store it efficient ETL with a lot of emphasis on the T Cleanup the data Put the logic in the data not into a complex query Store the data not convenient but as its needed

Temporal data simplified  Key  Time  Value Primary key Payload data

Collecting data in a staging area  Source  Type/SubType  Event Time  Value Data is in a messy state (missing data, duplicates, …) and most likely not in the shape we need it for reporting Primary key Payload data

Cleaning up the data  Replace the time with the ID from a Time Dimension  Replacing the source/type with an ID from Key Demension  Handling duplicates  Handling gaps

Time Dimension  Avoid storing the datetime in the table.  You need functions to extract the month or year and that kills query performance

Key Dimension  If the source is simple like a sensor ID you can leave it as it is but if its more complex like (Computer Name, SQL Instance Name, NUMA Node, CPU, Utilization) or just a very long key then creating a Key dimension and storing a reference is a great idea  Remember we are storing temporal data (we are expecting many values over time for the same key)

Dealing with duplicates  Depending on data  Ignore all but last  Ignore all but first  Average  Min  Max  …

Dealing with missing data  Depending on data  Make it 0 (or more general value x)  Carry forward the last collected value  Carry forward the average/min/max of the last n values  Average between n last and m next values  Use the next value  Use the average of the next n values

“Optimizing” data storage  Two types of data with some gray area in between  Fast changing (different value almost every time we sample)  Slow changing (same value for many sample periodes)  Fast changing examples  CPU utilization  Stock Prices  Slow changing examples  Database size  Outside Temperature

Optimizing: Fast changing data  Question is the detail in which we sample relevant and if yes for how long?  Example: Instead of storing 3600 values capturing the CPU utilization every second it might actually be more informative to have 60 records holding min/max/average for a minute at a time.  After a week or so this level of detail might not be relevant any more and aggregating it to one record per every ten minutes or every hour might be good enough.

Optimizing: Slow changing data  Instead of storing the same value over and over again we could just create a single record saying value x from this time to that time.  Key  TimeFrom  TimeTo  Value Primary key Payload data

Reporting on “optimized” data SELECT f.ID, t.DateTime, f.Value FROM FactTable f INNER JOIN TimeDim t ON (t.TimeID BETWEEN f.TimeFromID AND f.TimeKeyToID)

Coming soon TDaaS  Temporal Data as a Service  Fast, secure, fully managed transactional temporal data store that can scale close to infinity  Available in  NTT Data Cloud  Amazon AWS Cloud  Azure Cloud  Your datacenter or private cloud

Questions?

Thank you! DRAWING

© 2015 NTT DATA, Inc.25 NTT DATA Portfolio Infrastructure and Security Consulting Data Center Modernization DR and Business Continuity Infrastructure Management Managed Hosting Managed Security Development and Management Mobility Enterprise Applications Modernization QA and Testing BI, Analytics, Performance Mgmt. Interactive Services IT Strategy Digital Business Process Optimization Business Intelligence Strategy Organizational Change Management Program Management Office Consulting Integrated solutions across infrastructure, applications, and business processes Industry Solutions, Strategic Staffing, and BPO Secure Infrastructure Application Innovation Advisory Services AdvisoryModernizationManagement Cloud Services Operations