Download presentation
Presentation is loading. Please wait.
Published byMarcus Flynn Modified over 8 years ago
1
Processing Temporal Telemetry Data -aka- Storing BigData in a Small Space =tg= Thomas H. Grohser, SQL Server MVP, Senior Director - Technical Solutions Architecture, NTT Data, Inc tg@grohser.com or Thomas.Grohser@nttdata.com SQL Saturday #437 BI Edition Boston, MA Microsoft Conference Center, Kendall Square, 1 Cambridge Center, Cambridge, MA 02142
2
select * from =tg= where topic = =tg= Thomas Grohser, NTT DATA email: tg@grohser.com Focus on SQL Server Security, Performance Engineering, Infrastructure and Architecture New Papers coming late 2015 Close Relationship with SQLCAT (SQL Server Customer Advisory Team) SCAN (SQL Server Customer Advisory Network) TAP (Technology Adoption Program) Product Teams in Redmond Active PASS member and PASS Summit Speaker 21 Years with SQL Server
3
And one more thing … All I know about BI is how to not install it
4
© 2015 NTT DATA, Inc.4 20,000 professionals – Optimizing balanced global delivery $1.6B – Annual revenues with history of above-market growth Long-term relationships – >1,000 clients; mid-market to large enterprise Delivery excellence – Enabled by process maturity, tools and accelerators Flexible engagement – Spans consulting, staffing, managed services, outsourcing, and cloud Industry expertise – Driving depth in select industry verticals NTT DATA in North America NTT DATA North America Headquarters, Plano Texas
5
Agenda What is Temporal Telemetry data Collecting it in staging Cleaning it up Optimizing and saving it in a fact table Questions
6
Drawing at the end of the session Drop your business card or fill out provided blank card and drop in the box Must be present at the time of drawing at the end of the session to win: ½ day of free consulting
7
The good and the bad Good: Everything I show today can be done in BI and Standard edition Bad You need to understand your data to do it. There is no “one shoe fits all the problems” solution
8
What is temporal telemetry data Lots of 10110101
9
What is temporal telemetry data Data points collected over time CPU utilization Database size Page expressions Sales Heartrate Weight Speed Temperature Stock prices …
10
How to store it efficient ETL with a lot of emphasis on the T Cleanup the data Put the logic in the data not into a complex query Store the data not convenient but as its needed
11
Temporal data simplified Key Time Value Primary key Payload data
12
Collecting data in a staging area Source Type/SubType Event Time Value Data is in a messy state (missing data, duplicates, …) and most likely not in the shape we need it for reporting Primary key Payload data
13
Cleaning up the data Replace the time with the ID from a Time Dimension Replacing the source/type with an ID from Key Demension Handling duplicates Handling gaps
14
Time Dimension Avoid storing the datetime in the table. You need functions to extract the month or year and that kills query performance
15
Key Dimension If the source is simple like a sensor ID you can leave it as it is but if its more complex like (Computer Name, SQL Instance Name, NUMA Node, CPU, Utilization) or just a very long key then creating a Key dimension and storing a reference is a great idea Remember we are storing temporal data (we are expecting many values over time for the same key)
16
Dealing with duplicates Depending on data Ignore all but last Ignore all but first Average Min Max …
17
Dealing with missing data Depending on data Make it 0 (or more general value x) Carry forward the last collected value Carry forward the average/min/max of the last n values Average between n last and m next values Use the next value Use the average of the next n values
18
“Optimizing” data storage Two types of data with some gray area in between Fast changing (different value almost every time we sample) Slow changing (same value for many sample periodes) Fast changing examples CPU utilization Stock Prices Slow changing examples Database size Outside Temperature
19
Optimizing: Fast changing data Question is the detail in which we sample relevant and if yes for how long? Example: Instead of storing 3600 values capturing the CPU utilization every second it might actually be more informative to have 60 records holding min/max/average for a minute at a time. After a week or so this level of detail might not be relevant any more and aggregating it to one record per every ten minutes or every hour might be good enough.
20
Optimizing: Slow changing data Instead of storing the same value over and over again we could just create a single record saying value x from this time to that time. Key TimeFrom TimeTo Value Primary key Payload data
21
Reporting on “optimized” data SELECT f.ID, t.DateTime, f.Value FROM FactTable f INNER JOIN TimeDim t ON (t.TimeID BETWEEN f.TimeFromID AND f.TimeKeyToID)
22
Coming soon TDaaS Temporal Data as a Service Fast, secure, fully managed transactional temporal data store that can scale close to infinity Available in NTT Data Cloud Amazon AWS Cloud Azure Cloud Your datacenter or private cloud
23
Questions? tg@grohser.com
24
Thank you! DRAWING tg@grohser.com
25
© 2015 NTT DATA, Inc.25 NTT DATA Portfolio Infrastructure and Security Consulting Data Center Modernization DR and Business Continuity Infrastructure Management Managed Hosting Managed Security Development and Management Mobility Enterprise Applications Modernization QA and Testing BI, Analytics, Performance Mgmt. Interactive Services IT Strategy Digital Business Process Optimization Business Intelligence Strategy Organizational Change Management Program Management Office Consulting Integrated solutions across infrastructure, applications, and business processes Industry Solutions, Strategic Staffing, and BPO Secure Infrastructure Application Innovation Advisory Services AdvisoryModernizationManagement Cloud Services Operations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.