Presentation is loading. Please wait.

Presentation is loading. Please wait.

Processing Temporal Telemetry Data -aka- Storing BigData in a Small Space =tg= Thomas H. Grohser, SQL Server MVP, Senior Director - Technical Solutions.

Similar presentations


Presentation on theme: "Processing Temporal Telemetry Data -aka- Storing BigData in a Small Space =tg= Thomas H. Grohser, SQL Server MVP, Senior Director - Technical Solutions."— Presentation transcript:

1 Processing Temporal Telemetry Data -aka- Storing BigData in a Small Space =tg= Thomas H. Grohser, SQL Server MVP, Senior Director - Technical Solutions Architecture, NTT Data, Inc tg@grohser.com or Thomas.Grohser@nttdata.com SQL Saturday #437 BI Edition Boston, MA Microsoft Conference Center, Kendall Square, 1 Cambridge Center, Cambridge, MA 02142

2 select * from =tg= where topic = =tg= Thomas Grohser, NTT DATA email: tg@grohser.com Focus on SQL Server Security, Performance Engineering, Infrastructure and Architecture New Papers coming late 2015 Close Relationship with SQLCAT (SQL Server Customer Advisory Team) SCAN (SQL Server Customer Advisory Network) TAP (Technology Adoption Program) Product Teams in Redmond Active PASS member and PASS Summit Speaker 21 Years with SQL Server

3 And one more thing … All I know about BI is how to not install it

4 © 2015 NTT DATA, Inc.4 20,000 professionals – Optimizing balanced global delivery $1.6B – Annual revenues with history of above-market growth Long-term relationships – >1,000 clients; mid-market to large enterprise Delivery excellence – Enabled by process maturity, tools and accelerators Flexible engagement – Spans consulting, staffing, managed services, outsourcing, and cloud Industry expertise – Driving depth in select industry verticals NTT DATA in North America NTT DATA North America Headquarters, Plano Texas

5 Agenda  What is Temporal Telemetry data  Collecting it in staging  Cleaning it up  Optimizing and saving it in a fact table  Questions

6 Drawing at the end of the session  Drop your business card or fill out provided blank card and drop in the box  Must be present at the time of drawing at the end of the session to win: ½ day of free consulting

7 The good and the bad  Good:  Everything I show today can be done in BI and Standard edition  Bad  You need to understand your data to do it. There is no “one shoe fits all the problems” solution

8 What is temporal telemetry data Lots of 10110101

9 What is temporal telemetry data  Data points collected over time  CPU utilization  Database size  Page expressions  Sales  Heartrate  Weight  Speed  Temperature  Stock prices  …

10 How to store it efficient ETL with a lot of emphasis on the T Cleanup the data Put the logic in the data not into a complex query Store the data not convenient but as its needed

11 Temporal data simplified  Key  Time  Value Primary key Payload data

12 Collecting data in a staging area  Source  Type/SubType  Event Time  Value Data is in a messy state (missing data, duplicates, …) and most likely not in the shape we need it for reporting Primary key Payload data

13 Cleaning up the data  Replace the time with the ID from a Time Dimension  Replacing the source/type with an ID from Key Demension  Handling duplicates  Handling gaps

14 Time Dimension  Avoid storing the datetime in the table.  You need functions to extract the month or year and that kills query performance

15 Key Dimension  If the source is simple like a sensor ID you can leave it as it is but if its more complex like (Computer Name, SQL Instance Name, NUMA Node, CPU, Utilization) or just a very long key then creating a Key dimension and storing a reference is a great idea  Remember we are storing temporal data (we are expecting many values over time for the same key)

16 Dealing with duplicates  Depending on data  Ignore all but last  Ignore all but first  Average  Min  Max  …

17 Dealing with missing data  Depending on data  Make it 0 (or more general value x)  Carry forward the last collected value  Carry forward the average/min/max of the last n values  Average between n last and m next values  Use the next value  Use the average of the next n values

18 “Optimizing” data storage  Two types of data with some gray area in between  Fast changing (different value almost every time we sample)  Slow changing (same value for many sample periodes)  Fast changing examples  CPU utilization  Stock Prices  Slow changing examples  Database size  Outside Temperature

19 Optimizing: Fast changing data  Question is the detail in which we sample relevant and if yes for how long?  Example: Instead of storing 3600 values capturing the CPU utilization every second it might actually be more informative to have 60 records holding min/max/average for a minute at a time.  After a week or so this level of detail might not be relevant any more and aggregating it to one record per every ten minutes or every hour might be good enough.

20 Optimizing: Slow changing data  Instead of storing the same value over and over again we could just create a single record saying value x from this time to that time.  Key  TimeFrom  TimeTo  Value Primary key Payload data

21 Reporting on “optimized” data SELECT f.ID, t.DateTime, f.Value FROM FactTable f INNER JOIN TimeDim t ON (t.TimeID BETWEEN f.TimeFromID AND f.TimeKeyToID)

22 Coming soon TDaaS  Temporal Data as a Service  Fast, secure, fully managed transactional temporal data store that can scale close to infinity  Available in  NTT Data Cloud  Amazon AWS Cloud  Azure Cloud  Your datacenter or private cloud

23 Questions? tg@grohser.com

24 Thank you! DRAWING tg@grohser.com

25 © 2015 NTT DATA, Inc.25 NTT DATA Portfolio Infrastructure and Security Consulting Data Center Modernization DR and Business Continuity Infrastructure Management Managed Hosting Managed Security Development and Management Mobility Enterprise Applications Modernization QA and Testing BI, Analytics, Performance Mgmt. Interactive Services IT Strategy Digital Business Process Optimization Business Intelligence Strategy Organizational Change Management Program Management Office Consulting Integrated solutions across infrastructure, applications, and business processes Industry Solutions, Strategic Staffing, and BPO Secure Infrastructure Application Innovation Advisory Services AdvisoryModernizationManagement Cloud Services Operations


Download ppt "Processing Temporal Telemetry Data -aka- Storing BigData in a Small Space =tg= Thomas H. Grohser, SQL Server MVP, Senior Director - Technical Solutions."

Similar presentations


Ads by Google