Finding Islands, Gaps, and Clusters in Complex Data

Slides:



Advertisements
Similar presentations
Independent consultant Available for consulting In-house workshops Cost-Based Optimizer Performance By Design Performance Troubleshooting Oracle ACE Director.
Advertisements

Supervisor : Prof . Abbdolahzadeh
Big Data Working with Terabytes in SQL Server Andrew Novick
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Technical BI Project Lifecycle
Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
Clarity Educational Community Clarity Educational Community Creating and Tuning SQL Queries that Engage Users.
Architecting a Large-Scale Data Warehouse with SQL Server 2005 Mark Morton Senior Technical Consultant IT Training Solutions DAT313.
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
1.
OnLine Analytical Processing (OLAP)
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Views Lesson 7.
BUSINESS ANALYTICS AND DATA VISUALIZATION
Moving Beyond Standard BMV Reports Using Data Repository Session 373 Presented by: Ian Proffer.
1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Access 2010 by Robert Grauer, Keith Mast, and Mary Anne.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Or How I Learned to Love the Cube…. Alexander P. Nykolaiszyn BLOG:
Execution Plans Detail From Zero to Hero İsmail Adar.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Best Practices for Columnstore Indexes Warner Chaves SQL MCM / MVP SQLTurbo.com Pythian.com.
Database Design: Solving Problems Before they Start! Ed Pollack Database Administrator CommerceHub.
Steve Simon MVP SQL Server BI
Supervisor : Prof . Abbdolahzadeh
Tim Hall Oracle ACE Director
Software testing techniques TESTING BASED ON ERROR GUESSING
Temporal Databases Microsoft SQL Server 2016
Query Optimization Techniques
Dynamic SQL Writing Efficient Queries on the Fly
Dynamic SQL: Writing Efficient Queries on the Fly
Efficiently Searching Schema in SQL Server
Steve Simon MVP SQL Server BI
Dynamic SQL Writing Efficient Queries on the Fly
Mapping Shema and Recursively Managing Data
Four Rules For Columnstore Query Performance
Introduction to Execution Plans
What is the Azure SQL Datawarehouse?
SQL Server Analysis Services Fundamentals
DAX and the tabular model
Query Optimization Techniques
Database Vs. Data Warehouse
Physical Database Design
Typically data is extracted from multiple sources
Tracking Index Usage Like a Pro
Dynamic SQL: Writing Efficient Queries on the Fly
Transact SQL Performance Tips
Realtime Analytics OLAP & OLTP in the mix
Get your ETL flow under statistical process control
Designing Complex Tabular Models
Data Modeling and Prototyping
Finding Islands, Gaps, and Clusters in Complex Data
Four Rules For Columnstore Query Performance
Score a (row) goal and beat a query optimizer
Insight into the SQL Server Buffer Cache
Execution plans Eugene
Diving into Query Execution Plans
Tracking Index Usage Like a Pro
Finding Islands, Gaps, and Clusters in Complex Data
Query Optimization Techniques
Analytics, BI & Data Integration
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Creating and Using Calendar Tables
Using Columnstore indexes in Azure DevOps Services. Lessons learned
=tg= Thomas Grohser SQL Saturday Philadelphia 2019 TSQL Functions 42.
42 TSQL Functions =tg= Thomas Grohser SQL Saturday
SQL Server 2016 High Performance Database Offer.
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

Finding Islands, Gaps, and Clusters in Complex Data Diving Into Analytics With TSQL Edward Pollack Sr. Database Administrator Datto

Thank you to our SQL Saturday #892 Sponsors

Edward Pollack Sr. DBA, Datto Lives in Albany, NY with wife Theresa and sons Nolan (3.5yo) & Oliver (0.9yo), and Legos permanently affixed to the bottoms of his feet. Has spoken at over 100 events, including SQL Saturdays and PASS Summit. Regularly publishes articles for SQL Shack on fun data- related topics. Edward Pollack Sr. DBA, Datto /ed-pollack Published author of Dynamic SQL: Applications, Performance, and Security, which is now in 2nd edition. @EdwardPollack edrick42

Agenda Finding Significant Patterns in Complex Data Review: Structured/Inorganic Groupings Review: Gaps & Islands in Simple Data Data Clusters Answering Complex Questions Performance Conclusion

Structured/Inorganic Groupings The Pros Data can be partitioned into segments based on static rules. Can segment data by dates or date parts easily. Result set is in a predictable size and format. Predictable results. The Cons Does not provide mechanisms for learning or feedback. Boundaries can divide data into misleading groupings. Predictable results.

Structured Groupings Demo

Gaps/Islands Analysis Query that joins to previous/next rows of data to test for existence of those rows. Can locate and report on missing data. Great for analysis of outliers or exceptions. Can be used to pinpoint streaks, both positive or negative. Allows for many types of analytics against numeric data.

Gaps/Islands Analysis Demo

Data Clusters Created by using gaps/islands analysis over any type of data. Organizing sequential islands of data into meaningful groupings. Allows for related events to be easily identified. Introduces data proximity into analytics. Data groups itself into clusters naturally based on its contents. Must develop and experiment with grouping rules prior to analysis.

Data Clusters Demo

Answering Tough Questions Filters control what data to analyze. Existence checks control cluster parameters. Join predicates determine what to group together. Metrics include: Streaks, droughts, performance, unusual patterns, maxima, minima, etc… Dynamic SQL: Loop through dimensions to gather automated insights.

Answering Tough Questions Demo

Performance Analytics such as these rely on reading large volumes of data. Aka: Index/table scans. Not intended for OLTP databases/workloads. Run on data that is: Replicated, AlwaysOn, ETL, OLAP, data copy, etc… Helpful Tools: Covering Indexes. Columnstore Indexes. In-Memory OLTP. Automated Analytics. Incremental Data Loads. LEAD/LAG for some data aggregation challenges. Performance can be optimized to be linearly efficient to size of the data read.

Important Considerations Data Quality! How to manage: NULLs Missing Data Unexpected inputs/data values Duplicate data The borders of a data cluster within a multi-partitioned data set may require special treatment. QA: Thoroughly test all use cases!

Can This be Done With Other Tools? Probably! TSQL is a great tool as it can filter and manage data alongside analysis. If the data is already well-structured for reporting, then R/Python may be able to provide similar value and performance. Decide on tool based on: Performance Filtering/data manipulation required Complexity of analysis Expertise in tools What will happen with this data next?

Conclusion Data can be organically grouped, regardless of complexity. Results can be used to determine many useful metrics: Willing/losing Streaks. Data clusters. Related events. Patterns or abnormalities within a data set. Be creative and find innovative solutions to challenging problems!

Learn more from Ed Pollack @EdwardPollack epollack@datto.com