Finding Islands, Gaps, and Clusters in Complex Data

Slides:

Advertisements

Similar presentations

Independent consultant Available for consulting In-house workshops Cost-Based Optimizer Performance By Design Performance Troubleshooting Oracle ACE Director.

Advertisements

Supervisor : Prof . Abbdolahzadeh

Big Data Working with Terabytes in SQL Server Andrew Novick

Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.

Technical BI Project Lifecycle

Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)

Clarity Educational Community Clarity Educational Community Creating and Tuning SQL Queries that Engage Users.

Architecting a Large-Scale Data Warehouse with SQL Server 2005 Mark Morton Senior Technical Consultant IT Training Solutions DAT313.

State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.

SharePoint 2010 Business Intelligence Module 6: Analysis Services.

OnLine Analytical Processing (OLAP)

Ashwani Roy Understanding Graphical Execution Plans Level 200.

Views Lesson 7.

BUSINESS ANALYTICS AND DATA VISUALIZATION

Moving Beyond Standard BMV Reports Using Data Repository Session 373 Presented by: Ian Proffer.

1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Access 2010 by Robert Grauer, Keith Mast, and Mary Anne.

1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.

1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.

Or How I Learned to Love the Cube…. Alexander P. Nykolaiszyn BLOG:

Execution Plans Detail From Zero to Hero İsmail Adar.

Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.

1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.

Best Practices for Columnstore Indexes Warner Chaves SQL MCM / MVP SQLTurbo.com Pythian.com.

Database Design: Solving Problems Before they Start! Ed Pollack Database Administrator CommerceHub.

Steve Simon MVP SQL Server BI

Supervisor : Prof . Abbdolahzadeh

Tim Hall Oracle ACE Director

Software testing techniques TESTING BASED ON ERROR GUESSING

Temporal Databases Microsoft SQL Server 2016

Query Optimization Techniques

Dynamic SQL Writing Efficient Queries on the Fly

Dynamic SQL: Writing Efficient Queries on the Fly

Efficiently Searching Schema in SQL Server

Steve Simon MVP SQL Server BI

Dynamic SQL Writing Efficient Queries on the Fly

Mapping Shema and Recursively Managing Data

Four Rules For Columnstore Query Performance

Introduction to Execution Plans

What is the Azure SQL Datawarehouse?

SQL Server Analysis Services Fundamentals

DAX and the tabular model

Query Optimization Techniques

Database Vs. Data Warehouse

Physical Database Design

Typically data is extracted from multiple sources

Tracking Index Usage Like a Pro

Dynamic SQL: Writing Efficient Queries on the Fly

Transact SQL Performance Tips

Realtime Analytics OLAP & OLTP in the mix

Get your ETL flow under statistical process control

Designing Complex Tabular Models

Data Modeling and Prototyping

Finding Islands, Gaps, and Clusters in Complex Data

Four Rules For Columnstore Query Performance

Score a (row) goal and beat a query optimizer

Insight into the SQL Server Buffer Cache

Execution plans Eugene

Diving into Query Execution Plans

Tracking Index Usage Like a Pro

Finding Islands, Gaps, and Clusters in Complex Data

Query Optimization Techniques

Analytics, BI & Data Integration

Using Columnstore indexes in Azure DevOps Services. Lessons learned

Creating and Using Calendar Tables

Using Columnstore indexes in Azure DevOps Services. Lessons learned

=tg= Thomas Grohser SQL Saturday Philadelphia 2019 TSQL Functions 42.

42 TSQL Functions =tg= Thomas Grohser SQL Saturday

SQL Server 2016 High Performance Database Offer.

Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin

Presentation transcript:

Finding Islands, Gaps, and Clusters in Complex Data Diving Into Analytics With TSQL Edward Pollack Sr. Database Administrator Datto

Thank you to our SQL Saturday #892 Sponsors

Edward Pollack Sr. DBA, Datto Lives in Albany, NY with wife Theresa and sons Nolan (3.5yo) & Oliver (0.9yo), and Legos permanently affixed to the bottoms of his feet. Has spoken at over 100 events, including SQL Saturdays and PASS Summit. Regularly publishes articles for SQL Shack on fun data- related topics. Edward Pollack Sr. DBA, Datto /ed-pollack Published author of Dynamic SQL: Applications, Performance, and Security, which is now in 2nd edition. @EdwardPollack edrick42

Agenda Finding Significant Patterns in Complex Data Review: Structured/Inorganic Groupings Review: Gaps & Islands in Simple Data Data Clusters Answering Complex Questions Performance Conclusion

Structured/Inorganic Groupings The Pros Data can be partitioned into segments based on static rules. Can segment data by dates or date parts easily. Result set is in a predictable size and format. Predictable results. The Cons Does not provide mechanisms for learning or feedback. Boundaries can divide data into misleading groupings. Predictable results.

Structured Groupings Demo

Gaps/Islands Analysis Query that joins to previous/next rows of data to test for existence of those rows. Can locate and report on missing data. Great for analysis of outliers or exceptions. Can be used to pinpoint streaks, both positive or negative. Allows for many types of analytics against numeric data.

Gaps/Islands Analysis Demo

Data Clusters Created by using gaps/islands analysis over any type of data. Organizing sequential islands of data into meaningful groupings. Allows for related events to be easily identified. Introduces data proximity into analytics. Data groups itself into clusters naturally based on its contents. Must develop and experiment with grouping rules prior to analysis.

Data Clusters Demo

Answering Tough Questions Filters control what data to analyze. Existence checks control cluster parameters. Join predicates determine what to group together. Metrics include: Streaks, droughts, performance, unusual patterns, maxima, minima, etc… Dynamic SQL: Loop through dimensions to gather automated insights.

Answering Tough Questions Demo

Performance Analytics such as these rely on reading large volumes of data. Aka: Index/table scans. Not intended for OLTP databases/workloads. Run on data that is: Replicated, AlwaysOn, ETL, OLAP, data copy, etc… Helpful Tools: Covering Indexes. Columnstore Indexes. In-Memory OLTP. Automated Analytics. Incremental Data Loads. LEAD/LAG for some data aggregation challenges. Performance can be optimized to be linearly efficient to size of the data read.

Important Considerations Data Quality! How to manage: NULLs Missing Data Unexpected inputs/data values Duplicate data The borders of a data cluster within a multi-partitioned data set may require special treatment. QA: Thoroughly test all use cases!

Can This be Done With Other Tools? Probably! TSQL is a great tool as it can filter and manage data alongside analysis. If the data is already well-structured for reporting, then R/Python may be able to provide similar value and performance. Decide on tool based on: Performance Filtering/data manipulation required Complexity of analysis Expertise in tools What will happen with this data next?

Conclusion Data can be organically grouped, regardless of complexity. Results can be used to determine many useful metrics: Willing/losing Streaks. Data clusters. Related events. Patterns or abnormalities within a data set. Be creative and find innovative solutions to challenging problems!

Learn more from Ed Pollack @EdwardPollack epollack@datto.com