Finding Islands, Gaps, and Clusters in Complex Data

Slides:



Advertisements
Similar presentations
1 Visualizer for Firewall Display & Analysis Tool.
Advertisements

Clarity Educational Community Clarity Educational Community Creating and Tuning SQL Queries that Engage Users.
Lead Black Slide. © 2001 Business & Information Systems 2/e2 Chapter 7 Information System Data Management.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
On-Line Analytic Processing Chetan Meshram Class Id:221.
DAY 14: ACCESS CHAPTER 1 Tazin Afrin October 03,
OnLine Analytical Processing (OLAP)
Learningcomputer.com SQL Server 2008 – Administration, Maintenance and Job Automation.
Introduction to SQL Server Data Mining Nick Ward SQL Server & BI Product Specialist Microsoft Australia Nick Ward SQL Server & BI Product Specialist Microsoft.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 7 Information System Data Management.
Building Dashboards SharePoint and Business Intelligence.
By N.Gopinath AP/CSE.  The data warehouse architecture is based on a relational database management system server that functions as the central repository.
Mickey Stuewe Microsoft Junkie Red Gate Addict Creating User Friendly SSRS Reports.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP,MCP. SQL SERVER Database Administration.
Or How I Learned to Love the Cube…. Alexander P. Nykolaiszyn BLOG:
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP SQL SERVER Database Administration.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Doing fast! Optimizing Query performance with ColumnStore Indexes in SQL Server 2012 Margarita Naumova | SQL Master Academy.
Best Practices for Columnstore Indexes Warner Chaves SQL MCM / MVP SQLTurbo.com Pythian.com.
Database Design: Solving Problems Before they Start! Ed Pollack Database Administrator CommerceHub.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Why Database Management is Important for Well-Performing Companies.
Big Data & Test Automation
Microsoft Connect /6/ :05 AM
Chris Index Feng Shui Chris
Query Optimization Techniques
Dynamic SQL Writing Efficient Queries on the Fly
Dynamic SQL: Writing Efficient Queries on the Fly
Data Cleansing with SQL and R Kevin Feasel
Efficiently Searching Schema in SQL Server
On-Line Analytic Processing
Building Effective Backups
Dynamic SQL Writing Efficient Queries on the Fly
Mapping Shema and Recursively Managing Data
Oracle Analytic Views Enhance BI Applications and Simplify Development
Business Intelligence for Project Server/Online
The Killing Cursors Cyndi Johnson
Query Optimization Techniques
Enhance BI Applications and Simplify Development
Statistics: What are they and How do I use them
The Ins and Outs of Indexes
Tracking Index Usage Like a Pro
Dynamic SQL: Writing Efficient Queries on the Fly
NICE Evidence Services
Transact SQL Performance Tips
Targeting Wait Statistics with Extended Events
Designing Complex Tabular Models
Finding Islands, Gaps, and Clusters in Complex Data
Four Rules For Columnstore Query Performance
Insight into the SQL Server Buffer Cache
EXECUTION PLANS Quick Dive.
The Ins and Outs of Indexes
Chapter 17 Designing Databases
Data Warehousing Concepts
Diving into Query Execution Plans
Chapter 11 Managing Databases with SQL Server 2000
Applying Data Warehouse Techniques
Tracking Index Usage Like a Pro
Introduction to Execution Plans
Summit Nashville /26/2019 4:32 AM
Query Optimization Techniques
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Creating and Using Calendar Tables
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Introduction to Execution Plans
=tg= Thomas Grohser SQL Saturday Philadelphia 2019 TSQL Functions 42.
42 TSQL Functions =tg= Thomas Grohser SQL Saturday
The Ins and Outs of Indexes
Finding Islands, Gaps, and Clusters in Complex Data
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

Finding Islands, Gaps, and Clusters in Complex Data Ed Pollack Database Administrator CommerceHub

Agenda Finding Significant Patterns in Complex Data Quick Review: Structured/Inorganic Groupings Quick Review: Gaps & Islands in Simple Data Finding Data Clusters Answering Crazy Questions TSQL Madness More Demos Performance Conclusion

Structured/Inorganic Groupings We can partition data into segments based on static groupings. Often dates or date parts, but can be other metrics. Easy to visualize & understand. Does not provide recursive/self-referencing feedback. Boundaries can divide data into ill-conceived groupings.

Structured Groupings Demo

Basic Gaps/Islands Analysis A self-joining query (of some sort) can locate missing data and build analysis based on it. Useful for analyzing consistent sequences of data. Can determine streaks, both positive or negative. Many ways to perform analysis on numeric data. Carefully consider data quality prior to analysis!!!

Basic Gaps/Islands Analysis Demo

Finding Data Clusters Data can be organically grouped based on self-referential criteria. Allows for related events to be identified. Introduces internal proximity into analytics. Data groups itself into clusters, regardless of external metrics. Must determine grouping rules prior to analysis.

Finding Data Clusters Demo

Answering Crazy Questions Filters can control what data we include. Existence checks control cluster parameters. Join predicates determine what to group together. Examples of metrics: Streaks, droughts, performance, unusual patterns, etc… Dynamic SQL: Loop through dimensions to gather semi-automated insight.

Answering Crazy Questions Lots and Lots of Demos

Performance Generally, these analytics rely on index/table scans. Not intended for OLTP. Run on data that is: Replicated, AG, ETL, OLAP, restored, etc Helpful tools: Covering indexes. Columnstore indexes. In-Memory OLTP. Automated analytics.

Gotchas Fully understand data quality: NULLs Missing data Unexpected inputs/data values Duplicate data The borders of a cluster within a multi-partitioned data set may require special treatment. QA: thoroughly test all use cases!

Conclusion Data can be organically grouped, regardless of complexity. Results can be used to determine many useful metrics: Winning/losing streaks. Data clusters. Related events. Patterns or abnormalities within data Be creative and find innovative solutions to seemingly impossible problems.

Questions???

Contact Info & Links for Ed Pollack ed7@alum.rpi.edu @EdwardPollack SQL Shack SQL Server Central Dynamic SQL: Applications, Performance, and Security SQL Saturday Albany (2016) Thank you!!!