Columnstore Index - is it the DW "Faster" switch you are looking for? Johan Ludvig Brattås Oslo – 22/2 2017
About me Johan Ludvig Brattås Johan-ludvig.brattas@capgemini.com @intoleranse Managing consultant, Capgemini Norway SQLSaturday Oslo Excel BI virtual chapter
Agenda Columnstore basics What’s new in 2016 What is Operational Analytics and In-Memory Analytics When to use Clustered Columnstore When to not use columnstore Sofie
Columnstore basics First introduced in 2012. Version 0.4… Current version 2016 = 0.9 Hybrid in-memory technology, based on xVelocity engine Columnstore as opposed to rowstore. Data grouped in rowgroups - 1,048,576 rows Minimum size 102,400 rows. Columnsegments within each rowgroup Deltastore Batchmode execution
What’s new in 2016? Updateable nonclustered columnstore (a.k.a. Operational Analytics) In-Memory Analytics Nonclustered rowstore index on tables with Clustered Columnstore Improved Batch mode functionality Primary and Foreign Key support on tables with Clustered Columnstore Indexes String Predicate Pushdown Simple Aggregate Predicate Pushdown Improved DMVs Better index reorganzation Batchmode-forbedringer: 1 core execution plans (for too low specced systems…) Batch mode support for SORT operator Batch mode for Multiple Distinct Count Support for LEFT SEMI ANTI JOINS (NOT EXISTS…)
What is Operational Analytics Real-time analytics… Only designed for non-clustered columnstore The only real case for non-clustered columnstore usage In-memory Analytics Limitations: Adds 20% MORE memory usage DEMO In-Memory Analytics
When to use Clustered Columnstore Huge tables, like really gigantic. Ok? Lots of columns Star-schema/normalized fact tables Tables you want to scan, not seek. Demo
Some other examples on larger tables… Existing index: CREATE NONCLUSTERED INDEX [FactPR_Flybevegelse_ix] ON [dbo].[FactPunktlighetRegularitet] ( [FLYBEVEGELSE_ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO Result: (154586 row(s) affected) 00:00:27
Result: (154586 row(s) affected) 00:00:05 Reccomended index: CREATE NONCLUSTERED INDEX [Test_Full] ON [dbo].[FactPunktlighetRegularitet] ([PeriodeSk],[ISDELETED]) INCLUDE ([FactKey],[FlygningArtSk],[FlyplassSk],[FlyselskapSk],[FlytypeSk],[TimeSk],[ForsinkelseSk], [OrganisasjonSk],[AvgAnkSk],[VektKlasseSk],[AvgAnkFlyplassSk],[KansellertSk],[GateSk],[HandlerSk], [FlyoppstillingsplassSk],[AvgAnkFlyoppstillingsplassSk],[FLYBEVEGELSE_ID],[ACT_FLIGHT_KEY],[ACT_FLIGHT_LEG_KEY], [FORSINKELSE_MIN],[FLYBEVEGELSE_DATO],[KALLESIGNAL],[STD],[STA],[ETD],[ETA],[OFF_BLOCK],[ON_BLOCK],[ATD],[ATA], [REGMERKE],[ANTALL_SETER],[MTOW],[NOX_UTSLIPPSVERDI],[FRAKT],[POST],[FRAKT_POST],[FORSINKELSE_MIN_OVER_15_MIN], [FORSINKELSE_MIN_UNDER_3_MIN],[Antall_forsinkelse_kjent_Tid],[Antall_forsinkelse_kjent_Tid_3], [Antall_forsinkelse_totalt],[Totalt_Antall_kjent_tid],[Antall_kanselleringer],[FLIGHT_ID], [FLYBEVEGELSE_ID_KANSELLERT],[FLYBEVEGELSE_ID_FORSINKELSE],[TOUCH_GO_COUNT],[FellesOrgSK],[motsattOrganisasjonSK], [PeriodeSTDSk],[TimeSTDSk],[PeriodeSTASk],[TimeSTASk],[PeriodeATASk],[TimeATASk],[PeriodeATDSk],[TimeATDSk], [PeriodeOnBlockSk],[TimeOnBlockSk],[PeriodeOffBlockSk],[TimeOffBlockSk],[KorrigeringPost],[KorrigeringFrakt], [FirstBag],[Lastbag],[BagsKvalitetSk],[BusGateSk],[OmradeOSLSk]) GO Result: (154586 row(s) affected) 00:00:05
With Clustered Columnstore: create Clustered columnstore index CCI_FactPunktlighetRegularitet on [dbo].[FactPunktlighetRegularitet]; Result: (154586 row(s) affected) 00:00:09 With NONCLUSTERED Columnstore: CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_FactPunktlighetRegularitet on [dbo].[FactPunktlighetRegularitet] (FactKey, FlygningArtSk, FlyplassSk, FlyselskapSk, FlytypeSk, PeriodeSk, TimeSk, ForsinkelseSk, OrganisasjonSk, AvgAnkSk, VektKlasseSk, AvgAnkFlyplassSk, KansellertSk, GateSk, HandlerSk, FlyoppstillingsplassSk, AvgAnkFlyoppstillingsplassSk, FLYBEVEGELSE_ID, ACT_FLIGHT_KEY, ACT_FLIGHT_LEG_KEY, FORSINKELSE_MIN, FLYBEVEGELSE_DATO, KALLESIGNAL, STD, STA, ETD, ETA, OFF_BLOCK, ON_BLOCK, ATD, ATA, REGMERKE, ANTALL_SETER, MTOW, NOX_UTSLIPPSVERDI, FRAKT, POST, FRAKT_POST, BatchId, FORSINKELSE_MIN_OVER_15_MIN, FORSINKELSE_MIN_UNDER_3_MIN, Antall_forsinkelse_kjent_Tid, Antall_forsinkelse_kjent_Tid_3, Antall_forsinkelse_totalt, Totalt_Antall_kjent_tid, Antall_kanselleringer, FLIGHT_ID, FLYBEVEGELSE_ID_KANSELLERT, FLYBEVEGELSE_ID_FORSINKELSE, TOUCH_GO_COUNT, FellesOrgSK, motsattOrganisasjonSK, PeriodeSTDSk, TimeSTDSk, PeriodeSTASk, TimeSTASk, PeriodeATASk, TimeATASk, PeriodeATDSk, TimeATDSk, PeriodeOnBlockSk, TimeOnBlockSk, PeriodeOffBlockSk, TimeOffBlockSk, PeriodeStopBeltSK, TimeStopBeltSK, PeriodeStartBeltSK, TimeStartBeltSK, PARK_STA, Belt_Start_Time, Belt, KorrigeringPost, KorrigeringFrakt, ISDELETED, FirstBag, Lastbag, BagsKvalitetSk, BusGateSk, OmradeOSLSk, Flygningsart) Result: (154586 row(s) affected) 00:00:10 With both CCI and full row index: Result: (154586 row(s) affected) 00:00:05
exec sp_spaceused 'FactPunktlighetRegularitet' name rows reserved data index_size unused Current usage: FactPunktlighetRegularitet 19627654 9095832 KB 8662720 KB 428416 KB 4696 KB Med anbefalt rowindeks: 16164000 KB 7496512 KB 4768 KB Med Clustered Columnstore: 2485568 KB 2485328 KB 0 KB 240 KB
When not to use Clustered Columnstore Small and narrow tables Tables used to search for single or small range sets Tin can servers… It really depends…
Fraud database. 40 columns, 15 Bill rows, 7 TB disk usage Create Clustered Columnstore – took 9 hours Database Query Comment Feature Svartid MS SQL Query 1 (MS SQL) disk usage: 1,7TB partisjonert + column store 44 sec disk usage: 7,7TB partisjonert + column store + index 4 sec Oracle Query 1 (Oracle) index 21:70 sec InMemory 6:13 sec
Links http://www.nikoport.com/columnstore https://github.com/NikoNeugebauer/CISL https://github.com/NikoNeugebauer/MOSL