DataOps DataOps DevOps for Analytics
Hi, I’m Steph! An MVP & data science consultant.
DevOps Dev Test Ops What is DevOps? DevOps is bridging the gap between different teams, culturally and technologically to reduce the risk and cost of software development in an organisation. Culturally, it means things like ownership of changes, “left-shifting” responsibility for testing, lower bureaucracy Technologically, it means things like unit testing, immutable architecture, continuous integration, continuous deployment, real-time application monitoring and alerting
DBs Test Ops What’s DLM? Database Lifecycle Management is the work Microsoft & Redgate are doing to facilitate unit testing, CI, CD, and pipeline management for databases Is it DevOps? It depends! Yes, people need to do some of this database stuff for applications so DLM needs to be usable for a DevOps shop, but they should also be doing it for data warehouses and other data repositories. It’s vital but given that it’s not just about development it should be considered separately.
BI Test Ops What do people (sort of) use to make decisions? Facts / data / information. These are retrieved from source systems, integrated, transformed, and finally reported on. So many steps involved and so many point of failure. What to do about it? Test, test, test! But how? Unit test frameworks, metadata driven, scripted builds. These enable you to be more RIGHT – incorporate them and do so in an automated fashion and your bringing Ops and Testins closer. This is DataOps
Test Ops Analytics Analytics AKA data science, statistics, data mining, machine learning The problem of time We saw in BI that we need to be RIGHT – but we’re only ever aiming for RIGHT at the point of presentation. Analytics is aiming to be RIGHT over people and time. It’s about making predictions that as accurate today as they are tomorrow. That’s a really hard thing since when you start changing things, you can’t tell what would have happened if you hadn’t, unless you experiment. You need to be able to account for changing behaviour over time, and changes to data capture etc. This means your modelling has to be updateable and refinements easy to deploy, and at the same time remaining robust. How to solve? Coded analysis, regular retraining, and a validation and deployment pipeline are all needed to facilitate on-going analytics. Analytics has to be automated to be able to cope.
Data Test Ops DataOps What do DBs, BI, and Analytics have in common? They’re all about the data DataOps is all about bringing rigour and speed of delivery to how we store, process, use, and present data. It’s a vital movement that will enable BI departments to keep pace with the “modern world” and helps them scale to meet the increasing demand for data. So, yeah, #DataOps is totally a thing!
DataOps is moving data roles and ops closer
Current People Tools Process Quantified value Planning Continuous improvement Process Code review Coping with change Productivity tools Tools Satisfaction / fitness metrics Continuous integration People Teach simplicity Continuous learning Automation Quick to build Coherent communication Face to face / virtual
Ideal People Tools Process Quantified value Planning Continuous improvement Code review Coping with change Continuous learning People Productivity tools Tools Satisfaction / fitness metrics Face to face / virtual Continuous integration Teach simplicity Automation Quick to build Coherent communication
To improve is to change; to be perfect is to change often. Winston Churchill
Current Get & Tidy Transform Viz Model Transform @hadleywickham
Ideal Viz Model Transform Get & Tidy Transform @hadleywickham
Process Value / idea Prototype Dev Test Deliver Prioritise Release Ticket Prioritise Dev Test Release
It’s never about how you start – it’s always about how you finish The Rock
Good reading The Phoenix Project The Art of Agile Agile Data Warehouse Design DataOps.info DataOps Manifesto
Collaboration Azure Visual Studio Online Trello GitHub White board & post-its Slack http://itsalocke.com/index.php/database-bi-related-unit-testing-options/
Data Cost Tooling Learning SSDT Free Medium-High Medium Redgate DLM Anchor Modeling Low High MSBuild Medium-high CosmosDB http://itsalocke.com/index.php/database-bi-related-unit-testing-options/ Github.com/stephlocke/MeDriAnchor http://sqlbits.com/Sessions/Event14/Metadata_Driven_Automation_A_Primer
ETL Cost Tooling Learning BIML Free or High High SQL Free Medium Powershell Azure Functions Low SSIS Free or Medium http://itsalocke.com/index.php/database-bi-related-unit-testing-options/ Github.com/stephlocke/MeDriAnchor http://sqlbits.com/Sessions/Event14/Metadata_Driven_Automation_A_Primer
Reporting Cost Tooling Learning Excel Free/Low Low PowerBI Medium Mediuim SSRS Free-Medium Fairly low R Free High Medium-high Other ? http://itsalocke.com/index.php/database-bi-related-unit-testing-options/ http://sqlbits.com/Sessions/Event14/Delivering_Agile_Analytics_with_Azure_Machine_Learning
Cubes Cost Tooling Learning Tabular Free or Medium Medium Original Free or High Low High http://itsalocke.com/index.php/database-bi-related-unit-testing-options/ Github.com/stephlocke/MeDriAnchor http://sqlbits.com/Sessions/Event14/Metadata_Driven_Automation_A_Primer
Data Science Cost Tooling Learning R Free High Python Docker Free/Medium Medium Microsoft ML Free / High Azure ML Free / Low Low H2O Spark
@stefflocke @lockedata steph@itsalocke.com Follow up! @stefflocke @lockedata steph@itsalocke.com