Denis Reznik Data Architect, Intapp, Inc. Microsoft Data Platform MVP

Slides:



Advertisements
Similar presentations
The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department.
Advertisements

“ Leveraging SharePoint 2010 Search Technologies ” With: Ivan Neganov.
Optimizing SharePoint Search Using Scope and Managed Properties By Kevin Israel, MVP.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
Database Processing Chapter "No, Drew, You Don’t Know Anything About Report Writing.” Copyright © 2014 Pearson Education, Inc. Publishing as Prentice.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Mini-Project on Web Data Analysis DANIEL DEUTCH. Data Management “Data management is the development, execution and supervision of plans, policies, programs.
Online Conference June 17 th and 18 th What’s new in SharePoint 2016 for Power Users.
Big Data Yuan Xue CS 292 Special topics on.
What is Data Science and Who is Data Scientist
Deadlocks 3.0. Final Edition. Everything that developer needs to know Denis Reznik Microsoft SQL Server MVP Director of R&D at Intapp Kyiv.
SQL Server Deep Dive Denis Reznik Data Architect at Intapp.
This document and the information contained herein is confidential and proprietary to Allegient LLC and shall not be duplicated, used or disclosed in whole.
Look Mom! – NoSQL Charles Nurse | DotNetNuke Corp.
Book web site:
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
SQL Server Performance Tuning
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
DBSI Teaser Presentation
The DBA's Role within N-tier Applications
Machine Learning overview Chapter 18, 21
Thank You! #sqlsatdnipro Denis
NOSQL.
Parameter Sniffing in SQL Server Stored Procedures
Exploiting SQL Server Security Holes
Deep Learning with TensorFlow online Training at GoLogica Technologies
SQL Server Security Mistakes Everyone Makes
Azure Machine Learning 101
Query Execution Expectation-Reality Denis Reznik
PASS Business Analytics Virtual Group & Marek Matuszewski
CSEP 546 Data Mining Machine Learning
Global Enterprise Search
It’s Always a Hard Choice
SQL Server Mythconceptions And Mythteries
Everything you ever wanted to ask but were too shy
SQL Server 2014 Hidden Treasures Denis Reznik Microsoft SQL Server MVP
Hidden Gems of SQL Server 2014
Machine Learning Telepathy for Shift Right Approach
Hidden gems of SQL Server 2016
CSEP 546 Data Mining Machine Learning
Orchestration and data movement with Azure Data Factory v2
Securing SQL Server Processes with Certificates
New Paradigm for Performance Tuning in SQL Server 2016
NoSQL Databases Antonino Virgillito.
SQL Server Performance Tuning Nowadays
SQLCmd Mode The T-SQL Easy Button
SQL Server Mythconceptions And Mythteries
Hidden Gems of SQL Server 2016
Data Science Meetup Matthew Renze Data Science Consultant
What is this and how can I use it?
Database Systems Summary and Overview
Become the Data Platform Engineer of Tomorrow
Introduction to Big Data
Hidden Gems of SQL Server 2014
SQL Server Management Studio Tips and Tricks
What is this and how can I use it?
What is this and how can I use it?
Deadlocks Everything you ever wanted to ask but were too shy
Hidden Gems of SQL Server 2014
The Ins and Outs of Indexes
Hidden Gems of SQL Server 2014
Get data insights faster with Data Wrangling
Denis Reznik SQL Server 2017 Hidden Gems.
Why should I care about SQL, if I have ORM?
Data Wrangling as the key to success with Data Lake
The Ins and Outs of Indexes
Developer Intro to Cosmos DB
CMPT 120 Lecture 26 – Unit 5 – Internet and Big Data
Denis Reznik SQL Server 2017 Hidden Gems.
An Introduction to Data Science using Python
Presentation transcript:

Denis Reznik Data Architect, Intapp, Inc. Microsoft Data Platform MVP Data Driven Future Denis Reznik Data Architect, Intapp, Inc. Microsoft Data Platform MVP

About Me Denis Reznik Kyiv, Ukraine Data Architect at Intapp, Inc. Microsoft Data Platform MVP PASS Regional Mentor, CEE Ukrainian Data Community Kyiv Co-Founder Co-author of “SQL Server MVP Deep Dives vol. 2” Organizer of SQLSaturday Kyiv Conference

Agenda Data is a new Oil (c) Data and Science Data in Big Companies Data and Application Development Data-Driven Future

Data is a New Oil “Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.” (c) Clive Humby, UK Mathemetician

Data and Science Thousands of years Few hundreds of years Empirical Few hundreds of years Theoretical Last fifty years Computational “Query the world” Last twenty years eScience (Data Science) “Download the world”

Data Science is a new term Data Science is a new term. But in the same sense as Columbus was discovered NEW continent 1000 years ago (c) Hector Garcia-Molina. Professor in the Departments of Computer Science and Electrical Engineering at Stanford University

Unsupervised Learning Machine Learning Supervised Learning Unsupervised Learning Classification Regression

Distance from the Continent Linear Regression Training Data Learning Algorithm Ocean Temperature Oil Derricks in Area Distance from the Continent Whales Population h h - Hypothesis

DEMO Linear Regression

Data in Big Companies

source: http://www. visualcapitalist

source: http://www. visualcapitalist

source: http://www. visualcapitalist

source: http://www. visualcapitalist

source: http://www. visualcapitalist

Parallel Processing Q: How many times temperature was above the norm during the last week? Temperature Sensor Datasets (n Items) A: 5 Time: 2 hours Algorithmic Complexity: O(n)

Parallel Processing Q: How many times temperature was above the norm during the last week? Temperature Sensor Datasets (k Items in each one) A: 1 A: 0 A: 3 A: 4 Time: 0.5 hour Algorithmic Complexity: O(n/k)

Map-Reduce Map -> COUNT(*) WHERE Value > 40 A: 1 A: 0 A: 3 A: 4 Reduce -> COUNT(*) Reduce A: 5

DEMO Map-Reduce

RDMS Commercial Success Database History Amazon Dynamo Paper RDBMS Ingress System R Object Databases CODASYL IMS Google BigTable Paper SQL NewSQL (?) 1960s 1970s 1980s 1990s 2000s Nowadays E.F. Codd’s Paper RDMS Commercial Success NoSQL (Johan Oskarsson)

NoSQL SQL

Databases Key-Value Relational Column-Family Graph Document

… … Index (B-Tree) - Seek SELECT * FROM Users WHERE Id = 523 1 .. 1M 1M-2K .. 1M 1 .. 2K 2K+1 .. 4K … 1 .. 300 301..800 801..1,5K 1,5K+1..2K …

… … Index (B-Tree) - Scan SELECT * FROM Users 1 .. 1M 1M-2K .. 1M 2K+1 .. 4K … 1 .. 300 301..800 801..1,5K 1,5K+1..2K …

Hashtable Hash Function John Snow Jim Beam John Snow Jim Beam 2 3 1 4 Jim Beam Jim Beam Peter Parker John Snow Peter Parker Hash Function 2

Q&A Web Site (StackOverflow)

Domain Model Questions Answers Users Comments Votes

StackOverflow Architecture source: https://www.youtube.com/watch?v=t6kM2EM6so4

DEMO Relational vs. NoSQL

Data-Driven Future Data amount is growing and this is cool More and more decisions are based on data More and more applications are developed It is exciting to be a Software Engineer now!

Thank you! Denis Reznik Twitter: @denisreznik Email: denisreznik@live.ru Blog: http://reznik.uneta.com.ua (rus) Facebook: https://www.facebook.com/denis.reznik.5 LinkedIn: http://ua.linkedin.com/pub/denis-reznik/3/502/234