Download presentation
Presentation is loading. Please wait.
Published byDominic Walton Modified over 7 years ago
2
Denis Reznik Data Architect, Intapp, Inc. Microsoft Data Platform MVP
Data Driven Future Denis Reznik Data Architect, Intapp, Inc. Microsoft Data Platform MVP
3
About Me Denis Reznik Kyiv, Ukraine Data Architect at Intapp, Inc.
Microsoft Data Platform MVP PASS Regional Mentor, CEE Ukrainian Data Community Kyiv Co-Founder Co-author of “SQL Server MVP Deep Dives vol. 2” Organizer of SQLSaturday Kyiv Conference
4
Agenda Data is a new Oil (c) Data and Science Data in Big Companies
Data and Application Development Data-Driven Future
5
Data is a New Oil “Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.” (c) Clive Humby, UK Mathemetician
6
Data and Science Thousands of years Few hundreds of years
Empirical Few hundreds of years Theoretical Last fifty years Computational “Query the world” Last twenty years eScience (Data Science) “Download the world”
7
Data Science is a new term
Data Science is a new term. But in the same sense as Columbus was discovered NEW continent 1000 years ago (c) Hector Garcia-Molina. Professor in the Departments of Computer Science and Electrical Engineering at Stanford University
9
Unsupervised Learning
Machine Learning Supervised Learning Unsupervised Learning Classification Regression
10
Distance from the Continent
Linear Regression Training Data Learning Algorithm Ocean Temperature Oil Derricks in Area Distance from the Continent Whales Population h h - Hypothesis
11
DEMO Linear Regression
12
Data in Big Companies
13
source: http://www. visualcapitalist
14
source: http://www. visualcapitalist
15
source: http://www. visualcapitalist
16
source: http://www. visualcapitalist
17
source: http://www. visualcapitalist
18
Parallel Processing Q: How many times temperature was above the norm during the last week? Temperature Sensor Datasets (n Items) A: 5 Time: 2 hours Algorithmic Complexity: O(n)
19
Parallel Processing Q: How many times temperature was above the norm during the last week? Temperature Sensor Datasets (k Items in each one) A: 1 A: 0 A: 3 A: 4 Time: 0.5 hour Algorithmic Complexity: O(n/k)
20
Map-Reduce Map -> COUNT(*) WHERE Value > 40 A: 1 A: 0 A: 3 A: 4
Reduce -> COUNT(*) Reduce A: 5
21
DEMO Map-Reduce
22
RDMS Commercial Success
Database History Amazon Dynamo Paper RDBMS Ingress System R Object Databases CODASYL IMS Google BigTable Paper SQL NewSQL (?) 1960s 1970s 1980s 1990s 2000s Nowadays E.F. Codd’s Paper RDMS Commercial Success NoSQL (Johan Oskarsson)
23
NoSQL SQL
24
Databases Key-Value Relational Column-Family Graph Document
25
… … Index (B-Tree) - Seek SELECT * FROM Users WHERE Id = 523 1 .. 1M
1M-2K .. 1M 1 .. 2K 2K K … 801..1,5K 1,5K+1..2K …
26
… … Index (B-Tree) - Scan SELECT * FROM Users 1 .. 1M 1M-2K .. 1M
2K K … 801..1,5K 1,5K+1..2K …
27
Hashtable Hash Function John Snow Jim Beam John Snow Jim Beam
2 3 1 4 Jim Beam Jim Beam Peter Parker John Snow Peter Parker Hash Function 2
28
Q&A Web Site (StackOverflow)
29
Domain Model Questions Answers Users Comments Votes
30
StackOverflow Architecture
source:
31
DEMO Relational vs. NoSQL
32
Data-Driven Future Data amount is growing and this is cool
More and more decisions are based on data More and more applications are developed It is exciting to be a Software Engineer now!
33
Thank you! Denis Reznik Blog: (rus) Facebook: LinkedIn:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.