Presentation is loading. Please wait.

Presentation is loading. Please wait.

Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & others VP of Incubator.

Similar presentations


Presentation on theme: "Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & others VP of Incubator."— Presentation transcript:

1

2 Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & others VP of Incubator at Apache Foundation

3 Why Now? But Moore’s law has applied for a long time
Why is data exploding now? Why not 10 years ago? Why not 20? But we have seen constant growth for a long time. And simple growth would only explain some kinds of companies starting with big data (probably big ones) and then slow adoption. Databases started with big companies and took 20 years or more to reach everywhere because the need exceeded cost at different times for different companies. The internet, on the other hand, largely happened to everybody at the same time so it changed things in nearly all industries at all scales nearly simultaneously. Why is big data exploding right now and why is it exploding at all?

4 Size Matters, but … If it were just availability of data then existing big companies would adopt big data technology first

5 Size Matters, but … If it were just availability of data then existing big companies would adopt big data technology first They didn’t

6 Or Maybe Cost If it were just a net positive value then finance companies should adopt first because they have higher opportunity value / byte

7 Or Maybe Cost If it were just a net positive value then finance companies should adopt first because they have higher opportunity value / byte They didn’t

8 Backwards adoption Under almost any threshold argument startups would not adopt big data technology first

9 Backwards adoption Under almost any threshold argument startups would not adopt big data technology first They did

10 Everywhere at Once? Something very strange is happening
Big data is being applied at many different scales At many value scales By large companies and small

11 Everywhere at Once? Something very strange is happening Why?
Big data is being applied at many different scales At many value scales By large companies and small Why?

12 Analytics Scaling Laws
Analytics scaling is all about the rule Big gains for little initial effort Rapidly diminishing returns The key to net value is how costs scale Old school – exponential scaling Big data – linear scaling, low constant Cost/performance has changed radically IF you can use many commodity boxes The different kinds of scaling laws have different shape and I think that shape is the key.

13 Most data isn’t worth much in isolation
Later data is dregs The value of analytics always increases with more data, but the rate of increase drops dramatically after an initial quick increase. First data is valuable

14 Suddenly worth processing
But has high aggregate value Later data is dregs The value of analytics always increases with more data, but the rate of increase drops dramatically after an initial quick increase. First data is valuable

15 If we can handle the scale
It’s really big The value of analytics always increases with more data, but the rate of increase drops dramatically after an initial quick increase.

16 So what makes that possible?

17 In classical analytics, the cost of doing analytics increases sharply.

18 Net value optimum has a sharp peak well before maximum effort
The result is a net value that has a sharp optimum in the area where value is increasing rapidly and cost is not yet increasing so rapidly.

19 But scaling laws are changing both slope and shape
New techniques such as Hadoop result in linear scaling of cost. This is a change in shape and it causes a qualitative change in the way that costs trade off against value to give net value. As technology improves, the slope of this cost line is also changing rapidly over time.

20 More than just a little

21 They are changing a LOT!

22 This next sequence shows how the net value changes with different slope linear cost models.

23

24 Notice how the best net value has jumped up significantly

25 And as the line approaches horizontal, the highest net value occurs at dramatically larger data scale.

26 Then a tipping point is reached and things change radically …
Initially, linear cost scaling actually makes things worse

27 Pre-requisites for Tipping
To reach the tipping point, Algorithms must scale out horizontally On commodity hardware That can and will fail Data practice must change Denormalized is the new black Flexible data dictionaries are the rule Structured data becomes rare

28 Inferentially Forbidden Practices
Old data should not be changed, schemas must be flexible Global state isn’t and now should be discarded in favor of before and after All processes become streams Scale is nearly inevitable

29 Inferentially Forbidden Practices
Old data should not be changed, schemas must be flexible Note Apache Drill Global state isn’t and now should be discarded in favor of before and after See various large-scale databases, Spanner, MapR DB All processes become streams More about this coming up Scale is nearly inevitable

30 What System Architecture?
Development Speed ≈ V – S – C Total developer volume Internal communication Coupling

31 Communication Cost S + C Team Size

32 Which System Will Be Done Soonest?

33 Which System Will Be Done Soonest?
V – S – C V – S – C V – S – C

34 Which System Will Be Done Soonest?
V – S – C V – S – C V – S – C

35 Micro-services Wins, ESB does not
As systems grow, there is almost no choice but to adopt something like micro-services Messaging systems without global transactions (Kafka-esque) have already won for streaming systems

36 How Should Arrows Work? Implementations should be hidden to decrease C
Consider streaming micro-services Implementations should be hidden to decrease C Either process could be a batch process Either process (or both) could be running, or not => Messaging must be persistent

37 Will Arrows Be Used? Programmers adopt REST interfaces easily
Scale enough and universal access Adoption of streaming lags Perceived performance and scale issues => streaming must be pervasive and performant

38 We Already Have Some Winners
Only very few streaming systems can meet the requirements of scaling, persistence, performance and pervasiveness Kafka-esque designs are essentially required

39 Use Case Straight streaming

40 Financial Services Use Case
Customer handles bids and asks for stocks for off-exchange trading Need: routing of information to recipients in such a way so as to supported the core required queries Core queries: each recipient and sender would like to know what transactions they have received or sent during any period of time that period most commonly being from a few minutes ago to the present time Also want to be able to show a history of bids & offers for each stock

41 Financial Services Use Case
For reference assume: ,000 unique senders and receivers each bid or offer includes 10 recipients on average bids and offers arrive at a rate of 300k messages / second Customer tried to get this to work with Hbase / HW (and utterly failed) What would you do?

42 Financial Services: Stream First Solution

43 Key discussion points System handles nearly 4 million inserts running on 3 nodes This design doesn’t use a database. Is that good or bad? Real-time queries easily implemented directly against streams Archiving to compressed column files allows long term analytics Aggregates to DB allows live dashboards

44 Extreme streaming pays off big

45 Use Case Platform replication

46 Basic Situation Multiple locations Each location has many pumps

47 What Does a Pump Look Like
Voltage Current Temperature Pressure Flow Temperature Pressure Flow Winding temperature

48 Basic Situation Multiple locations Each location has many pumps

49 Basic Architecture Reflects Business Structure

50 One Stream Has Many Topics

51 Use Case Massive IoT

52 Massive IoT Requirements: Cars roam between data centers
100 million cars 2kB / second Cars roam between data centers

53

54 Conclusions Scale and speed changes core architectural trade-offs
Streaming is ideal abstraction for much of micro-services load All major persistence abstractions must be first-class

55 What is Convergence? Files Tables Streams

56 Call to action: Require convergence

57

58 Short Books by Ted Dunning & Ellen Friedman
Published by O’Reilly in For sale from Amazon or O’Reilly Free e-books currently available courtesy of MapR

59 Streaming Architecture
by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly) Free copies on MapR.com

60 Thank You!

61 Q & A Engage with us! @mapr maprtech mapr-technologies MapR
maprtech


Download ppt "Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & others VP of Incubator."

Similar presentations


Ads by Google