Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Emergence of Data Science: Why Now? Ike Nassi (With contributions from Andrew McAfee, MIT Sloan) 17-Oct 2013 BSOE Research Day.

Similar presentations


Presentation on theme: "The Emergence of Data Science: Why Now? Ike Nassi (With contributions from Andrew McAfee, MIT Sloan) 17-Oct 2013 BSOE Research Day."— Presentation transcript:

1 The Emergence of Data Science: Why Now? Ike Nassi (With contributions from Andrew McAfee, MIT Sloan) 17-Oct 2013 BSOE Research Day

2 What this talk is all about  Convince you that  There is a need  We have some tools  We need new approaches  We can’t do it all ourselves  Evidence-based decision making is important  And it needs more attention  It will happen anyway

3 Outline  Societal  Economic  Technological

4 A Short Story – Point of View 1984 Configuration = 0Configuration ≠ 0

5 The Future: Hard to Predict Accurately iWatch? Skynet?

6 Changes happen faster than we think!

7 How well can experts predict?

8 2012 Political Campaign “Bottom line: Romney 315, Obama 223. That sounds high for Romney. But he could drop Pennsylvania and Wisconsin and still win the election. Fundamentals." Barone: Going out on a limb: Romney beats Obama, handily (315 to 223) The Washington Examiner ^ | 11/2/12 | Michael Barone slide by Andrew McAfee (MIT)

9 What about the experts? slide by Andrew McAfee (MIT)

10 A Meta-Study Scorecard 136 studies of expert vs. algorithmic prediction Tossup Experts Clearly Better Algorithm Clearly Better 65 (48%)63 (46%) 8 (6%) slide by Andrew McAfee (MIT)

11 The Digital Frontier Keeps Expanding (slide contributed by Andy McAfee, MIT) Source: “Building Watson: It’s not so elementary, my dear” – W. Shih. HBS case #9-612-017

12 (slide contributed by Andrew McAfee, MIT) Ken Jennings

13 Why is Data Science happening now?

14 We can collect “Big Data” slide by Andrew McAfee (MIT)

15 Big Data slide by Andrew McAfee (MIT)

16 What can Economics tell us?  We are collecting a lot more data, but…  We are facing a rapidly changing economic landscape  And we are not very good at controlling the economy  Who is going to analyze it?

17

18

19

20 Capital vs. Labor Source: Federal Reserve Bank of St. Louis, Economic Research slide by Andrew McAfee (MIT)

21 Recent Trends Shaded areas indicate recessions slide by Andrew McAfee (MIT)

22 Recent Trends Shaded areas indicate recessions slide by Andrew McAfee (MIT)

23 Skill Disparities Source: http://econ-www.mit.edu/~dautor/hole-vol4/figs/fig-04.zip slide by Andrew McAfee (MIT)

24 Superstars Source: http://emlab.berkeley.edu/users/saez/piketty-saezOUP04US.pdf

25 How to effect change Make the experts more effective

26  Collect data, predict, act (proactive)  E.g. Evidence-based medicine  Build systems that collect data, create feedback loops (reactive)  E.g. Human body  Both are needed Proactive and Reactive Approaches Proactive Analysis Reactive

27 Technology Requirements  Data sizes for data under management are monotonically increasing  Who wants less data?  Our appetite for analysis is monotonically increasing  Do you think, or do you know?  Trend toward evidence-based management  Our appetite for speed is monotonically increasing  Who wants questions answered more slowly?  Hence the industry interest in in-memory data management systems  Our overall ability to manage complexity is not increasing

28 Technology To Support Data Science  Processor speeds are limited  Processor core density has been increasing at a healthy rate  Memory density is increasing (but at a lower rate than core density)!  Therefore, the memory/core ratio is going in the wrong direction!  We haven’t significantly changed the memory/storage hierarchies for decades  Interconnects are getting faster – as fast as memory access?  memory access is slow  caches are fast!

29 Memory-Density/Core-Density Declining…

30 Technological Solutions  It’s in our nature to tackle more ambitious problems  Need faster answers  SAP, Oracle, Neo-4j, Objectivity, etc.  More in-memory solutions (e.g. NYSE/Euronext – Steve Rubinow)  Cannot get faster processors, but we can get more of them  But: parallelism is difficult  Legacy software is a huge problem  Need more machine learning, therefore, feedback

31 What about memory?

32 Scaling out  When all you have is a hammer, every problem looks like a nail  Or, in my case, a thumb!  Today we rely almost exclusively on “scale-out” systems  Because that’s the main way we add processors and memory  Shard the data, intelligently target the queries – time consuming  It’s not easy to query partitioned databases  What is the best way to do it?  Moving data is time-consuming  And you might have to change it  What if you could build systems that “scale-up”?

33 What I’m doing about this  Enabling systems that scale-up (TidalScale Inc. mission)  Software that sits below an operating system but above the hardware that aggregates a set of servers together and runs that collection as a single virtual server running a single conventional operating system  dynamic scaling at linear cost  supporting unmodified legacy software and legacy operating systems  automatically, dynamically and hierarchically optimizing processors, memory, networks, and storage systems through machine learning  automatically evolving as hardware evolves  The computer begins to learn what it needs to do to manage itself!

34 Why Data Science Now?  NEED: the future is increasingly complex and difficult to predict  NEED: we don’t have enough qualified experts, and experts often get it wrong  RAW MATERIALS: we are collecting huge amounts of data at an increasing rate  ENABLER: new hardware and software tools are emerging  THEREFORE: Data science is inevitable! We don’t have a choice

35 What are the implications?  Danny Hillis, inventor of the Connection Machine:  “I want to build a computer that will be proud of me”  What about SkyNet?  Let’s leave that discussion for another day….

36 The Second Machine Age Andrew McAfee, MIT amcafee@mit.edu @amcafee

37 Thank you Ike Nassi UCSC Computer Science inassi@ucsc.edu and TidalScale, Inc. ike.nassi@tidalscale.com

38 Complexity  Is the world getting more complex?  Understanding complex systems:  because we want to?  Because we need to?  Does it matter?  Who manages our understanding of complexity?  Who analyzes it?  Is the world getting more complex, or are we making it more complex for ourselves based on a false sense of what we truly know?


Download ppt "The Emergence of Data Science: Why Now? Ike Nassi (With contributions from Andrew McAfee, MIT Sloan) 17-Oct 2013 BSOE Research Day."

Similar presentations


Ads by Google