Download presentation
Presentation is loading. Please wait.
Published byBernadette Hudson Modified over 8 years ago
1
Intel “Big Data” Science and Technology Center Michael Stonebraker
2
2 Context Intel held a national “beauty contest” to locate their next S & T center MIT won, with a “Big Data” proposal — 160 proposals $2.5M per year for 3-5 years plus 5 Intel scientists 20 PIs, half at MIT
3
3 Big Data Means What? Volume too large — Stupid analytics (i.e. SQL) solved by commercial data warehouse products — Smart analytics (predictive modelling, machine learning, …) Velocity too big — Drink from a firehose Variety too large — Data integration problem And what does this mean to computer architecture!
4
4 Big Data Means What? Volume too large – smart analytics — Array data bases — Parallel algo — Integration of linear algebra — Scalable vis Velocity too big — Main memory DBs And what does this mean to computer architecture! — Many core — Son-of-flash — Xeon Phi
5
5 Array Data Bases Elasticity in SciDB Query optimizer for SciDB Genomics benchmark — Run on SciDB, SciDB +Phi, column stores, row stores, MadLib, Hadoop Graphs as sparse arrays EarthDB
6
6 Scalable Algo Parallelizing locality sensitive hashing Other algo people are going to work in other areas — Pick your favorite algo, parallelize and make scale Scalable Julia
7
7 Integration of Linear Algebra Hardly anybody can beat BLAS/Lapack/Scalapack — 10 ** 5 difference between Python and Intel- optimized C++ — If you write operation X, chances are you will lose to Jack Dongarra by an order of magnitude — Don’t fight the wizard
8
8 Integration of Linear Algebra DBMS + Scalapack — Federation required — Resource manager required — Recoverable Scalapack required Someday — A common storage format — Would make ACID much easier, …
9
9 Visualization Resolution reduction — Using “explain” Choose the rendering automatically — Decision tree Smart prefetch Integrate with SciDB backend and Stanford visualizer front end
10
10 High Velocity Big pattern – little state — Find me a “banana” followed within 10 msec by a strawberry — Historically CEP Big state – little pattern — Assemble my global real-time risk — Main memory DBMS
11
11 High Velocity Lots of commonality between CEP and MM DBMS We are adding queues/windows to H-Store It’s clear we will do ACID – CEP as fast as CEP I predict the death of CEP
12
12 High Velocity – Other Predictions Death of Aries — Command logging much faster than data logging Death of disk-oriented OLTP data bases — H-store with anti-caching is wildly faster than MySQL with or without MemcacheD Trying an emulator for “son of flash” — Will make MM DBMSs even more attractive
13
13 Many Core 1000 cores will give major heartburn to all system software — Traditional DBMSs will collapse DBMSs cannot have shared data structures — H-Store approach Move the computation — Hardware-supported “move” — New concurrency control algorithms (revival of Dora?)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.