Presentation is loading. Please wait.

Presentation is loading. Please wait.

ML & DB – Options for Integration

Similar presentations


Presentation on theme: "ML & DB – Options for Integration"— Presentation transcript:

1 ML & DB – Options for Integration
Integrated as UDF Separated: As ETL (extract, transform, load) – data wrangling, cleansing, preparation For postprocessing (feeding data out of the database into a ML analysis tool) Example HANA -> SAS Extravagant: Self-tuning databases based on ML

2 Integration – Yes or No Pros Cons Security
Interdisciplinary Opportunity - bringing 2 communities together Drifting consensus vs con Pros Cons ML adds new insights & intelligence can be gained out of existing data Debugging is easier It makes databases more useful We can bring insights to the ML community DB allows tracking /version control of data for ML Opportunity to add provenance correctly to DB Potential to accelerate when combining Opportunity to leverage insights from 50years of DB research for new types of problems Commonalities (next slide) Security Open source vs customers own Integration of software stacks (python, R), dev support, profiling, version control DBA administration Huge spectrum of algorithms with very different compute and storage requirements Immature algorithms Separate communities Compute time of ML is so huge it outweighs the advantage of removing datatransfer overhead time

3 A closer look at the machine learning algorithms
Huge range of algorithms, however they share a lot of common arithmetic For example: dot products Also, observed a common selection of primitives from the database world Joins, map, reduce, flatmap There is an opportunity… CNNs Clustering K-means Bayesian networks Page rank Decision trees Random forests Kernel methods

4 Perhaps it is just a matter of time?
Phase 1: Adding, broadening of functionality Systems become larger, wider Phase 2: Systems integrate more and become more optimized If we are convinced this will happen, we can start looking ahead within the research community Much faster hardware, mature and less ML algorithms

5 Summary Consensus is currently drifting towards keeping ML and DB separate Opportunity for next year Dagstuhl, or a new workshop series Opportunity to speculatively start investigating Opportunity to apply our own box of tricks to ML algorithms Opportunities for selective algorithms Decision trees and random forests for example


Download ppt "ML & DB – Options for Integration"

Similar presentations


Ads by Google