ML & DB – Options for Integration

Slides:



Advertisements
Similar presentations
Implementing Tableau Server in an Enterprise Environment
Advertisements

The Internet of Riedwaan Bassadien Platform Strategy Manager Microsoft Everything Your things.
IBM Software Group ® Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Summary Role of Software (1 slide) ARCS Software Architecture (4 slides) SNS -- Caltech Interactions (3 slides)
Database Software File Management Systems Database Management Systems.
Components and Architecture CS 543 – Data Warehousing.
Accelerated Access to BW Al Weedman Idea Integration.
1 Cisco Public © 2006 Cisco Systems, Inc. All rights reserved. CBSW 2006 CISCO SERVICES DELIVER RECURRING REVENUE STREAMS © 2006 Cisco Systems, Inc. All.
VAP What is a Virtual Application ? A virtual application is an application that has been optimized to run on virtual infrastructure. The application software.
Module 13: Configuring Availability of Network Resources and Content.
Online Database Support Experiences Diana Bonham, Dennis Box, Anil Kumar, Julie Trumbo, Nelly Stanfield.
Dr. Russell Anderson Dr. Musa Jafar West Texas A&M University.
Data Warehouse Management March 13, 2000 Prof. Hwan-Seung Yong Dept. of CSE, Ewha Womans Univ. The Case for Data Warehousing.
material assembled from the web pages at
Numbers Working with negative numbers. Numbers There are many different ways of adding and subtracting negative numbers; we all have different methods.
Active Directory Maryam Izadi. Topics Covered NT Vs 2000/2003 Active Directory LDAP MMC.
Soup-2-Nuts Alaska Department of Fish & Game Commercial Fisheries October, 2011.
Intro – Part 2 Introduction to Database Management: Ch 1 & 2.
The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
 Understand the basic definitions and concepts of data warehouses  Describe data warehouse architectures (high level).  Describe the processes used.
By Rashid Khan Lesson 6-Building a Directory Service.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Study on “Secure In-VM Monitoring Using Hardware Virtualization” Qiang.Guan Dependable Computing System Lab New Mexico Tech.
Chapter 4 Review of Software Process Models Please answer the seven Blue Questions in the following slides. Bring the answers to class on Monday June 13.
What we mean by Big Data and Advanced Analytics
READ ME FIRST Use this template to create your Partner datasheet for Azure Stack Foundation. The intent is that this document can be saved to PDF and provided.
Unix Server Consolidation
Business System Development
Telling Stories with Data
CRM has been defined in a multiple ways
Local Area and Wide Area Networks
Delivering Business Insight with SQL Server 2005
A451 Theory – 7 Programming 7A, B - Algorithms.
ARC information in the (top) BDII
Reporting in ORTEC Radu Gabriel Năstase.
Web 3.0 and its Impact on E-Business
Business Transformation
DHCP, DNS, Client Connection, Assignment 1 1.3
Future Concepts Turning Water into Wine
PRODUCT DESIGN.
Local Area and Wide Area Networks
Slides prepared by: Farima Maneshi Professor: Dr. Ahmad Abdollahzadeh
Stop Data Wrangling, Start Transforming Data to Intelligence
SESSION 10.1 Latest developments with TUFMAN
Spatial computing Joshua sommers.
Computing Power and Storage in the Cloud Bring Web Mobility to Construction Estimating MINI-CASE STUDY “Microsoft Azure storage and scalability empower.
Introduction to Databases Transparencies
IS3440 Linux Security Unit 7 Securing the Linux Kernel
Near Real Time ETLs with Azure Serverless Architecture
Delivering an End-to-End Business Intelligence Solution
What's New in eCognition 9
Contract & Client Management
DAT381 Team Development with SQL Server 2005
CRM has been defined in a multiple ways
Blue Sky Thinking Network – Looking Back & Ahead
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Dep. of Information Technology By: Raz Dara Mohammad Amin
Business Intelligence
Best in Class Reporting to What’s Next: A/I Driven Insights
6/17/ :03 AM © 2004 Microsoft Corporation. All rights reserved.
What's New in eCognition 9
What's New in eCognition 9
Next-Generation Experimentation with Self-Driving Laboratories
Business Processes Associate Consultant - Supply Chain Planning - IBP
Igor Stančin, Alan Jović to: {igor.stancin,
Presentation transcript:

ML & DB – Options for Integration Integrated as UDF Separated: As ETL (extract, transform, load) – data wrangling, cleansing, preparation For postprocessing (feeding data out of the database into a ML analysis tool) Example HANA -> SAS Extravagant: Self-tuning databases based on ML

Integration – Yes or No Pros Cons Security Interdisciplinary Opportunity - bringing 2 communities together Drifting consensus vs con Pros Cons ML adds new insights & intelligence can be gained out of existing data Debugging is easier It makes databases more useful We can bring insights to the ML community DB allows tracking /version control of data for ML Opportunity to add provenance correctly to DB Potential to accelerate when combining Opportunity to leverage insights from 50years of DB research for new types of problems Commonalities (next slide) Security Open source vs customers own Integration of software stacks (python, R), dev support, profiling, version control DBA administration Huge spectrum of algorithms with very different compute and storage requirements Immature algorithms Separate communities Compute time of ML is so huge it outweighs the advantage of removing datatransfer overhead time

A closer look at the machine learning algorithms Huge range of algorithms, however they share a lot of common arithmetic For example: dot products Also, observed a common selection of primitives from the database world Joins, map, reduce, flatmap There is an opportunity… CNNs Clustering K-means Bayesian networks Page rank Decision trees Random forests Kernel methods

Perhaps it is just a matter of time? Phase 1: Adding, broadening of functionality Systems become larger, wider Phase 2: Systems integrate more and become more optimized If we are convinced this will happen, we can start looking ahead within the research community Much faster hardware, mature and less ML algorithms

Summary Consensus is currently drifting towards keeping ML and DB separate Opportunity for next year Dagstuhl, or a new workshop series Opportunity to speculatively start investigating Opportunity to apply our own box of tricks to ML algorithms Opportunities for selective algorithms Decision trees and random forests for example