Four Problem Areas Phil Bernstein Microsoft Research
List of Problems Ad hoc querying of non-integrated databases. Database as a service. Hardware/software co-design Dealing with the tidal wave of data
Ad hoc querying of non-integrated data Enterprise data sources: data warehouses, SQL databases, key-value stores, spreadsheets, SharePoint sites, event logs, internet sites, etc. To answer a given business question, many of these sources may be relevant Top 10 product improvements that would most please my top 25 customers. Requires looking at call center logs, blogs, product marketing input, search logs on the company's support website, etc. Today, this is much too time-consuming for a data analyst to figure out We need better tools for data discovery, data profiling, deriving relationships and other semantics, dealing with dirty data, and formulating queries. We have components technologies. Why is it still so hard?
Database as a Service For a large fraction of users, a database will be a service. What are the best forms for this to take? How best to support multi-tenancy? Resource governance within a VM or across VM’s Multi-tenants within a database (e.g., tenant-specific query processing, schema diversity and evolution, and security) Minimize labor in keeping the service up What to measure? How to use the data? Fast root-cause analysis of problems.
Software/Hardware Co-Design for Databases Trends that are driving hardware innovation for databases Limited improvements in single-threaded performance Custom hardware development by cloud providers Popularity of database appliances Areas that could benefit from custom hardware Transaction processing, encryption, query execution Hardware technology that’s worth considering Transactional memory, fast datacenter networks, GPU’s, FPGA’s
Dealing with the Tidal Wave of Data Internet of things, mobile phones, massive on-line games, telemetry from industrial systems The scenario: Process some as a virtual world of objects Process some as streams Save data for later, off-line analysis Problems – programming model, scalability, fault tolerance, security, geo-distribution, …