ISSUES THE CLOUD AND DATABASES
WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads than writes, so security and privacy less of an issue Tend to have far greater data needs, so there is a need for more servers The size of the data set grows over time and does not stabilize, so a better fit with expanding cloud server availability Analytical applications often want data from multiple sources, and availability is much better in a cloud environment
MORE ON ANALYTICAL PROCESSING Analytical Data Managements: system attributes Shared nothing works better when access is mostly reads ACID transactions do not need to be enforced as there is no need for a single, global state for all users Generally, statistical results are okay even if some very secure data is not discovered
WHAT IS NEEDED FOR NEW GENERATION OF CLOUD DBS? Focus on making use of broad parallelism and on shifting/expanding set of servers Looser notion of fault tolerance, as there is often no need to restart an interrupted query or if a branch of a query is killed Need to be able to operate on data in multiple formats, encryptions, attribute domains, namespaces, schemas, database products – heterogeneity! Must be able to sit underneath business intelligence systems
HYBRID DATABASES: IS THIS THE ANSWER? Folks don’t want to learn/buy/program new data management products But folks do want commercial grade systems with professional support Would make the transition from transaction apps to analytical apps easier – like with relational data warehousing But would we end up with an inelligant mess?
WHAT ABOUT OBJECT DATABASES? A RETURN? Blending a host language with a query language makes sense when queries involve complex calculations It is easy to extend an o-o language with statistical procedures The encapsulation of o-o languages is a good match with the wide and independent distribution of data in a cloud environment O-O procedures could be built and deployed by distributed volunteers
MOPE ON O-O DBS Partial results could be maintained and kept up to date, with batch updating of raw data only infrequently We know how to build multiple language interfaces to accommodate multiple o-o languages O-O databases are a good match with service- based interfaces – see diagram on page 29
OBJECT-ORIENTED DBS: RELEVANT RESEARCH & DEV. Adaptive query processing and optimization in real time Parallel and distributed database technology Massively parallel systems Shared nothing systems Data management stream technology
PROBLEM: MOST BUSINESS DATA RIGHT NOW IS IN A RELATIONAL FORMAT We don’t have truly massively parallel and distributed query models for relational data We don’t have truly massively parallel and distributed data partitioning for relational data To perform efficient and fluid analytical processing of data in the cloud, we would need to create new links quickly, but we won’t have a focused, fixed schema as we do in standard relational systems Object extensions to relational systems don’t include method encapsulation, only expanded domains
MORE CLOUD ISSUES: CENTRALIZED CONTROL? Is the cloud trusted or anonymous? Trusted, provider-specific commercial cloud solutions are much safer, centrally managed, and optimized as a single network, not as a mesh of networks In many environments, even trusted, centralized environments, many machines are not properly managed and are controlled by immediate users People don’t like their machines being co-opted, and so trust is not enough to guarantee dependibility
MORE ON THE CLOUD: OTHER APPLICATIONS? Is analytical processing the only likely application? There are many data sharing applications There are many applications for selling access to bulk data Data mining is a more focused form of analytical processing, but demands a very precise level of heterogeneity resolution and integration in the case of most medical and financial applications (and others)
DATA MINING Kinds of data (from Data Mining by Han and Kamber) Relational dbs Data warehouses Transaction processing systems Object-relational dbs Time sequence and temporal dbs Spatial dbs Text dbs Multimedia dbs Legacy dbs Data streams The Web…
HETEROGENEITY IN DATABASES: DATA MINING IMPLICATIONS Note how broad the “Web” is on the previous slide Includes countless hand-rolled dbs Includes databases hidden by web development frameworks like Ruby on Rails Includes data accessible only via specific APIs Includes data accessible via XML and Xpath, Xquery technology Includes data stored in proprietary databases for applications like CAD, finance, animation, geography The heterogeneity problem will only be solved by widespread collaboration on unifying standards
MORE ON THE CLOUD: THE FUTURE OF TRANSACTION PROCESSING? Will the rigidly centralized notion of OLTP survive? Corporations are adapting to the cloud incrementally and using middleware to leverage their own clouds With global business comes global data processing, across time zones, and is often managed in a widely distributed fashion There are large corporations that handle financial and retail transactions for other companies Are people warming to the idea of managing their personal and small business data in the cloud, including document and other services?
BUT THE CLOUD IS PROCESS-CENTRIC AND NOT DATA-CENTRIC Is the process vs. data centric issued about to reawaken? The process folks kind of lost… Data is seen more and more as a valuable resource, even if it is only “sold” indirectly More of us are buying multimedia data There are actually 3 models, process and data centric, and encapsulated Some argue that the cloud is actually an encapsulated model and that in fact, data movement is difficult to optimize do to the dynamic nature of the network Object-oriented databases…?