Download presentation
Presentation is loading. Please wait.
Published byAbraham Stevenson Modified over 8 years ago
1
Database Systems Carlos Ordonez
2
What is “Database systems” research? Input? large data sets, large files, relational tables How? Fast external algorithms; RAM-efficient data structures at two storage levels Efficiency? Desirable O(n) I/O Hardware? Small computer, single server, parallel DBMS server, parallel cluster; 1 disk, RAID Infrastructure? DBMS, parallel system Boring? Theory+programming
3
Database systems research today Transaction processing? done Efficient querying? done Fast external algorithms? Simple tasks. Parallel computation? Well proven DBMS shared- nothing, but still many challenges (big data). Exploiting new hardware? Difficult, low level Analyzing? Most difficult: data mining, statistics Future? Big data
4
DB Systems involves Core CS research: Theory+Programming Theory we use: –Time complexity, I/O cost models –Large data structures; especially external –Relational model is here to stay –Multivariate statistics, machine learning, discrete math –Numerical methods: linear algebra, optimization –Compilers: parsing/compiling/optimizing code; recursion Programming (even some hacking): –Systems in a broad sense –Languages: C, C++; efficiency, pointers, legacy systems code; Java, C# mainly for portability –Numerical libraries like LAPACK, OS thread libraries –DBMS SQL UDFs API with C, C++, C#
5
Research topics GOAL: Integrating statistical and machine learning algorithms with a DBMS (external algorithms, queries, UDFs) Difference with machine learning algorithms: Size, external algorithms (small RAM), queries, low level optimization, generally simpler models Main topics by students: –Zhibo Chen: OLAP cubes, parametric statistical tests, cube ops on flash memory –Mario Navas, Naveen Mohanam: Singular Value Decomposition for PCA and ML Factor Analysis, data summarization on multicore CPUs –Carlos Garcia-Alvarado: keyword search across docs and db, ranking, query recommendation –Sasi Pitchaimalai: Bayesian classification, multithreaded summarization –Wellington Cabrera: stochastic search variable selection on high dimensional data, SVD on high-d data –David Matusevich: Hybrid EM and MCMC mixture models on large data sets, database transformations for data mining
6
Representative problems OLAP cubes Finding predictive association rules Bayesian classification Cluster, PCA and regression
7
Why is our database systems research “cool”? Theory+Programming Optimization, O(f(n)), systems (external data structures, discrete math, compiler, OS) Goes from hardware-level stuff (multi-core, cache memory), to high-level query optimization in SQL Database systems techniques are used in search engines like Google and Yahoo (and vice-versa) DBMS technology used everywhere
8
Why join DBMS group? Balance between theory (math) and programming We target “DB systems” conferences: ACM SIGMOD and “IR/DM” conferences ACM CIKM (IR+DB+DM) Mature and stable CS research area Job/internship: many opportunities in DBMS and search engines; Job security on any large company Visit my web page, DBLP. Google “Ordonez SQL”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.