DB storage architectures: Rows and Columns COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
Basic row-based storage 8 32 4 2 = 50 Id BIGINT Name CHAR(32) Age INT Gender SMALLINT Birthday DATE
Basic row-based storage 8 32 4 2 = 50 Id BIGINT Name CHAR(32) Age INT Gender SMALLINT Birthday DATE 50 Id BIGINT Name CHAR(32) Age INT Gender SMALLINT Birthday DATE 100 Id BIGINT Name CHAR(32) Age INT Gender SMALLINT Birthday DATE 150 Id BIGINT Name CHAR(32) Age INT Gender SMALLINT Birthday DATE
Basic row-based storage 8 32 4 2 = 50 Id BIGINT Name CHAR(32) Age INT Gender SMALLINT Birthday DATE 50 Id BIGINT Name CHAR(32) Age INT Gender SMALLINT Birthday DATE 100 Id BIGINT Name CHAR(32) Age INT Gender SMALLINT Birthday DATE 150 Id BIGINT Name CHAR(32) Age INT Gender SMALLINT Birthday DATE READ 32 bytes at positions (0 + 8), (50 + 8), (100 + 8), (150 + 8)
Row-based storage: variable lengths 8 0 - 255 4 2 = 18 - 273 Id BIGINT URL VARCHAR(255) Size INT Code SMALLINT Fetched DATE How do you walk through all the URLs? No longer at fixed offsets
Row-based storage: variable lengths 8 0 - 255 4 2 = 18 - 273 Id BIGINT URL VARCHAR(255) Size INT Code SMALLINT Fetched DATE Data stored in fixed-sized pages on disk E.g., typically 8K in PostgreSQL Page includes metadata and actual data items Items = indexes, data rows
Row-based storage: variable lengths https://www.postgresql.org/docs/9.5/static/storage-page-layout.html ItemIdData: [58, 101), (102, 290), (291, 59)] 58 Id BIGINT URL VARCHAR(255) Size INT Code SMALLINT Fetched DATE 102 Id BIGINT URL VARCHAR(255) Size INT Code SMALLINT Fetched DATE 291 Id BIGINT URL VARCHAR(255) Size INT Code SMALLINT Fetched DATE
Row-based storage: variable lengths
Types of database workloads OLTP = OnLine Transaction Processing Write-heavy Transactions OLAP = OnLine Analytical Processing Read-heavy Analytical scans or “rollups” along column “SELECT AVG(latency) FROM system WHERE time > now() – interval(“1h”)
Comparison of disk layouts Row-oriented layout Id BIGINT Name CHAR(32) Age INT Gender SMALLINT Birthday DATE Id BIGINT Name CHAR(32) Age INT Gender SMALLINT Birthday DATE Id BIGINT Name CHAR(32) Age INT Gender SMALLINT Birthday DATE Column-oriented layout Id BIGINT Name CHAR(32) Age INT Particularly good for compression, especially for long runs of identical numbers or small deltas
Good discussion of benefits of columns... http://db.csail.mit.edu/projects/cstore/vldb.pdf http://db.csail.mit.edu/projects/cstore/abadi-sigmod08.pdf