Dr. N. MamoulisAdvanced Database Technologies1 Topics 9-10: Database optimization for modern machines Computer architecture is changing over the years,

Dr. N. MamoulisAdvanced Database Technologies1 Topics 9-10: Database optimization for modern machines Computer architecture is changing over the years, so we have to change our programmes, too! Most database operators, indexes, buffering techniques and storage schemes were developed and optimized over 20 years ago, so we need to re- tune them for the modern reality.

Dr. N. MamoulisAdvanced Database Technologies2 Moore’s Law – Dictionary Term http://info.astrian.net/jargon/terms/m/moore_s_law.html Moore's Law /morz law/ prov. The observation that the logic density of silicon integrated circuits has closely followed the curve (bits per square inch) = 2 (t - 1962) where t is time in years; that is, the amount of information storable on a given amount of silicon has roughly doubled every year since the technology was invented. This relation, first uttered in 1964 by semiconductor engineer Gordon Moore (who co-founded Intel four years later) held until the late 1970s, at which point the doubling period slowed to 18 months. The doubling period remained at that value through time of writing (late 1999). Moore's Law is apparently self-fulfilling. The implication is that somebody, somewhere is going to be able to build a better chip than you if you rest on your laurels, so you'd better start pushing hard on the problem.

Dr. N. MamoulisAdvanced Database Technologies3 Features of Modern Machines and Future Trends CPU speed, memory bandwidth, and disk bandwidth, follow Moore’s Law. On the other hand, memory latency improves by only 1% per year. Memories are becoming larger and slower relatively to memory bandwidth and CPU speed. Most of the time the processor of your PC is idle, waiting for data to be fetched from memory. The role of main memory caches is becoming very important in the overall performance.

Dr. N. MamoulisAdvanced Database Technologies4 Features of Modern Machines and Future Trends (cont’d) Disk bandwidth also follows Moore’s Law. On the other hand, disk seek time improves by only 10% per year. Thus a random disk access today may cost as much as 20-50 sequential accesses.

Dr. N. MamoulisAdvanced Database Technologies5 Features of Modern Machines and Future Trends (cont’d) Another characteristic of modern machines is that their processors (AMD Athlon, Intel P4), have parallel processing units able to process at the same time multiple (e.g., 5-9) independent instructions.

Dr. N. MamoulisAdvanced Database Technologies6 The new bottleneck: Memory Access Memory is now hierarchical: two levels of caching CPU L1 cache L2 cache Main memory CPU die L1 cache-line L2 cache-line Memory page Memory-latency: the time needed to transfer 1 byte from the main memory to the L2 cache. Cache (L2) miss: if the requested data is not in cache and needs to be fetched from main memory Cache-line: The transfer unit from main memory to cache (e.g., L2 cache- line = 128 bytes)

Dr. N. MamoulisAdvanced Database Technologies7 Example of Memory Latency Effects in Databases Employee(id: 4 bytes, name: 20 bytes, age: 4 bytes, gender: 1 byte) Disk/Memory representation: 12000 Mike Chan 30 M 12305 George Best 27 M... 29 bytes

Dr. N. MamoulisAdvanced Database Technologies8 Example of Memory Latency Effects in Databases (cont’d) Example query: how many girls work for the company? SELECT COUNT(*) FROM EMPLOYEE WHERE gender = “F”; What is the cost of this query, assuming that the relation is in main memory? We have to access the gender values, compare them to “F” and add 1 (or 0).

Dr. N. MamoulisAdvanced Database Technologies9 Example of Memory Latency Effects in Databases (cont’d) Accessing a specific address from memory loads also the information around it in the cache-line (in parallel over a wide bus). Example: Accessing the gender of the first tuple: 12000 Mike Chan 30 M 12305 George Best 27 M... 29 bytes MMem: Cache: cache-line (128 bytes)

Dr. N. MamoulisAdvanced Database Technologies10 Example of Memory Latency Effects in Databases (cont’d) Thus a cache-line at a time is accessed from main memory. If the requested information is already in cache (e.g., the gender of the second tuple), main memory is not accessed. Otherwise, a cache-miss occurs.

Dr. N. MamoulisAdvanced Database Technologies11 Effect of cache misses The stride.c program demonstrates how cache misses may dominate the query processing cost. http://www.csis.hku.hk/~nikos/courses/ CSIS7101/stride.c initialize a random memory buffer; sum = 0; ilimit = 200000; //tentative for(i=0; i<ilimit; i+=stride) sum += buffer[i]; printf("sum=%d\n",sum); stride=5 +

Dr. N. MamoulisAdvanced Database Technologies12 Effect of cache misses (cont’d)

Dr. N. MamoulisAdvanced Database Technologies13 Re-designing the DBMS We need to redesign the DBMS in order to face the new bottleneck: memory access New storage schemes are proposed to minimize memory and disk accesses. Query processing techniques are redesigned to take under consideration the memory-latency effects. Algorithms are changed to take advantage of the intraparallelism of instruction execution. New instruction types are used (e.g., SIMD instructions).

Dr. N. MamoulisAdvanced Database Technologies14 Problems in optimizing the DBMS Programming languages do not have control over the replacement policy of memory caches. These are determined by the hardware. As a result, we cannot apply buffer management techniques for memory cache management. Also, simple instructions generated by programming languages cannot control the parallel instruction execution capabilities of the machine. In many cases, we re-write the programs, trying to “fool” the page replacement policy, and the instruction execution in order to make them more efficient.

Dr. N. MamoulisAdvanced Database Technologies15 The N-ary storage model The N-ary storage model (NSM) stores the information for each tuple as a sequence of bytes. Disk/Memory representation: 12000 Mike Chan 30 M 12305 George Best 27 M... 29 bytes

Dr. N. MamoulisAdvanced Database Technologies16 A Decomposition Storage Model (DSM) The table is vertically decomposed, and information for each attribute is stored sequentially. 12000 Mike Chan 12305 George Best... 12000 30 12305 27... 12000 M 12305 M... a1 a2a3

Dr. N. MamoulisAdvanced Database Technologies17 Properties of DSM The relation is decomposed into many binary tables, one for each attribute. The attributes of a binary table are a surrogate id and the attribute from the original relation. The surrogate-id is necessary to bring back together information for a tuple, by joining the binary tables. Example: Print the information for tuple with id=12305 SELECT * FROM EMPLOYEE WHERE id=12305 NSM SELECT id,name,age,gender FROM a1,a2,a3 WHERE a1.id=12305 AND a2.id = a1.id AND a3.id = a1.id DSM

Dr. N. MamoulisAdvanced Database Technologies18 Properties of DSM (cont’d) Advantages of DSM If the relation has many attributes, but queries involve only few attributes, then:  Much less information is read from disk, than in NSM  Cache misses are fewer than in NSM, because the stride is smaller Projection queries are very fast Disadvantages of DSM If a tuple needs to be reconstructed, this will require many joins. The size of the decomposed relation is larger than the original due to the replication of the surrogate key.

Dr. N. MamoulisAdvanced Database Technologies19 A Decomposition Model with Unary tables (Monet) 1 1 M F... value of first surrogate size of attribute type in bytes Given the surrogate sid of a tuple, we can compute its attribute value v by: v = *(table_address+(sid - first_surrogate)*size) not materialized

Dr. N. MamoulisAdvanced Database Technologies20 Partition Attributes Across Attempts to combine advantages of both NSM and DSM, while avoiding their disadvantages. The data are split to pages, like in NSM, but the organization in each page is like DSM. Thus: Cache misses are minimized because the information for a specific attribute is stored compactly in the page. Record reconstruction cost is low, since the tuples are actually stored like in NSM on disk, but vertically partitioned in each page, where the reconstruction cost is minimal.

Dr. N. MamoulisAdvanced Database Technologies21 Mike Chan George Best How do pages look in each schema? PAGE HEADER 12000 Mike Chan30 M 12305 George Best 27 M... PAGE HEADER 54647 John Kit33 M 94765 Flora Ho 27 F... NSM PAGE HEADER 12000 Mike Chan12305 Best... PAGE HEADER 12000 12305 27 54647 33 94765 27... DSM George PAGE HEADER... 30 PAGE HEADER... PAGE HEADER... PAX many tables 12000 12305... 30 27... John Kit Flora Ho PAGE HEADER... 54647 94765... 33 27...

Dr. N. MamoulisAdvanced Database Technologies22 Comparison between PAX and NSM SELECT AVG(age) FROM EMPLOYEE WHERE id<20000 PAGE HEADER 12000 Mike Chan30 M 12305 George Best 27 M... NSM Mike Chan George Best PAGE HEADER... PAX 12000 12305... 30 27... Cache 12000 Mike Chan 30 M Cache 12000 12305... 30 27... irrelevant data are fetched to cache relevant data are fetched to cache

Dr. N. MamoulisAdvanced Database Technologies23 SELECT AVG(age) FROM EMPLOYEE WHERE name=“M*” Comparison between PAX and DSM Mike Chan George Best PAGE HEADER... PAX 12000 12305... 30 27... Cache 12000 Mike Chan 12305 George... Cache Mike Chan George... 30... join is avoided PAGE HEADER 12000 Mike Chan12305 Best... DSM George PAGE HEADER 12000 12305 27 54647 33 94765 27... 30qualifying ids have to be joined with other pages 12000... join 12000...

Dr. N. MamoulisAdvanced Database Technologies24 Is PAX always better than DSM? NO. If the relation has many attributes (e.g., 10) and the query involves only few (e.g., 2) then the join may be more beneficial, than reading whole tuples in memory. Example: R(a1,a2,a3,..., a10) Each attribute is 4 bytes long. DSM: D1(id,a1), D2(id,a2),...,D10(id,a10). Query: SELECT avg(a2) FROM R WHERE a1=x; Query selectivity is very high 1%. The qualifying ids can fit in memory PAX has to read the whole table=40bytes  |R| DSM has to read D1, apply the query, and create an intermediate table X with qualifying ids (in memory). Then it has to read D2 to get the qualifying a2 that join with X. So the total bytes DSM reads from disk are 16bytes  |R|.

Dr. N. MamoulisAdvanced Database Technologies25 Summary In modern computer architectures the bottleneck is memory access. In many cases the processor is waiting for data to be fetched from memory. Memory access is no longer “random”. When we access a location in disk then data around it are loaded to a fast memory chip (cache). Access locality is very important. Database operators, storage and buffering schemes, indexes are optimized for the reduction of cache misses.

Dr. N. MamoulisAdvanced Database Technologies26 References George P. Copeland, Setrag Khoshafian, A Decomposition Storage Model. SIGMOD, 1985. Peter A. Boncz, Stefan Manegold, Martin L. Kersten, Database Architecture Optimized for the New Bottleneck: Memory Access. VLDB, 1999. Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, Marios Skounakis: Weaving Relations for Cache Performance, VLDB, 2001. P. A. Boncz. Monet: A Next-Generation DBMS Kernel For Query-Intensive Applications. PhD dissertation. Universiteit van Amsterdam, May 2002.

Dr. N. MamoulisAdvanced Database Technologies1 Topics 9-10: Database optimization for modern machines Computer architecture is changing over the years,

Similar presentations

Presentation on theme: "Dr. N. MamoulisAdvanced Database Technologies1 Topics 9-10: Database optimization for modern machines Computer architecture is changing over the years,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dr. N. MamoulisAdvanced Database Technologies1 Topics 9-10: Database optimization for modern machines Computer architecture is changing over the years,

Similar presentations

Presentation on theme: "Dr. N. MamoulisAdvanced Database Technologies1 Topics 9-10: Database optimization for modern machines Computer architecture is changing over the years,"— Presentation transcript:

Similar presentations

About project

Feedback