Presentation is loading. Please wait.

Presentation is loading. Please wait.

จัดทำโดย นายชนากานต์ สันติคุณาภรณ์ นายธฤษพงศ์ ศิริบูรณ์ นางสาวศุภาภรณ์ ถ่านคำ.

Similar presentations


Presentation on theme: "จัดทำโดย นายชนากานต์ สันติคุณาภรณ์ นายธฤษพงศ์ ศิริบูรณ์ นางสาวศุภาภรณ์ ถ่านคำ."— Presentation transcript:

1 จัดทำโดย นายชนากานต์ สันติคุณาภรณ์ นายธฤษพงศ์ ศิริบูรณ์ นางสาวศุภาภรณ์ ถ่านคำ

2 A bitmap index is a special kind of database index that uses bit array. Credit : Wikipedia.org

3 To illustrate how bitmap indices work, we use the Titanic database as an example. Titanic is a table containing 2,201 tuples described by four attributes Class, Age, Gender and Survivor (Table 1). A bitmap index built on the Survivor attribute is presented in Table 2.

4 Table 1. Titanic Database RowIDClassAgeGenderSurvivor 11 st AdultFemaleYes 23 rd AdultMaleYes 32 nd ChildMaleYes 43 rd AdultMaleYes 51 st AdultFemaleYes 62 nd AdultMaleNo 71 st AdultMaleYes 8CrewAdultFemaleNo 9CrewAdultFemaleYes 102 nd AdultMaleNo 113 rd AdultMaleNo 12CrewAdultMaleNo …. Table 2. Bitmap Indices for Titanic’s Database RowID 123456789101112.. ClassCrew 000000011001.. 1 st 100010100000.. 2 nd 001001000100.. 3 rd 010100000010.. AgeChild 001000000000.. Adult 110111111111.. GenderFemale 100010011000.. Male 011101100111.. SurvivorNo 000001010111.. Yes 111110101000..

5  Queries are answered using bit-wise operations such as intersection (AND), and union (OR).  For some select queries "SELECT COUNT()...WHERE... AND...” Rowid123456789101112.. Survivor =“YES” Gender = “Male” 111110101000.. 011101100111 AND011100100000.. Bitmap(Survivor=”Yes”) AND Bitmap(Gender=”Male”)

6  The aim is to predict which classes of the Titanic passengers are more likely to survive the wreck. Those passengers are described by different attributes which are:  Class={1 st, 2 nd, 3 rd, Crew};  Age={Adult, Child};  Gender={Female, Male};  Survivor={No, Yes};

7 Bitmap Indices for Titanic’s Database Rowid 123456789101112.. ClassCrew 000000011001.. 1 st 100010100000.. 2 nd 001001000100.. 3 rd 010100000010.. AgeChild 001000000000.. Adult 110111111111.. GenderFemale 100010011000.. Male 011101100111.. SurvivorNo 000001010111.. Yes 111110101000..

8 Yes711 No1490 Yes711 No1490 COUNT1(Bitmap(Survivor = “Yes”)) COUNT1(Bitmap(Survivor = “No”)) COUNT1(Bitmap(Survivor = “Yes”)) COUNT1(Bitmap(Survivor = “No”)) Survivor Survivor =“YES” Gender = “Male” 111110101000.. 011101100111 AND011100100000.. Survivor = “No” Gender = “Male” 000001010111.. 011101100111 AND000001000111.. COUNT1(Bitmap(Gender = “Male”) AND (Bitmap(Survivor = “Yes”)) COUNT1(Bitmap(Gender = “Male”) AND (Bitmap(Survivor = “No”)) COUNT1(Bitmap(Gender = “Male”) AND (Bitmap(Survivor = “Yes”)) COUNT1(Bitmap(Gender = “Male”) AND (Bitmap(Survivor = “No”)) Yes 367 No 1364 Yes 367 No 1364 Yes 344 No 126 Yes 344 No 126 Gender MaleFemale

9  The stored procedure allows us to create the necessary bitmap indices for a given training set and then build the decision tree.  The nodes of the decision tree are built by using an SQL query that is based on the AND operation applied on its own bitmaps and its parent bitmaps.  Then, the obtained And_bitmaps are used to count the population frequency of each class in the node with simple COUNT queries.

10  Let’s  N: the total number of tuples in the training set  K the number of attributes  L: the average length, in bits, of each attribute  A: the average number of values of each attribute  Thus K bitmap indices are created with an average number of A bitmaps for each index. Each bitmap has a size of N bits The size of the initial training set is N ∗ L ∗ K bits The size of the training set is N ∗ A ∗ K bits

11  In terms of time spent to data reading, we consider that a bit is read in one time unit.  The total number of nodes on the i th depth level can be approximated by A i−1. Then, build the whole decision tree, in the classical approach, the reading time is :

12  To evaluate the gain in time, we build the following ratio : R −1 is of complexity G : The polynomials of higher degree : The insignificant

13 Fast operations —The most common operations are the bit-wise logical operations —They are well supported by hardware Easy to compress, potentially small index size Each individual bitmap is small and frequently used ones can be cached in memory Available in most major commercial DBMS [Database Management System]


Download ppt "จัดทำโดย นายชนากานต์ สันติคุณาภรณ์ นายธฤษพงศ์ ศิริบูรณ์ นางสาวศุภาภรณ์ ถ่านคำ."

Similar presentations


Ads by Google