Presentation is loading. Please wait.

Presentation is loading. Please wait.

Universiteit Utrecht MONET CD Session 9 | Monday 6 June 2005 Lee Provoost.

Similar presentations


Presentation on theme: "Universiteit Utrecht MONET CD Session 9 | Monday 6 June 2005 Lee Provoost."— Presentation transcript:

1 Universiteit Utrecht MONET CD Session 9 | Monday 6 June 2005 Lee Provoost

2 Universiteit Utrecht Question Marjolijn “I quite get the picture of the whole structure of Monet and all of his functions and refinements, but what I don’t really see is how this program can be useful for DNA matching. I think of DNA as a lot of data in one database table and what I understand from Monet is that it divides information into several tables with head and tails, but how can this be done with DNA? And what kind of queries would be useful to ask for the case of DNA?”

3 Universiteit Utrecht Good question! Remarks: -MonetDB not specifically in genome context -Genome “stuff” not (?) implemented yet, but master thesis project on that topic available Two questions: -How is it done in conventional databases? -How does genome processing benefit from MonetDB?

4 Universiteit Utrecht MySQL implementation BIOPERL project Table: fdna Column1: fref --> reference sequence name (string) Column2: foffset --> offset of this sequence Column3: fdna --> dna sequence (longblob) LONGBLOB  Binary Long Object  Max 4 GB size  DNA sequence split up in segments

5 Universiteit Utrecht BLASTN implementation 7 files to store all data + meta data 2 files interesting: *.nsq file -Contains actual DNA sequence data -Sequences in binary format and separated by \0 -A = 00, C = 01, G = 10, U|T = 11 -ACGT = 00011011\0 *.nin file -Offsets to the beginning of the sequences in the *.nsq file Detailed information: http://blast.wustl.edu/blast/dbfmts.html

6 Universiteit Utrecht MonetDB & DNA? Monet targets query intensive operations like - OLAP (online analytical processing) = analysis of data (for example trend analysis views) - data mining = try to find previously unknown relationships between data, often used for marketing or sales Genome data is also query intensive, so it could benefit from Monet

7 Universiteit Utrecht Benefit of vertical fragmentation? Conventional databases  OLTP  Single row centric (clustered on disk) Query intensive applications (like OLAP) use a subset of the data (one, two, three columns)  scanning a whole table means retrieving the whole table Vertical fragmentation helps us.... BUT DNA tables have only three columns... So, benefit?

8 Universiteit Utrecht Benefit of main memory aspect? MonetDB tries to put everything in (virtual) memory to exclude I/O performance penalties. However, conventional database can in some cases also put everything in main memory. But MonetDB is highly fine tuned for main memory usage, that this could give us the better performance. Can’t say how much the benefit of using Monet would be for genome data, benchmark statistics?

9 Universiteit Utrecht Question Adriano “MIL is not an OO or even a relational language. MIL just provides the minimally complete set of primitives, such that each front-end can adequately map operations on its logical model to the Monet primitives. What does it mean? And how does MIL work between front-end and back-end?”

10 Universiteit Utrecht MonetDB architecture: extensibility

11 Universiteit Utrecht Example: SQL front end

12 Universiteit Utrecht Data mapping

13 Universiteit Utrecht Question Ingmar “Do you find it logical that they pass information to the operating system to help it with virtual memory management? Isn’t this too OS dependent? Wouldn’t it be better to write their own dedicated Monet OS that handles the virtual memory management? Since performance is such a big issue I assume that you don’t want to run any other processes anyway.”

14 Universiteit Utrecht Virtual Memory: Introduction > 80386 --> 32 bits --> 4 GB RAM addressable for processes (64 bit --> 256 TB) Active parts of program & data -> physical RAM Rest --> page file | swap partition When program access data that is not in physical RAM - -> interrupt (page fault) --> system retrieves it from swap partition or page file When physical RAM space shortage --> paging out of inactive data / code

15 Universiteit Utrecht Do not re-invent the wheel -there are already good and mature OS out there -OS is very complex, much more than just VM management -Acceptance of MonetOS + MonetDB?

16 Universiteit Utrecht Big DBMS do re-invent parts of OS -Implementation of own buffer pools (Separate buffer pools allows schema objects, like tables and indexes, to be assigned to the appropriate buffer pool to control the way their data blocks age out of cache) -Implementing raw disk I/O that bypasses the OS file system -Built-in thread package, which is fine tuned for database scheduling Reasons?  initial design of DBMS is very old  Conventional DBMS rely heavily on I/O

17 Universiteit Utrecht Virtual Memory management Conventional DBMS relying on OS VM not good. No knowledge of access pattern of each application. OS VM --> LRU --> bad performance Solution Influence *nix OS VM behavior with mmap & madvise. But what with Windows? POSIX?

18 Universiteit Utrecht File I/O Disadvantages OS file system: -Not atomic with respect to their files -Fixed block size  but usually only a problem in OLTP Query intensive applications mainly use bulk I/O where the main DBMS demand is high throughput  already good implemented in OS Other advantages or relying on OS: -More portable -Smaller source code base

19 Universiteit Utrecht Conclusion question Ingmar Dedicated MonetOS COULD give advantages, but is it worth it? Acceptance of MonetOS? Last remark: such applications usually run on dedicated servers, so no worry about other processes.

20 Universiteit Utrecht OLAP in MS SQL Server OLAP queries are run on data warehouses Data warehouses contain data in a dimensional way (categories of information) Fact tables = A fact table is a table that contains the measure of interest, for example sales. Let’s say we are interested in the sales amount by store by day, then the fact table would have three columns (date, store, amount) Lookup table = Detailed information about the attribute

21 Universiteit Utrecht Schema Star schema = All lookup tables join directly to the fact table Snowflake schema = Not join to the fact table, but through other lookup tables

22 Universiteit Utrecht OLAP Cubes


Download ppt "Universiteit Utrecht MONET CD Session 9 | Monday 6 June 2005 Lee Provoost."

Similar presentations


Ads by Google