Universiteit Utrecht MONET CD Session 9 | Monday 6 June 2005 Lee Provoost.

Slides:



Advertisements
Similar presentations
4.4 Page replacement algorithms
Advertisements

Part IV: Memory Management
Big Data Working with Terabytes in SQL Server Andrew Novick
Multidimensional Data
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
CS 333 Introduction to Operating Systems Class 11 – Virtual Memory (1)
Computer ArchitectureFall 2008 © CS : Computer Architecture Lecture 22 Virtual Memory (1) November 6, 2008 Nael Abu-Ghazaleh.
Multiprocessing Memory Management
1 Virtual Memory vs. Physical Memory So far, all of a job’s virtual address space must be in physical memory However, many parts of programs are never.
CS 300 – Lecture 22 Intro to Computer Architecture / Assembly Language Virtual Memory.
Computer Organization Cs 147 Prof. Lee Azita Keshmiri.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
Computer Organization and Architecture
Memory Management April 28, 2000 Instructor: Gary Kimura.
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 24 Instructor: L.N. Bhuyan
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
By Matthew Smith, John Allred, Chris Fulton. Requirements Relocation Protection Sharing Logical Organization Physical Organization.
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
Review of Memory Management, Virtual Memory CS448.
CS 153 Design of Operating Systems Spring 2015 Final Review.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Cosc 2150: Computer Organization Chapter 6, Part 2 Virtual Memory.
CIS250 OPERATING SYSTEMS Memory Management Since we share memory, we need to manage it Memory manager only sees the address A program counter value indicates.
Lecture Topics: 11/17 Page tables TLBs Virtual memory flat page tables
Memory Management – Page 1 of 49CSCI 4717 – Computer Architecture Memory Management Uni-program – memory split into two parts –One for Operating System.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming  To allocate scarce memory resources.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui COMP 203 / NWEN 201 Computer Organisation / Computer Architectures Virtual.
BI Terminologies.
MIS2502: Data Analytics The Information Architecture of an Organization.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Virtual Memory 1 1.
Lecture 11 Page 1 CS 111 Online Working Sets Give each running process an allocation of page frames matched to its needs How do we know what its needs.
1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?
Lecture 11 Page 1 CS 111 Online Virtual Memory A generalization of what demand paging allows A form of memory where the system provides a useful abstraction.
1 Memory Management. 2 Fixed Partitions Legend Free Space 0k 4k 16k 64k 128k Internal fragmentation (cannot be reallocated) Divide memory into n (possible.
Virtual Memory The memory space of a process is normally divided into blocks that are either pages or segments. Virtual memory management takes.
Review °Apply Principle of Locality Recursively °Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings.
Paging (continued) & Caching CS-3013 A-term Paging (continued) & Caching CS-3013 Operating Systems A-term 2008 (Slides include materials from Modern.
Swap Space and Other Memory Management Issues Operating Systems: Internals and Design Principles.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
ICOM Noack Memory management Virtual memory Paging and segmentation Demand paging Memory management hardware.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
CSCI 156: Lab 11 Paging. Our Simple Architecture Logical memory space for a process consists of 16 pages of 4k bytes each. Your program thinks it has.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
CDA 5155 Virtual Memory Lecture 27. Memory Hierarchy Cache (SRAM) Main Memory (DRAM) Disk Storage (Magnetic media) CostLatencyAccess.
Virtual Memory (Section 9.3). The Need For Virtual Memory Many computers don’t have enough memory in RAM to accommodate all the programs a user wants.
Lesson Objectives Aims Key Words Paging, Segmentation, Virtual Memory
Physical Changes That Don’t Change the Logical Design
Swapping Segmented paging allows us to have non-contiguous allocations
Software Architecture in Practice
Database Performance Tuning and Query Optimization
CSCI206 - Computer Organization & Programming
Page that info back into your memory!
Steve Hood SimpleSQLServer.com
Selected Topics: External Sorting, Join Algorithms, …
CPSC 457 Operating Systems
CSE 451: Operating Systems Autumn 2005 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Chapter 11 Database Performance Tuning and Query Optimization
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Virtual Memory: Working Sets
Lecture Topics: 11/20 HW 7 What happens on a memory reference Traps
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Virtual Memory 1 1.
Presentation transcript:

Universiteit Utrecht MONET CD Session 9 | Monday 6 June 2005 Lee Provoost

Universiteit Utrecht Question Marjolijn “I quite get the picture of the whole structure of Monet and all of his functions and refinements, but what I don’t really see is how this program can be useful for DNA matching. I think of DNA as a lot of data in one database table and what I understand from Monet is that it divides information into several tables with head and tails, but how can this be done with DNA? And what kind of queries would be useful to ask for the case of DNA?”

Universiteit Utrecht Good question! Remarks: -MonetDB not specifically in genome context -Genome “stuff” not (?) implemented yet, but master thesis project on that topic available Two questions: -How is it done in conventional databases? -How does genome processing benefit from MonetDB?

Universiteit Utrecht MySQL implementation BIOPERL project Table: fdna Column1: fref --> reference sequence name (string) Column2: foffset --> offset of this sequence Column3: fdna --> dna sequence (longblob) LONGBLOB  Binary Long Object  Max 4 GB size  DNA sequence split up in segments

Universiteit Utrecht BLASTN implementation 7 files to store all data + meta data 2 files interesting: *.nsq file -Contains actual DNA sequence data -Sequences in binary format and separated by \0 -A = 00, C = 01, G = 10, U|T = 11 -ACGT = \0 *.nin file -Offsets to the beginning of the sequences in the *.nsq file Detailed information:

Universiteit Utrecht MonetDB & DNA? Monet targets query intensive operations like - OLAP (online analytical processing) = analysis of data (for example trend analysis views) - data mining = try to find previously unknown relationships between data, often used for marketing or sales Genome data is also query intensive, so it could benefit from Monet

Universiteit Utrecht Benefit of vertical fragmentation? Conventional databases  OLTP  Single row centric (clustered on disk) Query intensive applications (like OLAP) use a subset of the data (one, two, three columns)  scanning a whole table means retrieving the whole table Vertical fragmentation helps us.... BUT DNA tables have only three columns... So, benefit?

Universiteit Utrecht Benefit of main memory aspect? MonetDB tries to put everything in (virtual) memory to exclude I/O performance penalties. However, conventional database can in some cases also put everything in main memory. But MonetDB is highly fine tuned for main memory usage, that this could give us the better performance. Can’t say how much the benefit of using Monet would be for genome data, benchmark statistics?

Universiteit Utrecht Question Adriano “MIL is not an OO or even a relational language. MIL just provides the minimally complete set of primitives, such that each front-end can adequately map operations on its logical model to the Monet primitives. What does it mean? And how does MIL work between front-end and back-end?”

Universiteit Utrecht MonetDB architecture: extensibility

Universiteit Utrecht Example: SQL front end

Universiteit Utrecht Data mapping

Universiteit Utrecht Question Ingmar “Do you find it logical that they pass information to the operating system to help it with virtual memory management? Isn’t this too OS dependent? Wouldn’t it be better to write their own dedicated Monet OS that handles the virtual memory management? Since performance is such a big issue I assume that you don’t want to run any other processes anyway.”

Universiteit Utrecht Virtual Memory: Introduction > > 32 bits --> 4 GB RAM addressable for processes (64 bit --> 256 TB) Active parts of program & data -> physical RAM Rest --> page file | swap partition When program access data that is not in physical RAM - -> interrupt (page fault) --> system retrieves it from swap partition or page file When physical RAM space shortage --> paging out of inactive data / code

Universiteit Utrecht Do not re-invent the wheel -there are already good and mature OS out there -OS is very complex, much more than just VM management -Acceptance of MonetOS + MonetDB?

Universiteit Utrecht Big DBMS do re-invent parts of OS -Implementation of own buffer pools (Separate buffer pools allows schema objects, like tables and indexes, to be assigned to the appropriate buffer pool to control the way their data blocks age out of cache) -Implementing raw disk I/O that bypasses the OS file system -Built-in thread package, which is fine tuned for database scheduling Reasons?  initial design of DBMS is very old  Conventional DBMS rely heavily on I/O

Universiteit Utrecht Virtual Memory management Conventional DBMS relying on OS VM not good. No knowledge of access pattern of each application. OS VM --> LRU --> bad performance Solution Influence *nix OS VM behavior with mmap & madvise. But what with Windows? POSIX?

Universiteit Utrecht File I/O Disadvantages OS file system: -Not atomic with respect to their files -Fixed block size  but usually only a problem in OLTP Query intensive applications mainly use bulk I/O where the main DBMS demand is high throughput  already good implemented in OS Other advantages or relying on OS: -More portable -Smaller source code base

Universiteit Utrecht Conclusion question Ingmar Dedicated MonetOS COULD give advantages, but is it worth it? Acceptance of MonetOS? Last remark: such applications usually run on dedicated servers, so no worry about other processes.

Universiteit Utrecht OLAP in MS SQL Server OLAP queries are run on data warehouses Data warehouses contain data in a dimensional way (categories of information) Fact tables = A fact table is a table that contains the measure of interest, for example sales. Let’s say we are interested in the sales amount by store by day, then the fact table would have three columns (date, store, amount) Lookup table = Detailed information about the attribute

Universiteit Utrecht Schema Star schema = All lookup tables join directly to the fact table Snowflake schema = Not join to the fact table, but through other lookup tables

Universiteit Utrecht OLAP Cubes