23 Larger-than-Memory Databases
Background
- DRAM is expensive.
- About 40% of electric energy is spent on refreshing DRAM
- It would be nice if in-memory DBMS can use larger and cheaper storage and save more power at the same time
- Here comes larger-than-memory database.
- It allows an in-memory DBMS to store/access data on disk without bringing back all the slow parts of a disk oriented DBMS
- DBMS should be aware of its accessing tuples in memory and pages on disk
- nothing much to de done for OLAP queries since it implies sequential scan. zone map may be an improvement
- OLTP queries have hot and cold portions of accessing the database. DBMS needs a mechanism to move cold data to disk
and quickly retrieve it if needed
Implementation Issues
- runtime operation
- how to identify cold tuples
- do it online. DBMS monitors txns access patterns and tracks how often tuples are used. embed metadata directly in tuples.
- do it offline. DBMS maintains a tuples access log for running txns and process it in a background thread to compute the frequencies
- eviction policy
- timing
- threshold. DBMS monitors memory usage and begins moving when it reaches a threshold. DBMS has to manually move data.
- let the OS virtual memory decides when to move data out to disk
- evict tuple metadata(do not let this happen: tuples being swapped out are considered not in database)
- tombstone. leave a new tuple with less size points to the disk and make index points to this tombstone tuple
- bloom filter. filter says if a tuple is on disk. first check index, if tuple is not there check the filter.
filter can be false positive but no false negative. this may introduce more disk ios. but filter can use less memory than tombstones for a given set of swapped out tuples.
if the filter says yes, go and check on disk index(this is no the same one that in main memory) and find the on disk tuple if it exists
- virtual memory. just let the OS take charge of swap in and out
- data retrieval policy
- granularity
- only tuples needed
- all tuples in block
- retrieval mechanism
- abort-and-restart. when accessing a tuple on disk, abort and retrieve & merge data in a background thread, and restart txn when data has been retrieved.
- synchronized. stall, retrieve, continue
- merging back to memory
- always merge. merge into table heap upon retrieval
- merge only on update. not updated tuples are stored in temporary buffer
- selective merge. merge only if a certain block’s access frequency is above a threshold
Real-world example
- H-Store
- online identification
- administrator defines threshold
- tombstone
- abort and restart
- block level retrieval granularity
- always merge
- Hekaton - project siberia, not ported into MS SQL Server
- offline identification
- administrator defines threshold
- bloom filter(no sequential scan)
- synchronized retrieval
- tuple-level granularity
- always merge
- VoltDB(EPFL prototype)
- offline identification
- OS virtual memory (pinned hot memory address space, unpinned cold memory address space)
- synchronized retrieval
- page level granularity
- always merge
- Apache Geode
- online identification
- administrator defines threshold
- tombstone
- synchronized retrieval
- tuple -level granularity
- merge on update
- MemSQL
Evaluation
things happened in the second last week of april
- foundation DB became open source
- MySQL 8.0 released