13 Database Checkpoints
2018年3月1日
14:31
In-Memory Checkpoints
Shared Memory Restarts
Logging allows the DBMS to recover the database
after a crash/restart. But this system will have to
replay the entire log each time.
Checkpoints allows the systems to ignore large
segments of the log to reduce recovery time.
IN-MEMORY CHECKPOINTS
There are different approaches for how the DBMS
can create a new checkpoint for an in-memory
database.
The choice of approach in a DBMS is tightly
coupled with its concurrency control scheme.
The checkpoint thread(s) scans each table and
writes out data asynchronously to disk.
IDEAL CHECKPOINT PROPERTIES
Do not slow down regular txn processing.
Do not introduce unacceptable latency spikes.
Do not require excessive memory overhead.
CONSISTENT VS. FUZZY CHECKPOINTS
Approach #1: Consistent Checkpoints
→ Represents a consistent snapshot of the database at some
point in time. No uncommitted changes.
→ No additional processing during recovery.
Approach #2: Fuzzy Checkpoints
→ The snapshot could contain records updated from
transactions that have not finished yet.
→ Must do additional processing to remove those changes.
CHECKPOINT CONTENTS
Approach #1: Complete Checkpoint
→ Write out every tuple in every table regardless of
whether were modified since the last checkpoint.
Approach #2: Delta Checkpoint
→ Write out only the tuples that were modified since the
last checkpoint.
→ Can merge checkpoints together in the background
FREQUENCY
Taking checkpoints too often causes the runtime
performance to degrade.
But waiting a long time between checkpoints is
just as bad.
Approach #1: Time-based
Approach #2: Log File Size Threshold
Approach #3: On Shutdown (always!)
IN-MEMORY CHECKPOINTS
Approach #1: Naïve Snapshots
Approach #2: Copy-on-Update Snapshots
Approach #3: Wait-Free ZigZag(not used)
Approach #4: Wait-Free PingPong(not used)
NAÏVE SNAPSHOT
Create a consistent copy of the entire database in a
new location in memory and then write the
contents to disk.
Two approaches to copying database:
→ Do it yourself (tuple data only).
→ Let the OS do it for you (everything).
HYPER FORK SNAPSHOTS
Create a snapshot of the database by forking the
DBMS process.
→ Child process contains a consistent checkpoint if there
are not active txns.
→ Otherwise, use the in-memory undo log to roll back txns
in the child process.
Continue processing txns in the parent process.
COPY-ON-UPDATE SNAPSHOT
During the checkpoint, txns create new copies of
data instead of overwriting it.
→ Copies can be at different granularities (block, tuple)
The checkpoint thread then skips anything that
was created after it started.
→ Old data is pruned after it has been written to disk
VOLTDB CONSISTENT CHECKPOINTS
A special txn starts a checkpoint and switches the
DBMS into copy-on-write mode.
→ Changes are no longer made in-place to tables.
→ The DBMS tracks whether a tuple has been inserted,
deleted, or modified since the checkpoint started.
A separate thread scans the tables and writes tuples
out to the snapshot on disk.
→ Ignore anything changed after checkpoint.
→ Clean up old versions as it goes along.
CHECKPOINT IMPLEMENTATIONS
Bulk State Copying
→ Pause txn execution to take a snapshot.
Locking / Latching
→ Use latches to isolate the checkpoint thread from the
worker threads if they operate on shared regions.
Bulk Bit-Map Reset:
→ If DBMS uses BitMap to track dirty regions, it must
perform a bulk reset at the start of a new checkpoint.
Memory Usage:
→ To avoid synchronous writes, the method may need to
allocate additional memory for data copies.
Deal with regular restart when it is a large database
OBSERVATION
Not all DBMS restarts are due to crashes.
→ Updating OS libraries
→ Hardware upgrades/fixes
→ Updating DBMS software
Need a way to be able to quickly restart the DBMS
without having to re-read the entire database from
disk again.
FACEBOOK SCUBA FAST RESTARTS
Decouple the in-memory database lifetime from
the process lifetime.
By storing the database shared memory, the DBMS
process can restart and the memory contents will
survive.
FACEBOOK SCUBA
Distributed, in-memory DBMS for time-series
event analysis and anomaly detection.
Heterogeneous architecture
→ Leaf Nodes: Execute scans/filters on in-memory data
→ Aggregator Nodes: Combine results from leaf nodes
SHARED MEMORY RESTARTS
Approach #1: Shared Memory Heaps
→ All data is allocated in SM during normal operations.
→ Have to use a custom allocator to subdivide memory
segments for thread safety and scalability.
→ Cannot use lazy allocation of backing pages with SM.
Approach #2: Copy on Shutdown
→ All data is allocated in local memory during normal
operations.
→ On shutdown, copy data from heap to SM.
FACEBOOK SCUBA FAST RESTARTS
When the admin initiates restart command, the
node halts ingesting updates.
DBMS starts copying data from heap memory to
shared memory.
→ Delete blocks in heap once they are in SM.
Once snapshot finishes, the DBMS restarts.
→ On start up, check to see whether the there is a valid
database in SM to copy into its heap.
→ Otherwise, the DBMS restarts from disk.