Notes on : 15-721 (2018) MVCC
why?
As many lectures missing corresponding notes in the schedule page, I’ll place some notes here for those. I guess nobody wants to go through the slides or watch the video again if he feels he forgot something.
what?
These notes mainly originate from slides and are possibly mixed with materials mentioned in video lecture. May combine official notes if provided in the future.
Notes:
MVCC - Part I - Basic - 05
- Compare and swap(nothing new)
- Isolation levels:
-
Def.:Controls the extent that a txn is exposed to the actions of other concurrent txns. That is what a transaction wants itself to think itself to be. Lay no restrictions on other concurrent transactions.
- As typical isolation levels is made based on 2PL system, there’re two other isolation levels out there.
- Cursor stability
The DBMS’s internal cursor maintains a lock on a item in the database until it moves on to the next item. CS is a stronger isolation level in between REPEATABLE READS and READ COMMITTED that can (sometimes) prevent the Lost Update Anomaly. Possibly this runs into lost update anomaly. - SNAPSHOT ISOLATION (SI)
Guarantees that all reads made in a txn see a consistent snapshot of the database that existed at the time the txn started.- A txn will commit under SI only if its writes do not conflict with any concurrent updates made since that snapshot.
- SI is susceptible to the Write Skew Anomaly. Say txn A mod 12 and read 34, txn B mod 34 and read 12. There’s no conflict in writes but this is not a serializable schedule.
-
- MVCC(Teh best choice for mixed workloads, OLAP & OLTP)
-
Def.:The DBMS maintains multiple physical versions of a single logical object in the database:
→ When a txn writes to an object, the DBMS creates a new version of that object.(create on write)
→ When a txn reads an object, it reads the newest version that existed when the txn started.
- Benefit:
- Writes don’t block read
- Read-only txns can read a snapshot without acquiring locks
- Easily support time-travel queries
- CONCURRENCY CONTROL with MVCC
- Approach #1: Timestamp Ordering
→ Assign txns timestamps that determine serial order.
→ Considered to be original MVCC protocol.
→ Example:
• Use a read-ts field in the header to keep track of the timestamp of the last transaction that read it.
• Transaction is allowed to read version if the lock is unset and its Tid is between begin-ts and end-ts.
• For writes, transaction creates a new version if no other transaction holds lock and Tid is greater than read-ts. - Approach #2: Optimistic Concurrency Control
→ Three-phase protocol from last class.
→ Use private workspace for new versions. - Approach #3: Two-Phase Locking
→ Txns acquire appropriate lock on physical version before they can read/write a logical tuple.
- Approach #1: Timestamp Ordering
- VERSION STORAGE
- The DBMS uses the tuples’ pointer field to create a latch-free v rsion chain per logical tuple.
→ This allows the DBMS to find the version that is visible to a particular txn at runtime.
→ Indexes always point to the “head” of e chain. Threads store versions in “local” memory regions to avoid content on centralized data structures. Different storage schemes determine where/what to store for each version.- Approach #1: Append-Only Storage:
New versions are appended to the same table space. - Approach #2: Time-Travel Storage:
Old versions are copied to separate table space. - Approach #3: Delta Storage :
The original values of the modified attributes are copied into a separate delta record space.
- Approach #1: Append-Only Storage:
- The DBMS uses the tuples’ pointer field to create a latch-free v rsion chain per logical tuple.
- GARBAGE COLLECTION
- The DBMS needs to remove reclaimable physical versions from the database over time. A physical version is reclaimable if :
-
No active txn in the DBMS can “see” t version (SI).
-
The version was created by an aborted txn.
-
-
Two additional design decisions:
-
How to look for expired versions?
-
How to decide when it is safe to reclaim memory?
-
- Detection
- Approach #1: Tuple-level
→ Find old versions by examining tuples directly
→ Background Vacuuming vs. Cooperative Cleaning - Approach #2: Transaction-level
→ Txns keep track of their old versions so the DBMS does not have to scan tuples to determine visibility.
- Approach #1: Tuple-level
- The DBMS needs to remove reclaimable physical versions from the database over time. A physical version is reclaimable if :
- INDEX MANAGEMENT(from pdf note)
- Primary key indexes always point to the version chain head. If a transaction updates a primary key attribute(s), then this is treated as a DELETE followed by an INSERT.
- Secondary Indexes
- Approach #1: Logical Pointer
- Use a fixed identifier per tuple that does not change
- Requires an extra indirection layer
- Secondary indexes can store Primary Key or Tuple ID
- Approach #2: Physical Address
- Pointer physical points to tuple
- a use case on uber switch from postgres to mysql due to secondary index implementation.
- Approach #1: Logical Pointer
- Transaction Id Wraparound(from pdf note)
- If the DBMS reaches the max value for its timestamps, it will have to wrap around and start at zero. This will make all previous versions be in the “future” from new transactions.
- Postgres Txn-ID Wraparound
- Stop accepting new commands when the system gets close to the max transaction id.
- Set a flag in each tuple header that says that it is “frozen” in the past. Any new transaction id will always be newer than a frozen version.
- Runs the vacuum before the system gets close to this upper limit.
-
MVCC Part II - MVCC case study(Hekaton, HyPer, CMU Cicada) - 06
Microsoft Hekaton
A new in-memory OLTP engine for Microsoft SQL Server started in 2008.
CONCURRENT CONTROL
Txn has a being ts and an end ts.
Each tuple has a begin & end ts as well.
Tuple’s begin ts can be the end ts of a committed txn that created it or a begin ts of an active txn that creates it.
Tuple’s end ts can be inf which means the txns creates it hasn’t committed, begin ts of an active txn that creates the next version or the end ts of the committed txn that created it.
There’s a bit in txnid that stands for committed or uncommitted txns. This saves look ups into txn states.
Every txn gets a ts on ‘begin’ and receive another on ‘commit’ or ‘abort’.
On reading tuples, tranverse the version chain and compare begin/ end ts with the txn’s begin ts.
On updating, abort if the newest tuple version is not committed by a previous txns. Or mark the end ts as the txn’s begin ts, create a new version and mark the begin ts with the txn’s.
In short, it allows speculative reads, aborts on speculative writes.
Hekaton maintains a transaction state map globally.
Txns has 4 states:
Active
Validating
Commited: txn hasn’t update all of its version ts yet
Terminated: all ts has been updated
Txn meta-data:
Read set : ptrs to version version read
Write set : ptrs to update(old and new), insert(ne w), delete(old)
Scan set: info to perform scan to make sure input doesn’t change. Occ does change on its own txn copy, but mvcc doesn’t. this could be used to insure tuples the txn sees not updated/deleted/created
Commit dependency:
Txns wait for this to finish
Txn validation:
Read stability:
Make sure that each version read is still visible at the end of transaction.
Phantom avoidance:
Repeat each scan to check whether new versions have become visible since txn began.
On serializable isolation level, both are needed.
On repeatable reads, read stability is needed.
On snapshot isolation and read committed, none is needed.
These two checks especially phantom avoidance which need to repeat the scan need a while to finish so Hekaton is not very well fit for olap.
HyPer MVCC:for HTAP
Shortcoming of previous db:
Read/scan set validations are expensive if transactions access a lot of data. Appending new versions hurts
the performance of OLAP scans due to pointer chasing and branching. Record level conflict checks may be
too coarse-grained and incur false positives.
HyPer uses column-store with delta record versioning.
→ In Place updates for non-indexed attributes
→ Delete/Insert updates for indexed attributes.
→ Newest-to-Oldest Version Chains
→ No Predicate Locks / No Scan Checks
Avoids write-write conflicts by aborting txns that try to update an uncommitted object. Designed for HTAP workloads.
Transaction has its own version chain rather than shares a global version chain during execution.
On validation, first writer wins below is quote from pdf notes.
Transaction Validation
• First-Writer Wins: The version vector always points to the last committed version. Do not need to
check whether write-sets overlap.
• Check the undo buffers (i.e., delta records) of transactions that committed after the validation transaction
started.
• Compare the committed transaction’s write set for phantoms using Precision Locks [1].
• Only need to store the transaction’s read predicates and not its entire read set.
Version Synopsis
• Store a separate column that tracks the position of the first and last versioned tuple in a block of tuples.
• When scanning tuples, the DBMS can check for strides of tuples without older versions and execute
more efficiently.
Cmu Cicada
Oltp / optimistic mvcc/n2o append-only designed for both low and high contention. Not very well understood. below is quote from pdf notes.
In-memory OLTP engine based on optimistic MVCC with append-only storage (N2O) [3]. Designed to be
scalable for both low and high contention workloads.
• Best-effort in-lining
• Loosely Synchronized Clocks
• Contention Aware validation
• Index Nodes Stored in Tables
Best-Effort In-lining
• Record meta-data is stored in a fixed location.
• Threads will attempt to inline read-mostly version within this meta-data to reduce version chain traversals.
Fast Validation
• Contention-aware Validation: Validate access to recently modified (high contention) records first.
• Early Consistency Check: Pre-validated access set before making global writes.
• If the DBMS knows that most of the recently executed transactions committed successfully, then it
can skip Contention-Aware Validation and Early Consistency check
• Incremental Version Search: Resume from last search location in version list