11 System Catalogs & Database Compression
2018年2月27日
21:49
System Catalogs
Almost every DBMS stores their a database's
catalog in itself.
→ Wrap object abstraction around tuples.
→ Specialized code for "bootstrapping" catalog tables.
The entire DBMS should be aware of transactions
in order to automatically provide ACID guarantees
for DDL commands and concurrent txns.
Scheme changes
ADD COLUMN:
→ NSM: Copy tuples into new region in memory.
→ DSM: Just create the new column segment
DROP COLUMN:
→ NSM #1: Copy tuples into new region of memory.
→ NSM #2: Mark column as "deprecated", clean up later.
→ DSM: Just drop the column and free memory.
CHANGE COLUMN:
→ Check whether the conversion is allowed to happen.
Depends on default values.
CREATE INDEX:
→ Scan the entire table and populate the index.
→ Have to record changes made by txns that modified the
table while another txn was building the index.
→ When the scan completes, lock the table and resolve
changes that were missed after the scan started.
DROP INDEX:
→ Just drop the index logically from the catalog.
→ It only becomes "invisible" when the txn that dropped it
commits. All existing txns will still have to update it.
SEQUENCES
Typically stored in the catalog. Used for
maintaining a global counter
→ Also called "auto-increment" or "serial" keys
Sequences are not maintained with the same
isolation protection as regular catalog entries.
→ Rolling back a txn that incremented a sequence does not
rollback the change to that sequence.
Compression Background
I/O is the main bottleneck if the DBMS has to
fetch data from disk.
In-memory DBMSs are more complicated
→ Compressing the database reduces DRAM requirements
and processing.
Key trade-off is speed vs. compression ratio
→ In-memory DBMSs (always?) choose speed.
REAL-WORLD DATA CHARACTERISTICS
Data sets tend to have highly skewed
distributions for attribute values.
→ Example: Zipfian distribution of the Brown Corpus
Data sets tend to have high correlation between
attributes of the same tuple.
→ Example: Zip Code to City, Order Date to Ship Date
DATABASE COMPRESSION
Goal #1: Must produce fixed-length values.
Goal #2: Allow the DBMS to postpone
decompression as long as possible during query
execution.
Goal #3: Must be a lossless scheme
LOSSLESS VS. LOSSY COMPRESSION
When a DBMS uses compression, it is always
lossless because people don’t like losing data.
Any kind of lossy compression is has to be
performed at the application level.
Some new DBMSs support approximate queries
→ Example: BlinkDB, SnappyData, XDB, Oracle (2017)
ZONE MAPS(not compression but can lead to speed up in some cases)
Pre-computed aggregates for blocks of data.
DBMS can check the zone map first to decide
whether it wants to access the block.
COMPRESSION GRANULARITY
Choice #1: Block-level
→ Compress a block of tuples for the same table.
Choice #2: Tuple-level
→ Compress the contents of the entire tuple (NSM-only).
Choice #3: Attribute-level
→ Compress a single attribute value within one tuple.
→ Can target multiple attributes for the same tuple.
Choice #4: Column-level
→ Compress multiple values for one or more attributes
stored for multiple tuples (DSM-only).
Naïve Compression
Compress data using a general purpose algorithm.
Scope of compression is only based on the data
provided as input.
→ LZO (1996), LZ4 (2011), Snappy (2011), Zstd (2015)
Considerations
→ Computational overhead
→ Compress vs. decompress speed.
Choice #1: Entropy Encoding
→ More common sequences use less bits to encode, less
common sequences use more bits to encode.
Choice #2: Dictionary Encoding
→ Build a data structure that maps data segments to an
identifier. Replace those segments in the original data
with a reference to the segments position in the
dictionary data structure.
Mysql innodb compression
Files on disk are fixed size(1,2,4,8k) with mod log and compressed info.
When updated, load the page in buffer pool and append modifications to mod log instead of decompress/update/compress. If need to read from that page, it decompress file page to 16k memory page, and keep both compressed one and decompressed one together in database.
The data has to be decompressed first before it can
be read and (potentially) modified.
→ This limits the “scope” of the compression scheme.
These schemes also do not consider the high-level
meaning or semantics of the data.
We can perform exact-match comparisons and
natural joins on compressed data if predicates and
data are compressed the same way.
→ Range predicates are more tricky…
OLAP Columnar Compression
Run-length Encoding
Compress runs of the same value in a single
column into triplets:
→ The value of the attribute.
→ The start position in the column segment.
→ The # of elements in the run.
Requires the columns to be sorted intelligently to
maximize compression opportunities.
Bitmap Encoding
Store a separate Bitmap for each unique value for a
particular attribute where an offset in the vector
corresponds to a tuple.
→ The i
th position in the Bitmap corresponds to the ith tuple
in the table.
→ Typically segmented into chunks to avoid allocating large
blocks of contiguous memory.
Only practical if the value cardinality is low.
Delta Encoding
Recording the difference between values that
follow each other in the same column.
→ The base value can be stored in-line or in a separate lookup
table.
→ Can be combined with RLE to get even better
compression ratios.
Incremental Encoding
Type of delta encoding whereby common prefixes
or suffixes and their lengths are recorded so that
they need not be duplicated.
This works best with sorted data.
Mostly Encoding
When the values for an attribute are “mostly” less
than the largest size, you can store them as a
smaller data type.
→ The remaining values that cannot be compressed are
stored in their raw form.
Dictionary Encoding
Replace frequent patterns with smaller codes.
Most pervasive compression scheme in DBMSs.
Need to support fast encoding and decoding.
Need to also support range queries.
Design decision
When to construct the dictionary?
Choice #1: All At Once
→ Compute the dictionary for all the tuples at a given point
of time.
→ New tuples must use a separate dictionary or the all
tuples must be recomputed.
Choice #2: Incremental
→ Merge new tuples in with an existing dictionary.
→ Likely requires re-encoding to existing tuples.
What is the scope of the dictionary?
Choice #1: Block-level
→ Only include a subset of tuples within a single table.
→ Potentially lower compression ratio, but can add new
tuples more easily.
Choice #2: Table-level
→ Construct a dictionary for the entire table.
→ Better compression ratio, but expensive to update.
Choice #3: Multi-Table
→ Can be either subset or entire tables.
→ Sometimes helps with joins and set operations.
How do we allow for range queries?
The encoded values need to support sorting in the
same order as original values.
How do we enable fast encoding/decoding?
IMPLEMENTATIONS
Hash Table:
→ Fast and compact.
→ Unable to support range and prefix queries.
B+Tree:
→ Slower than a hash table and takes more memory.
→ Can support range and prefix queries.
SHARED-LEAVES TREES:
Encoding and decoding trees share same tree leaves with originalvalue <-> code