18 Query Execution & Scheduling
Process model
Query execution
- A query plan is compirsed of operators.
- An operator instance is an invocation of an operator on some segment of data.
- A task is the execution of a sequnce of one or more operator instances.
Process model
- A DBMS’s process model defines how the system is architected to support concurrent reequset from a multi-user applicatoin
- A worker is the DBMS component that is responsible for execution task on the behalf of client and returning the result
- Approach #1 Process per worker (prior to existence of pthread)
- replies on OS scheduler
- use shared-memory for global data structures
- A process crash doesn’t take down the entire system.
- example : IBM DB2, Postgres, Oracle
- Approache #2 Process Pool
- A worker uses any process that is free in a pool
- bad for cache locality
- Used by IBM DB2, Postgres
- Approach #4 Thread per worker
- single process with mltiple worker threads
- DBMS has to manage its own scheduling
- May or may not use a dispatcher thread
- thread crash may kill the entire system
- IBM DB2, MSSQL, MySQL, Oracle
- no shared memory / less contex swithc overhead
- most recent DBMS use this unless ones built on postgres
query parallelism
- scheduling
- DBMS decides where, when, how to execute it. DBMS can do this better than OS.
- how many task should it use?
- how many CPU cores should it use?
- What CPU core should the tasks execute on?
- Where should a task store its output?
- inter query parallelism
- run multiple queries concurrently
- won’t save you from 2PL/MVCC/OCC
- intra query parallelism
- improve performance of single query
- Approach #1 intra-operator(horizonal)
- different instance on different data
- Approach #2 inter operator(vertical)
- woker allocation
- Approach #1: one worker per core
- the thread is pinned to that core by OS
- sched_setaffinity
- Approach #2: Multiple worker per core (SAP HANA)
- a pool of workers per core
- improve cpu utilization
- task assignment
- Approach #1: push
- centralized dispather assigns tasks to wokers and monitors their process
- woker notifies dispatcher upon it finish and get a new task
- Approach #2: pull
- no dispatcher, work pulls task, runs it and the next task
- scheduler has to be awared of underlying hardware
data placement
- schedule operators to work on closest CPU cores to data
- memory allocation
- question is where does the OS place the physical memory
- approach #1 interleaving : distribute uniformly across CPUs
- approach #2 first-touch: that the CPU execute the access memory thread
convert logical plan to physical plan
- one OLTP query to one task cause OLTP doesn’t have much parallelism
- static scheduling
- number of threads don’t change
- morsel driven scheduling
- morsel is horizontal data partitin(like a chunk of memory)
- scheduls dynamically
- pull worker
- on worker per core
- hyper
- no dispatcher thread
- threads cooperate
- one lock free task queue
- worker tries to pull task local to its data
- multipe tasks per query
- numa-aware scheduler (SAP HANA)
- pull base scheduling
- each CPU has a working group
- one pool has multiple workers
- group has priority task queue
- seperate watchdog thread reassign task if saturated thread exists
- workers are allowed to steal tasks from other group’s soft task queue
- detect query CPU bound / memory bound and adjust thread pinning
- when socket number grows large (really large), work stealing may not be benifical
- thread groups allow cores to do not just query but other things like networking etc
DBMS is ‘strong-willed independent’
- database machines once beaten by quickly increasing commdity hardware performance
- trend now is speeding up DBMS component by specialized hardware like GPU
- specialized OS may not be worth the software engineer effort
- OS helps but sometimes may not be that helpful