18 Query Execution & Scheduling

Process model

Query execution
  • A query plan is compirsed of operators.
  • An operator instance is an invocation of an operator on some segment of data.
    • Parallelism
  • A task is the execution of a sequnce of one or more operator instances.
Process model
  • A DBMS’s process model defines how the system is architected to support concurrent reequset from a multi-user applicatoin
  • A worker is the DBMS component that is responsible for execution task on the behalf of client and returning the result
  • Approach #1 Process per worker (prior to existence of pthread)
    • replies on OS scheduler
    • use shared-memory for global data structures
    • A process crash doesn’t take down the entire system.
    • example : IBM DB2, Postgres, Oracle
  • Approache #2 Process Pool
    • A worker uses any process that is free in a pool
    • bad for cache locality
    • Used by IBM DB2, Postgres
  • Approach #4 Thread per worker
    • single process with mltiple worker threads
    • DBMS has to manage its own scheduling
    • May or may not use a dispatcher thread
    • thread crash may kill the entire system
    • IBM DB2, MSSQL, MySQL, Oracle
    • no shared memory / less contex swithc overhead
    • most recent DBMS use this unless ones built on postgres
      query parallelism
  • scheduling
    • DBMS decides where, when, how to execute it. DBMS can do this better than OS.
      • how many task should it use?
      • how many CPU cores should it use?
      • What CPU core should the tasks execute on?
      • Where should a task store its output?
    • inter query parallelism
      • run multiple queries concurrently
      • won’t save you from 2PL/MVCC/OCC
    • intra query parallelism
      • improve performance of single query
      • Approach #1 intra-operator(horizonal)
        • different instance on different data
      • Approach #2 inter operator(vertical)
        • executation overlap
  • woker allocation
    • Approach #1: one worker per core
      • the thread is pinned to that core by OS
      • sched_setaffinity
    • Approach #2: Multiple worker per core (SAP HANA)
      • a pool of workers per core
      • improve cpu utilization
  • task assignment
    • Approach #1: push
      • centralized dispather assigns tasks to wokers and monitors their process
      • woker notifies dispatcher upon it finish and get a new task
    • Approach #2: pull
      • no dispatcher, work pulls task, runs it and the next task
  • scheduler has to be awared of underlying hardware
data placement
  • schedule operators to work on closest CPU cores to data
  • memory allocation
    • question is where does the OS place the physical memory
    • approach #1 interleaving : distribute uniformly across CPUs
    • approach #2 first-touch: that the CPU execute the access memory thread
      convert logical plan to physical plan
  • one OLTP query to one task cause OLTP doesn’t have much parallelism
  • static scheduling
    • number of threads don’t change
    • morsel driven scheduling
      • morsel is horizontal data partitin(like a chunk of memory)
      • scheduls dynamically
      • pull worker
      • on worker per core
    • hyper
      • no dispatcher thread
      • threads cooperate
      • one lock free task queue
      • worker tries to pull task local to its data
      • multipe tasks per query
  • numa-aware scheduler (SAP HANA)
    • pull base scheduling
    • each CPU has a working group
    • one pool has multiple workers
    • group has priority task queue
    • seperate watchdog thread reassign task if saturated thread exists
    • workers are allowed to steal tasks from other group’s soft task queue
    • detect query CPU bound / memory bound and adjust thread pinning
    • when socket number grows large (really large), work stealing may not be benifical
    • thread groups allow cores to do not just query but other things like networking etc
      DBMS is ‘strong-willed independent’
  • database machines once beaten by quickly increasing commdity hardware performance
  • trend now is speeding up DBMS component by specialized hardware like GPU
  • specialized OS may not be worth the software engineer effort
  • OS helps but sometimes may not be that helpful