21 Vectorized Query Execution Part I
21 Vectorized Query Execution Part I
vectorization
the process of converting an algorithm’s scalar implementation that processes a single pair of operands at a time , to a vector implementation that process one operation on multiple pairs of operands at once. this is about parallelism of a single core.
some concepts
- multi-core cpu with superscalar and out-of-order execution and high power consumption, i.e. Xeon
- many integrate cores (mic) use a large number of low-power cores with low power consumption i.e. Xeon Phi
- SIMD : a class of instructions that allow cpu to perform the same operation on multiple data points simultaneously
- intel:MMX, SSE, SSE2, SSE3, SSE4, AVX, AVX2, AVX512
- powerpc:altivec
- arm:neon
streaming simd extension(SSE)
- instruction category
- data movement(move in/out)
- arithmetic operation
- logical instruction
- comparision instruction
- shuffle instruction(move between simd regs)
- miscellaneous
- transform data between x86 and simd regs
- cache control:move data from simd reg to memory, bypassing cpu cache
what about GPU ?
- moving data back and forth between DRAM and GPU over PCI-E is slow
- major cermercial database systems don’t support this
- co-processer that cache-coherent with cpu may change current trend
how to achieve vectorization?
- automatic vectorization
- compiler identifies possible rewrite of loops into vectorized operations
- this only works for simple loops
- compiler hints
- provide compiler with additional information to let it know that the code is safe to be vectorized.
- approach#1 explicit info about memory locations
- use ‘restrict’ key work in C++ to tell the compiler that the arrays(say char* restrict a1, char* restrict a2) have distinct locations in memory
- approach#2 tell compiler to ignore vector dependencies
- use ‘#pragma ivdep’ before the for loop
- these two approaches are generally supported by major compilers
- explicit vectorization
- use cpu instructions explicitly
vectorization direction
- use cpu instructions explicitly
- approach#1:horizontal
- perform operation on elements of a single vector
-
‘1 2 3 4’ =>add=>10
- approach#2:vertical
- perform operation on elements of each vector
-
‘1 2 3 4’ add ‘5 6 7 8’ => ‘6 8 10 12’
- selective load/store/gather/scatter can be building blocks for algorithms [jekyll-docs]: http://jekyllrb.com/docs/home [jekyll-gh]: https://github.com/jekyll/jekyll [jekyll-talk]: https://talk.jekyllrb.com/