Notes on : 15-721 (2018) 14 Networking
14 Networking
database access methods
- command line
- direct dbms access
- open database connection(ODBC)
- design to be independent of OS and DBMS
- it’s based on device driver model.
- converts commands to db specific calls.
- java database connection(JDBC)
- can be considered as a java version of ODBC
- methods:
- method 1: JDBC-ODBC bridge : convert JDBC method calls to ODBC calls
- method 2: Native-API Driver : convert JDBC calls to native DB API calls (like calling SQLlite or using IPC with DB on the same machine)
- method 3: Network-Protocol Driver : drivers connect to a middleware which do the conversion
- method 4: Database-Protocol Driver : ‘Pure Java implementation that converts JDBC calls directly into a vendor-specific DBMS protocol.’
network protocol
All major DBMSs implement their own proprietary wire protocol over TCP/IP.
A typical client/server interaction:
- Client connects to DBMS and begins authentication process. There may be an SSL handshake.
- Client then sends a query.
- DBMS executes the query, then serializes the results and sends it back to the client.
By using an existing protocol, one db can save the effort of writing its own connection driver.
protocol design issue
- Row vs Col layout
ODBC/JDBC are row-oriented APIs.One potential solution is to send data in vectors for col store. - compression
- naive compression(gzip etc)
- data serialization
- Approach #1: Binary Encoding
- Have to handle endian conversion on client.
- The closer the serialized format is to the DBMS’s binary format, then the lower the overhead to serialize.
- DBMS can implement its own format or rely on existing libraries (ProtoBuf, Thrift).
- Approach #2: Text Encoding
- Convert all binary values into strings.
- Do not have to worry about endianness.
- Approach #1: Binary Encoding
- string handling
- null termination
- length prefix
- fixed width Pad every string to be the max size of that attribute.
kernel bypass methods
transferring between NIC memory and database process memory without OS TCP/IP stack and other data copying.
- data plane development kit : program access NIC memory without OS or TCP/IP stack (a set of libraries originate from intel, valid on linux, seems no applicable on virtual machines)
- remote direct memory access : NIC access process memory while process has no knowledge of the access. client must provide target address so the server’s NIC knows where to look up.