Getting Started With Rocks DB — Part: 1(Conceptual Understanding)
Through a series of articles, I will bring to you :
1. Conceptual understanding of RocksDB
2. Project Setup and basic implementation of RocksDB
3. Basic Operations using RocksDB
4. Performance Optimization in RocksDB
Conceptual Understanding of Rocks DB
Rocks DB is a library initially written in C++ but has its API implementations in various languages.
Rocks DB is a Key-Value datastore with both key and value being an arbitrary sized byte streams.
What do we mean by a key-value store?
A key-value datastore is a non-relational database which stores a collection of values with a unique identifier called key. This means, it will not have a pre-defined schema as in relational databases and we can store arbitrary sized byte streams as value for each key.
Architecture and Workflow of Rocks DB
I will try to explain very briefly the entire process of writes and the components involved. For detailed understanding, reading official documentation is recommended.
- The writes are first written in a Memtable.
Memtable: It is an in-memory data structure holding data before they are flushed to SST files (we’ll come to this later). It always holds the newer data serves for both read and writes. Any record to be read is first searched in memtable before SST files. Once the memtable is full, it becomes immutable and is replaced by a new memtable and a background thread flushes the older one into SST files after which the older one is destroyed.
There are various options which can be used to configure a memtable: specifying the size, number of tables that can exist in memory before flushing etc. We can also specify the internal implementation of a memtable as suited to our needs. - When the memtable gets full, the data is flushed into SST file.
SST File: SST — Sorted String Table is a format in which the key-value pairs are stored in sorted order by keys. SST files contain non overlapping immutable set of segments or records. When the SST file also gets full, there are various compaction techniques which reduces the size of these files. During compaction, the older value of a key might get replaced by the newer value(in case of updates). - While writing data to memtable, we can optionally write data to a Write Ahead Log-WAL
WAL : While memtable is in-memory, WAL or logfile is written on a disk that can be used to recover data in the memtable which is necessary to restore the database to original state in cases of failures.
Features of RocksDB
- Column Families : RocksDB provides partitioning of data into column families. A column family is a tuple which is a key-value pair : key being mapped to value which is a set of columns. Each column is itself a tuple with name, value and a timestamp attached to it.
- Updates: Put API takes care of saving a singlerecord into the DB. If the key already exists, older one is overwritten. Write API inserts all the key-value pairs in DB (or none) . A DeleteRange API can be used to delete a set of keys from DB
- Gets, Iterators & Snapshots: Get API can be used to get a single key-value pair. Multiget API can be used to get a range of keys. The data in DB is in sorted order, an iterator can be used to do scans. Snapshot API can be used to get point in view of the DB.
- Prefix Scans : Most DBs will have to search entire data for range scans. Rocks provides advantage to do a prefix scan which reduces the data to be scanned. It also has Bloom Filter to find out which data file should be scanned.
- Persistence : We have already talked about memtable and WAL for persistence. An important thing to note is Rocks Db uses batch commits to batch the transaction in a single fsync call .
- Check summing: Data Checksum is maintained for each SST file to check for data corruptions.
- Compactions: Compaction is an important feature of Rocks that it uses to reduce the size of the files, remove the expired records which makes it more performant.
There are many more features like multi-threading, replication, backups but the above constitute the key features of RocksDB.
This article is good enough to get started with understanding Rocks DB. In my next article, I will explain how to setup the project and get started with basic operations in Rocks.