Database Basics

nand
2 min readMay 2, 2020

Most of what I will talk here are my learning from awesomly written book “Data Intensive Applications” by martin kleppmann.

Since ancient civilization, All stored data is written onto files. Different ways of storing this data makes a database different.

So when something need to be stored, create a new file and append that data into a file and when you read some data, traverse the file from top to bottom and return the object you are looking for. CONGRATULATIONS!! YOU JUST BUILD A DATABASE.

Now lets optimise it so that we don’t have to wait hours for a query to return.

What’s missing in above awesome database we created :) STRUCTURE.

A common structure is an object with multiple key-value pairs. Here the object can be called as a row and keys can be called as columns. And type of object can be called a Table.

Now with this structure you can make WHERE queries. It will traverse through file and will check if the value of key == query param. If yes, then that row is returned. Benefits of this approach is whole row is at same place on disk, so select * From type queries are very efficient, it has to find the row and return complete row. This is called Row based database.

Query time O(n), write O(1).

But why traverse whole while for each query. Lets maintain a index of where data is. Its a sorted list of column’s value and row’s location on file. So for query, it will first check the index, get the row’s location and then read the file. This way query time is O(log(n)).

Problem with this approach is table does have many columns and queries are on a few columns. Like select name from user where email=’x’. Now why fetch whole row and process it when we need only one column. What we can do is store each column separately and query on that particular column only. This approach is called Columnar Database.

As you can see columnar has efficient queries, select * queries are expensive and transactions are difficult, because then you will have to go to each columns and fetch values and if transaction fails then it will have to go to each column and revert back to old values.

Most transactional databases use row based architecture and Nosql, data lakes use Columnar databases.

--

--