Relational Databases
These notes cover relational databases (RDBMS): how they store data on disk, how they are structured, and the ACID properties that make them a reliable foundation for most applications. Relational databases are one of the most important tools in system design — understanding them deeply is worth the investment.
Contents
- Storage and the B+ Tree
- Tables, Schemas, and Rows
- Foreign Keys and Joins
- ACID Properties
- Transactions
- Atomicity
- Isolation
- Consistency
- Summary
1. Storage and the B+ Tree
Relational databases store data on disk, which means data is persistent — it survives restarts, crashes, and power loss. Databases are expected to hold very large amounts of data and to read and write it efficiently.
To make reads and writes fast, relational databases organise data internally using a B+ tree. Unlike a binary tree where each node has at most two children, a B+ tree is an M-way tree where each node can have up to M children and holds up to M-1 keys. This makes the tree extremely wide and shallow.
The key properties of this structure for databases:
- Wide, not deep. Because each node holds many keys, the tree stays very shallow. Finding any record requires traversing very few levels, which means very few disk reads.
- Data only in leaves. Internal nodes hold keys purely to guide the search. All actual data records live in the leaf nodes.
- Leaves form a linked list. Leaf nodes are connected to each other, so range queries (give me all notes created after a certain date) can be answered by scanning along the leaf level without traversing the tree again.
The goal is to minimise the number of disk reads and writes per operation. Reading from disk is orders of magnitude slower than reading from memory, so keeping the tree shallow and wide is what makes database queries fast even over millions of rows.
2. Tables, Schemas, and Rows
The surface-level interface of a relational database is the table. Each table has a name and a schema: a fixed list of columns, each with a name and a data type. Every row of data in the table must conform exactly to that schema.
CREATE TABLE notes (
id INT PRIMARY KEY,
user_id INT NOT NULL,
title VARCHAR(255) NOT NULL,
content TEXT NOT NULL
);The instances of this table — its rows, also called records — are real data that satisfy the schema:
| id | user_id | title | content |
|---|---|---|---|
| 1 | 42 | Meeting notes | Discussed Q3 roadmap... |
| 2 | 42 | Shopping list | Milk, eggs, bread... |
| 3 | 99 | Ideas | Side project concepts... |
Every table needs a way to uniquely identify each row. The id column above is a primary key: a column (or combination of columns) whose value is unique across every row in the table. In practice, most tables use a single auto-incrementing integer as the primary key rather than trying to compose one from existing data.
Other common constraints that databases enforce at the schema level include uniqueness (no two rows may have the same value in a column) and nullness (a column may or may not be allowed to hold a null value).
3. Foreign Keys and Joins
A database typically has more than one table. Consider a users table alongside the notes table:
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(255) NOT NULL
);| id | name |
|---|---|
| 42 | Alice |
| 99 | Bob |
The user_id column in the notes table can be declared as a foreign key that references the id column in users:
ALTER TABLE notes
ADD CONSTRAINT fk_notes_user
FOREIGN KEY (user_id) REFERENCES users(id);A foreign key constraint enforces referential integrity: the database will reject any insert or update that puts a user_id in notes that does not exist in users. You cannot create a note for a user that does not exist, and you cannot delete a user while they still have notes. The tables are linked and the database enforces the relationship automatically.
To retrieve data that spans both tables, you use a JOIN:
SELECT users.name, notes.title, notes.content
FROM notes
JOIN users ON notes.user_id = users.id
WHERE users.id = 42;This query finds all notes belonging to user 42 by combining rows from both tables where the user_id matches. The result:
| name | title | content |
|---|---|---|
| Alice | Meeting notes | Discussed Q3 roadmap... |
| Alice | Shopping list | Milk, eggs, bread... |
💡Foreign Keys Are a Design Tool
Foreign keys do more than enforce rules — they communicate intent. Declaring a foreign key is how you tell both the database and other developers that these two pieces of data are meaningfully related. Databases that skip foreign keys in favour of handling relationships in application code tend to develop integrity problems over time.
4. ACID Properties
Most relational database systems are ACID compliant. ACID is an acronym for four properties that together guarantee reliable database behaviour even in the presence of failures, concurrent access, and errors.
| Property | What it means |
|---|---|
| Atomicity | A transaction is all or nothing. If any part fails, the entire transaction is rolled back. |
| Consistency | A transaction brings the database from one valid state to another, respecting all defined constraints. |
| Isolation | Concurrent transactions do not interfere with each other. Each appears to execute in isolation. |
| Durability | Once a transaction is committed, it persists — even through crashes, power loss, or errors. |
Durability is directly tied to storing data on disk. Redis, for example, is an in-memory database — it is extremely fast, but because data lives in RAM rather than on disk, it is not durable by default. Redis is sometimes described as partially ACID compliant: it supports atomicity and some isolation, but durability requires additional configuration, and even then it is weaker than a full RDBMS.
5. Transactions
A transaction is a group of database operations that are executed together as a single unit. The structure is always the same:
BEGIN;
UPDATE accounts SET balance = balance - 500 WHERE id = 1;
UPDATE accounts SET balance = balance + 500 WHERE id = 2;
COMMIT;BEGIN opens the transaction. The operations inside are staged but not yet permanent. COMMIT makes them permanent — all changes are written to disk and become visible to other connections. If anything goes wrong before COMMIT, the transaction can be rolled back:
BEGIN;
UPDATE accounts SET balance = balance - 500 WHERE id = 1;
-- something fails here
ROLLBACK;
-- neither update is persisted; the database is unchangedTransactions are the mechanism through which atomicity and durability are delivered. Every operation inside a transaction either succeeds completely or leaves no trace.
6. Atomicity
Atomicity means a transaction is indivisible. You cannot split a transaction partway through — it either completes entirely or it does not happen at all.
The classic example is a bank transfer: moving money from account A to account B involves two writes. If the system crashes after debiting A but before crediting B, money has been lost.
With a transaction wrapping both operations, the database guarantees that either both writes succeed or neither does. There is no in-between state.
7. Isolation
Isolation addresses what happens when multiple transactions run at the same time. Without isolation, transactions can interfere with each other in subtle and damaging ways.
The most common problem is a dirty read: a transaction reads data that has been written by another transaction that has not yet committed. If that other transaction is then rolled back, the first transaction was working with data that never actually existed.
Isolation solves this by serialising transactions: even if multiple transactions arrive concurrently, the database executes them in a defined order such that each one sees a consistent snapshot of the database. Transaction 1 fully completes before Transaction 2 begins, or vice versa — they never overlap in a way that allows a dirty read.
8. Consistency
Consistency means the database always moves from one valid state to another. Every constraint defined in the schema — nullness, uniqueness, foreign key relationships — must hold true before and after every transaction.
If a transaction would violate any constraint, the database rejects it entirely:
-- This will fail if user_id 999 does not exist in the users table
INSERT INTO notes (id, user_id, title, content)
VALUES (10, 999, 'Orphan note', 'This user does not exist');
-- ERROR: insert or update on table "notes" violates foreign key constraintConsistency is not free. Every foreign key write requires the database to verify that the referenced value exists in the other table. Every uniqueness constraint requires checking the index. These checks add overhead, particularly at scale. That cost is usually worth paying — applications that bypass these constraints in favour of performance often accumulate integrity problems that are expensive to debug and correct later.
📝ACID Is a Guarantee, Not a Given
Not every database is ACID compliant. Many NoSQL databases relax one or more of these properties in exchange for performance, availability, or horizontal scalability. Understanding which properties a given database provides — and which it does not — is a core system design skill. The right trade-off depends on what your application actually needs.
Summary
| Concept | Key Takeaway |
|---|---|
| B+ tree | The internal data structure of most RDBMS. Wide and shallow to minimise disk reads; data only in leaf nodes; leaves linked for range scans. |
| Schema | The fixed structure of a table: column names, types, and constraints. Every row must conform to it. |
| Primary key | A column or set of columns that uniquely identifies each row. Typically a single auto-generated integer. |
| Foreign key | A column whose value must match an existing value in another table. Enforces referential integrity between related tables. |
| JOIN | A query that combines rows from multiple tables based on a shared column value. |
| ACID | Atomicity, Consistency, Isolation, Durability — the four properties that guarantee reliable database behaviour. |
| Transaction | A group of operations wrapped in BEGIN and COMMIT. Succeeds entirely or rolls back entirely. |
| Atomicity | A transaction is all or nothing. A crash mid-transaction triggers a rollback, leaving the database unchanged. |
| Isolation | Concurrent transactions are serialised so they cannot interfere with each other or produce dirty reads. |
| Consistency | All schema constraints hold before and after every transaction. The database rejects any transaction that would violate them. |
| Durability | Committed transactions survive crashes and power loss. Achieved by writing to disk. |