Write-Ahead Logs (WAL) in Databases: Internals & Durability

Databases are a fundamental part of modern software architecture. Depending on the use case, we rely on different types — from relational databases like PostgreSQL to NoSQL systems like Cassandra, or even distributed log systems like Kafka.

But have you ever wondered what happens to your data when the database crashes? How does the system ensure that your committed data isn't lost?

This is where Write-Ahead Log (WAL) comes into play. In this blog, we’ll dive into the internals of WAL, explore how it works behind the scenes, and understand its critical role in ensuring data durability.

🔧 What is WAL?

Every database uses an internal representation of data in memory — whether it’s based on B+ Trees or LSM Trees. When users issue commands to write or update records, these actions are first performed in memory and then periodically flushed to disk. This process is known as a checkpoint.

Since writes are batched before flushing, there’s always a risk of losing committed transactions if the system crashes before flushing to disk.

💡 One might think: “Why not flush every transaction directly to disk?”
Because it’s inefficient — writing every transaction involves random disk seeks, index updates, and structural changes, which decreases the throughput.

WAL solves this problem by introducing an immutable append-only log file. Each write is first recorded in the WAL, then applied to in-memory data structures.

📝 Think of WAL like a diary — jotting down everything before making the actual changes. If the system crashes mid-way, the diary can help restore what was lost.

🧠 Internals: How WAL Works

Writing to a sequential log (WAL) is significantly faster than writing to structured files.

Typical Write Path in a WAL-enabled Database

Steps:

Write is appended to the WAL — durability guaranteed.
Change is applied to an in-memory structure (like a memtable or buffer pool).
Once memory crosses a threshold, data is flushed to disk (checkpoint).
Old WAL logs can be purged after checkpoint to reduce log size.

🔍 Advantages of WAL

✅ Crash Recovery: Replays committed transactions from the WAL.
✅ Durability: Guarantees no data loss post-commit.
✅ Performance: Append-only writes are fast and sequential.
✅ Lazy Flushing: Flushes to disk in the background.
✅ Garbage Collection: Older WAL entries can be discarded post-checkpoint.
✅ Replication: WAL can be shipped to replicas for faster sync.

📦 Conceptual WAL Entry Format

A WAL entry typically stores:

LSN (Log Sequence Number, a byte offset for every record)
Transaction ID
Operation Type
Table + Row ID
Before/After values
Timestamp
CRC32 (for integrity)


LSN: 00001234
TransactionID: 99768
Operation: UPDATE
Table: users
RowID: 26
Before: { age: 20 }
After:  { age: 21 }
Timestamp: 2025-04-10 15:12:10
CRC32: 0x5d41402abc4b2a76b9719d911017c592

The above data is for representational purpose only, in real system, the data is stored in the binary format. Additionally, CRC32 is used as a checksum to ensure data integrity and usually calculated on the entire record.

💾 Durability and fsync()

It’s important to note that a DB operation is not truly durable just because it’s written to the WAL in memory or even buffered by the OS. There are multiple layers between the application and the actual disk:

To ensure durability, systems call fsync() (or similar system calls) to force the WAL to be flushed from the OS cache all the way to disk.

Every layer in the write path uses write buffering to improve performance, so calling fsync() tells the OS: “Please flush this data now.” However, frequent fsync() calls come at the cost of throughput. Many systems (like Kafka, PostgreSQL) batch writes and fsync periodically to strike a balance between durability and throughput.

⚙️ WAL Prototype

To solidify the concepts, I’ve built a simple WAL prototype in Java, showing:

Append-only log writes
Basic recovery logic

Below code snippet shows a simple WAL that logs different operations and flush changes to disk.

public class WriteAheadLog {
    private final File logFile;
    private final BufferedWriter writer;

    /**
     * Initializes the Write-Ahead Log with a given file name.
     * Creates or appends to the file if it already exists.
     *
     * @param fileName The name of the log file to use.
     * @throws IOException If the file cannot be created or opened.
     */
    public WriteAheadLog(String fileName) throws IOException {
        this.logFile = new File(fileName);
        // Open the file in append mode to preserve previous entries
        this.writer = new BufferedWriter(new FileWriter(logFile, true));
    }

    /**
     * Writes a single operation to the log file.
     * Each operation is flushed immediately to ensure durability.
     *
     * @param operation The string representing the operation (e.g., PUT, GET).
     * @throws IOException If writing to the file fails.
     */
    public void log(String operation) throws IOException {
        writer.write(operation);
        writer.newLine();    // Add newline to separate log entries
        writer.flush();      // Critical: force write to disk for durability
    }
}

👉 You can explore the full working prototype with recovery logic on GitHub

🔚 Final Thoughts

The Write-Ahead Log is one of the most fundamental techniques used in reliable storage systems. From PostgreSQL to Kafka, WAL ensures durability without sacrificing write performance.

💬 Let’s Discuss

Did you ever face a data loss incident?
Interested in WAL in distributed systems like Kafka?

Let me know in the comments!

Understanding Write-Ahead Logs: Durability Beyond the Flush

🔧 What is WAL?

🧠 Internals: How WAL Works

Typical Write Path in a WAL-enabled Database

🔍 Advantages of WAL

📦 Conceptual WAL Entry Format

💾 Durability and fsync()

⚙️ WAL Prototype

🔚 Final Thoughts

💬 Let’s Discuss

Comments

Building Blocks of Modern Data Systems

From Modulo to Consistent Hashing: Optimizing Distributed Storage

More from this blog

High-Watermark in Distributed Systems: A Deep Dive with Apache Kafka

When Bloom Filters Fail: False Positives, Memory Trade-offs, Production Lessons

Bloom Filter: Definitely No, Probably Yes

Concurrency vs. Parallelism: A Coffee Shop Guide for Developers

From Modulo to Consistent Hashing: Optimizing Distributed Storage

Command Palette

🔧 What is WAL?

🧠 Internals: How WAL Works

Typical Write Path in a WAL-enabled Database

🔍 Advantages of WAL

📦 Conceptual WAL Entry Format

💾 Durability and fsync()

⚙️ WAL Prototype

🔚 Final Thoughts

💬 Let’s Discuss

Comments

Building Blocks of Modern Data Systems

From Modulo to Consistent Hashing: Optimizing Distributed Storage

More from this blog