What Durability Actually Means (and how it translates to analytical systems)

By the end, you should understand what durability guarantees, what it costs to provide, and how the concern translates to analytical systems where the mechanics are mostly invisible.

May 25, 2026

This is the fourth and final article in a series on ACID. The first covered atomicity. The second covered consistency. The third covered isolation. This one covers durability — the simplest of the four properties to state, but with more depth than expected once you go past the surface.

By the end, you will understand what durability guarantees, what it costs to provide, and how the concern translates to analytical systems where the mechanics are mostly invisible. Understanding the connection is what makes the ACID framework useful beyond the OLTP databases where it originated.

The plain description

Durability is the guarantee that once a transaction has been committed, its effects survive any failure.

If the database tells you “your transaction is committed,” then a crash, power outage, restart, or hardware failure should not undo that transaction. The change is persistent.

This is the simplest of the four ACID properties to state. The complexity is in what “survive failure” actually means in practice, because there are many kinds of failure, and different durability mechanisms protect against different ones.

Why this is harder than it sounds

To understand why durability is non-trivial, consider what actually happens when an application “commits” a transaction.

The application sends a commit command to the database. Some time later, the database responds with “success.” Between the application’s send and the database’s response, a lot has happened, and at each step, a failure could lose the transaction.

Here’s a simplified sequence:

The application sends commit to the database
The database processes the commit in memory
The database writes the commit record to its in-memory log buffer
The database flushes the log buffer to the operating system
The operating system buffers the data in its page cache
The operating system writes the data to the disk controller
The disk controller writes the data to its own cache
The disk controller eventually flushes the data to the physical disk medium
The database returns “success” to the application

A crash at any point before the data reaches the physical disk medium could lose the transaction, depending on what exactly fails. A power outage between steps 7 and 8 — even after the disk controller has acknowledged the write — can lose data if the disk doesn’t have battery-backed cache.

So durability requires the database to ensure that the data has actually reached durable storage before reporting success to the application. The mechanism most databases use is fsync() — a system call that tells the operating system “actually write this to disk, don’t just say you did.”

fsync() is one of the slowest operations in computing. Modern SSDs can perform hundreds of thousands of writes per second but only a few thousand fsyncs per second. The cost of durability is real, and it’s paid on every committed transaction.

How durability is actually implemented

The standard mechanism is write-ahead logging combined with fsync. WAL was covered in the atomicity article; the connection to durability is direct.

The sequence:

Before applying any change to the database, write the change to the log
Before reporting commit, fsync the log to disk
Only after the fsync returns does the database tell the application “committed”

The key property: the log contains enough information to redo or undo every transaction. Even if the database’s data files are corrupted or in an inconsistent state after a crash, the database can replay the log to reconstruct the correct final state. The log is the source of truth for durability; the data files are derived from it.

This is why fsync is on the log, not on the data files. The data files can be written lazily — out of order, in batches, whenever the database decides — because if the system crashes, the log will be replayed to recover. Only the log needs immediate durable persistence.

In practice, modern databases batch fsyncs across multiple concurrent transactions (”group commit”). Multiple transactions commit in a single fsync, and all of them are durable when that fsync returns. This improves throughput on high-concurrency workloads, but each individual transaction still sees the latency of waiting for the next fsync to complete.

Where durability lives in analytical systems

This is the section that connects durability to analytical work, where most of this article’s readers operate.

Analytical systems handle durability differently from OLTP databases, primarily because their workloads are different. The mechanics most OLTP engineers worry about — fsync, WAL, replication configuration — are mostly invisible in modern analytical platforms.

BigQuery and Snowflake treat durability as effectively automatic and absolute. Data written to these systems is stored in highly redundant cloud storage with multi-region replication. The user doesn’t configure durability levels — it’s just always at the strongest practical level. The tradeoff between durability and performance, which is so important in OLTP databases, has been moved out of the user’s hands and into the cloud provider’s infrastructure.

This is reasonable for analytical workloads because the write patterns are different. OLTP systems handle thousands of small commits per second, each requiring an fsync. Analytical systems handle batch loads — large amounts of data written in bulk, infrequently. The per-commit fsync cost matters far less when you’re committing once every few minutes instead of thousands of times per second.

Object storage layers like S3 and GCS are the foundation. S3’s 11-nines durability is achieved through multi-AZ replication and continuous integrity checking. Data written to S3 effectively cannot be lost through hardware failure — only through accidental deletion or bucket-level disasters. Modern analytical systems build on this foundation: BigQuery’s Capacitor files, Snowflake’s micropartitions, Iceberg and Delta tables all sit on top of object storage with these durability guarantees.

Analytical pipelines shift the reliability concerns up the stack. The cloud provider solves the storage layer’s durability. What remains uncertain is what happens between writes — across pipeline stages, across retries, across the messy reality of orchestration failures and partial completions. If a pipeline fails mid-run, what state is the data in? If a downstream consumer reads while an upstream producer is mid-write, what do they see? If a backfill job reprocesses data, do the results converge correctly, or do they corrupt the warehouse?

These are not durability questions in the strict ACID sense. They’re the operational questions that arise once durability is assumed. And they map cleanly to a different property: idempotency.

The connection to idempotency

ACID durability says: once a transaction is committed, its effects survive any subsequent failure. The mechanism is write-ahead logging and fsync. The boundary of the guarantee is the single database.

Analytical pipelines have a different problem. The boundary isn’t a single transaction in a single database. It’s a multi-stage pipeline involving multiple sources, intermediate computations, and downstream destinations. The OLTP question “did this commit make it to disk?” has been answered by the cloud storage layer. The remaining question is “if this pipeline fails partway through and gets retried, does the final state still converge on the correct result?”

That question is what idempotency answers. An idempotent pipeline produces the same final state regardless of how many times it runs. Same parameters, same inputs, same output. The pipeline doesn’t accumulate state across runs. It doesn’t depend on the world outside its parameters. It converges on the correct output regardless of history.

Idempotency is the analytical analogue of durability. It’s the property that ensures the system’s promises survive the messy reality of failures and retries. The mechanisms are different — overwrite partitions instead of fsync, deterministic IDs instead of WAL — but the role is the same: making the system’s behaviour reliable in the face of inevitable failures.

The shift worth naming explicitly: in modern analytical systems, durability has been mostly solved by the storage layer, and the operational reliability concerns have moved up the stack to pipeline-level idempotency. The senior analytical engineer’s mental model includes asking “what happens if this pipeline runs twice?” — that question is the analytical version of asking “did this commit make it to disk?”

This is why ACID matters even for engineers working primarily in analytical systems. The mechanics are invisible, but the properties they guarantee are foundational. Understanding what those properties are — and how they translate when you move from OLTP to OLAP — is what makes the framework useful beyond the database systems where it originated.

What’s worth taking away

Durability is the simplest ACID property to state and the most well-understood. Its mechanism — write-ahead logging plus fsync — is essentially the same across all major databases. Its cost is real but predictable: every commit pays for an fsync.

What durability doesn’t cover is at least as important as what it does. It doesn’t catch application bugs. It doesn’t protect against malicious deletion. It doesn’t extend across systems. It isn’t backup.

In modern analytical systems, durability has largely been moved out of the user’s hands and into the cloud provider’s infrastructure. S3, GCS, and similar storage layers provide durability guarantees so strong that the operational concerns have shifted elsewhere. The remaining reliability question — “does the pipeline produce correct results under retries and failures?” — is answered by idempotency rather than durability. The two properties play parallel roles at different layers of the stack.

This concludes the ACID series. Across four articles, the recurring theme has been that the properties are precise and meaningful, but their relevance depends on understanding both what they guarantee and what they don’t. ACID was designed for OLTP databases in the 1980s. The properties survive into the analytical era, but the mechanisms and operational concerns have shifted. Understanding the framework deeply means understanding the translation — which is the work that separates engineers who can reason about system reliability from engineers who can only hope for it.

Sources

Foundational:

Härder, T. & Reuter, A. (1983), Principles of Transaction-Oriented Database Recovery — the paper that popularized the ACID acronym in its current form
Mohan, C. et al. (1992), ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging — the foundational paper on modern WAL implementation

On durability mechanisms:

PostgreSQL Documentation, Reliability and the Write-Ahead Log
Designing Data-Intensive Applications by Martin Kleppmann, Chapter 7 — accessible treatment of transactions and durability

On cloud storage durability:

Discussion about this post

Ready for more?