Storage, Consistency, and Resiliency

Storage model

Every table is sharded across the cluster.
Optional partitioning splits table data further by partition keys (for lifecycle and pruning).
Each shard persists as a Lucene-backed index composed of immutable segments.
Writes are first recorded in the translog and then materialized into new segments during refresh cycles.
Background merge policies compact smaller segments into larger ones to maintain query efficiency.
Writes pass through journal/translog before durable segment transitions.
Replicas store independent shard copies for fault tolerance and read scalability.

Write path

Write-path notes:

Primary shard validates and applies writes first.
Replication can operate in synchronous or quorum acknowledgement modes.
The write is considered successful once the configured number of shard copies (primary + replicas) confirm persistence.
Under backpressure/failures, retries and recovery paths preserve consistency semantics at shard level.

Read path types

Primary-key direct lookup path: immediate visibility semantics for key-based fetch/update loops.
Search-style path (secondary attributes/full scans): near-real-time visibility with refresh cycles.

Use REFRESH TABLE when deterministic immediate search visibility is required.

Read/visibility decision table

Requirement	Recommended pattern	Tradeoff
Immediate key-based verification after write	Primary-key lookup path	Works for key access, not broad search scans.
Immediate search visibility for user-facing query	`REFRESH TABLE` after write batch	Extra refresh overhead if overused.
Maximum ingest throughput	Rely on natural refresh cycle	Search visibility is near-real-time, not immediate.

Consistency model in practice

MonkDB does not provide multi-statement ACID transactions across arbitrary rows/tables.

Operationally important guarantees:

Atomicity at row/document write unit.
Durability via translog + shard replication.
Near-real-time consistency for search readers.
Stronger immediate behavior for point-lookups by key.

MVCC and optimistic concurrency

MonkDB uses MVCC-style versioned document semantics suitable for optimistic concurrency patterns.

Typical pattern:

Read current row version/state.
Write with expected version constraints (application/API path).
Retry if conflict occurs.

Refresh and visibility model

A refresh makes newly indexed segments visible to search readers without requiring a full disk flush. For deterministic user flows:

trigger REFRESH TABLE <table> after critical writes when immediate search visibility is required
keep refresh usage scoped to avoid excessive refresh overhead

Example:

INSERT INTO doc.orders (id, amount) VALUES ('o-1', 120.0);
REFRESH TABLE doc.orders;
SELECT * FROM doc.orders WHERE id = 'o-1';

Resiliency mechanics

Replica promotion on primary loss.
Automatic shard recovery and relocation after node events.
Cluster coordination node election using quorum-based consensus.

Failure recovery flow

Recovery and allocation diagnostics

Use system tables to track recovery health:

SELECT table_name, id, routing_state, state,
       recovery['stage'], recovery['size']['percent']
FROM sys.shards
WHERE routing_state IN ('INITIALIZING', 'RELOCATING')
ORDER BY table_name, id;

SELECT table_name, shard_id, node_id, explanation
FROM sys.allocations
WHERE explanation IS NOT NULL
ORDER BY table_name, shard_id;

Data lifecycle controls

Partitioned retention: drop old partitions cheaply.
Tiering strategy: hot/warm/cold storage placement via operational shard moves.
Snapshot/restore for disaster recovery and rollback workflows.

Shard Distribution and Replica Placement

Shard distribution and replica placement

Backup and disaster recovery posture

Recommended baseline:

Create repository early (CREATE REPOSITORY).
Schedule periodic snapshots (CREATE SNAPSHOT).
Test restore workflows (RESTORE SNAPSHOT) in staging.
Monitor snapshot state via sys.snapshots and repository metadata via sys.repositories.