Skip to content

MonkDB Documentation

Production Operations Runbook

Production Operations Runbook

This runbook is a practical checklist for day-2 MonkDB operations.

Daily checks

SELECT description, severity FROM sys.checks WHERE NOT passed ORDER BY severity DESC;
SELECT name, load['1'], mem['used_percent'], heap['used'] FROM sys.nodes ORDER BY name;
SELECT table_name, id, routing_state, state FROM sys.shards ORDER BY table_name, id;

During latency incident

Identify top active jobs/operations by memory and runtime.
Check node imbalance (sys.nodes).
Check relocating/initializing shards (sys.shards).
Inspect breaker exceptions and reduce heavy concurrency.

During node failure/restart

Confirm master election and cluster availability.
Track shard recovery and allocation explanations.
Avoid concurrent disruptive maintenance until recovery stabilizes.

Release/maintenance window

Use rolling strategy.
Validate shard health between each node change.
Keep snapshots recent before major version/config change.

Snapshot hygiene

SELECT * FROM sys.repositories;
SELECT * FROM sys.snapshots ORDER BY started DESC LIMIT 20;

Governance/audit/lineage checks

SELECT * FROM sys.policy_audit_sink_metrics LIMIT 1;
SELECT * FROM sys.governance_contract_metrics LIMIT 1;
SELECT * FROM sys.lineage_sink_metrics LIMIT 1;

License state checks

SELECT "license"['status'], "license"['valid'], "license"['allowed_nodes'], "license"['current_nodes'], "license"['error']
FROM sys.cluster;