monkdb.yaml / monkdb.yml (Complete Reference)

This page includes the complete MonkDB configuration document.
Complete configuration template

######################## MonkDB Configuration File ##########################

# The default configuration offers the ability to use MonkDB right away.
# However, the purpose of this file is to give operators an overview of the various
# different configuration settings which can be applied on MonkDB.

# Use this file to fine-tune your MonkDB cluster for resiliency, speed, and security.

################################ Quick Settings ##############################

# Recommended memory settings:
# - Set the `MONKDB_HEAP_SIZE` environment variable to 25% of your total memory
# (e.g., 16G, but not exceeding ~30G for CompressedOops benefits);
# update `/etc/default/monkdb` or `/etc/sysconfig/monkdb` based on your OS.
#
#
# - disable swapping
#bootstrap.memory_lock : true

# MonkDB supports parallel storage across multiple volumes;
# ensure the owner is set to `monkdb:monkdb`.
#path.data: /path/to/data1,/path/to/data2

# Clustering settings allow delaying recovery until a specified number
# of data nodes are available, preventing unnecessary replica creation;
# the expected node count also triggers health check warnings if the actual
# number differs.
#gateway.expected_data_nodes: 5
#gateway.recover_after_data_nodes: 3

# Bind the node to an IP address or network interface other than localhost,
# but ensure it is not exposed to the internet; options include a specific IP (e.g., 192.168.1.1),
# _local_ (Loopback addresses), _site_ (Private, site-local addresses),
# _global_(Public, globally routable addresses), or a _[networkInterface]_ like eth0.
#network.host: _site_

# Specify the hosts which will form the MonkDB cluster for discovery purposes at a
# cluster level.
#discovery.seed_hosts:
#    - host1
#    - host2

# To initialize the cluster, specify the master-eligible nodes that will participate in the election process.
# Without this configuration, the cluster will be unable to elect an initial master node.
#
#cluster.initial_master_nodes:
#    - host1
#    - host2


################################# Full Settings ##############################

# The quick settings above cover most use cases, but the full list
# of configuration options is provided below. Any setting can be replaced
# with environment variables using `${...}` notation, for example:
#
#node.attr.rack: ${RACK_ENV_VAR}


#/////////////////////////// JMX Monitoring Plugin ///////////////////////////

# Enabling this switches on the `sys.jobs`, `sys.operations`, `sys.jobs_log` and
# `sys.operations_log` tables in MonkDB.
#stats.enabled: true

#//////////////////////// Database Administration ////////////////////////////

# Enable host-based authentication to allow authenticated access
# to MonkDB from specific hosts.
# The default value is `false`.
auth.host_based.enabled: true

# Client access and authentication are managed through the host-based configuration,
# which defines remote client access rules.
# The following example is a sane configuration that covers a common use case:
# * The predefined superuser monkdb has trusted access from localhost.
# * All other users must authenticate with a username and password from any location.
#  Note: Authentication is only available via the Postgres Protocol, so non-local
# hosts cannot connect via HTTP with this setup.

auth:
  host_based:
    jwt:
    #     iss: http://example.com
    #     aud: example_aud
    config:
      0:
        user: monkdb
        address: _local_
        method: trust
      99:
        method: password

#With trust-based authentication, the server accepts the username provided
# by the client without validation. For HTTP connections, the username is
# extracted from the `Authorization: Basic ...` header. If this header is
# missing, a default username can be specified as follows:
# In `docker run` command pass it like `-Cauth.trust.http_default_user=johndoe`
#auth:
#  trust:
#    http_default_user: johndoe

#///////////////////////// User Defined Functions ////////////////////////////

# To disable JavaScript for user-defined functions, set the following
# option (enabled by default):
#lang.js.enabled: false

#/////////////////////////////////  SSL //////////////////////////////////////

# Enable encryption for HTTP endpoints to secure communication.
#ssl.http.enabled: true

# Enable encryption for the PostgreSQL wire protocol to secure data transmission.:
#ssl.psql.enabled: true

# Specify the full path to the node keystore file.
#ssl.keystore_filepath: /path/to/keystore_file.jks

# Specify the password required to decrypt `keystore_file.jks`.
#ssl.keystore_password: myKeyStorePasswd

# Specify the password entered at the end of the `keytool -genkey`
# command if it differs from the keystore password.
#ssl.keystore_key_password: myKeyStorePasswd

# Optional configuration for truststore

# Specify the full path to the node truststore file.
#ssl.truststore_filepath: /path/to/truststore_file.jks

# Specify the password required to decrypt `truststore_file.jks`.
#ssl.truststore_password: myTrustStorePasswd

# Specify how frequently SSL files are monitored for changes.
#ssl.resource_poll_interval: 5s

################################### Cluster ##################################

# The cluster name is used for auto-discovery; ensure it is unique if
# running multiple clusters on the same network.
#cluster.name: monkdb

# The `graceful_stop` namespace configures the controlled shutdown of cluster nodes.
# It defines the minimum data availability required when a node stops. By default,
# only primary shards remain available, but options include `"full"` (ensuring replicas)
# or `"none"` (no guarantee).
#cluster.graceful_stop.min_availability: primaries
#
# Specify the duration to wait for the reallocation process to complete.
#cluster.graceful_stop.timeout: 2h
#
# The `force` setting enables a forced shutdown of a node if the graceful shutdown
# process exceeds `cluster.graceful_stop.timeout`.
#cluster.graceful_stop.force: false
#
# In most scenarios, allowing all types of shard allocations is recommended.
#cluster.routing.allocation.enable = all
#
# However, shard allocation can be restricted to specific types,
# such as during a rolling cluster upgrade.
#cluster.routing.allocation.enable = new_primaries

#################################### Node ####################################

# Node names are dynamically generated at startup, eliminating the need
# for manual configuration. However, you can assign a specific name if desired.
#node.name: "Piz Buin"

# Each node can be configured to allow or deny master eligibility and data storage.
# By default, nodes are eligible to be master nodes.
#node.master: true
#
# Allow this node to store data; this setting is enabled by default.
#node.data: true

# These settings allow you to design advanced cluster topologies.
#
# 1. To prevent this node from becoming a master and only store data,
#    configure it as a "workhorse" node.
#node.master: false
#node.data: true
#
# 2. To configure this node as a dedicated master, prevent it from
#    storing data, allowing it to focus solely on cluster coordination
#    and resource management.
#node.master: true
#node.data: false
#
# 3. To configure this node as a "search load balancer," disable both
#    master and data roles, allowing it to fetch data from nodes, aggregate
#    results, and distribute query loads efficiently.
#node.master: false
#node.data: false

# A node can have custom attributes assigned as key-value pairs, which can
# be used for shard allocation filtering or allocation awareness.
# Example: `node.attr.key: value`.
#node.attr.rack: rack314

# This setting determines whether memory-mapping is allowed; the default value is `true`.
#node.store.allow_mmap: true

#################################### Paths ###################################

# Relative paths are resolved based on `MONKDB_HOME`, while absolute paths take precedence.


# Specify the path to the directory containing configuration files,
# including this file and `log4j2.properties`.
#path.conf: config

# Specify the path to the directory where table data for this node will be stored.
#path.data: data
#
# Multiple locations can be specified for data storage, enabling file-level
# striping similar to RAID 0. The system prioritizes locations with the most
# available free space during data creation. Example:
#path.data: /path/to/data1,/path/to/data2

# Complete path to log files:
#path.logs: logs

# An alternative syntax can be used for configuring path settings,
# allowing a structured format for defining log, data directories.
#path:
#  logs: /var/log/monkdb
#  data: /var/lib/monkdb

# Specify the path to the directory where blob data for this node will be stored.
#blobs.path: blobs

# See also: path.repo (further down)

################################### Memory ###################################

# MonkDB performance degrades significantly if the JVM starts swapping;
# to prevent this, ensure it **never** swaps by setting this property
# to `true` to lock memory.
#bootstrap.memory_lock: true

# Ensure the machine has sufficient memory allocated for MonkDB while
# reserving enough for the operating system to function properly.
# You can allocate memory for MonkDB as follows:
#  - Set the `MONKDB_MIN_MEM` and `MONKDB_MAX_MEM` environment variables
#    (recommended to be equal). Alternatively, use `MONKDB_HEAP_SIZE`
#    to automatically set both to the same value.
#
# Ensure the MonkDB process can lock memory by setting `ulimit -l unlimited`.


############################## Network And HTTP ###############################

# By default, MonkDB binds to loopback addresses and listens on ports **4200-4300**
# for HTTP traffic and **4300-4400** for node-to-node communication. If a port is
# occupied, it automatically selects the next available one.

# In addition to IPv4 and IPv6 addresses, special values can be used:
# _local_ Any loopback addresses on the system, for example 127.0.0.1.
# _site_ Any site-local addresses on the system, for example 192.168.0.1.
# _global_ Any globally-scoped addresses on the system, for example 8.8.8.8.
# _[networkInterface]_  Addresses of a network interface, for example _en0_.

# Specify the bind address explicitly, using an IPv4, IPv6, or special value.
#network.bind_host: 192.168.0.1

# Specify the address that other nodes will use to communicate with this node.
# If not set, it is automatically determined, but it must be a valid IP address.
#network.publish_host: 192.168.0.1

# Specify both `bind_host` and `publish_host` to control where the node
# binds for incoming connections and how it advertises itself to other nodes.
#network.host: 192.168.0.1

# Specify a custom port for node-to-node communication; the default is **4300**.
#transport.tcp.port: 4300

# Enable compression for node-to-node communication; it is disabled by default.
#transport.tcp.compress: true

# Specify a custom port for HTTP traffic.
#http.port: 4200

# Specify a custom maximum allowed content length for HTTP requests.:
#http.max_content_length: 100mb

################################### Gateway ##################################

# The gateway saves cluster metadata to disk whenever changes occur,
# ensuring persistence across full cluster restarts and recovery
# when nodes restart.

# Specify the minimum number of data nodes that must start before
# initiating cluster state recovery.
#gateway.recover_after_data_nodes: 2

# Specify the wait time before starting recovery after the required number of nodes
# (`gateway.recover_after_nodes`) have started.
#gateway.recover_after_time: 5m

# Specify the number of data nodes required for immediate cluster state
# recovery; this value should match the total number of nodes in the cluster.
#gateway.expected_data_nodes: 3


############################ Recovery Throttling #############################

# These settings control shard allocation during initial recovery, replica assignment,
# rebalancing, and when adding or removing nodes.

# Specify the number of concurrent recoveries allowed per node:
#
# 1. Specify the number of concurrent recoveries allowed per node during the initial
#    recovery phase.
#cluster.routing.allocation.node_initial_primaries_recoveries: 4
#
# 2. Specify the number of concurrent recoveries allowed per node during node addition,
#    removal, or rebalancing.
#cluster.routing.allocation.node_concurrent_recoveries: 2

# Define the maximum data transfer rate for shard recovery per second; the default is **40MB**.
#indices.recovery.max_bytes_per_sec: 40mb

# Specify the wait time before retrying recovery after a cluster state sync issue occurs.
#indices.recovery.retry_delay_state_sync: 500ms

# Specify the wait time before retrying recovery after a network-related issue occurs.
#indices.recovery.retry_delay_network: 5s

# Define the time interval after which idle recoveries will be considered failed.
#indices.recovery.recovery_activity_timeout: 15m

# Define the timeout duration for internal requests during the recovery process.
#indices.recovery.internal_action_timeout: 15m

# Define the timeout for internal recovery requests that are expected to
# take a long duration.
#indices.recovery.internal_long_action_timeout: 30m

# Specify the number of file chunk requests that can be sent in parallel during
# recovery.
# indices.recovery.max_concurrent_file_chunks: 2


################################# Discovery ##################################

# The discovery mechanism enables nodes to locate each other within a cluster and
# elect a master node. By default, **unicast discovery** is used, allowing explicit
# control over which nodes participate in cluster discovery through pinging.
#discovery.seed_hosts:
#  - host1:port
#  - host2:port
#
# To debug the discovery process, configure a logger in **`config/log4j2.properties`**
# for detailed logging.

# To initialize the cluster, specify the master-eligible nodes. Otherwise, the cluster cannot
# elect an initial master node.
#
#cluster.initial_master_nodes: ["host1", "host2"]

#/////////////////////////// Discovery via DNS ///////////////////////////////

# Service discovery enables MonkDB to retrieve host information for unicast discovery
# using **SRV DNS records**.

# To enable **SRV discovery**, set the discovery type to `'srv'`.
#discovery.seed_providers: srv

# Service discovery requires a query to retrieve **SRV records**, typically
# formatted as `_service._protocol.fqdn`.
#discovery.srv.query: _monkdb._srv.example.com

#////////////////////////////// EC2 Discovery ////////////////////////////////

# EC2 discovery enables MonkDB to find hosts for unicast discovery using the
# **AWS EC2 API**.

# To enable **EC2 discovery**, set the discovery type to `'ec2'`.
#discovery.seed_providers: ec2

# There are several methods to filter EC2 instances.
#
# Filter EC2 instances by security groups using their ID or name, ensuring
# that only instances associated with the specified group are utilized for
# unicast host discovery.
#discovery.ec2.groups: sg-example-1, sg-example-2
#
# Control whether all security groups (false) or just any security group (true)
# must be present for the instance to qualify for discovery.
#discovery.ec2.any_group: true
#
# Filter EC2 instances by availability zones, ensuring that only instances
# located within the specified zone are used for unicast host discovery.
#discovery.ec2.availability_zones:
#  - us-east-1
#  - us-west-1
#  - us-west-2
#  - ap-southeast-1
#  - ap-southeast-2
#  - ap-northeast-1
#  - eu-west-1
#  - eu-central-1
#  - sa-east-1
#  - cn-north-1
#
# EC2 instances for discovery can be filtered by tags using the discovery.ec2.tag. prefix
# followed by the tag name. For example, to filter instances with the environment tag
# set to dev, use the filter discovery.ec2.tag.environment=dev
#discovery.ec2.tag.environment: dev
#discovery.ec2.tag.<name>: <value>
#
# If you have your own compatible implementation of the EC2 API service, you can specify the
# endpoint to be used by providing a custom URI.
#discovery.ec2.endpoind: http://example.com/endpoint

#/////////////////////////////// Azure Discovery /////////////////////////////

# Azure discovery enables MonkDB to look up hosts for unicast discovery using the Azure API.

# To enable Azure discovery, set the discovery type to `azure`.
#discovery.seed_providers: azure

# You need to provide the resource group name of your Azure instances, which
# acts as a logical container for grouping related resources like virtual machines,
# storage accounts, and databases, enabling better management and governance.
#cloud.azure.management.resourcegroup.name: myrg

# The following configuration values must be provided for Active Directory authentication:
# 1. Azure Tenant ID: The unique identifier of your Azure Active Directory tenant.
# 2. Client ID: The application (or service principal) ID registered in Azure AD.
# 3. Client Secret: The secret key associated with the application for secure authentication.
# 4. Azure Subscription ID- Subscription ID for your Azure environment.
#cloud.azure.management.subscription.id: xxxxx.xxxx.xxx.xxx
#cloud.azure.management.tenant.id: xxxxxxxxxxx
#cloud.azure.management.app.id: xxxxxxxxxx
#cloud.azure.management.app.secret: my_password

# There are two methods of discovery in Azure:
# 1. The vnet method discovers all virtual machines within the same virtual network (VNet).
# 2. The subnet method discovers all virtual machines within the same subnet of a VNet.
#discovery.azure.method: vnet


############################# Routing Allocation #############################

# This setting controls shard allocation in MonkDB, with two options:
# 1. all: Allows all shard allocations. The cluster can allocate all types of shards,
#    including primary and replica shards
# 2. new_primaries: Restricts allocations to new primary shards only. This means: Newly added
# nodes will not receive replica shard allocations. @New primary shards can still be allocated
# for new indices.

#Useful for zero-downtime cluster upgrades:
# Set to new_primaries before stopping the first node.
# Reset to all after starting the last updated node

# This setting is part of the cluster-level shard allocation controls, which manage
# how MonkDB distributes shards across nodes for optimal performance and
# resource utilization
#cluster.routing.allocation.enable: all

# Shard rebalancing in MonkDB can be controlled using the
# `cluster.routing.allocation.allow_rebalance` setting, with the following options:
# 1. always: Rebalancing is enabled at all times.
# 2. indices_primary_active: Rebalancing occurs only when all primary shards in
#    the cluster are active.
# 3. indices_all_active (default): Rebalancing happens only when all shards
# (primary and replica) are active, reducing unnecessary activity during initial recovery
#cluster.routing.allocation.allow_rebalance: indices_all_active

# The number of concurrent rebalancing tasks allowed cluster-wide is controlled
# by the setting cluster.routing.allocation.cluster_concurrent_rebalance,
# which defaults to 2. This limit ensures that only two shard rebalancing tasks
# occur simultaneously to prevent resource overload and maintain cluster stability.
#cluster.routing.allocation.cluster_concurrent_rebalance: 2

# The number of initial recoveries of primary shards allowed per node is controlled
# by the setting `cluster.routing.allocation.node_initial_primaries_recoveries`.
# This setting defaults to 4, allowing up to 4 primary shard recoveries to occur in
# parallel on a single node. Since local gateway recoveries are typically fast,
# this value can be increased to handle more recoveries per node without
# overloading the system.
#cluster.routing.allocation.node_initial_primaries_recoveries: 4

# The number of concurrent recoveries allowed on a node is controlled by
# the `cluster.routing.allocation.node_concurrent_recoveries setting`,
# which defaults to 2. This includes both incoming and outgoing shard recoveries
#cluster.routing.allocation.node_concurrent_recoveries: 2


################################## Awareness #################################

# Cluster allocation awareness in MonkDB allows you to configure shard
# and replica allocation across generic attributes associated with nodes, such
# as racks or availability zones. By specifying awareness attributes (e.g., rack_id or zone),
# MonkDB ensures that primary and replica shards are distributed across different
# nodes with distinct attribute values, enhancing fault tolerance and minimizing
# the risk of data loss during failures

# To define node attributes for shard allocation awareness, you can use the
# `cluster.routing.allocation.awareness.attributes` setting. For example,
# to ensure that a shard and its replicas are not allocated to nodes with
# the same rack_id value:
# 1. Set Node Attributes: Assign a custom attribute (e.g., rack_id) to each node
#    in the MonkDB.yml file or via startup parameters.
# 2. Enable Awareness: Configure the cluster to consider the attribute by setting:
# `cluster.routing.allocation.awareness.attributes: rack_id`
# This ensures shards and their replicas are distributed across nodes with different
# `rack_id` values, enhancing fault tolerance
#
# The awareness attributes can hold several values
#cluster.routing.allocation.awareness.attributes:

# To force shard allocation based on node attributes, use the
# `cluster.routing.allocation.awareness.force.*` settings. This
# ensures that shards and replicas are allocated only to nodes with
# specific attribute values, preventing over-allocation in a single
# group of nodes.
#cluster.routing.allocation.awareness.force.<attribute>.values:


############################### Balanced Shards ##############################

# The weight factor for shards allocated on a node is defined by the setting
# `cluster.routing.allocation.balance.shard`, which is a float value that influences
# shard distribution to ensure balanced workloads across nodes
#cluster.routing.allocation.balance.shard: 0.45f

# The factor controlling the number of shards per index allocated on a specific
# node is defined by the setting `cluster.routing.allocation.balance.index`,
# which is a float value. This setting helps balance shard distribution across
# nodes for individual indices.
#cluster.routing.allocation.balance.index: 0.5f

## In MonkDB, the settings cluster.routing.allocation.balance.shard and
## cluster.routing.allocation.balance.index cannot both be set to 0.0f, as this would
## disable the balancing logic for shard allocation and indexing, potentially
## leading to an unbalanced cluster state where shards are not distributed effectively
## across nodes


# The weight factor for the number of primary shards of a specific index allocated on a
# node is defined by the setting `cluster.routing.allocation.balance.index`, which is a
# float value. This setting influences how MonkDB balances the allocation of
# primary shards across nodes, ensuring that no single node becomes overloaded with
# primary shards from a particular index.
#cluster.routing.allocation.balance.primary: 0.05f

# The minimal optimization value of operations that should be performed in
# MonkDB is defined by the setting `cluster.routing.allocation.balance.threshold`,
# which is a non-negative float. The default value is 1.0f. Increasing this value can lead to
# more efficient resource usage and improved performance during shard allocation and
# recovery processes.
#cluster.routing.allocation.balance.threshold: 1.0f


####################### Cluster-Wide Allocation Filtering ####################

# To place new shards only on nodes where one of the specified values matches
# an attribute, use the cluster.routing.allocation.include.<attribute> setting
#cluster.routing.allocation.include.<attribute>:

# To place new shards only on nodes where none of the specified values matches an attribute,
# use the `cluster.routing.allocation.exclude.<attribute>` setting
#cluster.routing.allocation.exclude.<attribute>:

# The setting cluster.routing.allocation.require.<attribute> specifies rules for
# shard allocation where all rules must match for a node to be eligible to
# host a shard. This contrasts with the include setting, which allocates shards
# if any rule matches.
#cluster.routing.allocation.require.<attribute>:


########################## Disk-based Shard Allocation #######################

# To prevent shard allocation on nodes based on disk usage, MonkDB provides
# the setting `cluster.routing.allocation.disk.threshold_enabled`, which is enabled
# by default (true). This setting ensures that disk-based shard allocation decisions
# are made to avoid overloading nodes with insufficient disk space
#cluster.routing.allocation.disk.threshold_enabled: true

# The setting cluster.routing.allocation.disk.watermark.low defines the
# lower disk threshold limit for shard allocation in MonkDB. By default,
# it is set to 85%, meaning that new shards will not be allocated to nodes with
# more than 85% disk usage. Alternatively, it can also be set to an absolute value,
# such as 500mb, to prevent shard allocation on nodes with less than the specified
# free disk space
#cluster.routing.allocation.disk.watermark.low: 85%

# The setting cluster.routing.allocation.disk.watermark.high defines the higher disk
# threshold limit for shard allocation in MonkDB. By default, it is set to 90%,
# meaning:
# 1. **Relocation Trigger**: If a node's disk usage exceeds 90%, MonkDb will attempt
#    to relocate shards from that node to other nodes with sufficient disk space.
# 2. **New Shard Allocation Block**: New shards will not be allocated to nodes exceeding
#    this threshold.
# This value can also be set to an absolute amount of free disk space (e.g., 500mb)
# instead of a percentage. Adjusting this setting helps prevent nodes from running
# out of disk space and ensures cluster stability by redistributing shards as needed
#cluster.routing.allocation.disk.watermark.high: 90%

# The setting cluster.routing.allocation.disk.watermark.flood_stage in MonkDB
# defines the threshold at which a read-only block is enforced on every index that
# has at least one shard (primary or replica) allocated on a node where disk usage
# exceeds this value. By default, it is set to 95%.
#cluster.routing.allocation.disk.watermark.flood_stage: 95%


########################## Field Data Circuit Breaker #########################

# The field data circuit breaker in MonkDB estimates the memory required for
# loading field data into memory, helping prevent out-of-memory errors.

# The setting indices.breaker.fielddata.limit specifies the maximum amount of
# memory that can be allocated for fielddata in MonkDB. By default,
# this limit is set to 40% of the JVM heap, but it can be adjusted based on
# specific use cases and resource availability
#indices.fielddata.breaker.limit: 60%

# The setting indices.fielddata.breaker.overhead is a constant used by MonkDB to
# multiply the theoretical memory estimation for field data to calculate the final
# memory requirement. By default, this value is set to 1.03, which means a 3%
# overhead is added to the estimated memory usage.
#indices.fielddata.breaker.overhead: 1.0.3


################################# Threadpools ################################

# MonkDB nodes use several thread pools to manage tasks efficiently and optimize
# resource usage.
#thread_pool.index.type: fixed
#thread_pool.index.queue_size: 200


################################## Metadata ##################################

# The setting cluster.info.update.interval in MonkDB defines how often the
# cluster collects metadata information, such as disk usage, if no specific event
# triggers an update. By default, it is set to 30s, meaning the cluster will refresh
# its metadata every 30 seconds.
#cluster.info.update.interval: 30s


################################## GC Logging ################################

#monitor.jvm.gc.collector.young.warn: 1000ms
#monitor.jvm.gc.collector.young.info: 700ms
#monitor.jvm.gc.collector.young.debug: 400ms

#monitor.jvm.gc.collector.old.warn: 10s
#monitor.jvm.gc.collector.old.info: 5s
#monitor.jvm.gc.collector.old.debug: 2s


###################################### SQL ####################################

# The setting node.sql.read_only in MonkDB determines whether SQL statements that
# result in modification operations (e.g., INSERT, UPDATE, DELETE) are allowed
# on the node.
#node.sql.read_only: false

# To execute SQL DML operations over a large number of rows in MonkDB, such
# as INSERT FROM SUBQUERY, UPDATE, or COPY FROM, you can increase the timeout to
# ensure the operation completes successfully, even on slower hardware or under
# heavy cluster load.
#bulk.request_timeout: 1m


######################### SQL Query Circuit Breaker ##########################

# The query circuit breaker in MonkDB estimates the memory required for executing
# queries and prevents excessive memory usage that could lead to OutOfMemoryError.
# It is part of the request circuit breaker, which specifically tracks memory usage
# for queries and aggregations.

# The setting indices.breaker.query.limit specifies the memory limit for the query
# circuit breaker in MonkDB. This circuit breaker prevents queries from consuming
# excessive memory, which could lead to performance issues or OutOfMemoryError
#indices.breaker.query.limit: 60%

# The setting indices.breaker.query.overhead in MonkDB defines a constant
# multiplier applied to the estimated memory usage of a query to determine the final
# memory requirement. This overhead accounts for inaccuracies in memory estimation and
# ensures the circuit breaker trips before the actual memory usage exceeds the
# configured limit.
#indices.breaker.query.overhead: 1.09


##################################### UDC ####################################

# Usage Data Collection
#
# If enabled MonkDB will send usage data to the url stored in setting
# `udc.url`. The sent usage data doesn't contain any confidential information.

# A user can enable/disable usage data collection at all
#udc.enabled: true

# The delay for first ping after start-up. A user can configure based on their
# requirements.
#udc.initial_delay: 10m

# The setting udc.interval specifies the interval at which a ping is sent.
# This configuration is crucial for maintaining connectivity and ensuring that
# the system can monitor the health and status of nodes effectively.
#udc.interval: 24h

# The setting udc.url specifies the URL to which a ping is sent for monitoring
# or health check purposes.
#udc.url: https://udc.monkdb.com/


############################# BACKUP / RESTORE ###############################

# To configure the paths where repositories of type fs (file system) may be created in MonkDB,
# you can use the path.repo setting
#path.repo: /path/to/shared/fs,/other/shared/fs

# The configuration for URL repositories in MonkDB allows specifying a
# list of URLs that can be used with the URL repository type. This setting
# is crucial for defining where snapshots can be stored and retrieved from
# when using URL-based repositories.
#
# Supported protocols are: "http", "https", "ftp", "file" and "jar"
# While only "http", "https" and "ftp" need to be listed here for usage in
# URL repsoitories.
# "file" urls must be prefixed with an entry configured in ``path.repo``
#repositories.url.allowed_urls: ["http://example.org/root/*", "https://*.mydomain.com/*?*#*"]

###################### POSTGRES WIRE PROTOCOL SUPPORT ########################

# MonkDB supports the PostgreSQL wire protocol v3 and emulates a PostgreSQL
# server version 10.5. This compatibility allows users to connect to MonkDB
# using tools and libraries designed for PostgreSQL, enabling seamless integration
# with existing PostgreSQL-based workflows and ecosystems.
#psql.enabled: true
#psql.port: 5432

# -------------------------
# Gremlin Policy (Hard Caps)
# -------------------------
# Maximum traversal depth allowed (for bounded repeat). Deeper requests are rejected.
# gremlin.http.max_depth: 6
# Hard upper cap on returned rows per Gremlin request.
# gremlin.http.max_rows: 500
# Maximum execution timeout (5 seconds) allowed for Gremlin requests.
# gremlin.http.timeout_ms: 5000
# Blocks repeat(...) without .times(n).
# gremlin.http.deny_unbounded_repeat: true

# Power-user override example (trusted internal service account)
# Per-user override: graph_service can use depth up to 8.
# gremlin.http.user.graph_service.max_depth: 8
# Per-user override: graph_service can return up to 1500 rows.
# gremlin.http.user.graph_service.max_rows: 1500
# Per-user override: graph_service can run up to 8 seconds.
# gremlin.http.user.graph_service.timeout_ms: 8000
# Keeps unbounded repeat blocked for that user too.
# gremlin.http.user.graph_service.deny_unbounded_repeat: true

# ---------------------------------
# Admission Control (Fanout Defense)
# ---------------------------------
# Planner assumption for neighbor expansion factor used in cost estimation.
# gremlin.http.admission.assumed_fanout: 12
# Rejects queries if estimated produced rows exceed this.
# gremlin.http.admission.max_estimated_rows: 200000
# Rejects queries if estimated total work exceeds this.
# gremlin.http.admission.max_estimated_work: 800000

# ----------------------------------------
# Alert Thresholds (Operational SLO Signals)
# ----------------------------------------
# Alert if policy-rejected requests ratio goes above 10%.
# gremlin.http.alerts.policy_rejected_ratio_threshold: 0.10
# Don’t evaluate that ratio alert until at least 50 requests observed.
# gremlin.http.alerts.policy_rejected_min_requests: 50
# Alert if guardrail-rejected requests ratio goes above 3%
# gremlin.http.alerts.guardrail_rejected_ratio_threshold: 0.03
# Don’t evaluate guardrail ratio until at least 50 requests.
# gremlin.http.alerts.guardrail_rejected_min_requests: 50
# Alert if parse-cache miss ratio exceeds 60%
# gremlin.http.alerts.parse_cache_miss_ratio_threshold: 0.60
# Don’t evaluate cache miss ratio until at least 100 cache samples (hits+misses).
# gremlin.http.alerts.parse_cache_miss_min_samples: 100

# ----------------------------------------
# Audit (policy decisions)
# ----------------------------------------
# audit.enabled: true
# audit.sink.mode: async
# audit.sink.queue_size: 200000
# audit.sink.batch_size: 1024
# audit.sink.flush_interval_ms: 1000
# audit.sink.spool.enabled: true
# audit.sink.drop_on_full: true
# audit.sink.sample_rate: 1.0

# Durable audit index (replicated)
# audit.sink.index.enabled: true
# audit.sink.index.name: policy_audit_events
# audit.sink.index.shards: 1
# audit.sink.index.replicas: "0-1"
# audit.sink.index.refresh_interval: 30s
# audit.sink.index.partition_by_day: true

# Archive (hot/cold)
# audit.archive.enabled: true
# audit.archive.repository: audit_repo   # CREATE REPOSITORY ... name
# audit.archive.interval: 1h
# audit.archive.max_age: 7d
# audit.archive.max_partitions_per_run: 50
# audit.archive.snapshot_prefix: policy_audit_archive
# audit.archive.delete_after_snapshot: true
# audit.archive.wait_for_completion: true

# ----------------------------------------
# Lineage
# ----------------------------------------
# lineage.enabled: true
# lineage.sink.mode: async
# lineage.sink.queue_size: 16384
# lineage.durability.mode: durable
# lineage.retention: 24h

# Optional: lineage projection tables
# lineage.sink.index.enabled: true
# lineage.sink.index.jobs_table: lineage_jobs_store
# lineage.sink.index.edges_table: lineage_edges_store
# lineage.sink.index.shards: 1
# lineage.sink.index.replicas: "0-1"
# lineage.sink.index.partition_by: day
Usage notes

File name in deployments is typically 'monkdb.yml' ('.yaml' is equivalent YAML syntax).
Prefer keeping baseline config in the file and environment overrides via CLI flags or orchestration values.
For Docker startup flow, see Provisioning with Docker Image.
For argument mapping and runtime flags, see Cluster Arguments Reference.