CREATE ANALYSER

Enterprise command reference.

Command Snapshot

Field	Value
Category	DDL and Administration
Mutates Data	Yes/Depends
Scope	Cluster / Object
Privilege Model	Requires DDL/administrative privilege according to target object scope.

Purpose

Defines, changes, or removes schema and metadata objects.

Syntax

CREATE ANALYZER analyzer_name EXTENDS parent_analyzer_name
    WITH ( override_parameter [= value] [, ... ] )

Operational Notes

Use schema-qualified identifiers in automation and automation pipelines.
Validate behavior in staging for cluster-impacting or governance-impacting changes.
Confirm runtime effects through system tables and metrics before and after execution.

When to Use

Use during planned schema and runtime administration changes.
Use in automation pipelines with environment-specific validation and rollback strategy.

When Not to Use

Avoid during incident windows unless the command is part of approved mitigation.
Avoid schema changes in peak traffic windows without staged rollout.

Common Errors and Troubleshooting

Symptom	Likely Cause	Action
Permission denied / unauthorized	Missing privilege on object or cluster scope	Re-run with required grants or elevated admin role.
Analysis/parse error	Syntax variant or object shape mismatch	Compare with canonical syntax and object definition.
Runtime failure under load	Resource limits, breaker pressure, or node state transitions	Check `sys.jobs`, `sys.operations`, `sys.checks`, and retry after mitigation.

Cross-References

Detailed Reference

Defines a new fulltext analyzer for use in text search operations. Analyzers process text data into tokens for efficient indexing and querying.

SQL Statement

Extending an Existing Analyzer

CREATE ANALYZER analyzer_name EXTENDS parent_analyzer_name
    WITH ( override_parameter [= value] [, ... ] )

Creating a New Analyzer

CREATE ANALYZER analyzer_name (
    [ TOKENIZER
      {
          tokenizer_name
        | custom_name WITH ( type = tokenizer_name, tokenizer_parameter [= value] [, ... ] )
      }
    ]
    [ TOKEN_FILTERS (
        {
            token_filter_name
          | custom_name WITH ( type = token_filter_name, token_filter_parameter [= value] [, ... ] )
        }
        [, ... ]
      )
    ]
    [ CHAR_FILTERS (
        {
            char_filter_name
          | custom_name WITH ( type = char_filter_name, char_filter_parameter [= value] [, ... ] )
        }
        [, ... ]
      )
    ]
)

Parameters

Parameter	Description
analyzer_name	Unique name for the analyzer.
parent_analyzer_name	Base analyzer to inherit from (for `EXTENDS`).
override_parameter	Parameter of the parent analyzer to override.
tokenizer_name	Built-in tokenizer name (e.g., `standard`, `ngram`).
tokenizer_parameter	Tokenizer-specific configuration (e.g., `min_gram`, `max_gram`).
token_filter_name	Built-in token filter (e.g., `lowercase`, `kstem`).
token_filter_parameter	Token filter configuration.
char_filter_name	Built-in character filter (e.g., `html_strip`, `mapping`).
char_filter_parameter	Character filter configuration.
custom_name	Unique identifier for a tokenizer/token-filter/char-filter with parameters.

Description

Key Features:

Tokenizers: Split text into tokens (e.g., by whitespace, n-grams).
Token Filters: Modify tokens (e.g., lowercase, stemming).
Char Filters: Pre-process text (e.g., strip HTML tags, replace characters).

Examples

Basic Analyzer with Built-in Components

Creates an analyzer that: 1. Tokenizes using whitespace. 2. Converts tokens to lowercase. 3. Applies the kstem filter for English stemming. 4. Strips HTML tags.

CREATE ANALYZER testanalyzer (
    TOKENIZER whitespace,
    TOKEN_FILTERS (lowercase, kstem),
    CHAR_FILTERS (html_strip)
);

Custom Tokenizer with Parameters

Creates an analyzer using a custom ngram tokenizer with parameters:

CREATE ANALYZER my_ngram_analyzer (
    TOKENIZER my_ngram WITH (
        type = 'ngram',
        min_gram = 2,
        max_gram = 10
    )
);

Extending a Built-in Analyzer

Extends the snowball analyzer to use German language rules:

CREATE ANALYZER "german_snowball" EXTENDS snowball WITH (
    language = 'german'
);

Custom Char Filter with Mappings

Replaces ph → f, qu → q, etc.:

CREATE ANALYZER mymapping_analyzer (
    CHAR_FILTERS (
        mymapping WITH (
            type = 'mapping',
            mappings = ['ph=>f', 'qu=>q', 'foo=>bar']
        )
    )
);

Full Example with All Components

CREATE ANALYZER advanced_analyzer (
    TOKENIZER whitespace,
    TOKEN_FILTERS (
        lowercase,
        custom_stemmer WITH (type = 'kstem')
    ),
    CHAR_FILTERS (
        html_strip,
        my_mapping WITH (
            type = 'mapping',
            mappings = ['I=>1', 'II=>2']
        )
    )
);

Extending a Custom Analyzer

Overrides the tokenizer of myanalyzer while retaining its other components:

CREATE ANALYZER e2 EXTENDS myanalyzer (
    TOKENIZER mypattern WITH (
        type = 'pattern',
        pattern = '.*'
    )
);

Important Notes

Reserved Names: Avoid using built-in tokenizer/filter names directly. Use custom_name WITH (type = 'builtin_name').
Existing Tables: Existing tables continue using the old analyzer definition. Reindexing is required to apply changes.
Validation: Use EXPLAIN ANALYZE to test analyzer behavior before production use.

Built-in Components

Commonly used components for reference:

Tokenizers

standard: Splits text into words.
ngram: Splits text into n-grams (configure with min_gram/max_gram).
whitespace: Splits text on whitespace.

Token Filters

lowercase: Converts tokens to lowercase.
kstem: English-language stemming.
stop: Removes stopwords.

Char Filters

html_strip: Removes HTML tags.
mapping: Replaces characters via a map (e.g., ph=>f).

Notes

Analyzer Uniqueness: Analyzer names must be unique within the cluster. Attempting to create an analyzer with an existing name will result in an error.
Reindexing Requirement: Changes to analyzer definitions do not automatically affect existing data. Reindexing is required for changes to take effect on previously indexed documents.
Component Naming Best Practices:
Avoid reusing names of built-in components directly. Use a custom name and assign a type.
Example: sql TOKENIZER my_tokenizer WITH (type = 'standard')
Dynamic Use in Table Schema:
Analyzers can be assigned to fulltext columns in table definitions using INDEX USING fulltext WITH (analyzer = 'my_analyzer').
Validation & Testing:
Use EXPLAIN ANALYZE or sample indexing workflows to test custom analyzer behavior before applying in production.

Permissions

Execution Rights: The user must have the DDL (Data Definition Language) privilege to execute CREATE ANALYZER.
Cluster Scope: Analyzer definitions are global to the cluster and available to all users and schemas once created.

Summary

Feature	Supported
Custom Analyzers	Yes Yes
Extend Built-in Analyzers	Yes Yes (via `EXTENDS`)
Override Built-in Components	Yes Yes (tokenizer, token filters, etc.)
Use in Fulltext Columns	Yes Yes
Reindex Required for Updates	Yes Yes
Requires DDL Privilege	Yes Yes
Global Scope	Yes Yes (cluster-wide availability)