CREATE ANALYSER
Enterprise command reference.
Command Snapshot
| Field | Value |
|---|---|
| Category | DDL and Administration |
| Mutates Data | Yes/Depends |
| Scope | Cluster / Object |
| Privilege Model | Requires DDL/administrative privilege according to target object scope. |
Purpose
Defines, changes, or removes schema and metadata objects.
Syntax
CREATE ANALYZER analyzer_name EXTENDS parent_analyzer_name
WITH ( override_parameter [= value] [, ... ] )
Operational Notes
- Use schema-qualified identifiers in automation and automation pipelines.
- Validate behavior in staging for cluster-impacting or governance-impacting changes.
- Confirm runtime effects through system tables and metrics before and after execution.
When to Use
- Use during planned schema and runtime administration changes.
- Use in automation pipelines with environment-specific validation and rollback strategy.
When Not to Use
- Avoid during incident windows unless the command is part of approved mitigation.
- Avoid schema changes in peak traffic windows without staged rollout.
Common Errors and Troubleshooting
| Symptom | Likely Cause | Action |
|---|---|---|
| Permission denied / unauthorized | Missing privilege on object or cluster scope | Re-run with required grants or elevated admin role. |
| Analysis/parse error | Syntax variant or object shape mismatch | Compare with canonical syntax and object definition. |
| Runtime failure under load | Resource limits, breaker pressure, or node state transitions | Check sys.jobs, sys.operations, sys.checks, and retry after mitigation. |
Cross-References
Detailed Reference
Defines a new fulltext analyzer for use in text search operations. Analyzers process text data into tokens for efficient indexing and querying.
SQL Statement
Extending an Existing Analyzer
CREATE ANALYZER analyzer_name EXTENDS parent_analyzer_name
WITH ( override_parameter [= value] [, ... ] )
Creating a New Analyzer
CREATE ANALYZER analyzer_name (
[ TOKENIZER
{
tokenizer_name
| custom_name WITH ( type = tokenizer_name, tokenizer_parameter [= value] [, ... ] )
}
]
[ TOKEN_FILTERS (
{
token_filter_name
| custom_name WITH ( type = token_filter_name, token_filter_parameter [= value] [, ... ] )
}
[, ... ]
)
]
[ CHAR_FILTERS (
{
char_filter_name
| custom_name WITH ( type = char_filter_name, char_filter_parameter [= value] [, ... ] )
}
[, ... ]
)
]
)
Parameters
| Parameter | Description |
|---|---|
| analyzer_name | Unique name for the analyzer. |
| parent_analyzer_name | Base analyzer to inherit from (for EXTENDS). |
| override_parameter | Parameter of the parent analyzer to override. |
| tokenizer_name | Built-in tokenizer name (e.g., standard, ngram). |
| tokenizer_parameter | Tokenizer-specific configuration (e.g., min_gram, max_gram). |
| token_filter_name | Built-in token filter (e.g., lowercase, kstem). |
| token_filter_parameter | Token filter configuration. |
| char_filter_name | Built-in character filter (e.g., html_strip, mapping). |
| char_filter_parameter | Character filter configuration. |
| custom_name | Unique identifier for a tokenizer/token-filter/char-filter with parameters. |
Description
Key Features:
- Tokenizers: Split text into tokens (e.g., by whitespace, n-grams).
- Token Filters: Modify tokens (e.g., lowercase, stemming).
- Char Filters: Pre-process text (e.g., strip HTML tags, replace characters).
Examples
Basic Analyzer with Built-in Components
Creates an analyzer that:
1. Tokenizes using whitespace.
2. Converts tokens to lowercase.
3. Applies the kstem filter for English stemming.
4. Strips HTML tags.
CREATE ANALYZER testanalyzer (
TOKENIZER whitespace,
TOKEN_FILTERS (lowercase, kstem),
CHAR_FILTERS (html_strip)
);
Custom Tokenizer with Parameters
Creates an analyzer using a custom ngram tokenizer with parameters:
CREATE ANALYZER my_ngram_analyzer (
TOKENIZER my_ngram WITH (
type = 'ngram',
min_gram = 2,
max_gram = 10
)
);
Extending a Built-in Analyzer
Extends the snowball analyzer to use German language rules:
CREATE ANALYZER "german_snowball" EXTENDS snowball WITH (
language = 'german'
);
Custom Char Filter with Mappings
Replaces ph → f, qu → q, etc.:
CREATE ANALYZER mymapping_analyzer (
CHAR_FILTERS (
mymapping WITH (
type = 'mapping',
mappings = ['ph=>f', 'qu=>q', 'foo=>bar']
)
)
);
Full Example with All Components
CREATE ANALYZER advanced_analyzer (
TOKENIZER whitespace,
TOKEN_FILTERS (
lowercase,
custom_stemmer WITH (type = 'kstem')
),
CHAR_FILTERS (
html_strip,
my_mapping WITH (
type = 'mapping',
mappings = ['I=>1', 'II=>2']
)
)
);
Extending a Custom Analyzer
Overrides the tokenizer of myanalyzer while retaining its other components:
CREATE ANALYZER e2 EXTENDS myanalyzer (
TOKENIZER mypattern WITH (
type = 'pattern',
pattern = '.*'
)
);
Important Notes
- Reserved Names: Avoid using built-in tokenizer/filter names directly. Use
custom_name WITH (type = 'builtin_name'). - Existing Tables: Existing tables continue using the old analyzer definition. Reindexing is required to apply changes.
- Validation: Use
EXPLAIN ANALYZEto test analyzer behavior before production use.
Built-in Components
Commonly used components for reference:
Tokenizers
standard: Splits text into words.ngram: Splits text into n-grams (configure withmin_gram/max_gram).whitespace: Splits text on whitespace.
Token Filters
lowercase: Converts tokens to lowercase.kstem: English-language stemming.stop: Removes stopwords.
Char Filters
html_strip: Removes HTML tags.mapping: Replaces characters via a map (e.g.,ph=>f).
Notes
- Analyzer Uniqueness: Analyzer names must be unique within the cluster. Attempting to create an analyzer with an existing name will result in an error.
- Reindexing Requirement: Changes to analyzer definitions do not automatically affect existing data. Reindexing is required for changes to take effect on previously indexed documents.
- Component Naming Best Practices:
- Avoid reusing names of built-in components directly. Use a custom name and assign a
type. - Example:
sql TOKENIZER my_tokenizer WITH (type = 'standard') - Dynamic Use in Table Schema:
- Analyzers can be assigned to fulltext columns in table definitions using
INDEX USING fulltext WITH (analyzer = 'my_analyzer'). - Validation & Testing:
- Use
EXPLAIN ANALYZEor sample indexing workflows to test custom analyzer behavior before applying in production.
Permissions
- Execution Rights: The user must have the
DDL(Data Definition Language) privilege to executeCREATE ANALYZER. - Cluster Scope: Analyzer definitions are global to the cluster and available to all users and schemas once created.
Summary
| Feature | Supported |
|---|---|
| Custom Analyzers | Yes Yes |
| Extend Built-in Analyzers | Yes Yes (via EXTENDS) |
| Override Built-in Components | Yes Yes (tokenizer, token filters, etc.) |
| Use in Fulltext Columns | Yes Yes |
| Reindex Required for Updates | Yes Yes |
| Requires DDL Privilege | Yes Yes |
| Global Scope | Yes Yes (cluster-wide availability) |