Writing Custom Comparators¶
This section explains how to extend genschema with your own comparators. Comparators are small, focused components that add or adjust fields in the generated JSON Schema.
Overview¶
A comparator is a class that implements two methods:
can_process(ctx, env, prev_result) -> boolprocess(ctx, env, prev_result) -> (updates, alternatives)
Comparators are executed for every node of the schema tree. The pipeline
passes the current context and the partial schema node being built. The
core TypeComparator always runs first to establish the node type,
then all registered comparators run in registration order. The
comparator can:
Add or update fields in the current node.
Provide alternative nodes (for
anyOf/oneOf).Mark fields for deletion using
ToDelete.
Where Comparators Run¶
The pipeline descends into the schema based on the inferred type:
object-> each property is processed in its own sub-context.array-> theitemssub-context is processed.pseudo-arrays -> handled by the pseudo-array handler.
The env argument is a path-like string showing where you are in the tree:
/(root)/properties/name/items/patternProperties/<pattern>
Use this to scope your comparator to a specific level.
Core Types and Interfaces¶
The base classes live in genschema/comparators/template.py:
Comparator: base class.ComparatorResult: return type alias.ProcessingContext: current inputs.Resource: wrapper for each input schema or JSON instance.ToDelete: marker for fields that should be removed.
Important fields in ProcessingContext:
schemas: list of input JSON Schemas (if any).jsons: list of input JSON instances (if any).sealed: whenTrue, comparators should avoid introducinganyOf.
Comparator Result Contract¶
process() returns a pair (updates, alternatives):
updates: adictmerged into the current node.alternatives: a list of schema nodes that becomeanyOforoneOf.
Return (None, None) when nothing should be changed.
Notes:
If
ctx.sealedisTrue, comparators should not emit alternatives.The pipeline merges
updatesinto the current node usingdict.update.If you return a
ToDeletevalue for a key, the pipeline removes it later.
Minimal Comparator Example¶
The simplest comparator just adds a field:
from genschema.comparators.template import Comparator, ComparatorResult, ProcessingContext
class TitleComparator(Comparator):
name = "title"
def can_process(self, ctx: ProcessingContext, env: str, prev_result: dict) -> bool:
# Only at the root node, and only if not already set.
return env == "/" and "title" not in prev_result
def process(self, ctx: ProcessingContext, env: str, prev_result: dict) -> ComparatorResult:
return {"title": "Generated schema"}, None
Adding JSON Schema Version¶
To set the JSON Schema version at the top level (root only):
from genschema.comparators.template import Comparator, ComparatorResult, ProcessingContext
class SchemaVersionComparator(Comparator):
name = "schema_version"
def __init__(self, version: str = "https://json-schema.org/draft/2020-12/schema"):
self._version = version
def can_process(self, ctx: ProcessingContext, env: str, prev_result: dict) -> bool:
return env == "/" and "$schema" not in prev_result
def process(self, ctx: ProcessingContext, env: str, prev_result: dict) -> ComparatorResult:
return {"$schema": self._version}, None
Working With Alternatives (anyOf / oneOf)¶
If a comparator needs to emit alternatives, return them in the second tuple item. Each alternative is a schema node fragment.
def process(self, ctx: ProcessingContext, env: str, prev_result: dict) -> ComparatorResult:
variants = [
{"type": "string"},
{"type": "integer"},
]
if ctx.sealed:
# In sealed context, choose a deterministic single option.
return variants[0], None
return None, variants
When alternatives are returned, the pipeline will wrap them into the configured
base combinator (anyOf or oneOf).
Using Input Data¶
The context provides both schemas and JSON instances. Each entry is a Resource
with id, type, and content:
for s in ctx.schemas:
if isinstance(s.content, dict):
# Use s.content to inspect schema fields.
pass
for j in ctx.jsons:
# j.content is a raw JSON value.
pass
Best Practices¶
Keep
can_processfast and side-effect free.Avoid mutating
prev_resultorctxin-place.Scope behavior using
envto avoid unintended changes in nested nodes.Respect
ctx.sealedto avoid creating new unions.Return
Nonewhen no changes are required.
Registering a Comparator (Python)¶
TypeComparator is the core comparator and must be passed via core_comparator
when creating a Converter. All other comparators are registered via register:
from genschema import Converter
from genschema.comparators import RequiredComparator, SchemaVersionComparator
conv = Converter()
conv.register(RequiredComparator())
conv.register(SchemaVersionComparator())
Registering a Comparator (CLI)¶
The CLI registers a default set of comparators. To disable one, use the flags
shown by genschema --help. If you add your own comparator, you can:
Register it in your own Python script (recommended), or
Extend
genschema/cli.pyto include it in the CLI pipeline.
Troubleshooting¶
If your comparator never runs, check
can_processandenv.If fields disappear, ensure you are not returning
ToDeleteaccidentally.If you see unexpected unions, verify when you are returning alternatives.