genschema CLI Tool¶

genschema is a command-line utility that generates JSON Schema from one or more JSON documents. It supports multiple input files, stdin input, smart type merging with anyOf/oneOf, pseudo-array detection, and several optional schema refinement comparators.

Note

The CLI can now run shared-reference extraction directly with --extract-refs. The Python API still exposes more advanced tuning options when you need custom merge or naming strategies.

Features¶

Generate JSON Schema from single or multiple JSON instances
Merge schemas using anyOf or oneOf combinators
Automatic detection of pseudo-arrays (inhomogeneous arrays treated as object-like structures)
Optional comparators: - Format detection (format keyword) - Required properties inference - Empty value handling (null vs absence) - Element deletion in special cases (e.g. pseudo-array markers)
Extensible comparator pipeline for custom refinements such as EnumComparator
Output to file or stdout
Rich console output with error reporting and timing

Usage¶

genschema [OPTIONS] [INPUTS]...

Arguments¶

INPUTS: Paths to JSON files, or - to read from stdin. Multiple files are allowed. If no inputs are provided, help is shown and program exits.

Options¶

-o, --output OUTPUT: Path to the output JSON Schema file. If omitted, schema is printed to stdout.
--base-of {anyOf,oneOf}: Schema combination strategy when types differ across instances. Default: anyOf
--no-pseudo-array: Disable pseudo-array detection and handling.
--no-format: Disable inference of format keywords (email, date, uri, etc.).
--no-enum: Disable inference of enum for compact string fields.
--no-required: Disable automatic population of the required array.
--no-empty: Disable special handling of empty values / missing properties.
--no-delete-element: Disable all DeleteElement comparators (including pseudo-array cleanup).
--extract-refs: Run reference-extraction postprocessing and emit shared $defs / $ref blocks.
--refs-similarity-threshold FLOAT: Similarity threshold for grouping reference candidates. Default: 0.85
--refs-min-total-keys INT: Minimum total number of structural keys before extraction is applied. Default: 3
--refs-min-occurrences INT: Minimum number of similar occurrences required for extraction. Default: 2
--refs-defs-key TEXT: Definition container key for extracted refs. Default: $defs

Examples¶

Read single file and write schema to disk¶

genschema data.json -o schema.json

Multiple files → anyOf combination¶

genschema user1.json user2.json user3.json --base-of anyOf -o schema.json

Read from stdin¶

cat record.json | genschema -

# or with redirection
genschema - < record.json

# piping from another command
curl https://api.example.com/data | genschema -o api-schema.json

Use oneOf instead of anyOf¶

genschema event-log-*.json --base-of oneOf -o events.schema.json

Extract shared refs¶

genschema input.json --extract-refs -o schema.json

Tune ref extraction¶

genschema input.json --extract-refs --refs-similarity-threshold 0.9 --refs-min-total-keys 4 -o schema.json

Exit Codes¶

0 — success
1 — invalid JSON, file not found, schema generation error, etc.

Output¶

When writing to stdout, the schema is printed as formatted JSON (indent=2). When writing to file, the same formatted JSON is saved and a success message is shown.

Console also reports:

number of processed JSON instances
elapsed generation time

Implementation Notes¶

The tool is built around a modular Converter class that:

Accepts multiple JSON documents
Applies a chain of comparators/transformers
Supports optional pseudo-array flattening
Uses configurable base combinator (anyOf / oneOf)

Comparators can be selectively disabled via CLI flags.

genschema CLI Tool¶

Features¶

Usage¶

Arguments¶

Options¶

Examples¶

Read single file and write schema to disk¶

Multiple files → anyOf combination¶

Read from stdin¶

Use oneOf instead of anyOf¶

Extract shared refs¶

Tune ref extraction¶

Disable most refinements (minimal schema)¶

Exit Codes¶

Output¶

Implementation Notes¶