genschema CLI Tool¶
genschema is a command-line utility that generates JSON Schema from one or more JSON documents.
It supports multiple input files, stdin input, smart type merging with anyOf/oneOf, pseudo-array detection, and several optional schema refinement comparators.
Note
The CLI can now run shared-reference extraction directly with
--extract-refs. The Python API still exposes more advanced tuning
options when you need custom merge or naming strategies.
Features¶
Generate JSON Schema from single or multiple JSON instances
Merge schemas using
anyOforoneOfcombinatorsAutomatic detection of pseudo-arrays (inhomogeneous arrays treated as object-like structures)
Optional comparators: - Format detection (
formatkeyword) - Required properties inference - Empty value handling (nullvs absence) - Element deletion in special cases (e.g. pseudo-array markers)Extensible comparator pipeline for custom refinements such as
EnumComparatorOutput to file or stdout
Rich console output with error reporting and timing
Usage¶
genschema [OPTIONS] [INPUTS]...
Arguments¶
INPUTSPaths to JSON files, or
-to read from stdin. Multiple files are allowed. If no inputs are provided, help is shown and program exits.
Options¶
-o,--outputOUTPUTPath to the output JSON Schema file. If omitted, schema is printed to stdout.
--base-of{anyOf,oneOf}Schema combination strategy when types differ across instances. Default:
anyOf--no-pseudo-arrayDisable pseudo-array detection and handling.
--no-formatDisable inference of
formatkeywords (email, date, uri, etc.).--no-enumDisable inference of
enumfor compact string fields.--no-requiredDisable automatic population of the
requiredarray.--no-emptyDisable special handling of empty values / missing properties.
--no-delete-elementDisable all
DeleteElementcomparators (including pseudo-array cleanup).--extract-refsRun reference-extraction postprocessing and emit shared
$defs/$refblocks.--refs-similarity-thresholdFLOATSimilarity threshold for grouping reference candidates. Default:
0.85--refs-min-total-keysINTMinimum total number of structural keys before extraction is applied. Default:
3--refs-min-occurrencesINTMinimum number of similar occurrences required for extraction. Default:
2--refs-defs-keyTEXTDefinition container key for extracted refs. Default:
$defs
Examples¶
Read single file and write schema to disk¶
genschema data.json -o schema.json
Multiple files → anyOf combination¶
genschema user1.json user2.json user3.json --base-of anyOf -o schema.json
Read from stdin¶
cat record.json | genschema -
# or with redirection
genschema - < record.json
# piping from another command
curl https://api.example.com/data | genschema -o api-schema.json
Use oneOf instead of anyOf¶
genschema event-log-*.json --base-of oneOf -o events.schema.json
Tune ref extraction¶
genschema input.json --extract-refs --refs-similarity-threshold 0.9 --refs-min-total-keys 4 -o schema.json
Disable most refinements (minimal schema)¶
genschema messy-data.json --no-format --no-enum --no-required --no-empty --no-pseudo-array -o minimal.json
Exit Codes¶
0— success1— invalid JSON, file not found, schema generation error, etc.
Output¶
When writing to stdout, the schema is printed as formatted JSON (indent=2). When writing to file, the same formatted JSON is saved and a success message is shown.
Console also reports:
number of processed JSON instances
elapsed generation time
Implementation Notes¶
The tool is built around a modular Converter class that:
Accepts multiple JSON documents
Applies a chain of comparators/transformers
Supports optional pseudo-array flattening
Uses configurable base combinator (
anyOf/oneOf)
Comparators can be selectively disabled via CLI flags.
See also for more: genschema.cli