CLI Usage¶

This page explains how to use YomiToku as a command-line interface (CLI).

When you run the command for the first time, the model weight files will be automatically downloaded from HuggingFace Hub. After that, you can analyze document images using the following command:

yomitoku ${path_data} -v -o results

Option Name	Description
`${path_data}`	Specifies the path to the directory containing images or the path to an image file.
`-o`, `--outdir`	Specifies the output directory (will be created if it doesn't exist).
`-v`, `--vis`	Outputs a visualization image of the analysis results.

Supplement: About ${path_data}

An image file or a directory can be specified.
If a directory is specified, it will be processed recursively, including subdirectories.
The supported file formats are pdf, jpeg, png, bmp, and tiff.

Note

OCR is generally divided into Document OCR and Scene OCR (e.g., text on signs or surfaces other than paper). YomiToku is optimized for Document OCR.
The accuracy of AI-OCR depends heavily on the resolution of the input image. For best results, we recommend using images with a minimum short edge of 1000px.

Displaying Help¶

To display the list of available options:

yomitoku --help
# or
yomitoku -h

License Key Authentication¶

You can also specify the license key and secret key directly when running the command:

yomitoku ${path_data} -k ${your_license_key} -s ${your_secret_key}

-k, --license_key: Specify your license key.
-s, --secret_key: Specify your secret key.

Lightweight Mode (Faster Processing)¶

Use the --lite option to run inference with a lightweight model. This allows faster analysis compared to normal mode, though text recognition accuracy may decrease.

yomitoku ${path_data} --lite -v

Specifying Visualization Output Directory¶

To specify the folder for saving visualized images:

yomitoku ${path_data} -f md -v --vis_dir ${folder_name}

Specifying Output Format¶

Use -f or --format to specify the output format of analysis results. Supported formats are: json, csv, html, md, pdf (searchable-pdf).

yomitoku ${path_data} -f md

If pdf is specified, the system will recognize the text within the image using OCR and embed the text information as an invisible layer to convert it into a searchable PDF.

You can specify multiple formats at once, separated by commas:

yomitoku ${path_data} -f md,html,json,csv

Specifying Inference Device¶

Use the -d or --device option to specify the device for model execution. Supported values: cuda, cpu, mps. Default is cuda. If GPU is not available, it will fall back to cpu.

yomitoku ${path_data} -d cpu

Ignoring Line Breaks¶

By default, line breaks follow the layout in the image. With the --ignore_line_break option, line breaks are ignored and sentences in the same paragraph are merged.

yomitoku ${path_data} --ignore_line_break

Extracting and Saving Figures/Graphs¶

Normally, figures and images in documents are not extracted. With the --figure option, they will be cropped and saved as separate image files, and links to them will be included in the output file.

yomitoku ${path_data} --figure

To specify the folder for saving images:

yomitoku ${path_data} --figure_dir ${folder_name}

If you set --figure_dir to an empty string, the images will be saved directly under the output folder:

yomitoku ${path_data} --figure_dir ""

Extracting Text in Figures or Images¶

By default, text contained within figures or images is not extracted. With the --figure_letter option, text in figures/images will also be included in the output.

yomitoku ${path_data} --figure --figure_letter

Specifying Output File Encoding¶

You can set the character encoding for the output file using --encoding. Supported encodings: utf-8, utf-8-sig, shift-jis, enc-jp, cp932. Unsupported characters will be ignored.

yomitoku ${path_data} --encoding utf-8-sig

Specifying Config File Paths¶

You can specify the YAML config file paths for each module:

Option Name	Target Model
`--td_cfg`	Text Detector (TD)
`--tr_cfg`	Text Recognizer (TR)
`--lp_cfg`	Layout Parser (LP)
`--tsr_cfg`	Table Structure Recognizer (TSR)

Example:

yomitoku ${path_data} --td_cfg ${path_yaml}

Excluding Metadata¶

Exclude metadata such as headers or footers from the output file:

yomitoku ${path_data} --ignore_meta

Combining Multiple PDF Pages into One File¶

If the input is a multi-page PDF, you can export all pages into a single output file:

yomitoku ${path_data} -f md --combine

Automatic Document Orientation Correction¶

If images are rotated (e.g., sideways), YomiToku can detect and automatically correct their orientation:

yomitoku ${path_data} --rotate_detection

Enabling Recognition Orientation Fallback¶

By default, orientation fallback is disabled. When enabled with --enable-rec-orientation-fallback, if the confidence score of text recognition is low, the system retries recognition with the ROI image rotated 180 degrees and adopts the result with the higher confidence.

yomitoku ${path_data} --enable-rec-orientation-fallback

You can specify the confidence threshold for triggering the fallback using --rec-orientation-fallback-thresh. (Default: 0.75)

yomitoku ${path_data} --enable-rec-orientation-fallback --rec-orientation-fallback-thresh 0.6

Checking Request Count¶

You can check the usage count linked to your YomiToku license key:

query_count --license_key ${YOMITOKU_LICENSE_KEY} --secret_key ${YOMITOKU_SECRET_KEY}

Each option is optional. If omitted, values will be read from environment variables.

Specifying Reading Order¶

By default, the reading order option is set to auto.

When auto is specified, the system identifies the document's orientation (horizontal or vertical) and automatically estimates the reading order. Specifically, the order is estimated as top2left for horizontal documents and top2bottom for vertical documents.

Setting Name	Preferred Reading Order	Valid Document Types
`top2bottom`	Top to Bottom	Column-formatted Word documents, etc.
`left2right`	Left to Right	Layouts where keys and values are in columns (e.g., receipts, insurance cards)
`right2left`	Right to Left	Vertically written documents

You can also explicitly set it:

yomitoku ${path_data} --reading_order left2right

PDF Output Image Quality¶

You can specify the image quality preset for searchable PDF output using --pdf_quality. The default is high.

Preset	Max Long Side	JPEG Quality	Description
`high`	No limit	85	High quality (default). Preserves the original image resolution.
`middle`	2000px	80	Medium quality. Balances file size and image quality.
`low`	1500px	60	Low quality. Minimizes file size.

yomitoku ${path_data} -f pdf --pdf_quality middle

Setting the PDF Reading Resolution¶

Specifies the resolution (DPI) when reading a PDF (default DPI = 200). Increasing the DPI value may improve recognition accuracy when dealing with fine text or small details within the PDF.

yomitoku ${path_data} --dpi 250

Excluding Ruby (Furigana) Text¶

You can exclude ruby (furigana) text from the output. When the --ignore_ruby option is set, text whose line height is below a certain threshold relative to the median line height within each paragraph or cell, and consists solely of hiragana or katakana characters, is identified as ruby and excluded.

yomitoku ${path_data} --ignore_ruby

You can adjust the ruby detection threshold using the --ruby_threshold option (default: 2.0). Increasing the value widens the range of text identified as ruby.

yomitoku ${path_data} --ignore_ruby --ruby_threshold 3.0

Specifying Pages to Process¶

You can choose to process only specific pages. Pages can be specified either as a comma-separated list or as a range using a hyphen.

yomitoku ${path_data} --pages 1,3-5,10

Specify and Execute a Model¶

You can run AI-OCR by specifying particular models. Use tr_name to define the text recognition model and td_name to define the text detection model.

yomitoku ${path_data} --tr_name parseqv4-short --td_name dbnet

Model List and Key Features¶

Category	Model Name	Version	Max Sequence Length	Supported Text Types	Description
Text Recognition	`parseqv3`	v1.3.0	100 characters	Printed / Handwritten	Accuracy-optimized model providing high OCR performance for general documents.
Text Recognition	`parseqv4`	v1.4.0	100 characters	Printed / Handwritten / Old-style / Variant Characters	High-accuracy model supporting a wide range of Japanese characters, including historical and variant forms. (★ Default)
Text Recognition	`parseqv4-short`	v1.4.0	75 characters	Printed / Handwritten / Old-style / Variant Characters	Balanced model optimized for both processing speed and accuracy.
Text Recognition	`parseqv4-tiny`	v1.4.0	50 characters	Printed / Handwritten / Old-style / Variant Characters	High-speed lightweight model optimized for CPU inference with broad versatility.
Text Recognition	`parseqv4-large`	v1.6	100 characters	Printed / Handwritten / Old-style / Variant Characters	Large-scale model with stronger language model correction. Improved recognition of fine characters, vertical text, and symbols.
Text Detection	`dbnet`	v1.0.0	—	Printed	Detection model optimized for printed text.
Text Detection	`dbnetv2`	v1.2.0	—	Printed / Handwritten	Detection model optimized for both printed and handwritten text.
Text Detection	`dbnetv2_1`	v1.6	—	Printed / Handwritten	Improved version of dbnetv2 with enhanced detection of fine characters, vertical text, and symbols. (★ Default)

For maximum accuracy: parseqv4-large + dbnetv2_1
For balanced speed and accuracy on printed documents: parseqv4-short + dbnetv2_1
For efficient CPU-based recognition of mixed printed and handwritten documents: parseqv4-tiny + dbnetv2_1

Recommended Combinations¶

For maximum accuracy: parseqv4-large + dbnetv2_1
For balanced speed and accuracy on printed documents: parseqv4-short + dbnetv2_1
For efficient CPU-based recognition of mixed printed and handwritten documents: parseqv4-tiny + dbnetv2_1