Server¶

YomiToku-Pro can be launched as a REST API server for document analysis over HTTP.

Setup¶

Installation¶

uv pip install -e ".[server]"

Environment Variables¶

Variable	Required	Description
`YOMITOKU_LICENSE_KEY`	Yes	License key
`YOMITOKU_SECRET_KEY`	Yes	Secret key
`YOMITOKU_DEVICE_TOKEN`	No	Path to device token file for offline authentication
`YOMITOKU_ENV`	No	Environment name

Starting the Server¶

Document Analyzer Server¶

yomitoku_server document_analyzer [--host 0.0.0.0] [--port 8000] [--device cuda]

Table Semantic Parser Server¶

yomitoku_server table_semantic_parser [--host 0.0.0.0] [--port 8000] [--device cuda]

Options¶

Option	Default	Description
`--host`	`0.0.0.0`	Host to bind to
`--port`	`8000`	Port to bind to
`--device`	Auto-detect	Device to use (`cuda`, `cpu`)
`-l`, `--lite`	-	Use lightweight models for faster inference (automatically enabled on CPU)
`--request-timeout`	`600`	Per-request processing timeout in seconds. Returns HTTP 504 when exceeded
`--max-pages`	Unlimited	Maximum number of PDF pages allowed per request. Returns HTTP 400 when exceeded
`--max-body-size-mb`	`100`	Maximum request body size in MB. Returns HTTP 413 when exceeded
`--max-long-side`	Unlimited	Maximum length of the long side (in pixels) per page/image. Returns HTTP 400 when exceeded
`--max-in-flight`	`8`	Maximum number of concurrent in-flight requests admitted. Returns HTTP 503 when exceeded

Concurrency and worker count

To avoid GPU resource contention, the server serializes inference using an in-process asyncio.Lock. Because this lock does not span processes, the server is always launched with a single worker (workers=1). To scale throughput, run multiple processes horizontally (e.g. multiple GPUs or containers).

Request fairness

The GPU lock is released and reacquired per page, so smaller requests can interleave between pages of a large PDF (FIFO order). The --max-in-flight option caps concurrent admitted requests, bounding both queued memory footprint and the worst-case waiting time.

HTTP Status Codes¶

Status codes returned by the /invocations endpoint:

Code	Meaning	Trigger
200	OK	Successful processing
400	Bad Request	Invalid file format, exceeds `--max-pages` or `--max-long-side`
413	Payload Too Large	Request body exceeds `--max-body-size-mb`
500	Internal Server Error	Unexpected server error
503	Service Unavailable	Exceeds `--max-in-flight` concurrent admitted requests
504	Gateway Timeout	Processing time exceeds `--request-timeout`
507	Insufficient Storage	GPU out of memory (error code `5007 GPU_OUT_OF_MEMORY`)

API Endpoints¶

`GET /ping`¶

Health check endpoint.

Response:

{"status": "healthy", "message": "Service is running"}

`POST /invocations`¶

Analyze document images or PDFs.

Request:

Set the Content-Type header to the file format
Send binary data in the request body

Supported Content Types:

application/pdf
image/jpeg
image/png
image/tiff

Example Requests:

curl -X POST http://localhost:8000/invocations \
     -H "Content-Type: image/jpeg" \
     --data-binary @sample.jpg

curl -X POST http://localhost:8000/invocations \
     -H "Content-Type: application/pdf" \
     --data-binary @document.pdf

Client CLI¶

The yomitoku_client command sends files to the server and exports results in the specified format.

Usage¶

yomitoku_client document_analyzer <input_file> [options]
yomitoku_client table_semantic_parser <input_file> [options]

Options¶

Option	Default	Description
`--url`	`http://localhost:8000`	Server URL
`-f`, `--format`	`json`	Output format (`json`, `csv`, `html`, `md`, `dict`). Comma-separated for multiple
`-o`, `--outdir`	`results`	Output directory
`--encoding`	`utf-8`	Output file encoding
`--combine`	-	Merge all PDF pages into a single output file
`--figure`	-	Export figures in the output
`--figure_letter`	-	Export letters within figures in the output
`--figure_width`	`200`	Width of exported figure images in pixels
`--figure_dir`	`figures`	Directory to save figure images
`-v`, `--vis`	-	Save visualization images (layout & OCR) of the results
`--vis_dir`	`""`	Subdirectory under outdir for visualization images
`--font_path`	Bundled font	Path to font file for OCR visualization
`--ignore_line_break`	-	Remove line breaks from output
`--dpi`	`200`	DPI for loading PDF files
`--pages`	All pages	Pages to process (e.g., `1,2,5-10`)

Examples¶

# Document Analyzer: Export as JSON
yomitoku_client document_analyzer sample.jpg -f json -o results

# Document Analyzer: Export as both Markdown and HTML
yomitoku_client document_analyzer document.pdf -f md,html -o results

# Document Analyzer: Merge all PDF pages into a single Markdown
yomitoku_client document_analyzer document.pdf -f md --combine -o results

# Document Analyzer: Export Markdown with figures
yomitoku_client document_analyzer sample.jpg -f md --figure -o results

# Document Analyzer: Save visualization images (layout & OCR)
yomitoku_client document_analyzer sample.jpg -f json -v -o results

# Table Semantic Parser: Export as JSON and dict
yomitoku_client table_semantic_parser form.jpg -f json,dict -o results

# Connect to a different host/port
yomitoku_client document_analyzer sample.jpg --url http://192.168.1.100:8080

Document Analyzer Output Formats¶

Format	Description
`json`	Full analysis result as JSON
`csv`	Table structures converted to CSV
`html`	HTML document with paragraphs and tables
`md`	Markdown format

Table Semantic Parser Output Formats¶

Format	Description
`json`	Full analysis result as JSON
`dict`	Table cells converted to key-value dictionary format as JSON

Client CLI Error Codes¶

Code	Error Name	Description	Resolution
7001	`CLIENT_FILE_NOT_FOUND`	Input file not found	Check the file path
7002	`CLIENT_UNSUPPORTED_INPUT_FORMAT`	Unsupported input file format	Use JPG, PNG, BMP, TIFF, or PDF
7003	`CLIENT_UNSUPPORTED_FILE_FORMAT`	Extension cannot be mapped to a Content-Type	Use a file with a supported extension
7004	`CLIENT_CONNECTION_ERROR`	Cannot connect to the server	Verify the server is running and the URL is correct
7005	`CLIENT_TIMEOUT`	Server response timed out (60s)	Check the server load
7006	`CLIENT_REQUEST_FAILED`	Network error during request	Check network connectivity
7007	`CLIENT_SERVER_ERROR`	Server returned an error status	Check server logs (400: bad request, 500: internal error)
7008	`CLIENT_INVALID_JSON_RESPONSE`	Server returned invalid JSON	Check server version and status
7009	`CLIENT_EMPTY_RESULT`	No analysis results in response	Check the input file content and server logs

API Documentation¶

While the server is running, API documentation is available at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

For a static API reference without starting the server, see:

Regenerating OpenAPI Schemas¶

When API definitions change, regenerate the OpenAPI JSON files under docs/:

python scripts/generate_openapi.py --output-dir docs

Docker Usage¶

To use Docker, you first need to clone the repository.

1. Clone the Repository¶

git clone https://github.com/MLism-Inc/yomitoku-pro.git
cd yomitoku-pro

2. Set Environment Variables¶

Create a .env file in the docker/ directory or set environment variables directly.

export YOMITOKU_LICENSE_KEY="your-license-key"
export YOMITOKU_SECRET_KEY="your-secret-key"

3. Build & Run¶

cd docker

Each platform has services for both Document Analyzer (no suffix) and Table Semantic Parser (_tsp).

Service	Dockerfile	Platform	Analyzer	Port
`arm64_cpu`	`Dockerfile.cpu`	linux/arm64	Document Analyzer	8000
`arm64_cpu_tsp`	`Dockerfile.cpu`	linux/arm64	Table Semantic Parser	8001
`amd64_cpu`	`Dockerfile.cpu`	linux/amd64	Document Analyzer	8000
`amd64_cpu_tsp`	`Dockerfile.cpu`	linux/amd64	Table Semantic Parser	8001
`amd64_gpu`	`Dockerfile.gpu`	linux/amd64	Document Analyzer	8000
`amd64_gpu_tsp`	`Dockerfile.gpu`	linux/amd64	Table Semantic Parser	8001

ARM64 CPU (Apple Silicon):

# Document Analyzer only
docker compose up arm64_cpu --build

# Table Semantic Parser only
docker compose up arm64_cpu_tsp --build

# Both at the same time
docker compose up arm64_cpu arm64_cpu_tsp --build

AMD64 CPU:

docker compose up amd64_cpu --build
docker compose up amd64_cpu_tsp --build

AMD64 GPU (NVIDIA):

docker compose up amd64_gpu --build
docker compose up amd64_gpu_tsp --build

Note

The GPU services (amd64_gpu, amd64_gpu_tsp) use an NVIDIA CUDA base image and only work on AMD64 environments. For ARM64 or environments without a GPU, use one of the CPU services.

Device Token Setup (Offline Authentication)¶

To use a device token for offline authentication, you need to edit the Dockerfile directly.

Steps:

Place your device token file (device_token.txt) in the repository root
Open the Dockerfile you are using (docker/Dockerfile.gpu or docker/Dockerfile.cpu) and uncomment the lines near the end

# Before (commented out)
#COPY device_token.txt ${server_dir}/device_token.txt
#ENV YOMITOKU_DEVICE_TOKEN=${server_dir}/device_token.txt
#ENV YOMITOKU_ENV=${ENVIRONMENT}

# After (uncommented)
COPY device_token.txt ${server_dir}/device_token.txt
ENV YOMITOKU_DEVICE_TOKEN=${server_dir}/device_token.txt
ENV YOMITOKU_ENV=production

Set YOMITOKU_ENV to the appropriate environment name for your deployment.

Custom Commands¶

You can also use the ANALYZER_TYPE build argument to switch the analyzer type at build time:

# Build directly as Table Semantic Parser
docker build --build-arg ANALYZER_TYPE=table_semantic_parser -f docker/Dockerfile.gpu -t yomitoku-server:tsp .
docker run yomitoku-server:tsp

Server¶

Setup¶

Installation¶

Environment Variables¶

Starting the Server¶

Document Analyzer Server¶

Table Semantic Parser Server¶

Options¶

HTTP Status Codes¶

API Endpoints¶

GET /ping¶

POST /invocations¶

Client CLI¶

Usage¶

Options¶

Examples¶

Document Analyzer Output Formats¶

Table Semantic Parser Output Formats¶

Client CLI Error Codes¶

API Documentation¶

Regenerating OpenAPI Schemas¶

Docker Usage¶

1. Clone the Repository¶

2. Set Environment Variables¶

3. Build & Run¶

Device Token Setup (Offline Authentication)¶

Custom Commands¶

`GET /ping`¶

`POST /invocations`¶