Skip to content

Server

YomiToku-Pro can be launched as a REST API server for document analysis over HTTP.

Setup

Installation

uv pip install -e ".[server]"

Environment Variables

Variable Required Description
YOMITOKU_LICENSE_KEY Yes License key
YOMITOKU_SECRET_KEY Yes Secret key
YOMITOKU_DEVICE_TOKEN No Path to device token file for offline authentication
YOMITOKU_ENV No Environment name

Starting the Server

Document Analyzer Server

yomitoku_server document_analyzer [--host 0.0.0.0] [--port 8000] [--device cuda]

Table Semantic Parser Server

yomitoku_server table_semantic_parser [--host 0.0.0.0] [--port 8000] [--device cuda]

Options

Option Default Description
--host 0.0.0.0 Host to bind to
--port 8000 Port to bind to
--device Auto-detect Device to use (cuda, cpu)
-l, --lite - Use lightweight models for faster inference (automatically enabled on CPU)
--request-timeout 600 Per-request processing timeout in seconds. Returns HTTP 504 when exceeded
--max-pages Unlimited Maximum number of PDF pages allowed per request. Returns HTTP 400 when exceeded
--max-body-size-mb 100 Maximum request body size in MB. Returns HTTP 413 when exceeded
--max-long-side Unlimited Maximum length of the long side (in pixels) per page/image. Returns HTTP 400 when exceeded
--max-in-flight 8 Maximum number of concurrent in-flight requests admitted. Returns HTTP 503 when exceeded

Concurrency and worker count

To avoid GPU resource contention, the server serializes inference using an in-process asyncio.Lock. Because this lock does not span processes, the server is always launched with a single worker (workers=1). To scale throughput, run multiple processes horizontally (e.g. multiple GPUs or containers).

Request fairness

The GPU lock is released and reacquired per page, so smaller requests can interleave between pages of a large PDF (FIFO order). The --max-in-flight option caps concurrent admitted requests, bounding both queued memory footprint and the worst-case waiting time.

HTTP Status Codes

Status codes returned by the /invocations endpoint:

Code Meaning Trigger
200 OK Successful processing
400 Bad Request Invalid file format, exceeds --max-pages or --max-long-side
413 Payload Too Large Request body exceeds --max-body-size-mb
500 Internal Server Error Unexpected server error
503 Service Unavailable Exceeds --max-in-flight concurrent admitted requests
504 Gateway Timeout Processing time exceeds --request-timeout
507 Insufficient Storage GPU out of memory (error code 5007 GPU_OUT_OF_MEMORY)

API Endpoints

GET /ping

Health check endpoint.

Response:

{"status": "healthy", "message": "Service is running"}

POST /invocations

Analyze document images or PDFs.

Request:

  • Set the Content-Type header to the file format
  • Send binary data in the request body

Supported Content Types:

  • application/pdf
  • image/jpeg
  • image/png
  • image/tiff

Example Requests:

curl -X POST http://localhost:8000/invocations \
     -H "Content-Type: image/jpeg" \
     --data-binary @sample.jpg
curl -X POST http://localhost:8000/invocations \
     -H "Content-Type: application/pdf" \
     --data-binary @document.pdf

Client CLI

The yomitoku_client command sends files to the server and exports results in the specified format.

Usage

yomitoku_client document_analyzer <input_file> [options]
yomitoku_client table_semantic_parser <input_file> [options]

Options

Option Default Description
--url http://localhost:8000 Server URL
-f, --format json Output format (json, csv, html, md, dict). Comma-separated for multiple
-o, --outdir results Output directory
--encoding utf-8 Output file encoding
--combine - Merge all PDF pages into a single output file
--figure - Export figures in the output
--figure_letter - Export letters within figures in the output
--figure_width 200 Width of exported figure images in pixels
--figure_dir figures Directory to save figure images
-v, --vis - Save visualization images (layout & OCR) of the results
--vis_dir "" Subdirectory under outdir for visualization images
--font_path Bundled font Path to font file for OCR visualization
--ignore_line_break - Remove line breaks from output
--dpi 200 DPI for loading PDF files
--pages All pages Pages to process (e.g., 1,2,5-10)

Examples

# Document Analyzer: Export as JSON
yomitoku_client document_analyzer sample.jpg -f json -o results

# Document Analyzer: Export as both Markdown and HTML
yomitoku_client document_analyzer document.pdf -f md,html -o results

# Document Analyzer: Merge all PDF pages into a single Markdown
yomitoku_client document_analyzer document.pdf -f md --combine -o results

# Document Analyzer: Export Markdown with figures
yomitoku_client document_analyzer sample.jpg -f md --figure -o results

# Document Analyzer: Save visualization images (layout & OCR)
yomitoku_client document_analyzer sample.jpg -f json -v -o results

# Table Semantic Parser: Export as JSON and dict
yomitoku_client table_semantic_parser form.jpg -f json,dict -o results

# Connect to a different host/port
yomitoku_client document_analyzer sample.jpg --url http://192.168.1.100:8080

Document Analyzer Output Formats

Format Description
json Full analysis result as JSON
csv Table structures converted to CSV
html HTML document with paragraphs and tables
md Markdown format

Table Semantic Parser Output Formats

Format Description
json Full analysis result as JSON
dict Table cells converted to key-value dictionary format as JSON

Client CLI Error Codes

Code Error Name Description Resolution
7001 CLIENT_FILE_NOT_FOUND Input file not found Check the file path
7002 CLIENT_UNSUPPORTED_INPUT_FORMAT Unsupported input file format Use JPG, PNG, BMP, TIFF, or PDF
7003 CLIENT_UNSUPPORTED_FILE_FORMAT Extension cannot be mapped to a Content-Type Use a file with a supported extension
7004 CLIENT_CONNECTION_ERROR Cannot connect to the server Verify the server is running and the URL is correct
7005 CLIENT_TIMEOUT Server response timed out (60s) Check the server load
7006 CLIENT_REQUEST_FAILED Network error during request Check network connectivity
7007 CLIENT_SERVER_ERROR Server returned an error status Check server logs (400: bad request, 500: internal error)
7008 CLIENT_INVALID_JSON_RESPONSE Server returned invalid JSON Check server version and status
7009 CLIENT_EMPTY_RESULT No analysis results in response Check the input file content and server logs

API Documentation

While the server is running, API documentation is available at:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

For a static API reference without starting the server, see:

Regenerating OpenAPI Schemas

When API definitions change, regenerate the OpenAPI JSON files under docs/:

python scripts/generate_openapi.py --output-dir docs

Docker Usage

To use Docker, you first need to clone the repository.

1. Clone the Repository

git clone https://github.com/MLism-Inc/yomitoku-pro.git
cd yomitoku-pro

2. Set Environment Variables

Create a .env file in the docker/ directory or set environment variables directly.

export YOMITOKU_LICENSE_KEY="your-license-key"
export YOMITOKU_SECRET_KEY="your-secret-key"

3. Build & Run

cd docker

Each platform has services for both Document Analyzer (no suffix) and Table Semantic Parser (_tsp).

Service Dockerfile Platform Analyzer Port
arm64_cpu Dockerfile.cpu linux/arm64 Document Analyzer 8000
arm64_cpu_tsp Dockerfile.cpu linux/arm64 Table Semantic Parser 8001
amd64_cpu Dockerfile.cpu linux/amd64 Document Analyzer 8000
amd64_cpu_tsp Dockerfile.cpu linux/amd64 Table Semantic Parser 8001
amd64_gpu Dockerfile.gpu linux/amd64 Document Analyzer 8000
amd64_gpu_tsp Dockerfile.gpu linux/amd64 Table Semantic Parser 8001

ARM64 CPU (Apple Silicon):

# Document Analyzer only
docker compose up arm64_cpu --build

# Table Semantic Parser only
docker compose up arm64_cpu_tsp --build

# Both at the same time
docker compose up arm64_cpu arm64_cpu_tsp --build

AMD64 CPU:

docker compose up amd64_cpu --build
docker compose up amd64_cpu_tsp --build

AMD64 GPU (NVIDIA):

docker compose up amd64_gpu --build
docker compose up amd64_gpu_tsp --build

Note

The GPU services (amd64_gpu, amd64_gpu_tsp) use an NVIDIA CUDA base image and only work on AMD64 environments. For ARM64 or environments without a GPU, use one of the CPU services.

Device Token Setup (Offline Authentication)

To use a device token for offline authentication, you need to edit the Dockerfile directly.

Steps:

  1. Place your device token file (device_token.txt) in the repository root
  2. Open the Dockerfile you are using (docker/Dockerfile.gpu or docker/Dockerfile.cpu) and uncomment the lines near the end
# Before (commented out)
#COPY device_token.txt ${server_dir}/device_token.txt
#ENV YOMITOKU_DEVICE_TOKEN=${server_dir}/device_token.txt
#ENV YOMITOKU_ENV=${ENVIRONMENT}

# After (uncommented)
COPY device_token.txt ${server_dir}/device_token.txt
ENV YOMITOKU_DEVICE_TOKEN=${server_dir}/device_token.txt
ENV YOMITOKU_ENV=production

Set YOMITOKU_ENV to the appropriate environment name for your deployment.

Custom Commands

You can also use the ANALYZER_TYPE build argument to switch the analyzer type at build time:

# Build directly as Table Semantic Parser
docker build --build-arg ANALYZER_TYPE=table_semantic_parser -f docker/Dockerfile.gpu -t yomitoku-server:tsp .
docker run yomitoku-server:tsp