Server¶
YomiToku-Pro can be launched as a REST API server for document analysis over HTTP.
Setup¶
Installation¶
Environment Variables¶
| Variable | Required | Description |
|---|---|---|
YOMITOKU_LICENSE_KEY |
Yes | License key |
YOMITOKU_SECRET_KEY |
Yes | Secret key |
YOMITOKU_DEVICE_TOKEN |
No | Path to device token file for offline authentication |
YOMITOKU_ENV |
No | Environment name |
Starting the Server¶
Document Analyzer Server¶
Table Semantic Parser Server¶
Options¶
| Option | Default | Description |
|---|---|---|
--host |
0.0.0.0 |
Host to bind to |
--port |
8000 |
Port to bind to |
--device |
Auto-detect | Device to use (cuda, cpu) |
-l, --lite |
- | Use lightweight models for faster inference (automatically enabled on CPU) |
--request-timeout |
600 |
Per-request processing timeout in seconds. Returns HTTP 504 when exceeded |
--max-pages |
Unlimited | Maximum number of PDF pages allowed per request. Returns HTTP 400 when exceeded |
--max-body-size-mb |
100 |
Maximum request body size in MB. Returns HTTP 413 when exceeded |
--max-long-side |
Unlimited | Maximum length of the long side (in pixels) per page/image. Returns HTTP 400 when exceeded |
--max-in-flight |
8 |
Maximum number of concurrent in-flight requests admitted. Returns HTTP 503 when exceeded |
Concurrency and worker count
To avoid GPU resource contention, the server serializes inference using an
in-process asyncio.Lock. Because this lock does not span processes, the
server is always launched with a single worker (workers=1). To scale
throughput, run multiple processes horizontally (e.g. multiple GPUs or
containers).
Request fairness
The GPU lock is released and reacquired per page, so smaller requests
can interleave between pages of a large PDF (FIFO order). The
--max-in-flight option caps concurrent admitted requests, bounding both
queued memory footprint and the worst-case waiting time.
HTTP Status Codes¶
Status codes returned by the /invocations endpoint:
| Code | Meaning | Trigger |
|---|---|---|
| 200 | OK | Successful processing |
| 400 | Bad Request | Invalid file format, exceeds --max-pages or --max-long-side |
| 413 | Payload Too Large | Request body exceeds --max-body-size-mb |
| 500 | Internal Server Error | Unexpected server error |
| 503 | Service Unavailable | Exceeds --max-in-flight concurrent admitted requests |
| 504 | Gateway Timeout | Processing time exceeds --request-timeout |
| 507 | Insufficient Storage | GPU out of memory (error code 5007 GPU_OUT_OF_MEMORY) |
API Endpoints¶
GET /ping¶
Health check endpoint.
Response:
POST /invocations¶
Analyze document images or PDFs.
Request:
- Set the
Content-Typeheader to the file format - Send binary data in the request body
Supported Content Types:
application/pdfimage/jpegimage/pngimage/tiff
Example Requests:
curl -X POST http://localhost:8000/invocations \
-H "Content-Type: image/jpeg" \
--data-binary @sample.jpg
curl -X POST http://localhost:8000/invocations \
-H "Content-Type: application/pdf" \
--data-binary @document.pdf
Client CLI¶
The yomitoku_client command sends files to the server and exports results in the specified format.
Usage¶
yomitoku_client document_analyzer <input_file> [options]
yomitoku_client table_semantic_parser <input_file> [options]
Options¶
| Option | Default | Description |
|---|---|---|
--url |
http://localhost:8000 |
Server URL |
-f, --format |
json |
Output format (json, csv, html, md, dict). Comma-separated for multiple |
-o, --outdir |
results |
Output directory |
--encoding |
utf-8 |
Output file encoding |
--combine |
- | Merge all PDF pages into a single output file |
--figure |
- | Export figures in the output |
--figure_letter |
- | Export letters within figures in the output |
--figure_width |
200 |
Width of exported figure images in pixels |
--figure_dir |
figures |
Directory to save figure images |
-v, --vis |
- | Save visualization images (layout & OCR) of the results |
--vis_dir |
"" |
Subdirectory under outdir for visualization images |
--font_path |
Bundled font | Path to font file for OCR visualization |
--ignore_line_break |
- | Remove line breaks from output |
--dpi |
200 |
DPI for loading PDF files |
--pages |
All pages | Pages to process (e.g., 1,2,5-10) |
Examples¶
# Document Analyzer: Export as JSON
yomitoku_client document_analyzer sample.jpg -f json -o results
# Document Analyzer: Export as both Markdown and HTML
yomitoku_client document_analyzer document.pdf -f md,html -o results
# Document Analyzer: Merge all PDF pages into a single Markdown
yomitoku_client document_analyzer document.pdf -f md --combine -o results
# Document Analyzer: Export Markdown with figures
yomitoku_client document_analyzer sample.jpg -f md --figure -o results
# Document Analyzer: Save visualization images (layout & OCR)
yomitoku_client document_analyzer sample.jpg -f json -v -o results
# Table Semantic Parser: Export as JSON and dict
yomitoku_client table_semantic_parser form.jpg -f json,dict -o results
# Connect to a different host/port
yomitoku_client document_analyzer sample.jpg --url http://192.168.1.100:8080
Document Analyzer Output Formats¶
| Format | Description |
|---|---|
json |
Full analysis result as JSON |
csv |
Table structures converted to CSV |
html |
HTML document with paragraphs and tables |
md |
Markdown format |
Table Semantic Parser Output Formats¶
| Format | Description |
|---|---|
json |
Full analysis result as JSON |
dict |
Table cells converted to key-value dictionary format as JSON |
Client CLI Error Codes¶
| Code | Error Name | Description | Resolution |
|---|---|---|---|
| 7001 | CLIENT_FILE_NOT_FOUND |
Input file not found | Check the file path |
| 7002 | CLIENT_UNSUPPORTED_INPUT_FORMAT |
Unsupported input file format | Use JPG, PNG, BMP, TIFF, or PDF |
| 7003 | CLIENT_UNSUPPORTED_FILE_FORMAT |
Extension cannot be mapped to a Content-Type | Use a file with a supported extension |
| 7004 | CLIENT_CONNECTION_ERROR |
Cannot connect to the server | Verify the server is running and the URL is correct |
| 7005 | CLIENT_TIMEOUT |
Server response timed out (60s) | Check the server load |
| 7006 | CLIENT_REQUEST_FAILED |
Network error during request | Check network connectivity |
| 7007 | CLIENT_SERVER_ERROR |
Server returned an error status | Check server logs (400: bad request, 500: internal error) |
| 7008 | CLIENT_INVALID_JSON_RESPONSE |
Server returned invalid JSON | Check server version and status |
| 7009 | CLIENT_EMPTY_RESULT |
No analysis results in response | Check the input file content and server logs |
API Documentation¶
While the server is running, API documentation is available at:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
For a static API reference without starting the server, see:
Regenerating OpenAPI Schemas¶
When API definitions change, regenerate the OpenAPI JSON files under docs/:
Docker Usage¶
To use Docker, you first need to clone the repository.
1. Clone the Repository¶
2. Set Environment Variables¶
Create a .env file in the docker/ directory or set environment variables directly.
3. Build & Run¶
Each platform has services for both Document Analyzer (no suffix) and Table Semantic Parser (_tsp).
| Service | Dockerfile | Platform | Analyzer | Port |
|---|---|---|---|---|
arm64_cpu |
Dockerfile.cpu |
linux/arm64 | Document Analyzer | 8000 |
arm64_cpu_tsp |
Dockerfile.cpu |
linux/arm64 | Table Semantic Parser | 8001 |
amd64_cpu |
Dockerfile.cpu |
linux/amd64 | Document Analyzer | 8000 |
amd64_cpu_tsp |
Dockerfile.cpu |
linux/amd64 | Table Semantic Parser | 8001 |
amd64_gpu |
Dockerfile.gpu |
linux/amd64 | Document Analyzer | 8000 |
amd64_gpu_tsp |
Dockerfile.gpu |
linux/amd64 | Table Semantic Parser | 8001 |
ARM64 CPU (Apple Silicon):
# Document Analyzer only
docker compose up arm64_cpu --build
# Table Semantic Parser only
docker compose up arm64_cpu_tsp --build
# Both at the same time
docker compose up arm64_cpu arm64_cpu_tsp --build
AMD64 CPU:
AMD64 GPU (NVIDIA):
Note
The GPU services (amd64_gpu, amd64_gpu_tsp) use an NVIDIA CUDA base image and only work on AMD64 environments. For ARM64 or environments without a GPU, use one of the CPU services.
Device Token Setup (Offline Authentication)¶
To use a device token for offline authentication, you need to edit the Dockerfile directly.
Steps:
- Place your device token file (
device_token.txt) in the repository root - Open the Dockerfile you are using (
docker/Dockerfile.gpuordocker/Dockerfile.cpu) and uncomment the lines near the end
# Before (commented out)
#COPY device_token.txt ${server_dir}/device_token.txt
#ENV YOMITOKU_DEVICE_TOKEN=${server_dir}/device_token.txt
#ENV YOMITOKU_ENV=${ENVIRONMENT}
# After (uncommented)
COPY device_token.txt ${server_dir}/device_token.txt
ENV YOMITOKU_DEVICE_TOKEN=${server_dir}/device_token.txt
ENV YOMITOKU_ENV=production
Set YOMITOKU_ENV to the appropriate environment name for your deployment.
Custom Commands¶
You can also use the ANALYZER_TYPE build argument to switch the analyzer type at build time: