🌟 Introduction¶

YomiToku is a Document AI engine specialized in Japanese document image analysis. It provides full OCR (optical character recognition) and layout analysis capabilities, enabling the recognition, extraction, and conversion of text and diagrams from images.

🤖 Equipped with four AI models trained on Japanese datasets: text detection, text recognition, layout analysis, and table structure recognition. All models are independently trained and optimized for Japanese documents, delivering high-precision inference.
🇯🇵 Each model is specifically trained for Japanese document images, supporting the recognition of over 7,000 Japanese characters, including vertical text and other layout structures unique to Japanese documents. (It also supports English documents.)
📈 By leveraging layout analysis, table structure parsing, and reading order estimation, it extracts information while preserving the semantic structure of the document layout.
📄 Supports a variety of output formats, including HTML, Markdown, JSON, and CSV. It also allows for the extraction of diagrams and images contained within the documents.
⚡ Operates efficiently in GPU environments, enabling fast document transcription and analysis. It requires less than 8GB of VRAM, eliminating the need for high-end GPUs.。

🙋 Contact¶

If you have any questions, please contact us at support@mlism.com.

Index¶

Basic Usage¶

Installation: Installation instructions
FAQ: Frequently asked questions

CLI Usage¶

Document Analyzer: How to use the CLI
Extractor: How to use the Extractor
Schema Generation Prompt: Schema generation prompt

Python API¶

Document Analyzer Python API: How to use the DocumentAnalyzer API
Table Semantic Parser Python API: How to use the TableSemanticParser
Module Output: Output schema definitions for each module
Model Config: Model configuration settings

Code Reference¶

Inputs¶

load_image: How to load image files
load_pdf: How to load PDF files

Modules¶

Outputs¶

Utilities¶

create_searchable_pdf: Create searchable PDF files
table_to_csv: Convert table data to CSV

Error Codes¶

Error Codes: List of error codes
Error Codes List: Detailed error codes

Sample Code¶

Use Rotate Detection: How to use the rotation detection module
Table Extraction: Extract table data (TableSemanticParser)
Searchable PDF: Create searchable PDF files
Get Query Count Information: Retrieve processed page count information

Server¶

Overview: How to use the REST API server

Operations¶

Monitoring: Logging and monitoring
Release Note: Release notes