OCR¶
Bases: BaseJob
A class for performing Optical Character Recognition (OCR) on images.
This class integrates text detection and text recognition components to extract text from images. It supports customization through configurations and allows asynchronous processing of images.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
configs
|
dict
|
A dictionary of configurations to override the default settings.
The |
{}
|
device
|
str
|
The device to use for computation, e.g., "cuda" or "cpu". Defaults to "cuda". |
'cuda'
|
visualize
|
bool
|
Whether to enable visualization during OCR processing. Defaults to False. |
False
|
license_key
|
str
|
The license key for using specific features or services. Defaults to None. |
None
|
secret_key
|
str
|
The secret key for authentication with external services. Defaults to None. |
None
|
device_token
|
str
|
The device token for authentication with external services. Defaults to None. |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
detector |
TextDetector
|
An instance of the text detection module used to detect text regions in images. |
recognizer |
TextRecognizer
|
An instance of the text recognition module used to recognize text content from detected regions. |
Source code in src/yomitoku/ocr.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | |
__call__(img)
¶
Perform OCR on the given image.
This method is a synchronous wrapper for the run method, allowing direct
invocation of the OCR process.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
img
|
ndarray
|
The input image in BGR format (as loaded by OpenCV). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
tuple[OCRSchema, ndarray | None]
|
A tuple containing:
|
Source code in src/yomitoku/ocr.py
run(img)
async
¶
Perform OCR on the given image asynchronously.
This method detects text regions in the image and recognizes the text content from those regions. It also supports visualization of the OCR process.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
img
|
ndarray
|
The input image in BGR format. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
tuple[dict[str, Any], ndarray]
|
A tuple containing:
|