OCR¶

Bases: BaseJob

A class for performing Optical Character Recognition (OCR) on images.

This class integrates text detection and text recognition components to extract text from images. It supports customization through configurations and allows asynchronous processing of images.

Parameters:

Name	Type	Description	Default
`configs`	`dict`	A dictionary of configurations to override the default settings. The `configs` dictionary can include keys such as "text_detector" and "text_recognizer" to customize specific components. Defaults to an empty dictionary.	`{}`
`device`	`str`	The device to use for computation, e.g., "cuda" or "cpu". Defaults to "cuda".	`'cuda'`
`visualize`	`bool`	Whether to enable visualization during OCR processing. Defaults to False.	`False`
`license_key`	`str`	The license key for using specific features or services. Defaults to None.	`None`
`secret_key`	`str`	The secret key for authentication with external services. Defaults to None.	`None`
`device_token`	`str`	The device token for authentication with external services. Defaults to None.	`None`

Attributes:

Name	Type	Description
`detector`	`TextDetector`	An instance of the text detection module used to detect text regions in images.
`recognizer`	`TextRecognizer`	An instance of the text recognition module used to recognize text content from detected regions.

Source code in src/yomitoku/ocr.py

class OCR(BaseJob):
    """
    A class for performing Optical Character Recognition (OCR) on images.

    This class integrates text detection and text recognition components to extract text
    from images. It supports customization through configurations and allows asynchronous
    processing of images.

    Args:
        configs (dict, optional): A dictionary of configurations to override the default settings.
            The `configs` dictionary can include keys such as "text_detector" and "text_recognizer"
            to customize specific components. Defaults to an empty dictionary.
        device (str, optional): The device to use for computation, e.g., "cuda" or "cpu". Defaults to "cuda".
        visualize (bool, optional): Whether to enable visualization during OCR processing. Defaults to False.
        license_key (str, optional): The license key for using specific features or services. Defaults to None.
        secret_key (str, optional): The secret key for authentication with external services. Defaults to None.
        device_token (str, optional): The device token for authentication with external services. Defaults to None.

    Attributes:
        detector (TextDetector): An instance of the text detection module used to detect text regions in images.
        recognizer (TextRecognizer): An instance of the text recognition module used to recognize text content
            from detected regions.
    """

    def __init__(
        self,
        configs={},
        device="cuda",
        visualize=False,
        license_key=None,
        secret_key=None,
        device_token=None,
    ):
        super().__init__()
        text_detector_kwargs = {
            "device": device,
            "license_key": license_key,
            "secret_key": secret_key,
            "device_token": device_token,
        }
        text_recognizer_kwargs = {
            "device": device,
            "license_key": license_key,
            "secret_key": secret_key,
            "device_token": device_token,
        }

        if isinstance(configs, dict):
            if "text_detector" in configs:
                text_detector_kwargs.update(configs["text_detector"])
            if "text_recognizer" in configs:
                text_recognizer_kwargs.update(configs["text_recognizer"])
        else:
            raise ValueError(
                "configs must be a dict. See the https://kotaro-kinoshita.github.io/yomitoku-dev/usage/"
            )

        self.detector = TextDetector(**text_detector_kwargs)
        self.recognizer = TextRecognizer(**text_recognizer_kwargs)
        self.visualize = visualize

    async def run(self, img) -> tuple[dict[str, Any], np.ndarray]:
        """
        Perform OCR on the given image asynchronously.

        This method detects text regions in the image and recognizes the text content
        from those regions. It also supports visualization of the OCR process.

        Args:
            img (np.ndarray): The input image in BGR format.

        Returns:
            tuple: A tuple containing:

                - outputs (dict): A dictionary with the recognized words and their positions.

                - vis (np.ndarray): The visualization image (if visualization is enabled).
        """
        det_outputs, det_score = self.detector(img)
        rec_outputs = self.recognizer(
            img, det_outputs.points, det_scores=det_outputs.scores
        )
        outputs = {"words": ocr_aggregate(rec_outputs)}
        return outputs, det_score

    def __call__(self, img) -> tuple[OCRSchema, np.ndarray | None]:
        """
        Perform OCR on the given image.

        This method is a synchronous wrapper for the `run` method, allowing direct
        invocation of the OCR process.

        Args:
            img (np.ndarray): The input image in BGR format (as loaded by OpenCV).

        Returns:
            tuple: A tuple containing:

                - outputs (OCRSchema): An OCR output with the recognized words and their positions.

                - vis (np.ndarray or None): The visualization image (if visualization is enabled).
        """
        outputs, det_score = asyncio.run(self.run(img))
        results = OCRSchema(**outputs)

        ocr_vis = None
        if self.visualize:
            ocr_vis = ocr_visualizer(
                results.words,
                img,
                font_path=self.recognizer._cfg.visualize.font,
                det_score=det_score,
                vis_heatmap=self.detector._cfg.visualize.heatmap,
            )

        return results, ocr_vis

`call(img)` ¶

Perform OCR on the given image.

This method is a synchronous wrapper for the run method, allowing direct invocation of the OCR process.

Parameters:

Name	Type	Description	Default
`img`	`ndarray`	The input image in BGR format (as loaded by OpenCV).	required

Returns:

Name	Type	Description
`tuple`	`tuple[OCRSchema, ndarray \| None]`	A tuple containing: outputs (OCRSchema): An OCR output with the recognized words and their positions. vis (np.ndarray or None): The visualization image (if visualization is enabled).

Source code in src/yomitoku/ocr.py

def __call__(self, img) -> tuple[OCRSchema, np.ndarray | None]:
    """
    Perform OCR on the given image.

    This method is a synchronous wrapper for the `run` method, allowing direct
    invocation of the OCR process.

    Args:
        img (np.ndarray): The input image in BGR format (as loaded by OpenCV).

    Returns:
        tuple: A tuple containing:

            - outputs (OCRSchema): An OCR output with the recognized words and their positions.

            - vis (np.ndarray or None): The visualization image (if visualization is enabled).
    """
    outputs, det_score = asyncio.run(self.run(img))
    results = OCRSchema(**outputs)

    ocr_vis = None
    if self.visualize:
        ocr_vis = ocr_visualizer(
            results.words,
            img,
            font_path=self.recognizer._cfg.visualize.font,
            det_score=det_score,
            vis_heatmap=self.detector._cfg.visualize.heatmap,
        )

    return results, ocr_vis

`run(img)` `async` ¶

Perform OCR on the given image asynchronously.

This method detects text regions in the image and recognizes the text content from those regions. It also supports visualization of the OCR process.

Parameters:

Name	Type	Description	Default
`img`	`ndarray`	The input image in BGR format.	required

Returns:

Name	Type	Description
`tuple`	`tuple[dict[str, Any], ndarray]`	A tuple containing: outputs (dict): A dictionary with the recognized words and their positions. vis (np.ndarray): The visualization image (if visualization is enabled).

Source code in src/yomitoku/ocr.py

async def run(self, img) -> tuple[dict[str, Any], np.ndarray]:
    """
    Perform OCR on the given image asynchronously.

    This method detects text regions in the image and recognizes the text content
    from those regions. It also supports visualization of the OCR process.

    Args:
        img (np.ndarray): The input image in BGR format.

    Returns:
        tuple: A tuple containing:

            - outputs (dict): A dictionary with the recognized words and their positions.

            - vis (np.ndarray): The visualization image (if visualization is enabled).
    """
    det_outputs, det_score = self.detector(img)
    rec_outputs = self.recognizer(
        img, det_outputs.points, det_scores=det_outputs.scores
    )
    outputs = {"words": ocr_aggregate(rec_outputs)}
    return outputs, det_score

OCR¶

__call__(img) ¶

run(img) async ¶

`call(img)` ¶

`run(img)` `async` ¶