OCR¶

Bases: BaseJob

A class for performing Optical Character Recognition (OCR) on images.

This class integrates text detection and text recognition components to extract text from images. It supports customization through configurations and allows asynchronous processing of images.

引数：

名前	タイプ	デスクリプション	デフォルト
`configs`	`dict`	A dictionary of configurations to override the default settings. The `configs` dictionary can include keys such as "text_detector" and "text_recognizer" to customize specific components. Defaults to an empty dictionary.	`{}`
`device`	`str`	The device to use for computation, e.g., "cuda" or "cpu". Defaults to "cuda".	`'cuda'`
`visualize`	`bool`	Whether to enable visualization during OCR processing. Defaults to False.	`False`
`license_key`	`str`	The license key for using specific features or services. Defaults to None.	`None`
`secret_key`	`str`	The secret key for authentication with external services. Defaults to None.	`None`
`device_token`	`str`	The device token for authentication with external services. Defaults to None.	`None`

属性：

名前	タイプ	デスクリプション
`detector`	`TextDetector`	An instance of the text detection module used to detect text regions in images.
`recognizer`	`TextRecognizer`	An instance of the text recognition module used to recognize text content from detected regions.

ソースコード位置： src/yomitoku/ocr.py

class OCR(BaseJob):
    """
    A class for performing Optical Character Recognition (OCR) on images.

    This class integrates text detection and text recognition components to extract text
    from images. It supports customization through configurations and allows asynchronous
    processing of images.

    Args:
        configs (dict, optional): A dictionary of configurations to override the default settings.
            The `configs` dictionary can include keys such as "text_detector" and "text_recognizer"
            to customize specific components. Defaults to an empty dictionary.
        device (str, optional): The device to use for computation, e.g., "cuda" or "cpu". Defaults to "cuda".
        visualize (bool, optional): Whether to enable visualization during OCR processing. Defaults to False.
        license_key (str, optional): The license key for using specific features or services. Defaults to None.
        secret_key (str, optional): The secret key for authentication with external services. Defaults to None.
        device_token (str, optional): The device token for authentication with external services. Defaults to None.

    Attributes:
        detector (TextDetector): An instance of the text detection module used to detect text regions in images.
        recognizer (TextRecognizer): An instance of the text recognition module used to recognize text content
            from detected regions.
    """

    def __init__(
        self,
        configs={},
        device="cuda",
        visualize=False,
        license_key=None,
        secret_key=None,
        device_token=None,
    ):
        super().__init__()
        text_detector_kwargs = {
            "device": device,
            "license_key": license_key,
            "secret_key": secret_key,
            "device_token": device_token,
        }
        text_recognizer_kwargs = {
            "device": device,
            "license_key": license_key,
            "secret_key": secret_key,
            "device_token": device_token,
        }

        if isinstance(configs, dict):
            if "text_detector" in configs:
                text_detector_kwargs.update(configs["text_detector"])
            if "text_recognizer" in configs:
                text_recognizer_kwargs.update(configs["text_recognizer"])
        else:
            raise ValueError(
                "configs must be a dict. See the https://kotaro-kinoshita.github.io/yomitoku-dev/usage/"
            )

        self.detector = TextDetector(**text_detector_kwargs)
        self.recognizer = TextRecognizer(**text_recognizer_kwargs)
        self.visualize = visualize

    async def run(self, img) -> tuple[dict[str, Any], np.ndarray]:
        """
        Perform OCR on the given image asynchronously.

        This method detects text regions in the image and recognizes the text content
        from those regions. It also supports visualization of the OCR process.

        Args:
            img (np.ndarray): The input image in BGR format.

        Returns:
            tuple: A tuple containing:

                - outputs (dict): A dictionary with the recognized words and their positions.

                - vis (np.ndarray): The visualization image (if visualization is enabled).
        """
        det_outputs, det_score = self.detector(img)
        rec_outputs = self.recognizer(
            img, det_outputs.points, det_scores=det_outputs.scores
        )
        outputs = {"words": ocr_aggregate(rec_outputs)}
        return outputs, det_score

    def __call__(self, img) -> tuple[OCRSchema, np.ndarray | None]:
        """
        Perform OCR on the given image.

        This method is a synchronous wrapper for the `run` method, allowing direct
        invocation of the OCR process.

        Args:
            img (np.ndarray): The input image in BGR format (as loaded by OpenCV).

        Returns:
            tuple: A tuple containing:

                - outputs (OCRSchema): An OCR output with the recognized words and their positions.

                - vis (np.ndarray or None): The visualization image (if visualization is enabled).
        """
        outputs, det_score = asyncio.run(self.run(img))
        results = OCRSchema(**outputs)

        ocr_vis = None
        if self.visualize:
            ocr_vis = ocr_visualizer(
                results.words,
                img,
                font_path=self.recognizer._cfg.visualize.font,
                det_score=det_score,
                vis_heatmap=self.detector._cfg.visualize.heatmap,
            )

        return results, ocr_vis

`call(img)` ¶

Perform OCR on the given image.

This method is a synchronous wrapper for the run method, allowing direct invocation of the OCR process.

引数：

名前	タイプ	デスクリプション	デフォルト
`img`	`ndarray`	The input image in BGR format (as loaded by OpenCV).	必須

戻り値：

名前	タイプ	デスクリプション
`tuple`	`tuple[OCRSchema, ndarray \| None]`	A tuple containing: outputs (OCRSchema): An OCR output with the recognized words and their positions. vis (np.ndarray or None): The visualization image (if visualization is enabled).

ソースコード位置： src/yomitoku/ocr.py

def __call__(self, img) -> tuple[OCRSchema, np.ndarray | None]:
    """
    Perform OCR on the given image.

    This method is a synchronous wrapper for the `run` method, allowing direct
    invocation of the OCR process.

    Args:
        img (np.ndarray): The input image in BGR format (as loaded by OpenCV).

    Returns:
        tuple: A tuple containing:

            - outputs (OCRSchema): An OCR output with the recognized words and their positions.

            - vis (np.ndarray or None): The visualization image (if visualization is enabled).
    """
    outputs, det_score = asyncio.run(self.run(img))
    results = OCRSchema(**outputs)

    ocr_vis = None
    if self.visualize:
        ocr_vis = ocr_visualizer(
            results.words,
            img,
            font_path=self.recognizer._cfg.visualize.font,
            det_score=det_score,
            vis_heatmap=self.detector._cfg.visualize.heatmap,
        )

    return results, ocr_vis

`run(img)` `async` ¶

Perform OCR on the given image asynchronously.

This method detects text regions in the image and recognizes the text content from those regions. It also supports visualization of the OCR process.

引数：

名前	タイプ	デスクリプション	デフォルト
`img`	`ndarray`	The input image in BGR format.	必須

戻り値：

名前	タイプ	デスクリプション
`tuple`	`tuple[dict[str, Any], ndarray]`	A tuple containing: outputs (dict): A dictionary with the recognized words and their positions. vis (np.ndarray): The visualization image (if visualization is enabled).

ソースコード位置： src/yomitoku/ocr.py

async def run(self, img) -> tuple[dict[str, Any], np.ndarray]:
    """
    Perform OCR on the given image asynchronously.

    This method detects text regions in the image and recognizes the text content
    from those regions. It also supports visualization of the OCR process.

    Args:
        img (np.ndarray): The input image in BGR format.

    Returns:
        tuple: A tuple containing:

            - outputs (dict): A dictionary with the recognized words and their positions.

            - vis (np.ndarray): The visualization image (if visualization is enabled).
    """
    det_outputs, det_score = self.detector(img)
    rec_outputs = self.recognizer(
        img, det_outputs.points, det_scores=det_outputs.scores
    )
    outputs = {"words": ocr_aggregate(rec_outputs)}
    return outputs, det_score

OCR¶

__call__(img) ¶

run(img) async ¶

`call(img)` ¶

`run(img)` `async` ¶