コンテンツにスキップ

Layout Analyzer

Bases: BaseJob

A class for analyzing the layout of documents, including table structure recognition.

This class provides functionality to process and analyze the layout of documents, such as detecting and recognizing table structures. It initializes components like layout_parser for general layout analysis and table_structure_recognizer for identifying table structures within documents.

Parameters:

Name Type Description Default
configs dict

A dictionary of configurations to override the default settings. The configs dictionary can include keys such as "layout_parser" and "table_structure_recognizer" to customize specific components. Defaults to an empty dictionary.

{}
device str

The device to use for computation, e.g., "cuda" or "cpu". Defaults to "cuda".

'cuda'
visualize bool

Whether to enable visualization during layout analysis. Defaults to False.

False
license_key str

The license key for using specific features or services. Defaults to None.

None
secret_key str

The secret key for authentication with external services. Defaults to None.

None
device_token str

The device token for authentication with external services. Defaults to None.

None

Attributes:

Name Type Description
layout_parser LayoutParser

An instance of the layout parser used for general layout analysis.

table_structure_recognizer TableStructureRecognizer

An instance of the table structure recognizer used for detecting and analyzing table structures in documents.

Source code in src/yomitoku/layout_analyzer.py
class LayoutAnalyzer(BaseJob):
    """
    A class for analyzing the layout of documents, including table structure recognition.

    This class provides functionality to process and analyze the layout of documents, such as detecting
    and recognizing table structures. It initializes components like `layout_parser` for general layout
    analysis and `table_structure_recognizer` for identifying table structures within documents.

    Args:
        configs (dict, optional): A dictionary of configurations to override the default settings.
            The `configs` dictionary can include keys such as "layout_parser" and "table_structure_recognizer"
            to customize specific components. Defaults to an empty dictionary.
        device (str, optional): The device to use for computation, e.g., "cuda" or "cpu". Defaults to "cuda".
        visualize (bool, optional): Whether to enable visualization during layout analysis. Defaults to False.
        license_key (str, optional): The license key for using specific features or services. Defaults to None.
        secret_key (str, optional): The secret key for authentication with external services. Defaults to None.
        device_token (str, optional): The device token for authentication with external services. Defaults to None.

    Attributes:
        layout_parser (LayoutParser): An instance of the layout parser used for general layout analysis.
        table_structure_recognizer (TableStructureRecognizer): An instance of the table structure recognizer
            used for detecting and analyzing table structures in documents.
    """

    def __init__(
        self,
        configs={},
        device="cuda",
        visualize=False,
        license_key=None,
        secret_key=None,
        device_token=None,
    ):
        super().__init__()
        layout_parser_kwargs = {
            "device": device,
            "visualize": visualize,
            "license_key": license_key,
            "secret_key": secret_key,
            "device_token": device_token,
        }
        table_structure_recognizer_kwargs = {
            "device": device,
            "visualize": visualize,
            "license_key": license_key,
            "secret_key": secret_key,
            "device_token": device_token,
        }

        if isinstance(configs, dict):
            if "layout_parser" in configs:
                layout_parser_kwargs.update(configs["layout_parser"])

            if "table_structure_recognizer" in configs:
                table_structure_recognizer_kwargs.update(
                    configs["table_structure_recognizer"]
                )
        else:
            raise ValueError(
                "configs must be a dict. See the https://kotaro-kinoshita.github.io/yomitoku-dev/usage/"
            )

        self.layout_parser = LayoutParser(
            **layout_parser_kwargs,
        )
        self.table_structure_recognizer = TableStructureRecognizer(
            **table_structure_recognizer_kwargs,
        )

    async def run(self, img):
        layout_results, vis = self.layout_parser(img)
        table_boxes = [table.box for table in layout_results.tables]
        table_results, vis = self.table_structure_recognizer(img, table_boxes, vis=vis)
        return layout_results, table_results, vis

    def __call__(self, img) -> tuple[LayoutAnalyzerSchema, np.ndarray | None]:
        """
        Perform layout analysis on the given image.

        This method processes the input image to detect and analyze the layout structure,
        including paragraphs, tables, and figures. It combines the results from the layout
        parser and the table structure recognizer to produce a comprehensive analysis.

        Args:
            img (np.ndarray): The input image in BGR format.

        Returns:
            tuple: A tuple containing:

                - results (LayoutAnalyzerSchema): The aggregated results of the layout analysis,
                  including detected paragraphs, tables, and figures.

                - vis (np.ndarray or None): The visualization of the layout analysis if
                  visualization is enabled, otherwise `None`.
        """
        layout_results, table_results, vis = asyncio.run(self.run(img))

        results = LayoutAnalyzerSchema(
            paragraphs=layout_results.paragraphs,
            tables=table_results,
            figures=layout_results.figures,
        )

        return results, vis

__call__(img)

Perform layout analysis on the given image.

This method processes the input image to detect and analyze the layout structure, including paragraphs, tables, and figures. It combines the results from the layout parser and the table structure recognizer to produce a comprehensive analysis.

Parameters:

Name Type Description Default
img ndarray

The input image in BGR format.

required

Returns:

Name Type Description
tuple tuple[LayoutAnalyzerSchema, ndarray | None]

A tuple containing:

  • results (LayoutAnalyzerSchema): The aggregated results of the layout analysis, including detected paragraphs, tables, and figures.

  • vis (np.ndarray or None): The visualization of the layout analysis if visualization is enabled, otherwise None.

Source code in src/yomitoku/layout_analyzer.py
def __call__(self, img) -> tuple[LayoutAnalyzerSchema, np.ndarray | None]:
    """
    Perform layout analysis on the given image.

    This method processes the input image to detect and analyze the layout structure,
    including paragraphs, tables, and figures. It combines the results from the layout
    parser and the table structure recognizer to produce a comprehensive analysis.

    Args:
        img (np.ndarray): The input image in BGR format.

    Returns:
        tuple: A tuple containing:

            - results (LayoutAnalyzerSchema): The aggregated results of the layout analysis,
              including detected paragraphs, tables, and figures.

            - vis (np.ndarray or None): The visualization of the layout analysis if
              visualization is enabled, otherwise `None`.
    """
    layout_results, table_results, vis = asyncio.run(self.run(img))

    results = LayoutAnalyzerSchema(
        paragraphs=layout_results.paragraphs,
        tables=table_results,
        figures=layout_results.figures,
    )

    return results, vis