Layout Analyzer¶

Bases: BaseJob

A class for analyzing the layout of documents, including table structure recognition.

This class provides functionality to process and analyze the layout of documents, such as detecting and recognizing table structures. It initializes components like layout_parser for general layout analysis and table_structure_recognizer for identifying table structures within documents.

Parameters:

Name	Type	Description	Default
`configs`	`dict`	A dictionary of configurations to override the default settings. The `configs` dictionary can include keys such as "layout_parser" and "table_structure_recognizer" to customize specific components. Defaults to an empty dictionary.	`{}`
`device`	`str`	The device to use for computation, e.g., "cuda" or "cpu". Defaults to "cuda".	`'cuda'`
`visualize`	`bool`	Whether to enable visualization during layout analysis. Defaults to False.	`False`
`license_key`	`str`	The license key for using specific features or services. Defaults to None.	`None`
`secret_key`	`str`	The secret key for authentication with external services. Defaults to None.	`None`
`device_token`	`str`	The device token for authentication with external services. Defaults to None.	`None`

Attributes:

Name	Type	Description
`layout_parser`	`LayoutParser`	An instance of the layout parser used for general layout analysis.
`table_structure_recognizer`	`TableStructureRecognizer`	An instance of the table structure recognizer used for detecting and analyzing table structures in documents.

Source code in src/yomitoku/layout_analyzer.py

class LayoutAnalyzer(BaseJob):
    """
    A class for analyzing the layout of documents, including table structure recognition.

    This class provides functionality to process and analyze the layout of documents, such as detecting
    and recognizing table structures. It initializes components like `layout_parser` for general layout
    analysis and `table_structure_recognizer` for identifying table structures within documents.

    Args:
        configs (dict, optional): A dictionary of configurations to override the default settings.
            The `configs` dictionary can include keys such as "layout_parser" and "table_structure_recognizer"
            to customize specific components. Defaults to an empty dictionary.
        device (str, optional): The device to use for computation, e.g., "cuda" or "cpu". Defaults to "cuda".
        visualize (bool, optional): Whether to enable visualization during layout analysis. Defaults to False.
        license_key (str, optional): The license key for using specific features or services. Defaults to None.
        secret_key (str, optional): The secret key for authentication with external services. Defaults to None.
        device_token (str, optional): The device token for authentication with external services. Defaults to None.

    Attributes:
        layout_parser (LayoutParser): An instance of the layout parser used for general layout analysis.
        table_structure_recognizer (TableStructureRecognizer): An instance of the table structure recognizer
            used for detecting and analyzing table structures in documents.
    """

    def __init__(
        self,
        configs={},
        device="cuda",
        visualize=False,
        license_key=None,
        secret_key=None,
        device_token=None,
    ):
        super().__init__()
        layout_parser_kwargs = {
            "device": device,
            "visualize": visualize,
            "license_key": license_key,
            "secret_key": secret_key,
            "device_token": device_token,
        }
        table_structure_recognizer_kwargs = {
            "device": device,
            "visualize": visualize,
            "license_key": license_key,
            "secret_key": secret_key,
            "device_token": device_token,
        }

        if isinstance(configs, dict):
            if "layout_parser" in configs:
                layout_parser_kwargs.update(configs["layout_parser"])

            if "table_structure_recognizer" in configs:
                table_structure_recognizer_kwargs.update(
                    configs["table_structure_recognizer"]
                )
        else:
            raise ValueError(
                "configs must be a dict. See the https://kotaro-kinoshita.github.io/yomitoku-dev/usage/"
            )

        self.layout_parser = LayoutParser(
            **layout_parser_kwargs,
        )
        self.table_structure_recognizer = TableStructureRecognizer(
            **table_structure_recognizer_kwargs,
        )

    async def run(self, img):
        layout_results, vis = self.layout_parser(img)
        table_boxes = [table.box for table in layout_results.tables]
        table_results, vis = self.table_structure_recognizer(img, table_boxes, vis=vis)
        return layout_results, table_results, vis

    def __call__(self, img) -> tuple[LayoutAnalyzerSchema, np.ndarray | None]:
        """
        Perform layout analysis on the given image.

        This method processes the input image to detect and analyze the layout structure,
        including paragraphs, tables, and figures. It combines the results from the layout
        parser and the table structure recognizer to produce a comprehensive analysis.

        Args:
            img (np.ndarray): The input image in BGR format.

        Returns:
            tuple: A tuple containing:

                - results (LayoutAnalyzerSchema): The aggregated results of the layout analysis,
                  including detected paragraphs, tables, and figures.

                - vis (np.ndarray or None): The visualization of the layout analysis if
                  visualization is enabled, otherwise `None`.
        """
        layout_results, table_results, vis = asyncio.run(self.run(img))

        results = LayoutAnalyzerSchema(
            paragraphs=layout_results.paragraphs,
            tables=table_results,
            figures=layout_results.figures,
        )

        return results, vis

`call(img)` ¶

Perform layout analysis on the given image.

This method processes the input image to detect and analyze the layout structure, including paragraphs, tables, and figures. It combines the results from the layout parser and the table structure recognizer to produce a comprehensive analysis.

Parameters:

Name	Type	Description	Default
`img`	`ndarray`	The input image in BGR format.	required

Returns:

Name	Type	Description
`tuple`	`tuple[LayoutAnalyzerSchema, ndarray \| None]`	A tuple containing: results (LayoutAnalyzerSchema): The aggregated results of the layout analysis, including detected paragraphs, tables, and figures. vis (np.ndarray or None): The visualization of the layout analysis if visualization is enabled, otherwise `None`.

Source code in src/yomitoku/layout_analyzer.py

def __call__(self, img) -> tuple[LayoutAnalyzerSchema, np.ndarray | None]:
    """
    Perform layout analysis on the given image.

    This method processes the input image to detect and analyze the layout structure,
    including paragraphs, tables, and figures. It combines the results from the layout
    parser and the table structure recognizer to produce a comprehensive analysis.

    Args:
        img (np.ndarray): The input image in BGR format.

    Returns:
        tuple: A tuple containing:

            - results (LayoutAnalyzerSchema): The aggregated results of the layout analysis,
              including detected paragraphs, tables, and figures.

            - vis (np.ndarray or None): The visualization of the layout analysis if
              visualization is enabled, otherwise `None`.
    """
    layout_results, table_results, vis = asyncio.run(self.run(img))

    results = LayoutAnalyzerSchema(
        paragraphs=layout_results.paragraphs,
        tables=table_results,
        figures=layout_results.figures,
    )

    return results, vis

Layout Analyzer¶

__call__(img) ¶

`call(img)` ¶