load_pdf¶
Load a PDF file and return an iterator that yields page images in BGR format.
Pages are rendered lazily one at a time to avoid loading all pages into memory at once, preventing OOM errors for large PDFs with hundreds of pages.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pdf_path
|
str
|
The path to the PDF file to be loaded. |
required |
dpi
|
int
|
The resolution (dots per inch) for rendering the PDF pages as images. Higher values result in higher resolution images. Defaults to 200. |
200
|
Returns:
| Name | Type | Description |
|---|---|---|
PdfPageIterator |
PdfPageIterator
|
An iterator yielding NumPy arrays (BGR format) for each page.
Has a |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the specified PDF file does not exist. |
ValueError
|
|
RuntimeError
|
If there is an error while processing the PDF file. |