stouputils.data_science.data_processing.image_preprocess module#

class ImageDatasetPreprocess(
techniques: list[ProcessingTechnique] | None = None,
)[source]#

Bases: object

Image dataset preprocessing class. Check the class constructor for more information.

get_files_recursively(
source: str,
destination: str,
extensions: tuple[str, ...] = ('.jpg', '.jpeg', '.png', '.bmp', '.tiff', '.tif'),
) dict[str, str][source]#

Recursively get all files in a directory and their destinations.

Parameters:
  • source (str) – Path to the source directory

  • destination (str) – Path to the destination directory

  • extensions (tuple[str,...]) – Tuple of extensions to consider (e.g. (“.jpg”, “.png”))

Returns:

Dictionary mapping source paths to destination paths

Return type:

dict[str, str]

get_queue(
dataset_path: str,
destination_path: str,
) list[tuple[str, str, list[ProcessingTechnique]]][source]#

Get the queue of images to process with their techniques.

This method converts the processing techniques ranges to fixed values and builds a queue of files to process by recursively finding all images in the dataset path.

Parameters:
  • dataset_path (str) – Path to the dataset directory

  • destination_path (str) – Path to the destination directory where processed images will be saved

Returns:

Queue of (source_path, dest_path, techniques) tuples

Return type:

list[tuple[str, str, list[ProcessingTechnique]]]

process_dataset(
dataset_path: str,
destination_path: str,
max_workers: int = 4,
ignore_confirmation: bool = False,
) None[source]#

Preprocess the dataset by applying the given processing techniques to the images.

Parameters:
  • dataset_path (str) – Path to the dataset

  • destination_path (str) – Path to the destination dataset

  • max_workers (int) – Number of workers to use (Defaults to CPU_COUNT)

  • ignore_confirmation (bool) – If True, don’t ask for confirmation

static apply_techniques(
path: str,
dest: str,
techniques: list[ProcessingTechnique],
use_padding: bool = True,
) None[source]#

Apply the processing techniques to the image.

Parameters:
  • path (str) – Path to the image

  • dest (str) – Path to the destination image

  • techniques (list[ProcessingTechnique]) – List of processing techniques to apply

  • use_padding (bool) – If True, add padding to the image before applying techniques