stouputils.data_science.data_processing.image_preprocess module#

class ImageDatasetPreprocess(techniques: list[ProcessingTechnique] | None = None)[source]#

Bases: object

Image dataset preprocessing class. Check the class constructor for more information.

get_files_recursively(source: str, destination: str, extensions: tuple[str, ...] = ('.jpg', '.jpeg', '.png', '.bmp', '.tiff', '.tif')) dict[str, str][source]#

Recursively get all files in a directory and their destinations.

Parameters:
  • source (str) – Path to the source directory

  • destination (str) – Path to the destination directory

  • extensions (tuple[str,...]) – Tuple of extensions to consider (e.g. (“.jpg”, “.png”))

Returns:

Dictionary mapping source paths to destination paths

Return type:

dict[str, str]

get_queue(dataset_path: str, destination_path: str) list[tuple[str, str, list[ProcessingTechnique]]][source]#

Get the queue of images to process with their techniques.

This method converts the processing techniques ranges to fixed values and builds a queue of files to process by recursively finding all images in the dataset path.

Parameters:
  • dataset_path (str) – Path to the dataset directory

  • destination_path (str) – Path to the destination directory where processed images will be saved

Returns:

Queue of (source_path, dest_path, techniques) tuples

Return type:

list[tuple[str, str, list[ProcessingTechnique]]]

process_dataset(dataset_path: str, destination_path: str, max_workers: int = 4, ignore_confirmation: bool = False) None[source]#

Preprocess the dataset by applying the given processing techniques to the images.

Parameters:
  • dataset_path (str) – Path to the dataset

  • destination_path (str) – Path to the destination dataset

  • max_workers (int) – Number of workers to use (Defaults to CPU_COUNT)

  • ignore_confirmation (bool) – If True, don’t ask for confirmation

static apply_techniques(path: str, dest: str, techniques: list[ProcessingTechnique], use_padding: bool = True) None[source]#

Apply the processing techniques to the image.

Parameters:
  • path (str) – Path to the image

  • dest (str) – Path to the destination image

  • techniques (list[ProcessingTechnique]) – List of processing techniques to apply

  • use_padding (bool) – If True, add padding to the image before applying techniques