stouputils.data_science.dataset.grouping_strategy module#

This module contains the GroupingStrategy class, which provides a strategy for grouping images when loading a dataset.

There are 3 strategies, NONE, SIMPLE and CONCATENATE. Refer to the docstrings of the GroupingStrategy class for more information.

class GroupingStrategy(value)[source]#

Bases: Enum

Grouping strategy for the dataset

NONE = 0#

A subfolder “subject1” is a group of images, all images are grouped together (list of features) and the label is the class of the folder above (class1)

Example file tree:

  • dataset/class1/subject1/image1.png

  • dataset/class1/subject1/image2.png

  • dataset/class1/subject1/image3.png

Example data (if binary classification):

  • features = [features_image1, features_image2, features_image3] where

    features_image1, features_image2, features_image3 are NDArray[Any] of shape (224, 224, 3)

  • labels = [1.0, 0.0]

If subjects do not have the same number of images, the missing images are padded with zeros so every features have the same shape.

This strategy preserves the relationship between images of the same subject when splitting the dataset, ensuring that all images from the same subject stay together in either train or test sets.

Type:

Default behavior

CONCATENATE = 1#

A subfolder “subject1” is a group of images, all images are concatenated into a single feature (NDArray[Any]) and the label is the class of the folder above (class1)

Example file tree:

  • dataset/class1/subject1/image1.png

  • dataset/class1/subject1/image2.png

  • dataset/class1/subject1/image3.png

Example data (if binary classification):

  • features will have a shape of (224, 224, 3*num_images) (if RGB images).

    Notice that the concatenation is done along the last axis.

  • labels = [1.0, 0.0]

If subjects do not have the same number of images, the missing images are padded with zeros so every features have the same shape.

static _load_folder(folder_path: str, class_idx: int, num_classes: int, kwargs: dict[str, Any]) tuple[list[ndarray[Any, dtype[Any]]], ndarray[Any, dtype[Any]], tuple[str, ...]][source]#

Load images from a single folder.

Parameters:
  • folder_path (str) – Path to the folder

  • class_idx (int) – Index of the class

  • num_classes (int) – Total number of classes

  • kwargs (dict[str, Any]) – Additional arguments for image_dataset_from_directory

Returns:

List of tuples containing (images, one-hot label, filepaths)

Return type:

list[tuple[NDArray[Any], NDArray[Any], str]]

Examples

> data = GroupingStrategy._load_folder(
        folder_path="data/pizza/pizza1",
        class_idx=0,
        num_classes=2,
        kwargs={"color_mode": "grayscale"}
)
> features, label, filepaths = zip(*data, strict=True)
static image_dataset_from_directory(grouping_strategy: GroupingStrategy, path: str, seed: int, **kwargs: Any) tuple[XyTuple, tuple[str, ...], GroupingStrategy][source]#

Load images from a directory while keeping groups of images together.

Parameters:
  • grouping_strategy (GroupingStrategy) – Grouping strategy to use

  • path (str) – Path to the dataset directory

  • seed (int) – Random seed for shuffling

  • **kwargs (Any) – Additional arguments passed to image_dataset_from_directory

Returns:

XyTuple with the data tuple[str, …]: List of class labels (strings) GroupingStrategy: Grouping strategy used (because it can be updated)

Return type:

XyTuple

Examples

> data = GroupingStrategy.image_dataset_from_directory(
        grouping_strategy=GroupingStrategy.NONE,
        path="data/pizza",
        seed=42,
        color_mode="grayscale"
)
> all_data: XyTuple = data[0]
> all_labels: tuple[str, ...] = data[1]
static fix_different_sizes(data: list[list[ndarray[Any, dtype[Any]]]], grouping_strategy: GroupingStrategy) list[list[ndarray[Any, dtype[Any]]]][source]#

Fix different sizes of images in a list of lists of numpy arrays.

Simple strategy will add empty images to shape[0] Concatenate strategy will add empty channels to shape[-1]

Parameters:
  • data (list[list[NDArray[Any]]]) – List of lists of numpy arrays

  • grouping_strategy (GroupingStrategy) – Grouping strategy used

Returns:

List of lists of numpy arrays with consistent shapes

Return type:

list[list[NDArray[Any]]]

Examples

>>> # Concatenate grouping strategy
>>> data = [[np.zeros((7, 224, 224, 3))], [np.zeros((1, 224, 224, 1))]]
>>> data = GroupingStrategy.fix_different_sizes(data, GroupingStrategy.CONCATENATE)
>>> data[0][0].shape
(7, 224, 224, 3)
>>> data[1][0].shape
(1, 224, 224, 3)
>>> data[1][0].shape[0] == data[0][0].shape[0]
False
>>> data[1][0].shape[-1] == data[0][0].shape[-1]
True