stouputils.data_science.dataset.grouping_strategy module#
This module contains the GroupingStrategy class, which provides a strategy for grouping images when loading a dataset.
There are 3 strategies, NONE, SIMPLE and CONCATENATE. Refer to the docstrings of the GroupingStrategy class for more information.
- class GroupingStrategy(value)[source]#
Bases:
Enum
Grouping strategy for the dataset
- NONE = 0#
A subfolder “subject1” is a group of images, all images are grouped together (list of features) and the label is the class of the folder above (class1)
Example file tree:
dataset/class1/subject1/image1.png
dataset/class1/subject1/image2.png
dataset/class1/subject1/image3.png
Example data (if binary classification):
- features = [features_image1, features_image2, features_image3] where
features_image1, features_image2, features_image3 are NDArray[Any] of shape (224, 224, 3)
labels = [1.0, 0.0]
If subjects do not have the same number of images, the missing images are padded with zeros so every features have the same shape.
This strategy preserves the relationship between images of the same subject when splitting the dataset, ensuring that all images from the same subject stay together in either train or test sets.
- Type:
Default behavior
- CONCATENATE = 1#
A subfolder “subject1” is a group of images, all images are concatenated into a single feature (NDArray[Any]) and the label is the class of the folder above (class1)
Example file tree:
dataset/class1/subject1/image1.png
dataset/class1/subject1/image2.png
dataset/class1/subject1/image3.png
Example data (if binary classification):
- features will have a shape of (224, 224, 3*num_images) (if RGB images).
Notice that the concatenation is done along the last axis.
labels = [1.0, 0.0]
If subjects do not have the same number of images, the missing images are padded with zeros so every features have the same shape.
- static _load_folder(folder_path: str, class_idx: int, num_classes: int, kwargs: dict[str, Any]) tuple[list[ndarray[Any, dtype[Any]]], ndarray[Any, dtype[Any]], tuple[str, ...]] [source]#
Load images from a single folder.
- Parameters:
folder_path (str) – Path to the folder
class_idx (int) – Index of the class
num_classes (int) – Total number of classes
kwargs (dict[str, Any]) – Additional arguments for image_dataset_from_directory
- Returns:
List of tuples containing (images, one-hot label, filepaths)
- Return type:
list[tuple[NDArray[Any], NDArray[Any], str]]
Examples
> data = GroupingStrategy._load_folder( folder_path="data/pizza/pizza1", class_idx=0, num_classes=2, kwargs={"color_mode": "grayscale"} ) > features, label, filepaths = zip(*data, strict=True)
- static image_dataset_from_directory(grouping_strategy: GroupingStrategy, path: str, seed: int, **kwargs: Any) tuple[XyTuple, tuple[str, ...], GroupingStrategy] [source]#
Load images from a directory while keeping groups of images together.
- Parameters:
grouping_strategy (GroupingStrategy) – Grouping strategy to use
path (str) – Path to the dataset directory
seed (int) – Random seed for shuffling
**kwargs (Any) – Additional arguments passed to image_dataset_from_directory
- Returns:
XyTuple with the data tuple[str, …]: List of class labels (strings) GroupingStrategy: Grouping strategy used (because it can be updated)
- Return type:
Examples
> data = GroupingStrategy.image_dataset_from_directory( grouping_strategy=GroupingStrategy.NONE, path="data/pizza", seed=42, color_mode="grayscale" ) > all_data: XyTuple = data[0] > all_labels: tuple[str, ...] = data[1]
- static fix_different_sizes(data: list[list[ndarray[Any, dtype[Any]]]], grouping_strategy: GroupingStrategy) list[list[ndarray[Any, dtype[Any]]]] [source]#
Fix different sizes of images in a list of lists of numpy arrays.
Simple strategy will add empty images to shape[0] Concatenate strategy will add empty channels to shape[-1]
- Parameters:
data (list[list[NDArray[Any]]]) – List of lists of numpy arrays
grouping_strategy (GroupingStrategy) – Grouping strategy used
- Returns:
List of lists of numpy arrays with consistent shapes
- Return type:
list[list[NDArray[Any]]]
Examples
>>> # Concatenate grouping strategy >>> data = [[np.zeros((7, 224, 224, 3))], [np.zeros((1, 224, 224, 1))]] >>> data = GroupingStrategy.fix_different_sizes(data, GroupingStrategy.CONCATENATE) >>> data[0][0].shape (7, 224, 224, 3) >>> data[1][0].shape (1, 224, 224, 3) >>> data[1][0].shape[0] == data[0][0].shape[0] False >>> data[1][0].shape[-1] == data[0][0].shape[-1] True