stouputils.data_science.dataset.dataset_loader module#

This module contains the DatasetLoader class which handles dataset loading operations.

The DatasetLoader class provides the following key features:

  • Loading image datasets from directories using keras.image_dataset_from_directory

  • Handling different grouping strategies (when having multiple images per subject)

  • Preventing data leakage between train/test sets when using data augmentation

  • Ensuring test data consistency when loading an augmented dataset

class DatasetLoader[source]#

Bases: object

Handles dataset loading operations

static from_path(path: str, loading_type: Literal['image'] = 'image', seed: int = 42, test_size: float = 0.2, val_size: float = 0.2, grouping_strategy: GroupingStrategy = GroupingStrategy.NONE, based_of: str = '', **kwargs: Any) Dataset[source]#

Create a balanced dataset from a path.

Parameters:
  • path (str) – Path to the dataset

  • loading_type (Literal["image"]) – Type of the dataset

  • seed (int) – Seed for the random generator

  • test_size (float) – Size of the test dataset (0 means no test set)

  • val_size (float) – Size of the validation dataset (0 means no validation set)

  • grouping_strategy (GroupingStrategy) – Grouping strategy for the dataset (ex: GroupingStrategy.CONCATENATE)

  • based_of (str) – Assuming path is an augmentation of based_of, this parameter is used to load the original dataset and prevent having test_data that have augmented images in the training set

  • **kwargs (Any) – Keyword arguments for the loading function (ex for image: keras.src.utils.image_dataset_from_directory(…, **kwargs))

Returns:

Dataset object

Return type:

Dataset

Examples

> dataset = DatasetLoader.from_path(
        path="data/pizza_augmented",
        loading_type="image",
        seed=42,
        test_size=0.2,
        val_size=0.2,
        grouping_strategy=GroupingStrategy.NONE,
        based_of="data/pizza",

        # Image loading kwargs
        color_mode="grayscale",
        image_size=(224, 224),
)