stouputils.data_science.dataset.dataset_loader module#
This module contains the DatasetLoader class which handles dataset loading operations.
The DatasetLoader class provides the following key features:
Loading image datasets from directories using keras.image_dataset_from_directory
Handling different grouping strategies (when having multiple images per subject)
Preventing data leakage between train/test sets when using data augmentation
Ensuring test data consistency when loading an augmented dataset
- class DatasetLoader[source]#
Bases:
object
Handles dataset loading operations
- static from_path(path: str, loading_type: Literal['image'] = 'image', seed: int = 42, test_size: float = 0.2, val_size: float = 0.2, grouping_strategy: GroupingStrategy = GroupingStrategy.NONE, based_of: str = '', **kwargs: Any) Dataset [source]#
Create a balanced dataset from a path.
- Parameters:
path (str) – Path to the dataset
loading_type (Literal["image"]) – Type of the dataset
seed (int) – Seed for the random generator
test_size (float) – Size of the test dataset (0 means no test set)
val_size (float) – Size of the validation dataset (0 means no validation set)
grouping_strategy (GroupingStrategy) – Grouping strategy for the dataset (ex: GroupingStrategy.CONCATENATE)
based_of (str) – Assuming path is an augmentation of based_of, this parameter is used to load the original dataset and prevent having test_data that have augmented images in the training set
**kwargs (Any) – Keyword arguments for the loading function (ex for image: keras.src.utils.image_dataset_from_directory(…, **kwargs))
- Returns:
Dataset object
- Return type:
Examples
> dataset = DatasetLoader.from_path( path="data/pizza_augmented", loading_type="image", seed=42, test_size=0.2, val_size=0.2, grouping_strategy=GroupingStrategy.NONE, based_of="data/pizza", # Image loading kwargs color_mode="grayscale", image_size=(224, 224), )