stouputils.data_science.dataset package#
Package for advanced dataset handling.
Provides comprehensive tools for loading, processing and managing image datasets with special handling for augmented data and group-aware operations.
Main Components:
Dataset : Core class for storing and managing dataset splits with metadata
DatasetLoader : Handles dataset loading from directories with various strategies
DatasetSplitter : Manages stratified splitting while maintaining group integrity
GroupingStrategy : Enum defining image grouping approaches (NONE/SIMPLE/CONCATENATE)
XyTuple : Specialized container for features/labels with file tracking
Key Features:
Augmented data handling with original file mapping
Prevention of data leakage between train/test sets
Support for multiple grouping strategies at subject/image level
Class-aware dataset splitting with stratification
Comprehensive metadata tracking (class distributions, file paths)
Compatibility with keras.image_dataset_from_directory
Group-aware k-fold cross validation support
Submodules#
- stouputils.data_science.dataset.dataset module
DEFAULT_IMAGE_KWARGS
Dataset
Dataset._training_data
Dataset._val_data
Dataset._test_data
Dataset.num_classes
Dataset.name
Dataset.loading_type
Dataset.grouping_strategy
Dataset.labels
Dataset.class_distribution
Dataset.original_dataset
Dataset._get_num_classes()
Dataset._update_class_distribution()
Dataset.exclude_augmented_images_from_val_test()
Dataset.get_experiment_name()
- stouputils.data_science.dataset.dataset_loader module
- stouputils.data_science.dataset.grouping_strategy module
- stouputils.data_science.dataset.image_loader module
- stouputils.data_science.dataset.xy_tuple module
XyTuple
XyTuple._X
XyTuple._y
XyTuple.filepaths
XyTuple.augmented_files
XyTuple.n_samples
XyTuple.is_empty()
XyTuple.update_augmented_files()
XyTuple.group_by_original()
XyTuple.get_indices_from_originals()
XyTuple.create_subset()
XyTuple.remove_augmented_files()
XyTuple.split()
XyTuple.kfold_split()
XyTuple.ungrouped_array()
XyTuple.empty()