torch_concepts.data.splitters.random.RandomSplitter

class RandomSplitter(val_size: int | float = 0.1, test_size: int | float = 0.2)[source]

Random splitting strategy for datasets.

Randomly divides a dataset into train, validation, and test splits. Ensures reproducibility when numpy’s random seed is set externally before calling fit().

The splitting is done in the following order: 1. Test (if test_size > 0) 2. Validation (if val_size > 0) 3. Training (remaining samples)

Parameters:
  • val_size (Union[int, float], optional) – Size of validation set. If float, represents fraction of dataset. If int, represents absolute number of samples. Defaults to 0.1.

  • test_size (Union[int, float], optional) – Size of test set. If float, represents fraction of dataset. If int, represents absolute number of samples. Defaults to 0.2.

Example

>>> # 70% train, 10% val, 20% test
>>> splitter = RandomSplitter(val_size=0.1, test_size=0.2)
>>> splitter.fit(dataset)
>>> print(f"Train: {splitter.train_len}, Val: {splitter.val_len}, Test: {splitter.test_len}")
Train: 700, Val: 100, Test: 200
__init__(val_size: int | float = 0.1, test_size: int | float = 0.2)[source]

Initialize the RandomSplitter.

Parameters:
  • val_size – Size of validation set. If float, represents fraction of dataset. If int, represents absolute number of samples. Defaults to 0.1.

  • test_size – Size of test set. If float, represents fraction of dataset. If int, represents absolute number of samples. Defaults to 0.2.

Methods

__init__([val_size, test_size])

Initialize the RandomSplitter.

fit(dataset)

Randomly split the dataset into train/val/test sets.

reset()

set_indices([train, val, test])

split(dataset)

Attributes

fitted

indices

test_idxs

test_len

train_idxs

train_len

val_idxs

val_len