torch_concepts.data.splitters.random.RandomSplitter¶

class RandomSplitter(val_size: int | float = 0.1, test_size: int | float = 0.2)[source]¶

Random splitting strategy for datasets.

Randomly divides a dataset into train, validation, and test splits. Ensures reproducibility when numpy’s random seed is set externally before calling fit().

The splitting is done in the following order: 1. Test (if test_size > 0) 2. Validation (if val_size > 0) 3. Training (remaining samples)

Parameters:

val_size (Union[int, float], optional) – Size of validation set. If float, represents fraction of dataset. If int, represents absolute number of samples. Defaults to 0.1.
test_size (Union[int, float], optional) – Size of test set. If float, represents fraction of dataset. If int, represents absolute number of samples. Defaults to 0.2.

Example

>>> # 70% train, 10% val, 20% test
>>> splitter = RandomSplitter(val_size=0.1, test_size=0.2)
>>> splitter.fit(dataset)
>>> print(f"Train: {splitter.train_len}, Val: {splitter.val_len}, Test: {splitter.test_len}")
Train: 700, Val: 100, Test: 200

__init__(val_size: int | float = 0.1, test_size: int | float = 0.2)[source]¶

Initialize the RandomSplitter.

Parameters:

val_size – Size of validation set. If float, represents fraction of dataset. If int, represents absolute number of samples. Defaults to 0.1.
test_size – Size of test set. If float, represents fraction of dataset. If int, represents absolute number of samples. Defaults to 0.2.

Methods

`__init__`([val_size, test_size])	Initialize the RandomSplitter.
`fit`(dataset)	Randomly split the dataset into train/val/test sets.
`reset`()
`set_indices`([train, val, test])
`split`(dataset)

Attributes

`fitted`
`indices`
`test_idxs`
`test_len`
`train_idxs`
`train_len`
`val_idxs`
`val_len`