torch_concepts.data.splitters.random.RandomSplitter¶
- class RandomSplitter(val_size: int | float = 0.1, test_size: int | float = 0.2)[source]¶
Random splitting strategy for datasets.
Randomly divides a dataset into train, validation, and test splits. Ensures reproducibility when numpy’s random seed is set externally before calling fit().
The splitting is done in the following order: 1. Test (if test_size > 0) 2. Validation (if val_size > 0) 3. Training (remaining samples)
- Parameters:
val_size (Union[int, float], optional) – Size of validation set. If float, represents fraction of dataset. If int, represents absolute number of samples. Defaults to 0.1.
test_size (Union[int, float], optional) – Size of test set. If float, represents fraction of dataset. If int, represents absolute number of samples. Defaults to 0.2.
Example
>>> # 70% train, 10% val, 20% test >>> splitter = RandomSplitter(val_size=0.1, test_size=0.2) >>> splitter.fit(dataset) >>> print(f"Train: {splitter.train_len}, Val: {splitter.val_len}, Test: {splitter.test_len}") Train: 700, Val: 100, Test: 200
- __init__(val_size: int | float = 0.1, test_size: int | float = 0.2)[source]¶
Initialize the RandomSplitter.
- Parameters:
val_size – Size of validation set. If float, represents fraction of dataset. If int, represents absolute number of samples. Defaults to 0.1.
test_size – Size of test set. If float, represents fraction of dataset. If int, represents absolute number of samples. Defaults to 0.2.
Methods
__init__([val_size, test_size])Initialize the RandomSplitter.
fit(dataset)Randomly split the dataset into train/val/test sets.
reset()set_indices([train, val, test])split(dataset)Attributes
fittedindicestest_idxstest_lentrain_idxstrain_lenval_idxsval_len