Data Utilities¶

This module provides utility functions for data manipulation and processing.

Summary¶

Utility Functions

`ensure_list`	Ensure a value is converted to a list.
`files_exist`	Check if all files in a sequence exist.
`parse_tensor`	Convert input data to torch tensor with appropriate format.
`convert_precision`	Convert tensor to specified precision.
`colorize`	Colorize grayscale images based on specified colors.
`affine_transform`	Apply affine transformations to a batch of images.
`transform_images`	Apply a sequence of transformations to a batch of images.
`assign_random_values`	Create a vector of random values for each sample in concepts.
`assign_values_based_on_intervals`	Create a vector of values (0 or 1) for each sample in concepts based on intervals given.
`colorize_and_transform`	Colorize and transform MNIST images based on specified coloring scheme.

Function Documentation¶

ensure_list(value: Any) → List[source]¶

Ensure a value is converted to a list. If the value is iterable (but not a string or dict), converts it to a list. Otherwise, wraps it in a list.

Parameters:: value – Any value to convert to list.
Returns:: The value as a list.
Return type:: List

Examples

>>> ensure_list([1, 2, 3])
[1, 2, 3]
>>> ensure_list((1, 2, 3))
[1, 2, 3]
>>> ensure_list(5)
[5]
>>> ensure_list("hello")
['hello']
>>> ensure_list({'a': 1, 'b': 2})  
TypeError: Cannot convert dict to list. Use list(dict.values())
or list(dict.keys()) explicitly.

files_exist(files: Sequence[str]) → bool[source]¶

Check if all files in a sequence exist.

Parameters:

files – Sequence of file paths to check.

Returns:

True if all files exist, False otherwise.: Returns True for empty sequences (vacuous truth).

Return type:

bool

parse_tensor(data: ndarray | DataFrame | Tensor | list, name: str, precision: int | str) → Tensor[source]¶

Convert input data to torch tensor with appropriate format.

Supports conversion from numpy arrays, pandas DataFrames, or existing tensors.

Parameters:

data – Input data as numpy array, DataFrame, Tensor, list.
name – Name of the data (for error messages).
precision – Desired numerical precision (16, 32, or 64).

Returns:

Converted tensor with specified precision.

Return type:

Tensor

Raises:

TypeError – If data is not in a supported format.

convert_precision(tensor: Tensor, precision: int | str) → Tensor[source]¶

Convert tensor to specified precision.

Parameters:

tensor – Input tensor.
precision – Target precision (“float16”, “float32”, or “float64”, or 16, 32, 64).

Returns:

Tensor converted to specified precision.

Return type:

Tensor

colorize(images, colors)[source]¶

Colorize grayscale images based on specified colors.

Converts grayscale images to RGB by assigning the intensity to one of three color channels (red, green, or blue).

Parameters:

images – Tensor of shape (N, H, W) containing grayscale images.
colors – Tensor of shape (N) containing color labels (0=red, 1=green, 2=blue).

Returns:

Colored images of shape (N, 3, H, W).

Return type:

Tensor

Raises:

AssertionError – If colors contain values other than 0, 1, or 2.

affine_transform(images, degrees, scales, batch_size=512)[source]¶

Apply affine transformations to a batch of images.

Applies rotation and scaling transformations to each image.

Parameters:

images – Tensor of shape (N, H, W) or (N, 3, H, W).
degrees – Tensor of shape (N) containing rotation degrees.
scales – Tensor of shape (N) containing scaling factors.
batch_size – Number of images to process at once (default: 512).

Returns:

Transformed images with same shape as input.

Return type:

Tensor

transform_images(images, transformations, colors=None, degrees=None, scales=None)[source]¶

Apply a sequence of transformations to a batch of images.

Parameters:

images – Tensor of shape [N, H, W] or [N, 3, H, W].
transformations – List of transformation names (e.g., [‘colorize’, ‘affine’]).
colors – Optional color labels for colorization.
degrees – Optional rotation degrees for affine transform.
scales – Optional scaling factors for affine transform.

Returns:

Transformed images.

Return type:

Tensor

assign_random_values(concept, random_prob=[0.5, 0.5], values=[0, 1])[source]¶

Create a vector of random values for each sample in concepts. :param concepts: Tensor of shape (N) containing concept values (e.g. digit labels 0-9). :param random_prob: List of probabilities for each value. :param values: List of output values corresponding to each probability.

Returns:: Tensor of shape (N) containing final values.
Return type:: outputs

assign_values_based_on_intervals(concept, intervals, values)[source]¶

Create a vector of values (0 or 1) for each sample in concepts based on intervals given. If a concept value belongs to interval[i], it gets an output value randomly chosen among values[i]. :param concept: Tensor of shape (N) containing concept values (e.g. digit labels 0-9). :param intervals: List of lists, each inner list contains the values defining an interval. :param values: List of lists of output values corresponding to each interval.

Returns:: Tensor of shape (N) containing final values.
Return type:: outputs

colorize_and_transform(data, targets, training_percentage=0.8, test_percentage=0.2, training_mode=['random'], test_mode=['random'], training_kwargs=[{}], test_kwargs=[{}])[source]¶

Colorize and transform MNIST images based on specified coloring scheme.: The coloring scheme is defined differently for training and test data. It can contain parameters for coloring, scale and rotating images.

Parameters:

data – Tensor of shape (N, 28, 28) containing grayscale MNIST images.
targets – Tensor of shape (N) containing target values (0-9).
training_percentage – Percentage of data to color for training.
test_percentage – Percentage of data to color for testing.
training_mode – List of coloring modes for training data. Options are ‘random’ and ‘
test_mode – List of coloring modes for test data. Options are ‘random’ and ‘digits’.
training_kwargs – List of dictionaries containing additional arguments for each training mode.
test_kwargs – List of dictionaries containing additional arguments for each test mode.

Returns:

Tensor of shape (N, 3, 28, 28) containing colorized and/or transformed images. concepts: Dictionary containing values of the parameters used for coloring and transformations (e.g., colors, scales, degrees). targets: Tensor of shape (N) containing target values (0-9). coloring_mode: List of strings indicating the coloring mode used for each sample (‘training’ or ‘test’).

Return type:

input

Note: data and targets are shuffled before applying the coloring scheme.