Data Utilities

This module provides utility functions for data manipulation and processing.

Summary

Utility Functions

ensure_list

Ensure a value is converted to a list.

files_exist

Check if all files in a sequence exist.

parse_tensor

Convert input data to torch tensor with appropriate format.

convert_precision

Convert tensor to specified precision.

colorize

Colorize grayscale images based on specified colors.

affine_transform

Apply affine transformations to a batch of images.

transform_images

Apply a sequence of transformations to a batch of images.

assign_random_values

Create a vector of random values for each sample in concepts.

assign_values_based_on_intervals

Create a vector of values (0 or 1) for each sample in concepts based on intervals given.

colorize_and_transform

Colorize and transform MNIST images based on specified coloring scheme.

Function Documentation

ensure_list(value: Any) List[source]

Ensure a value is converted to a list. If the value is iterable (but not a string or dict), converts it to a list. Otherwise, wraps it in a list.

Parameters:

value – Any value to convert to list.

Returns:

The value as a list.

Return type:

List

Examples

>>> ensure_list([1, 2, 3])
[1, 2, 3]
>>> ensure_list((1, 2, 3))
[1, 2, 3]
>>> ensure_list(5)
[5]
>>> ensure_list("hello")
['hello']
>>> ensure_list({'a': 1, 'b': 2})  
TypeError: Cannot convert dict to list. Use list(dict.values())
or list(dict.keys()) explicitly.
files_exist(files: Sequence[str]) bool[source]

Check if all files in a sequence exist.

Parameters:

files – Sequence of file paths to check.

Returns:

True if all files exist, False otherwise.

Returns True for empty sequences (vacuous truth).

Return type:

bool

parse_tensor(data: ndarray | DataFrame | Tensor | list, name: str, precision: int | str) Tensor[source]

Convert input data to torch tensor with appropriate format.

Supports conversion from numpy arrays, pandas DataFrames, or existing tensors.

Parameters:
  • data – Input data as numpy array, DataFrame, Tensor, list.

  • name – Name of the data (for error messages).

  • precision – Desired numerical precision (16, 32, or 64).

Returns:

Converted tensor with specified precision.

Return type:

Tensor

Raises:

TypeError – If data is not in a supported format.

convert_precision(tensor: Tensor, precision: int | str) Tensor[source]

Convert tensor to specified precision.

Parameters:
  • tensor – Input tensor.

  • precision – Target precision (“float16”, “float32”, or “float64”, or 16, 32, 64).

Returns:

Tensor converted to specified precision.

Return type:

Tensor

colorize(images, colors)[source]

Colorize grayscale images based on specified colors.

Converts grayscale images to RGB by assigning the intensity to one of three color channels (red, green, or blue).

Parameters:
  • images – Tensor of shape (N, H, W) containing grayscale images.

  • colors – Tensor of shape (N) containing color labels (0=red, 1=green, 2=blue).

Returns:

Colored images of shape (N, 3, H, W).

Return type:

Tensor

Raises:

AssertionError – If colors contain values other than 0, 1, or 2.

affine_transform(images, degrees, scales, batch_size=512)[source]

Apply affine transformations to a batch of images.

Applies rotation and scaling transformations to each image.

Parameters:
  • images – Tensor of shape (N, H, W) or (N, 3, H, W).

  • degrees – Tensor of shape (N) containing rotation degrees.

  • scales – Tensor of shape (N) containing scaling factors.

  • batch_size – Number of images to process at once (default: 512).

Returns:

Transformed images with same shape as input.

Return type:

Tensor

transform_images(images, transformations, colors=None, degrees=None, scales=None)[source]

Apply a sequence of transformations to a batch of images.

Parameters:
  • images – Tensor of shape [N, H, W] or [N, 3, H, W].

  • transformations – List of transformation names (e.g., [‘colorize’, ‘affine’]).

  • colors – Optional color labels for colorization.

  • degrees – Optional rotation degrees for affine transform.

  • scales – Optional scaling factors for affine transform.

Returns:

Transformed images.

Return type:

Tensor

assign_random_values(concept, random_prob=[0.5, 0.5], values=[0, 1])[source]

Create a vector of random values for each sample in concepts. :param concepts: Tensor of shape (N) containing concept values (e.g. digit labels 0-9). :param random_prob: List of probabilities for each value. :param values: List of output values corresponding to each probability.

Returns:

Tensor of shape (N) containing final values.

Return type:

outputs

assign_values_based_on_intervals(concept, intervals, values)[source]

Create a vector of values (0 or 1) for each sample in concepts based on intervals given. If a concept value belongs to interval[i], it gets an output value randomly chosen among values[i]. :param concept: Tensor of shape (N) containing concept values (e.g. digit labels 0-9). :param intervals: List of lists, each inner list contains the values defining an interval. :param values: List of lists of output values corresponding to each interval.

Returns:

Tensor of shape (N) containing final values.

Return type:

outputs

colorize_and_transform(data, targets, training_percentage=0.8, test_percentage=0.2, training_mode=['random'], test_mode=['random'], training_kwargs=[{}], test_kwargs=[{}])[source]
Colorize and transform MNIST images based on specified coloring scheme.

The coloring scheme is defined differently for training and test data. It can contain parameters for coloring, scale and rotating images.

Parameters:
  • data – Tensor of shape (N, 28, 28) containing grayscale MNIST images.

  • targets – Tensor of shape (N) containing target values (0-9).

  • training_percentage – Percentage of data to color for training.

  • test_percentage – Percentage of data to color for testing.

  • training_mode – List of coloring modes for training data. Options are ‘random’ and ‘

  • test_mode – List of coloring modes for test data. Options are ‘random’ and ‘digits’.

  • training_kwargs – List of dictionaries containing additional arguments for each training mode.

  • test_kwargs – List of dictionaries containing additional arguments for each test mode.

Returns:

Tensor of shape (N, 3, 28, 28) containing colorized and/or transformed images. concepts: Dictionary containing values of the parameters used for coloring and transformations (e.g., colors, scales, degrees). targets: Tensor of shape (N) containing target values (0-9). coloring_mode: List of strings indicating the coloring mode used for each sample (‘training’ or ‘test’).

Return type:

input

Note: data and targets are shuffled before applying the coloring scheme.