torch_concepts.data.PendulumDataset

class PendulumDataset(root: str | None = None, n_theta: int = 100, n_phi: int = 1000, seed: int = 42, concept_subset: list | None = None, label_descriptions: dict | None = None)[source]

Procedurally generated pendulum scene dataset for regression.

Each sample is a rendered image of a pendulum with a light source casting a shadow. The concepts are the pendulum angle (theta) and the light angle (phi), both continuous. The regression task is to predict the x-coordinate of the pendulum ball.

Parameters:
  • root (str, optional) – Root directory to store/load the dataset. If None, defaults to './data/pendulum'.

  • n_theta (int, optional) – Number of theta angle steps for generation. Default: 100

  • n_phi (int, optional) – Number of phi angle steps for generation. Default: 1000

  • seed (int, optional) – Random seed for reproducibility. Default: 42

  • concept_subset (list of str, optional) – Subset of concept names to use. Default: None (all concepts).

input_data

List of image filenames (images loaded on-the-fly).

Type:

list

concepts

Tensor of shape (n_samples, 3) containing [theta, phi, pendulum_x].

Type:

torch.Tensor

Examples

>>> from torch_concepts.data import PendulumDataset
>>> dataset = PendulumDataset(root='./data/pendulum', n_theta=10, n_phi=10)
>>> sample = dataset[0]
>>> x = sample['inputs']['x']  # image tensor (C, H, W)
>>> c = sample['concepts']['c']  # [theta, phi, pendulum_x]
__init__(root: str | None = None, n_theta: int = 100, n_phi: int = 1000, seed: int = 42, concept_subset: list | None = None, label_descriptions: dict | None = None)[source]

Methods

__init__([root, n_theta, n_phi, seed, ...])

add_exogenous(name, value[, convert_precision])

add_scaler(key, scaler)

Add a scaler for preprocessing a specific tensor.

build()

Generate pendulum images and save metadata to disk.

collate(samples)

Collate samples into a batch, re-annotating the ground-truth concepts.

download()

This dataset is procedurally generated.

load()

Loads raw dataset and preprocess data.

load_raw()

Loads raw dataset without any data preprocessing.

maybe_build()

maybe_download()

remove_exogenous(name)

set_concepts(concepts)

Set concept annotations for the dataset.

set_graph(graph)

Set the adjacency matrix of the causal graph between concepts as a pandas DataFrame.

Attributes

annotations

Annotations for the concepts in the dataset.

concept_names

List of concept names in the dataset.

exogenous

Mapping of dataset's exogenous variables.

graph

Adjacency matrix of the causal graph between concepts.

has_concepts

Whether the dataset has concept annotations.

has_exogenous

Whether the dataset has exogenous information.

n_concepts

Number of concepts in the dataset.

n_exogenous

Number of exogenous variables in the dataset.

n_features

Shape of features in dataset's input (excluding number of samples).

n_samples

Number of samples in the dataset.

processed_filenames

The list of processed filenames in the self.root_dir folder that must be present in order to skip build().

processed_paths

The absolute paths of the processed files that must be present in order to skip building.

raw_filenames

The list of raw filenames in the self.root_dir folder that must be present in order to skip download().

raw_paths

The absolute paths of the raw files that must be present in order to skip downloading.

root_dir

shape

Shape of the input tensor.