torch_concepts.data.AWA2Dataset

class AWA2Dataset(root: str | None = None, image_size: int = 224, concept_subset: list | None = None, label_descriptions: Mapping | None = None)[source]

Dataset class for Animals with Attributes 2 (AwA2).

AwA2 pairs 37,322 animal images across 50 classes with 85 binary semantic attributes (colour, shape, behaviour, habitat, …).

The concept vector per sample contains:

  • columns 0-84: 85 binary semantic attributes (cardinality 1 each)

  • column 85: animal class index 0-49 (cardinality 50)

Parameters:
  • root (str, optional) – Root directory where the dataset is stored. Defaults to ./data/AWA2.

  • image_size (int, optional) – Side length (px) to resize images to. Default: 224.

  • concept_subset (list of str, optional) – Subset of concept names to retain. None keeps all 86.

  • label_descriptions (dict, optional) – Mapping from concept name to human-readable description.

__init__(root: str | None = None, image_size: int = 224, concept_subset: list | None = None, label_descriptions: Mapping | None = None)[source]

Methods

__init__([root, image_size, concept_subset, ...])

add_exogenous(name, value[, convert_precision])

add_scaler(key, scaler)

Add a scaler for preprocessing a specific tensor.

build()

Process raw AwA2 files and save cached dataset artefacts.

collate(samples)

Collate samples into a batch, re-annotating the ground-truth concepts.

download()

Download raw AwA2 data from official sources.

load()

Loads raw dataset and preprocess data.

load_raw()

Load processed artefacts from disk.

maybe_build()

maybe_download()

remove_exogenous(name)

set_concepts(concepts)

Set concept annotations for the dataset.

set_graph(graph)

Set the adjacency matrix of the causal graph between concepts as a pandas DataFrame.

Attributes

annotations

Annotations for the concepts in the dataset.

concept_names

List of concept names in the dataset.

exogenous

Mapping of dataset's exogenous variables.

graph

Adjacency matrix of the causal graph between concepts.

has_concepts

Whether the dataset has concept annotations.

has_exogenous

Whether the dataset has exogenous information.

n_concepts

Number of concepts in the dataset.

n_exogenous

Number of exogenous variables in the dataset.

n_features

Shape of features in dataset's input (excluding number of samples).

n_samples

Number of samples in the dataset.

processed_filenames

The list of processed filenames in the self.root_dir folder that must be present in order to skip build().

processed_paths

The absolute paths of the processed files that must be present in order to skip building.

raw_filenames

The list of raw filenames in the self.root_dir folder that must be present in order to skip download().

raw_paths

The absolute paths of the raw files that must be present in order to skip downloading.

root_dir

shape

Shape of the input tensor.