torch_concepts.data.datasets.bnlearn.BnLearnDataset

class BnLearnDataset(name: str, root: str | None = None, seed: int = 42, n_gen: int = 10000, concept_subset: list | None = None, label_descriptions: dict | None = None, autoencoder_kwargs: dict | None = None)[source]

Dataset class for the Asia dataset from bnlearn.

This dataset represents a small expert system that models the relationship between traveling to Asia, smoking habits, and various lung diseases.

__init__(name: str, root: str | None = None, seed: int = 42, n_gen: int = 10000, concept_subset: list | None = None, label_descriptions: dict | None = None, autoencoder_kwargs: dict | None = None)[source]

Methods

__init__(name[, root, seed, n_gen, ...])

add_exogenous(name, value[, convert_precision])

add_scaler(key, scaler)

Add a scaler for preprocessing a specific tensor.

build()

Eventually build the dataset from raw data to self.root_dir folder.

download()

Downloads dataset's files to the self.root_dir folder.

load()

Loads raw dataset and preprocess data.

load_raw()

Loads raw dataset without any data preprocessing.

maybe_build()

maybe_download()

maybe_reduce_annotations(annotations[, ...])

Set concept and labels for the dataset. :param annotations: Annotations object for all concepts. :param concept_names_subset: List of strings naming the subset of concepts to use. If None, will use all concepts.

remove_exogenous(name)

set_concepts(concepts)

Set concept annotations for the dataset.

set_graph(graph)

Set the adjacency matrix of the causal graph between concepts as a pandas DataFrame.

Attributes

annotations

Annotations for the concepts in the dataset.

concept_names

List of concept names in the dataset.

exogenous

Mapping of dataset's exogenous variables.

graph

Adjacency matrix of the causal graph between concepts.

has_concepts

Whether the dataset has concept annotations.

has_exogenous

Whether the dataset has exogenous information.

n_concepts

Number of concepts in the dataset.

n_exogenous

Number of exogenous variables in the dataset.

n_features

Shape of features in dataset's input (excluding number of samples).

n_samples

Number of samples in the dataset.

processed_filenames

List of processed filenames that will be created during build step.

processed_paths

The absolute paths of the processed files that must be present in order to skip building.

raw_filenames

List of raw filenames that need to be present in the raw directory for the dataset to be considered present.

raw_paths

The absolute paths of the raw files that must be present in order to skip downloading.

root_dir

shape

Shape of the input tensor.