Interpretable Layers and Interventions

The Low-Level API provides building blocks to create concept-based models using interpretable layers and perform interventions using a PyTorch-like interface.

Design Principles

Overview of Data Representations

In pyc_logo PyC, we distinguish between three types of data representations:

  • Input: High-dimensional representations where exogenous and endogenous information is entangled

  • Exogenous: Representations that are direct causes of endogenous variables

  • Endogenous: Representations of observable quantities of interest

Layer Types

In pyc_logo PyC you will find three types of layers whose interfaces reflect the distinction between data representations:

  • Encoder layers: Never take as input endogenous variables

  • Predictor layers: Must take as input a set of endogenous variables

  • Special layers: Perform operations like memory selection or graph learning

Layer Naming Standard

In order to easily identify the type of layer, pyc_logo PyC uses a consistent standard to assign names to layers. Each layer name follows the format:

<LayerType><InputType><OutputType>

where:

  • LayerType: describes the type of layer (e.g., Linear, HyperLinear, Selector, Transformer, etc…)

  • InputType and OutputType: describe the type of data representations the layer takes as input and produces as output. pyc_logo PyC uses the following abbreviations:

    • Z: Input

    • U: Exogenous

    • C: Endogenous

For instance, a layer named LinearZC is a linear layer that takes as input an Input representation and produces an Endogenous representation. Since it does not take as input any endogenous variables, it is an encoder layer.

pyc.nn.LinearZC(in_features=10, out_features=3)

As another example, a layer named HyperLinearCUC is a hyper-network layer that takes as input both Endogenous and Exogenous representations and produces an Endogenous representation. Since it takes as input endogenous variables, it is a predictor layer.

pyc.nn.HyperLinearCUC(
   in_features_endogenous=10,
   in_features_exogenous=7,
   embedding_size=24,
   out_features=3
)

As a final example, graph learners are a special layers that learn relationships between concepts. They do not follow the standard naming convention of encoders and predictors, but their purpose should be clear from their name.

wanda = pyc.nn.WANDAGraphLearner(
   ['c1', 'c2', 'c3'],
   ['task A', 'task B', 'task C']
)

Detailed Guides

Concept Bottleneck Model

Import Libraries

To get started, import pyc_logo PyC and pytorch_logo PyTorch:

import torch
import torch_concepts as pyc

Create Sample Data

Generate random inputs and targets for demonstration:

batch_size = 32
input_dim = 64
n_concepts = 5
n_tasks = 3

# Random input
x = torch.randn(batch_size, input_dim)

# Random concept labels (binary)
concept_labels = torch.randint(0, 2, (batch_size, n_concepts)).float()

# Random task labels
task_labels = torch.randint(0, n_tasks, (batch_size,))

Build a Concept Bottleneck Model

Use a ModuleDict to combine encoder and predictor:

# Create model using ModuleDict
model = torch.nn.ModuleDict({
    'encoder': pyc.nn.LinearZC(
        in_features=input_dim,
        out_features=n_concepts
    ),
    'predictor': pyc.nn.LinearCC(
        in_features_endogenous=n_concepts,
        out_features=n_tasks
    ),
})
Inference and Training

Inference

Once a concept bottleneck model is built, we can perform inference by first obtaining concept activations from the encoder, and then task predictions from the predictor:

# Get concept endogenous from input
concept_endogenous = model['encoder'](input=x)

# Get task predictions from concept endogenous
task_endogenous = model['predictor'](endogenous=concept_endogenous)

print(f"Concept endogenous shape: {concept_endogenous.shape}")  # [32, 5]
print(f"Task endogenous shape: {task_endogenous.shape}")        # [32, 3]

Compute Loss and Train

Train with both concept and task supervision:

import torch.nn.functional as F

# Compute losses
concept_loss = F.binary_cross_entropy(torch.sigmoid(concept_endogenous), concept_labels)
task_loss = F.cross_entropy(task_endogenous, task_labels)
total_loss = task_loss + 0.5 * concept_loss

# Backpropagation
total_loss.backward()

print(f"Concept loss: {concept_loss.item():.4f}")
print(f"Task loss: {task_loss.item():.4f}")
Interventions

Intervene using the intervention context manager which replaces the encoder layer temporarily. The context manager takes two main arguments: strategies and policies.

  • Intervention strategies define how the layer behaves during the intervention, e.g., setting concept endogenous to ground truth values.

  • Intervention policies define the priority/order of concepts to intervene on.

from torch_concepts.nn import GroundTruthIntervention, UniformPolicy
from torch_concepts.nn import intervention

ground_truth = 10 * torch.rand_like(concept_endogenous)
strategy = GroundTruthIntervention(model=model['encoder'], ground_truth=ground_truth)
policy = UniformPolicy(out_features=n_concepts)

# Apply intervention to encoder
with intervention(
    policies=policy,
    strategies=strategy,
    target_concepts=[0, 2]
) as new_encoder_layer:
    intervened_concepts = new_encoder_layer(input=x)
    intervened_tasks = model['predictor'](endogenous=intervened_concepts)

print(f"Original concept endogenous: {concept_endogenous[0]}")
print(f"Original task predictions: {task_endogenous[0]}")
print(f"Intervened concept endogenous: {intervened_concepts[0]}")
print(f"Intervened task predictions: {intervened_tasks[0]}")
(Advanced) Graph Learning

Add a graph learner to discover concept relationships:

# Define concept and task names
concept_names = ['round', 'smooth', 'bright', 'large', 'centered']

# Create WANDA graph learner
graph_learner = pyc.nn.WANDAGraphLearner(
    row_labels=concept_names,
    col_labels=concept_names
)

print(f"Learned graph shape: {graph_learner.weighted_adj}")

The graph_learner.weighted_adj tensor contains a learnable adjacency matrix representing relationships between concepts.

(Advanced) Verifiable Concept-Based Models

To design more complex concept-based models, you can combine multiple interpretable layers. For example, to build a verifiable concept-based model we can use an encoder to predict concept activations, a selector to select relevant exogenous information, and a hyper-network predictor to make final predictions based on both concept activations and exogenous information.

from torch_concepts.nn import LinearZC, SelectorZU, HyperLinearCUC

memory_size = 7
exogenous_size = 16
embedding_size = 5

# Create model using ModuleDict
model = torch.nn.ModuleDict({
    'encoder': LinearZC(
        in_features=input_dim,
        out_features=n_concepts
    ),
    'selector': SelectorZU(
        in_features=input_dim,
        memory_size=memory_size,
        exogenous_size=exogenous_size,
        out_features=n_tasks
    ),
    'predictor': HyperLinearCUC(
        in_features_endogenous=n_concepts,
        in_features_exogenous=exogenous_size,
        embedding_size=embedding_size,
    )
})

Next Steps