Mid-level API

Mid-level APIs allow you to build custom interpretable and causally transparent probabilistic models.

Warning

This API is still under development and interfaces might change in future releases.

Documentation

Design principles

Probabilistic Models

At this API level, models are represented as probabilistic models where:

  • Variable objects represent random variables in the probabilistic model. Variables are defined by their name, parents, and distribution type. For instance we can define a list of three concepts as:

    concepts = pyc.EndogenousVariable(
       concepts=["c1", "c2", "c3"],
       parents=[],
       distribution=torch.distributions.RelaxedBernoulli
    )
    
  • ParametricCPD objects represent conditional probability distributions (CPDs) between variables in the probabilistic model and are parameterized by pyc_logo PyC layers. For instance we can define a list of three parametric CPDs for the above concepts as:

    concept_cpd = pyc.nn.ParametricCPD(
       concepts=["c1", "c2", "c3"],
       parametrization=pyc.nn.LinearZC(in_features=10, out_features=3)
    )
    
  • ProbabilisticModel objects are a collection of variables and CPDs. For instance we can define a model as:

    probabilistic_model = pyc.nn.ProbabilisticModel(
       variables=concepts,
       parametric_cpds=concept_cpd
    )
    

Inference

Inference is performed using efficient tensorial probabilistic inference algorithms. For instance, we can perform ancestral sampling as:

inference_engine = pyc.nn.AncestralSamplingInference(
    probabilistic_model=probabilistic_model,
    graph_learner=wanda,
    temperature=1.
)
predictions = inference_engine.query(["c1"], evidence={'input': x})

Structural Equation Models

pyc_logo PyC can be used to design Structural Equation Models (SEMs), where:

  • ExogenousVariable and EndogenousVariable objects represent random variables in the SEM. Variables are defined by their name, parents, and distribution type. For example, in this guide we define variables as:

    exogenous_var = ExogenousVariable(
        "exogenous",
        parents=[],
        distribution=RelaxedBernoulli
    )
    genotype_var = EndogenousVariable(
        "genotype",
        parents=["exogenous"],
        distribution=RelaxedBernoulli
    )
    
  • ParametricCPD objects represent the structural equations (causal mechanisms) between variables in the SEM and are parameterized by pyc_logo PyC or pytorch_logo PyTorch modules. For example:

    genotype_cpd = ParametricCPD(
        "genotype",
        parametrization=torch.nn.Sequential(
            torch.nn.Linear(1, 1),
            torch.nn.Sigmoid()
        )
    )
    
  • ProbabilisticModel objects collect all variables and CPDs to define the full SEM. For example:

    sem_model = ProbabilisticModel(
        variables=[exogenous_var, genotype_var],
        parametric_cpds=[exogenous_cpd, genotype_cpd]
    )
    

Interventions

Interventions allow us to estimate causal effects. For instance, do-interventions allow us to set specific variables to fixed values and observe the effect on downstream variables simulating a randomized controlled trial.

To perform a do-intervention, use the DoIntervention strategy and the intervention context manager. For example, to set smoking to 0 (prevent smoking) and query the effect on downstream variables:

# Intervention: Force smoking to 0 (prevent smoking)
smoking_strategy_0 = DoIntervention(
    model=sem_model.parametric_cpds,
    constants=0.0
)

with intervention(
   policies=UniformPolicy(out_features=1),
   strategies=smoking_strategy_0,
   target_concepts=["smoking"]
):
    intervened_results_0 = inference_engine.query(
        query_concepts=["genotype", "smoking", "tar", "cancer"],
        evidence=initial_input
    )
    # Results reflect the effect of setting smoking=0

You can use these interventional results to estimate causal effects, such as the Average Causal Effect (ACE), as shown in later steps of this guide.