Mid-level API¶
Mid-level APIs allow you to build custom interpretable and causally transparent probabilistic models.
Warning
This API is still under development and interfaces might change in future releases.
Documentation¶
Design principles¶
Probabilistic Models¶
At this API level, models are represented as probabilistic models where:
Variableobjects represent random variables in the probabilistic model. Variables are defined by their name, parents, and distribution type. For instance we can define a list of three concepts as:concepts = pyc.EndogenousVariable( concepts=["c1", "c2", "c3"], parents=[], distribution=torch.distributions.RelaxedBernoulli )
ParametricCPDobjects represent conditional probability distributions (CPDs) between variables in the probabilistic model and are parameterized byPyC layers. For instance we can define a list of three parametric CPDs for the above concepts as:
concept_cpd = pyc.nn.ParametricCPD( concepts=["c1", "c2", "c3"], parametrization=pyc.nn.LinearZC(in_features=10, out_features=3) )
ProbabilisticModelobjects are a collection of variables and CPDs. For instance we can define a model as:probabilistic_model = pyc.nn.ProbabilisticModel( variables=concepts, parametric_cpds=concept_cpd )
Inference¶
Inference is performed using efficient tensorial probabilistic inference algorithms. For instance, we can perform ancestral sampling as:
inference_engine = pyc.nn.AncestralSamplingInference(
probabilistic_model=probabilistic_model,
graph_learner=wanda,
temperature=1.
)
predictions = inference_engine.query(["c1"], evidence={'input': x})
Structural Equation Models¶
PyC can be used to design Structural Equation Models (SEMs), where:
ExogenousVariableandEndogenousVariableobjects represent random variables in the SEM. Variables are defined by their name, parents, and distribution type. For example, in this guide we define variables as:exogenous_var = ExogenousVariable( "exogenous", parents=[], distribution=RelaxedBernoulli ) genotype_var = EndogenousVariable( "genotype", parents=["exogenous"], distribution=RelaxedBernoulli )
ParametricCPDobjects represent the structural equations (causal mechanisms) between variables in the SEM and are parameterized byPyC or
PyTorch modules. For example:
genotype_cpd = ParametricCPD( "genotype", parametrization=torch.nn.Sequential( torch.nn.Linear(1, 1), torch.nn.Sigmoid() ) )
ProbabilisticModelobjects collect all variables and CPDs to define the full SEM. For example:sem_model = ProbabilisticModel( variables=[exogenous_var, genotype_var], parametric_cpds=[exogenous_cpd, genotype_cpd] )
Interventions¶
Interventions allow us to estimate causal effects. For instance, do-interventions allow us to set specific variables to fixed values and observe the effect on downstream variables simulating a randomized controlled trial.
To perform a do-intervention, use the DoIntervention strategy and the intervention context manager.
For example, to set smoking to 0 (prevent smoking) and query the effect on downstream variables:
# Intervention: Force smoking to 0 (prevent smoking)
smoking_strategy_0 = DoIntervention(
model=sem_model.parametric_cpds,
constants=0.0
)
with intervention(
policies=UniformPolicy(out_features=1),
strategies=smoking_strategy_0,
target_concepts=["smoking"]
):
intervened_results_0 = inference_engine.query(
query_concepts=["genotype", "smoking", "tar", "cancer"],
evidence=initial_input
)
# Results reflect the effect of setting smoking=0
You can use these interventional results to estimate causal effects, such as the Average Causal Effect (ACE), as shown in later steps of this guide.