Mid-level API
=============

Mid-level APIs allow you to build custom interpretable and causally transparent probabilistic models.

.. warning::

   This API is still under development and interfaces might change in future releases.

.. |pyc_logo| image:: https://raw.githubusercontent.com/pyc-team/pytorch_concepts/refs/heads/master/doc/_static/img/logos/pyc.svg
   :width: 20px
   :align: middle

.. |pytorch_logo| image:: https://raw.githubusercontent.com/pyc-team/pytorch_concepts/refs/heads/master/doc/_static/img/logos/pytorch.svg
   :width: 20px
   :align: middle

Documentation
----------------

.. toctree::
   :maxdepth: 1

   nn.base.mid
   nn.variable
   nn.models
   nn.inference.mid
   nn.constructors


Design principles
-----------------

Probabilistic Models
^^^^^^^^^^^^^^^^^^^^

At this API level, models are represented as probabilistic models where:

- ``Variable`` objects represent random variables in the probabilistic model. Variables are defined by their name, parents, and distribution type. For instance we can define a list of three concepts as:

  .. code-block:: python

     concepts = pyc.EndogenousVariable(
        concepts=["c1", "c2", "c3"],
        parents=[],
        distribution=torch.distributions.RelaxedBernoulli
     )

- ``ParametricCPD`` objects represent conditional probability distributions (CPDs) between variables in the probabilistic model and are parameterized by |pyc_logo| PyC layers. For instance we can define a list of three parametric CPDs for the above concepts as:

  .. code-block:: python

     concept_cpd = pyc.nn.ParametricCPD(
        concepts=["c1", "c2", "c3"],
        parametrization=pyc.nn.LinearZC(in_features=10, out_features=3)
     )

- ``ProbabilisticModel`` objects are a collection of variables and CPDs. For instance we can define a model as:

  .. code-block:: python

     probabilistic_model = pyc.nn.ProbabilisticModel(
        variables=concepts,
        parametric_cpds=concept_cpd
     )

Inference
^^^^^^^^^

Inference is performed using efficient tensorial probabilistic inference algorithms. For instance, we can perform ancestral sampling as:

.. code-block:: python

   inference_engine = pyc.nn.AncestralSamplingInference(
       probabilistic_model=probabilistic_model,
       graph_learner=wanda,
       temperature=1.
   )
   predictions = inference_engine.query(["c1"], evidence={'input': x})


Structural Equation Models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

|pyc_logo| PyC can be used to design Structural Equation Models (SEMs), where:

- ``ExogenousVariable`` and ``EndogenousVariable`` objects represent random variables in the SEM. Variables are defined by their name, parents, and distribution type. For example, in this guide we define variables as:

  .. code-block:: python

     exogenous_var = ExogenousVariable(
         "exogenous",
         parents=[],
         distribution=RelaxedBernoulli
     )
     genotype_var = EndogenousVariable(
         "genotype",
         parents=["exogenous"],
         distribution=RelaxedBernoulli
     )

- ``ParametricCPD`` objects represent the structural equations (causal mechanisms) between variables in the SEM and are parameterized by |pyc_logo| PyC or |pytorch_logo| PyTorch modules. For example:

  .. code-block:: python

     genotype_cpd = ParametricCPD(
         "genotype",
         parametrization=torch.nn.Sequential(
             torch.nn.Linear(1, 1),
             torch.nn.Sigmoid()
         )
     )

- ``ProbabilisticModel`` objects collect all variables and CPDs to define the full SEM. For example:

  .. code-block:: python

     sem_model = ProbabilisticModel(
         variables=[exogenous_var, genotype_var],
         parametric_cpds=[exogenous_cpd, genotype_cpd]
     )

Interventions
^^^^^^^^^^^^^

Interventions allow us to estimate causal effects. For instance, do-interventions allow us to set specific variables
to fixed values and observe the effect on downstream variables simulating a randomized controlled trial.

To perform a do-intervention, use the ``DoIntervention`` strategy and the ``intervention`` context manager.
For example, to set ``smoking`` to 0 (prevent smoking) and query the effect on downstream variables:

.. code-block:: python

   # Intervention: Force smoking to 0 (prevent smoking)
   smoking_strategy_0 = DoIntervention(
       model=sem_model.parametric_cpds,
       constants=0.0
   )

   with intervention(
      policies=UniformPolicy(out_features=1),
      strategies=smoking_strategy_0,
      target_concepts=["smoking"]
   ):
       intervened_results_0 = inference_engine.query(
           query_concepts=["genotype", "smoking", "tar", "cancer"],
           evidence=initial_input
       )
       # Results reflect the effect of setting smoking=0

You can use these interventional results to estimate causal effects, such as the Average Causal Effect (ACE),
as shown in later steps of this guide.