A proof for the back-door criterion

A very common way to quantify a causal effect is inverse-probability-weighting (IPW) with back door adjustment. You can find the method described in many places on the internet, but I have several times forgotten the proof for why it works. So here it follows. The following presentation is due to Judea Pearls book “Causality”.

I will be quite loose with notation to make it more digestible for myself.

The interventional distribution

You have a set of variables \(x_1, \ldots, x_n\) that are Markovian with respect to some DAG. You want the interventional distribution for setting \(x_i\) to the value \(x'\) atomically. What is then the distribution \(p(x_1, \ldots, x_n \vert{} \hat{x}'_i)\)?

By the Markov factorization in the mutilated graph, we have that

\[p(x_1, \ldots, x_n \vert \hat{x}'_i) = 1\{x_i=x'\}\prod_{j\neq i}^n p(x_j \vert \text{pa}(x_j))\]

Multiply and divide by the observational distribution \(p(x_1, \ldots, x_n)\) factored over the observational graph, and perform cancellations. All the factors are the same except the factor for \(x_i\).

\[p(x_1, \ldots, x_n \vert \hat{x}'_i) = \frac{1\{x_i=x'\}}{p(x_i \vert \text{pa}_i)} p(x_1, \ldots, x_n)\]

The above formulation is the one most common in IPW derivations. But by using the joint-conditional factorization to first extract the marginal of the parents, and then create a conditional, we get

\[p(x_1, \ldots, x_n \vert \hat{x}'_i) = \frac{1\{x_i=x'\}}{p(x_i \vert \text{pa}_i)} p(x_1, \ldots, x_n \vert \text{pa}_i)p(\text{pa}_i)\] \[p(x_1, \ldots, x_n \vert \hat{x}'_i) = 1\{x_i=x'\} p(x_1, \ldots, x_n \vert x_i, \text{pa}_i)p(\text{pa}_i)\]

This is the interventional distribution in the nice compact format for parental adjustment.

The observational distribution of some outcome using parental adjustment

Define an outcome \(y\) among the variables \(x_1, \ldots, x_n\). . Marginalize the above expression on all other variables to obtain

\[p(y \vert \hat{x}'_i) = \sum_{\text{not }y} 1\{x_i=x'\} p(x_1, \ldots, x_n \vert x_i, \text{pa}_i)p(\text{pa}_i)\]

Define the parental set \(t=\text{pa}_i\). Define the set \(S = \{x_1,\dots,x_n\} \setminus \{y,T,x_i\}\). The above marginalization then takes three separate summations.

\[p(y \vert \hat{x}'_i) = \sum_{t}\sum_{x_i}\sum_{S}p(t) 1\{x_i=x'\} p(y,s \vert x_i, t)p(t)\] \[p(y \vert \hat{x}'_i) = \sum_{t} p(y \vert x_i', t)p(t)\]

This is the parental adjustment formula for a single outcome.

The back-door criterion

We have a set of variables \(z\) that fulfill the back-door criteria: (1) They are non-descendants to \(x_i\) and (2) they block all the back-door paths from \(x_i\) to \(y\). A back-door path is a path from \(x_i\) to \(y\) that has a directed edge into \(x_i\).

We want an expression for the interventional distribution \(p(y \vert \hat{x}'_i)\) in terms of the observational distributions \(p(y \vert x_i, z)\) and \(p(z)\).

Take the parental adjustment formula above and introduce a summation over \(z\).

\[p(y \vert \hat{x}'_i) = \sum_{t,z} p(y \vert x_i', z, t)p(z\vert{}x_i,t)p(t)\]

Since \(z\) is a back-door adjustment set, it is non-descendant of \(x_i\), and \(x_i \perp\!\!\perp z \,\vert\, t\). Therefore, \(p(z\vert{}x_i,t)=p(z\vert{}t)\).

\[p(y \vert \hat{x}'_i) = \sum_{t,z} p(y \vert x_i', z, t)p(z\vert{}t)p(t)\]

Since \(z\) is a back-door adjustment set, it blocks all back-door paths from \(x\) to \(y\), and \(y \perp\!\!\perp t \,\vert\, x_i, z\). Therefore, \(p(y \vert x_i', z, t) = p(y \vert x_i', z)\).

\[p(y \vert \hat{x}'_i) = \sum_{t,z} p(y \vert x_i', z)p(z\vert{}t)p(t)\]

Finally, perform the summation over \(t\).

\[p(y \vert \hat{x}'_i) = \sum_{z} p(y \vert x_i', z)p(z)\]

This is the interventional distribution of \(y\) when adjusting for the back-door set \(z\). We can also massage the above expression again to get a IPW-expression back, which is suitable in some cases.

\[p(y \vert x_i', z)p(z) = \frac{1\{x_i=x'\}}{p(x_i|z)}p(y,x_i,z)\]

Updated: