A proof for the back-door criterion
A very common way to quantify a causal effect is inverse-probability-weighting (IPW) with back door adjustment. You can find the method described in many places on the internet, but I have several times forgotten the proof for why it works. So here it follows. The following presentation is due to Judea Pearls book “Causality”.
I will be quite loose with notation to make it more digestible for myself.
The interventional distribution
You have a set of variables \(x_1, \ldots, x_n\) that are Markovian with respect to some DAG. You want the interventional distribution for setting \(x_i\) to the value \(x'\) atomically. What is then the distribution \(p(x_1, \ldots, x_n \vert{} \hat{x}'_i)\)?
By the Markov factorization in the mutilated graph, we have that
\[p(x_1, \ldots, x_n \vert \hat{x}'_i) = 1\{x_i=x'\}\prod_{j\neq i}^n p(x_j \vert \text{pa}(x_j))\]Multiply and divide by the observational distribution \(p(x_1, \ldots, x_n)\) factored over the observational graph, and perform cancellations. All the factors are the same except the factor for \(x_i\).
\[p(x_1, \ldots, x_n \vert \hat{x}'_i) = \frac{1\{x_i=x'\}}{p(x_i \vert \text{pa}_i)} p(x_1, \ldots, x_n)\]The above formulation is the one most common in IPW derivations. But by using the joint-conditional factorization to first extract the marginal of the parents, and then create a conditional, we get
\[p(x_1, \ldots, x_n \vert \hat{x}'_i) = \frac{1\{x_i=x'\}}{p(x_i \vert \text{pa}_i)} p(x_1, \ldots, x_n \vert \text{pa}_i)p(\text{pa}_i)\] \[p(x_1, \ldots, x_n \vert \hat{x}'_i) = 1\{x_i=x'\} p(x_1, \ldots, x_n \vert x_i, \text{pa}_i)p(\text{pa}_i)\]This is the interventional distribution in the nice compact format for parental adjustment.
The observational distribution of some outcome using parental adjustment
Define an outcome \(y\) among the variables \(x_1, \ldots, x_n\). . Marginalize the above expression on all other variables to obtain
\[p(y \vert \hat{x}'_i) = \sum_{\text{not }y} 1\{x_i=x'\} p(x_1, \ldots, x_n \vert x_i, \text{pa}_i)p(\text{pa}_i)\]Define the parental set \(t=\text{pa}_i\). Define the set \(S = \{x_1,\dots,x_n\} \setminus \{y,T,x_i\}\). The above marginalization then takes three separate summations.
\[p(y \vert \hat{x}'_i) = \sum_{t}\sum_{x_i}\sum_{S}p(t) 1\{x_i=x'\} p(y,s \vert x_i, t)p(t)\] \[p(y \vert \hat{x}'_i) = \sum_{t} p(y \vert x_i', t)p(t)\]This is the parental adjustment formula for a single outcome.
The back-door criterion
We have a set of variables \(z\) that fulfill the back-door criteria: (1) They are non-descendants to \(x_i\) and (2) they block all the back-door paths from \(x_i\) to \(y\). A back-door path is a path from \(x_i\) to \(y\) that has a directed edge into \(x_i\).
We want an expression for the interventional distribution \(p(y \vert \hat{x}'_i)\) in terms of the observational distributions \(p(y \vert x_i, z)\) and \(p(z)\).
Take the parental adjustment formula above and introduce a summation over \(z\).
\[p(y \vert \hat{x}'_i) = \sum_{t,z} p(y \vert x_i', z, t)p(z\vert{}x_i,t)p(t)\]Since \(z\) is a back-door adjustment set, it is non-descendant of \(x_i\), and \(x_i \perp\!\!\perp z \,\vert\, t\). Therefore, \(p(z\vert{}x_i,t)=p(z\vert{}t)\).
\[p(y \vert \hat{x}'_i) = \sum_{t,z} p(y \vert x_i', z, t)p(z\vert{}t)p(t)\]Since \(z\) is a back-door adjustment set, it blocks all back-door paths from \(x\) to \(y\), and \(y \perp\!\!\perp t \,\vert\, x_i, z\). Therefore, \(p(y \vert x_i', z, t) = p(y \vert x_i', z)\).
\[p(y \vert \hat{x}'_i) = \sum_{t,z} p(y \vert x_i', z)p(z\vert{}t)p(t)\]Finally, perform the summation over \(t\).
\[p(y \vert \hat{x}'_i) = \sum_{z} p(y \vert x_i', z)p(z)\]This is the interventional distribution of \(y\) when adjusting for the back-door set \(z\). We can also massage the above expression again to get a IPW-expression back, which is suitable in some cases.
\[p(y \vert x_i', z)p(z) = \frac{1\{x_i=x'\}}{p(x_i|z)}p(y,x_i,z)\]