The independent complement of a random variable

This post came to be when I was reading arxiv:2008.01883. They use some definitions and terms from probability theory that I needed to revise, and in the process, I wrote some notes for memory. The text below are those thoughts.

At first, I was labeling the post “Who has really read books like Rick Durrett. Probability: Theory and Examples. Thomson, 2019.”, since that is the book they refer to in the article. I have instead read sections from doi:10.1007/978-1-4614-4708-5, Probability: A Graduate Course, by Allan Gut.

I will refer to results in this book like Defn 1.2.3 meaning Definition 2.3 in chapter 1. I quote very freely, so check the reference if you want to be sure to get it right!

What is a \(\sigma\)-algebra, a measurable function and a random variable?

Defn 1.2.2: Given a set \(\Omega\), we say \(\mathcal F \subset \mathfrak P(\Omega )\)1 is a \(\sigma\)-algebra if it fulfils a set of rules. It has to be closed under complements (i.e.\(A \in \mathcal F \Rightarrow A^c \in \mathcal F\)) and countable unions \(A_k \in \mathcal F \Rightarrow \bigcup_{k=1}^{\infty} A_k \in \mathcal F\)). It has to contain the full set, i.e. \(\Omega \in \mathcal F\). Often the universal set \(\Omega\) is implied from the context, and dropped from notation. If a set \(A \in \mathcal F\) we say that \(A\) is measurable.

Defn 1.2.3: Given \(\mathcal A \subset \mathfrak P(\Omega )\) there is a smallest \(\sigma\)-algebra containing \(A\). We call this the \(\sigma\)-algebra generated by \(\mathcal A\), and denote it \(\sigma(\mathcal A)\). We call \(\mathcal A\) a generating set.

Defn 1.3.1 Take some set \(\Omega\) and call it a sample space. Introuduce \(\sigma\)-algebra \(\mathcal F\) on \(\Omega\) and call the elements \(A_k \in \mathcal F\) events. Define a probability measure \(P: \mathcal F \to \mathbb R\) so that \(P(A)\geq 0\), \(P(\Omega)=1\) and \(P(\bigcup_{n=1}^{\infty}A_n) = \sum_{n=1}^{\infty}P(A_n)\). Call \((\Omega,\mathcal F,P)\) a probability space or probability triplet. Notice how it is necessary that \(\mathcal F\) is closed under countable unions, and the definition of the \(\sigma\)-algebra is natural.

Defn 2.1.1 A random variable \(X\) is a real-valued measurable function. That means, the inverse image of any borel set is measurable. I.e. \(\forall A \in \mathcal B : X^{-1}(A) := \{\omega \in \Omega | X(\omega) \in A \} \in \mathcal F\)

What is a generated alebra \(\sigma(M)\) and what is this membership \(Z \in \sigma(M)\)?

Defn 1.1.4 For a random variable \(M: \Omega \to \mathbb R\), we can generate a \(\sigma\)-algebra \(\sigma(M)\). It is constructed by the inverse images of the Borel sets.

Defn 1.1.5: For a set of random variables \(M = \{M_k\}_k\), we define \(\sigma(M) = \sigma\left(\cup_k \sigma(M_k)\right)\). It is thus the \(\sigma\)-algebra generated by all the different \(\sigma\)-algebras for each random variable.

Example 1: \(\Omega = \{A,B,C\}\). \(\mathcal F = \mathfrak P(\Omega)\). \(P\) is a uniform measure. \((\Omega,\mathcal F,P)\) is a probability triple.

\(M_0\) is the function \(M_0(A)=M_0(B)=M_0(C)=1\) \(M_1\) is \(M_1(A)=M_1(B)=1, M_0(C)=2\) \(M=\{M_0,M_1\}\)

The generated \(\sigma\)-algebra of \(M_0\) is \(\{\Omega,\emptyset\}\). The generated \(\sigma\)-algebra of \(M_1\) is \(\{\Omega,\emptyset,\{A,B\},\{C\}\}\). The union of this is \(\sigma(M)=\{\Omega,\emptyset,\{A,B\},\{C\}\} \neq \mathcal F\).

Next we turn to the definition of \(Z \in \sigma(M)\) where \(Z\) is a random variable. This definition is not from the book of Gut, but from the article in question. If a random variable \(Z\) is a measurable function with respect to a generated \(\sigma\)-algebra \(\sigma(M)\), where \(M\) is a set of random variables, then we use the notation \(Z \in \sigma(M)\).

Example 1 (cont’d) . Let \(Z = M_0 + 4 = 7\). To make it a bit more rigorous we define it as \(Z: (\Omega,\,\mathcal F) \to (\mathbb R,\, \mathcal B)\)2 and \(Z: \omega \mapsto M_0(\omega)+4\). It is measurable with respect to the \(\sigma\)-algebra \(\sigma(M_0)\) necessarily. Thus, \(Z \in \sigma(M_0) \subset \sigma(M)\).

We can easily generalize from Example 1, that if \(M_0:\Omega \to A\) is a random variable (i.e. a measurable function)and \(\tilde Z:A \to B\) is a measurable function, then \(Z = \tilde Z \circ M_0\) will be a random variable, and \(Z \in \sigma(M_0)\). This argument suggests the implication in one direction for the footnote in arxiv:2008.01883: “For those not familiar with this notation, please identify \(\sigma (M)\) as \(\{Z;Z = g(M)\}\) for some function \(g\) in the main part of this manuscript”.

My intuition is that “Knowing an outcome \(Z(\omega)\) do not say more about which element \(\omega \in \Omega\) that was picked, than knowing all the outcomes of the variables \(M\)”. Another way I think about it is that “Fix some \(\omega\). Pick some measurable set \(\mathcal Z\) so that \(Z(\omega) \in \mathcal Z\). It is then always possible to pick measurable sets \(\mathcal{M}_k\) so that \(M_k(\omega) \in \mathcal{M}_k\) for all \(k\) and so that \(\bigcap_k M_k^{-1}(\mathcal M_k) \subseteq Z^{-1}(\mathcal Z)\). We can think that the collection of random variables \(M\) has at least the same “resolution” as \(Z\) to identify elements in \(\Omega\).

The definition implies that \(\sigma(Z) \subset \sigma(M)\), or in english: Every event in \(\sigma(Z)\) is an event in \(\sigma(M)\) but there may be events in \(\sigma(M)\) that are not events in \(\sigma(Z)\).

As a side note, we can interpret this subset relation as a natural filtration \(\sigma(Z) \subset \sigma(M) \subset \mathcal F\). See Defn 10.2.1.

What is independence of random variables?

Defn 1.4.1 A finite set of events \(\{A_k\}_{k=1}^{n}\) are independent if \(P(\bigcap_k A_k) = \prod_{k}P(A_k)\).

Defn 1.4.4 Take a finite or infinite collection of collections of events \(\{\mathcal A_k \}_{k\in K}\), with \(K\) being some abstract index-set. Each \(\mathcal A_k\) contains events \(A_{i}\). Now pick some natural number \(0<j \in \mathbb N\). Pick \(j\) collections \(\mathcal A_{k_1},...,\mathcal A_{k_j}\). Form a new collection of events \(\mathcal A = \{A_{k_1},... A_{k_j}\}\) so that so that \(A_{k_i} \in \mathcal A_{k_i}\) for each \(i=1..j\). If all events in \(\mathcal A\) are independent as described by Defn 1.4.1, for all such choices of events, and for all such \(j\), we say that the all the collections in \(\{\mathcal A_k \}_{k \in K}\) are independent from each other. Notice that a \(\sigma\)-algebra is a collection of events, so this definition applies to independence between \(\sigma\)-algebras.

Thm 2.10.5. Two random variables \(X\) and \(Y\) are independent if their generated \(\sigma\)-algebras \(\sigma(X)\) and \(\sigma(Y)\) are independent in the sense of Defn 1.4.4.

Example 2 Here comes an example of indepent random variables! Consider the setting where \(\Omega = \{A,B,C,D\}\). \(\mathcal F\) is the powerset on \(\Omega\). \(P\) is uniform.

\[X(A)=X(B)=1,\quad X(C)=X(D)=2\] \[Y(A)=Y(C)=1,\quad Y(B)=Y(D)=2\] \[\sigma(X) = \{\Omega,\emptyset,\{A,B\},\{C,D\}\}\] \[\sigma(Y) = \{\Omega,\emptyset,\{A,C\},\{B,D\}\}\]

\(X\) and \(Y\) are independent random variables, since \(P(A_x\cap A_y)=P(A_x)P(A_y)\) for all pairs \(A_x \in \sigma(X)\) and \(A_x \in \sigma(Y)\)

Two independent random variables \(X\) and \(Y\) defined on the sample space \(\{A,B,C,D\}\). The \(\sigma\)-algebras generated by respective random variable are independent.

An illustration of example 2

If two random variables are independent, we write \(X \perp Y\).

What is a independent complement?

This definition I have also picked from the article as I understood it. If there is a misunderstanding here, please let me know! \(M\) is some set of random variables. Let \(Z\) be some function that is measurable with respect to \(M\), i.e. \(Z\in\sigma(M)\). Call \(Z^c\), an independent complement of \(Z\) in \(M\), if it is a random variable such that \(Z^c \perp Z\) and also \(M \in \sigma(Z,Z^c)\)

Equivalently, we must have \(\sigma(Z)\subset \sigma(M)\) and \(\sigma(M) \subset \sigma(Z,Z^c)\).

Example 2 (cont’d). We directly see that if \(M=\{X,Y\}\) then \(Y=X^c\) in \(M\).

Example 3. Take \(\Omega = \mathbb R^n\) Let \(\mathcal F\) be the Borel algebra \(\mathcal B^n\) Let \(P\) be a multivariate standard normal. Define the \(n\) different random variables that project onto the axes. \(X_k: (x_j)_{j=1}^{n} \mapsto x_k\).

Let \(M=\{X_1,X_2\}\) be the collection of random variables that project onto the first plane. In this case, \(X_2\) is a possible \(X_1^c\) in \(M\). Verify by applying the definitions!

We do also find \(X_3 \perp X_1\), but since \(M \not \in \sigma(X_1,X_3)\), it is not a candidate to be a independent complement in \(M\).

Introduce instead \(Y: (x_j)_{j=1}^{n} \mapsto (x_2,x_3)\). I claim that \(Y\) is a independent complement of \(X_2\) in \(M\). We have indeed that \(X_1 \perp (X_2,X_3)\) and also that \(\sigma(M) = \sigma(X_1,X_2) \subset \sigma(X_1,X_2,X_3) = \sigma(X,Y) = \sigma(X,X^c)\)

This example shows that an independent complement is not unique, and it may contain extra information not contained in \(M\). I think that is a bit strange, but that is what the definition allows…

Open questions

What is the connection between entropy, mutual information, and generated \(\sigma\)-algebras. It is clear that if \(X\in\sigma(Y)\) and \(Y\not\in\sigma(X)\), then \(Y\) has more “information” than \(X\). But it is not clear how it connects… Entropy and mutual information is defined on random variables with a common (joint) distribution/measure, whereas KL divergence is defined on different measures on the same outcome space.

If you know of any reading that disentangles all this, I would be very happy!

  1. \(\mathfrak P(\Omega )\) denotes the power set of \(\Omega\), which is a \(\sigma\)-algebra. 

  2. We use \(\mathcal B\) to denote Borel sets, which is a \(\sigma\)-algebra. 

Updated: