Probability Theory/Conditional probability

Basics and multiplication formula

Definition 3.1 (Conditional probability):

Let $(\Omega ,{\mathcal {F}},P)$ be a probability space, and let $A\in {\mathcal {F}}$ be fixed, such that $P(A)>0$ . If $B\in {\mathcal {F}}$ is another set, then the conditional probability of $B$ where $A$ already has occurred (or occurs with certainty) is defined as

P_{A}(B):={\frac {P(B\cap A)}{P(A)}}

.

Using multiplicative notation, we could have written

P_{A}(B):={\frac {P(BA)}{P(A)}}

instead.

This definition is intuitive, since the following lemmata are satisfied:

Lemma 3.2:

A\subseteq B\Rightarrow P_{A}(B)=1

Lemma 3.3:

P_{A}(B+C)=P_{A}(B)+P_{A}(C)

Each lemma follows directly from the definition and the axioms holding for $P$ (definition 2.1).

From these lemmata, we obtain that for each $A\in {\mathcal {F}}$ , $(\Omega ,{\mathcal {F}},P_{A})$ satisfies the defining axioms of a probability space (definition 2.1).

With this definition, we have the following theorem:

Theorem 3.4 (Multiplication formula):

P(A_{1}A_{2}\cdots A_{n})=P_{A_{1}\cdots A_{n-1}}(A_{n})P_{A_{1}\cdots A_{n-2}}(A_{n-1})\cdots P_{A_{1}}(A_{2})P(A_{1})

,

where $(\Omega ,{\mathcal {F}},P)$ is a probability space and $A_{1},\ldots ,A_{n}$ are all in ${\mathcal {F}}$ .

Proof:

From the definition, we have

P_{A}(B)P(A)=P(AB)

for all $A,B\in {\mathcal {F}}$ . Thus, as ${\mathcal {F}}$ is an algebra, we obtain by induction:

{\begin{aligned}P(A_{1}A_{2}\cdots A_{n})&=P((A_{1}A_{2}\cdots A_{n-1})A_{n})\\&=P_{A_{1}\cdots A_{n-1}}(A_{n})P(A_{1}\cdots A_{n-1})\\&=P_{A_{1}\cdots A_{n-1}}(A_{n})P_{A_{1}\cdots A_{n-2}}(A_{n-1})\cdots P_{A_{1}}(A_{2})P(A_{1}).\end{aligned}}

\Box

Bayes' theorem

Theorem 3.5 (Theorem of the total probability):

Let $(\Omega ,{\mathcal {F}},P)$ be a probability space, and assume

\Omega =A_{1}+\cdots +A_{n}

(note that by using the $+$ -notation, we assume that the union is disjoint), where $A_{1},\ldots ,A_{n}$ are all contained within ${\mathcal {F}}$ . Then

\forall B\in {\mathcal {F}}:P(B)=\sum _{j=1}^{n}P(A_{j})P_{A_{j}}(B)

.

Proof:

{\begin{aligned}\sum _{j=1}^{n}P(A_{j})P_{A_{j}}(B)&=\sum _{j=1}^{n}P(A_{j}){\frac {P(A_{j}\cap B)}{P(A_{j})}}\\&=\sum _{j=1}^{n}P(A_{j}B)\\&=P\left(\sum _{j=1}^{n}A_{j}B\right)\\&=P\left(\left(\sum _{j=1}^{n}A_{j}\right)B\right)\\&=P(\Omega B)\\&=P(B),\end{aligned}}

where we used that the sets $A_{1}B,\ldots ,A_{n}B$ are all disjoint, the distributive law of the algebra ${\mathcal {F}}$ and $\Omega \cap B=B$ . $\Box$

Theorem 3.6 (Retarded Bayes' theorem):

Let $(\Omega ,{\mathcal {F}},P)$ be a probability space and $A,B\in {\mathcal {F}}$ . Then

P_{B}(A)={\frac {P(A)P_{A}(B)}{P(B)}}

.

Proof:

{\frac {P(A)P_{A}(B)}{P(B)}}={\frac {P(A){\frac {P(A\cap B)}{P(A)}}}{P(B)}}=P_{B}(A)

.

\Box

This formula may look somewhat abstract, but it actually has a nice geometrical meaning. Suppose we are given two sets $A,B\in {\mathcal {F}}$ , already know $P(A)$ , $P(B)$ and $P_{A}(B)$ , and want to compute $P_{B}(A)$ . The situation is depicted in the following picture:

We know the ratio of the size of $A\cap B$ to $A$ , but what we actually want to know is how $A\cap B$ compares to $B$ . Hence, we change the 'comparitant' by multiplying with $P(A)$ , the old reference magnitude, and dividing by $P(B)$ , the new reference magnitude.

Theorem 3.7 (Bayes' theorem):

Let $(\Omega ,{\mathcal {F}},P)$ be a probability space, and assume

\Omega =A_{1}+\cdots +A_{n}

,

where $A_{1},\ldots ,A_{n}$ are all in ${\mathcal {F}}$ . Then for all $B\in {\mathcal {F}}$

\forall j\in \{1,\ldots ,n\}:P_{B}(A_{j})={\frac {P_{A_{j}}(B)P(A_{j})}{\sum _{k=1}^{n}P(A_{k})P_{A_{k}}(B)}}

.

Proof:

From the basic version of the theorem, we obtain

P_{B}(A_{j})={\frac {P_{A_{j}}(B)P(A_{j})}{P(B)}}

.

Using the formula of total probability, we obtain

P_{B}(A_{j})={\frac {P_{A_{j}}(B)P(A_{j})}{\sum _{k=1}^{n}P(A_{k})P_{A_{k}}(B)}}

.

\Box