Bayesian credible interval and posterior distribution
When looking at the results of surveys or opinion polls one may wonder whether the number of people participating in the survey has been large enough to get an accurate picture for the whole population. Bayesian inference is the most elegant method to quantify the error due to the limited number of people taking part in a poll (sampling error). There are other errors due to the fact that the sample may not be completely randomly chosen from the population which cannot be so easily quantified.As a rough guide the table below shows the sampling error assuming a 95% confidence interval. For example this means that if an exit poll uses 10000 randomly chosen results and finds that 3500 have voted for a specific party then we know that the party will have scored 35%±1% overall (with 95% certainty).
number of polled people | 10 | 100 | 1000 | 10 000 | 100 000 | 1000 000 |
---|---|---|---|---|---|---|
sampling error ± | 27% | 10% | 3% | 1% | 0.3% | 0.1% |
- Source: Using R as follows:
$ R > n=10; k=0.5*n; 0.5*(qbeta(0.975,k+1,n-k+1)-qbeta(0.025,k+1,n-k+1)) [1] 0.2662064 ...
Opinion polls
- source: binomial.py
Vaccine efficacy
Vaccine efficacy is a measure of how well the vaccine protects from illness compared to not being vaccinated. It is defined as efficacy=1-p1/p2 where p1 is the probability of falling ill whilst being vaccinated and p2 the probability of falling ill without being vaccinated.- source: efficacy.py
Maths
$ \newcommand{\R}{\mathbb{R}} \newcommand{\N}{\mathbb{N}} \newcommand{\Co}{\mathrm{C}} \newcommand{\I}{\mathbb{I}} \newcommand{\dd}{\:\text{d}} \DeclareMathOperator{\Prob}{\mathbb{P}} $Objective
We would like to estimate a parameter which cannot be directly observed, but we can perform an experiment which is dependent on the parameter. Bayesian inference works as follows:- prior distribution: before the experiment make an assumption about the distribution of the unobservable parameter
- perform the experiment
- posterior distribution: update the prior distribution based on the outcomes of the experiment using Bayes theorem
Notation
- $\theta:\Omega\to\R^m$: random parameters (unobservable)
- $X:\Omega\to\R^n$: random variable/vector of the experiment
- $x\in\R^n$: specific observation of the random variable $X$
- $p\in\R^m$: specific realisation of the random parameter $\theta$
- $f_{X|\theta=p}:\R^n\to\R$: probability density function of $X$ given the parameters $\theta=p$ has been realised
Bayesian update
Bayes theorem in terms of probability density functions can be stated as follows and is a direct result of the definition of conditional and joint densities: \begin{equation} \label{eq:bayes_update} \boxed{ \underbrace{f_{\theta|X=x}(p)}_{\text{posterior}} = \underbrace{f_{\theta}(p)}_{\text{prior}} \underbrace{f_{X|\theta=p}(x)}_{\text{likelihood}} \underbrace{\frac{1}{f_X(x)}}_{\text{normalising factor}}. } \end{equation}- posterior: distribution of the unknown parameter $\theta$ after we have observed the experiment
- prior: assumption of the distribution of $\theta$ before the experiment (e.g. uniform)
- likelihood: definition of the experiment, e.g. how likely is the outcome of the experiment given a specific parameter $\theta=p$
- normalising factor: this effectively only ensures that the posterior density integrates to 1
Example: Binomial distribution (opinion polls)
- experiment: perform a poll, i.e. ask a random sample of $n$ people a yes/no (or multiple-choice) question
- $\theta:\Omega\to[0,1]$: random parameter: probability of answering "yes"
- $X:\Omega\to\N$: random variable of the experiment: number of people in a poll answering "yes"
- prior: $f_{\theta}(p) = 1$, $\forall p\in[0,1]$
- likelihood: $f_{X|\theta=p}(k) = {n \choose k} p^k (1-p)^{n-k}$
Example: 2D-Binomial distribution (vaccine efficacy)
- experiment: perform a trial, i.e. give $n_1$ randomly selected people a vaccine and $n_2$ randomly selected people a placebo
- $\theta:\Omega\to[0,1]^2$: random parameter: probabilities of becoming ill with and without a vaccine (over a specified time frame)
- $X:\Omega\to\N^2$: experiment: number of people in the trial who became ill in the vaccine, and placebo group
- prior: $f_{\theta}(p_1,p_2) = 1$, $\forall p_1,p_2\in[0,1]$
- likelihood: $f_{X|\theta=(p_1,p_2)}(k_1,k_2) = {n_1 \choose k_1} p_1^{k_1} (1-p_1)^{n_1-k_1} {n_2 \choose k_2} p_2^{k_2} (1-p_2)^{n_2-k_2}$