Bayesian confidence intervals and posterior distribution

When looking at the results of surveys or opinion polls one may wonder whether the number of people participating in the survey has been large enough to get an accurate picture for the whole population. Bayesian inference is the most elegant method to quantify the error due to the limited number of people taking part in a poll (sampling error). There are other errors due to the fact that the sample may not be completely randomly chosen from the population which cannot be so easily quantified.
As a rough guide the table below shows the sampling error assuming a 95% confidence interval. For example this means that if an exit poll uses 10000 randomly chosen results and finds that 3500 have voted for a specific party then we know that the party will have scored 35%±1% overall (with 95% certainty).
number of polled people 10 100 1000 10 000 100 000 1000 000
sampling error ± 27% 10% 3% 1% 0.3% 0.1%
The forms below calculate the exact confidence intervals and posterior distribution for opinion polls as well as vaccine efficacy studies.

Opinion polls

number of people surveyed
group of interest total
k: n:

Vaccine efficacy

Vaccine efficacy is a measure of how well the vaccine protects from illness compared to not being vaccinated. It is defined as efficacy=1-p1/p2 where p1 is the probability of falling ill whilst being vaccinated and p2 the probability of falling ill without being vaccinated.
number of people in the trial
ill total
vaccine group k1: n1:
placebo group k2: n2:


$ \newcommand{\R}{\mathbb{R}} \newcommand{\N}{\mathbb{N}} \newcommand{\Co}{\mathrm{C}} \newcommand{\I}{\mathbb{I}} \newcommand{\dd}{\:\text{d}} \DeclareMathOperator{\Prob}{\mathbb{P}} $


We would like to estimate a parameter which cannot be directly observed, but we can perform an experiment which is dependent on the parameter. Bayesian inference works as follows:


Bayesian update

Bayes theorem in terms of probability density functions can be stated as follows and is a direct result of the definition of conditional and joint densities: \begin{equation} \label{eq:bayes_update} \boxed{ \underbrace{f_{\theta|X=x}(p)}_{\text{posterior}} = \underbrace{f_{\theta}(p)}_{\text{prior}} \underbrace{f_{X|\theta=p}(x)}_{\text{likelihood}} \underbrace{\frac{1}{f_X(x)}}_{\text{normalising factor}}. } \end{equation} It is very common to choose a uniform prior if we do not know anything about the problem a-priori. In this case the update simplifies to \begin{equation*} \underbrace{f_{\theta|X=x}(p)}_{\text{posterior}} = \underbrace{c(x)}_{\text{normalising factor}} \underbrace{f_{X|\theta=p}(x)}_{\text{likelihood}}. \end{equation*}

Example: Binomial distribution (opinion polls)

It follows directly that the posterior is a Beta distribution with parameters $\alpha=k+1$, $\beta=n-k+1$: \begin{equation*} \boxed{ \begin{aligned} \text{prior} & & f_{\theta}(p) & = \I_{p\in[0,1]},\\ \text{posterior} & & f_{\theta|X=k}(p) & = c(k) \, p^k (1-p)^{n-k}, \quad p\in[0,1],\\ & & \left(\theta|X=k\right) & \sim \text{Beta}(k+1,n-k+1). \end{aligned} } \end{equation*}

Example: 2D-Binomial distribution (vaccine efficacy)

It follows directly that the posterior is that of two independent Beta distributions: \begin{equation*} f_{\theta|X=(k_1,k_2)}(p_1,p_2) = c(k_1,k_2) \, p_1^{k_1} (1-{p_1})^{n_1-k_1} p_2^{k_2} (1-{p_2})^{n_2-k_2} , \quad p_1,p_2\in[0,1].\\ \end{equation*} The efficacy defined as $\omega:=1-\frac{\theta_1}{\theta_2}$ is then a derived quantity from $\theta$ but does not have a closed form density. However, in the limit of $n_1,n_2\to\infty$ it can be shown: \begin{equation*} \boxed{ \begin{aligned} \text{prior} & & f_{\theta}(p_1,p_2) & := \I_{p_1\in[0,1]}\I_{p_2\in[0,1]},\\ & & f_{\omega}(x) & = \begin{cases} 1/2, & 0\leq x \leq 1,\\ \frac{1/2}{(1-x)^2}, & x\leq0,\\ \end{cases} \\ \text{posterior} & & f_{\omega|\dots}(x) & = c \, (1-x)^{k_1} \int_0^{\min\{1,\frac{1}{1-x}\}} z^{k_1+k_2+1} \big(1-(1-x)z\big)^{n_1-k_1} (1-z)^{n_2-k_2} \dd z,\\ & & & \xrightarrow{n_1,n_2\to\infty} \frac{(k_1+k_2+1)! }{k_1! k_2!} \left(\frac{n_2}{n_1}\right)^{k_2+1} \frac{(1-x)^{k_1}}{\left(1+\frac{n_2}{n_1}-x\right)^{k_1+k_2+2}}, \quad x\leq 1,\\ & & \Prob(\omega\leq x|\dots) & \xrightarrow{n_1,n_2\to\infty} \text{BetaCdf}_{k_2+1,k_1+1} \left(\frac{\frac{n_2}{n_1}}{1+\frac{n_2}{n_1}-x}\right). \end{aligned} } \end{equation*}