C.3 Estimating the error in $u^0$

To estimate the error in the best-fitting parameter values that we find, we assume $$P u | x_i, f_i, σ_i $ to be approximated by an $n_u$-dimensional Gaussian distribution around $u^0$. Taking a Taylor expansion of $L(u)$ about $u^0$, we can write: 

\begin{eqnarray}  L(\mathbf{u}) &  = &  L(\mathbf{u}^0) + \underbrace{ \sum _{i=0}^{n_\mathrm {u}-1} \left( u_ i - u^0_ i \right) \left.\frac{\partial L}{\partial u_ i}\right|_{\mathbf{u}^0} }\end{eqnarray}

$Zero at $$u^0$ by definition$ +
∑_i=0^n_u-1 ∑_j=0^n_u-1 _u^0 + O u - u^0^3

Since the logarithm of a Gaussian distribution is a parabola, the quadratic terms in the above expansion encode the Gaussian component of the probability distribution $$P u | x_i, f_i, σ_i $ about $u^0$.\footnote{The use of this is called \textit{Gauss' Method}. Higher order terms in the expansion represent any non-Gaussianity in the probability distribution, which we neglect. See MacKay, D.J.C., \textit{Information Theory, Inference and Learning Algorithms}, CUP (2003).} We may write the sum of these terms, which we denote $Q$, in matrix form: 

\begin{equation}  Q = \frac{1}{2} \left(\mathbf{u} - \mathbf{u}^0\right)^\mathbf {T} \mathbf{A} \left(\mathbf{u} - \mathbf{u}^0\right) \label{eqn:Q_ vector} \end{equation}

\noindent where the superscript 

$^T$ represents the transpose of the vector displacement from $u^0$, and $A$ is the Hessian matrix of $L$, given by: 

\begin{equation}  A_{ij} = \nabla \nabla L = \left.\frac{\partial ^2 L}{\partial u_ i \partial u_ j}\right|_{\mathbf{u}^0} \end{equation}

 \index{Hessian matrix} 

This is the Hessian matrix which is output by the {\tt fit} command. In general, an 

$n_u$-dimensional Gaussian distribution such as that given by Equation~ (\ref{eqa:L_ taylor_ expand}) yields elliptical contours of equi-probability in parameter space, whose principal axes need not be aligned with our chosen coordinate axes – the variables $u_0 ... u_n_u-1$. The eigenvectors $e_i$ of $A$ are the principal axes of these ellipses, and the corresponding eigenvalues $λ_i$ equal $1/σ_i^2$, where $σ_i$ is the standard deviation of the probability density function along the direction of these axes. 

This can be visualised by imagining that we diagonalise 

$A$, and expand Equation~ (\ref{eqn:Q_ vector}) in our diagonal basis. The resulting expression for $L$ is a sum of square terms; the cross terms vanish in this basis by definition. The equations of the equi-probability contours become the equations of ellipses: 

\begin{equation}  Q = \frac{1}{2} \sum _{i=0}^{n_\mathrm {u}-1} A_{ii} \left(u_ i - u^0_ i\right)^2 = k \end{equation}

\noindent where 

$k$ is some constant. By comparison with the equation for the logarithm of a Gaussian distribution, we can associate $A_ii$ with $-1/σ_i^2$ in our eigenvector basis. 

The problem of evaluating the standard deviations of our variables 

$u_i$ is more complicated, however, as we are attempting to evaluate the width of these elliptical equi-probability contours in directions which are, in general, not aligned with their principal axes. To achieve this, we first convert our Hessian matrix into a covariance matrix. 

$

Footnotes

  1. The use of this is called Gauss’ Method. Higher order terms in the expansion represent any non-Gaussianity in the probability distribution, which we neglect. See MacKay, D.J.C., Information Theory, Inference and Learning Algorithms, CUP (2003).