# Imaging geometry

With this chapter, we begin our analysis of the relationship between images and the geometry of the world from which they are formed. An analysis of the structure of imaging geometry has been the subject of countless papers, but the modern treatment of camera geometry stems from work in the late 80’s relating concepts in projective geometry to camera imaging. While these ideas have brought new insights into camera structure, we will just touch on those ideas, and otherwise rely on more direct algebraic ideas for much of our development.

One important concept that will be useful throughout this chapter is the notion of “projective space and associated projective coordinates. We can illustrate these ideas using a basic pinhole model of projection. Recall that (pinhole) perspective projection of a point $p = (x,y,z)^t$ to image coordinates $q = (u,v)^t$ as follows:

$u = f \frac{x}{z} \quad v = f \frac{y}{z}$

where $f$ is the focal length of the system. Note that all quantities should have consistent units — e.g. meters — in order for this to make sense!

As written, perspective projection is a non-linear transformation of $p$ to $q$. However, suppose we adopt what might seem an odd convention. Namely, we will embed vectors in $\Re(n)$ in $\Re(n+1)$ with following convention. For a vector $\mathbf{x} = (x_1, x_2, \ldots x_n)^t$, we will write $\tilde\mathbf{x} = (\lambda x_1, \lambda x_2, \ldots, \lambda x_n, \lambda)^t, \lambda \neq 0.$ Note in particular that if we choose $\lambda = 1$, $\tilde\mathbf{x}$ is just $\mathbf{x}$ with the value 1 appended. This special form is often referred to as “homogeneous coordinates and is commonly used in spatial transformations (see the Appendix).

There are many ways to view projective coordinates. However, for our purpose, one useful picture to think of is the fact that projective coordinates establishes a relationship between a point on a plane (e.g. the plane $z = 1$) and a line in space. Think now of that place as being the camera imaging plane, the origin being the focal point, and coordinates in the plane being the point $q$, above. Let us assume, for the moment, that $f = 1$. Then we can simply write this:

$\tilde u = p$

That is, the projective coordinates of the projection of the point $p$ is just $p$ itself! You should verify yourself that this indeed works, since to recover $u$ we just divide the first two coordinates ($x$ and $y$) by the third coordinate (= $z$ in this case) and voila — we have our perspective projection. Furthermore, we have shown that the point $p$ lies on the ray defined by $\tilde q$. In short, $q$ is the projection of $p$ onto the imaging plane.

We left out the focal length — how do we add it back. Well, let’s now add an additional linear transformation to the mix. Namely, define the matrix $M_f$ as follows:

$M_f = \begin{matrix} 1 &0 &0 \\ 0 & 1 & 0 \\ 0 & 0 & 1/f \end{matrix}$

Now, if we write

$\tilde q = M_f p$

you should be able to show that we can describe our original perspective projection model.

At this point, we should also take care to note that $M_f$, like everything else in projective space, is defined up to a scale factor. Thus, we could equivalently write
$M’_f = f M_f = \begin{matrix} f &0 &0 \\ 0 & f & 0 \\ 0 & 0 & 1 \end{matrix}$

and we get the same result! This will take some getting used to, but will make much of what we do later nicely compact. Indeed, anticipating this, we will also now lift the point $p$ to projective coordinates. If we rewrite $M_f$ as

$\tilde M_f = \begin{matrix} f &0 &0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 1 & 0\end{matrix}$

we can now write

$\tilde q = \tilde M_f \tilde p$

You should once again convince yourself that this all works out correctly.

## Image Formation Models

There are three common image formation models: 1) perspective projection, 2) Orthographic projection, and 3) Affine projection. In addition, there is a fourth model, the projective camera, that includes all of these as a special case.

In order to understand these models, it is useful to realize that projection, in general, can be written as the product of three matrices as:

$\tilde q = K \Pi H \tilde p$

where

• K contains the camera intrinsic parameters
• $\Pi$ is a projection matrix
• H is a homogeneous transform

Again, projective coordinates make it possible to write projection in this form. In general, $K$ takes the form:

$\begin{matrix} f s_x &0 &c_x \\  0 & f s_y & c_y \\ 0 & 0 & 1 \end{matrix}$

Here, $f$ is the focal length and $s_x$ and $s_y$ are the conversion from external units to pixels coordinates. For example, if the pitch of the ccd is $100pix/mm$ and the focal length is $12 mm$ then $f s_x = 1200 pix.$ The values $c_x$ and $c_y$ describe the location of the optical center of the camera in the pixel grid.

The matrix $\Pi$ takes the form described above in the case of perspective projection. For orthographic projection it is of the form

$\begin{matrix}  f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}$

Notice that the $z$ coordinate does not matter in this case.

Affine is easiest to think of in terms of a linearization about some nominal point of project. All three of these models are discuss online:

## Some Special Cases

There is a useful special case that we will make use of in our subsequent developments. These special cases will rely on the structure of projective homographies. A homography on $P(n)$ is described by a nonsingular $(n+1) \times (n+1)$ matrix up to scale. Thus, we can write

$\tilde p’ = H p = \lambda H p$ for any $\lambda \neq 0.$

Notice that a homography on $P(2)$ (i.e. the image plane) has nine entries, but since we can pick an arbitrary scale (1 parameter), it only has eight free parameters.

It is easy to show the following two facts:

• The image of a plane is described by a homography
• The image of a camera under pure rotation is described by a homography.

### Estimating Homographies

There are a variety of methods for estimating homographies. Here is one.

Note that although we write $(U’, V’, W’)^t = \tilde p’ = H \tilde p,$ the fixed coordinates of interest are

$q = (u’,v’)^t = (U’/W’, V’/W’)^t$

If we write $H$ in terms of its row vectors, i.e.

$H = \begin{matrix} h_1 \\ h_2 \\ h_3 \end{matrix}$

we can substitute into the expression above and we get

$(u’,v’)^t = ((h_1 \tilde p)/(h_3 \tilde p), (h_2 \tilde p)/(h_3 \tilde p))^t$

Multiplying through by the denominator, we arrive at two equations

$u’ h_3 \tilde p – h_1 \tilde p = 0$

$v’ h_3 \tilde p – h_2 \tilde p = 0$

If we now define a 9 vector of unknowns $\mu = (h_1, h_2, h_3)^t$, we can write the above in a compact linear form:

$A(q,p) = 0$

Recalling we have 8 unknown parameters in $H,$ if we have for corresponding pairs, we can “stack” the matrix, and get

$\left[ \begin{matrix} A(q_1,p_1)\\ A(q_2,p_2)\\ A(q_3,p_3)\\ A(q_4,p_4) \end{matrix}\right]  = 0$

This is a homogeneous system that can be solved using SVD.

## Camera Calibration

We will consider two calibration methods: indirect calibration described here: [camera calibration] and multiplane calibration described in this paper A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11):1330-1334, 2000., and implemented in Matlab as described here The Matlab Camera Calibration Toolkit