The purpose of image registration is to compare or integrate data sets between different measurements. In image registration, the input images (images of the same scene from different modalities) are compared and aligned to each in order to measure the similarities (e.g., cross correlation and mutual information) between them. The resultant measurement must not be affected by noise and background changes [16].

Registration is done between a source image and a target image. Usually, important information are extracted from the source image and are transformed by a transformation model such that it matches perfectly with information on the target image. The choice of transformation model depends on the type of variations present between the images. Variations are classified into three groups, namely, spatial variations, volumetric distortion variations, and variations of interest. So, in transforming an image, a transformation function is applied either to the entire image (called global) or subsections of the image (called local). With global transformation, a single transformation is computed on some volume of interest while that of local transformation, at least two composite transformations are computed on sub-images that cannot generally be described a global transformation [17].

### Types of image registration

Some types of registration methods that can be applied are correlation and sequential based, Fourier based, point mapping with feedback, point mapping without feedback, basis functions method (interpolation and approximation), and elastic based model [17, 18].

#### Correlation and sequential based methods

The correlation and sequential based methods are early approaches used in image registration. It is a systematic statistical primary tool mostly used for template matching or pattern recognition to determine the degree of similarity between images by applying a cross-correlation function. This method is relevant to images that are misaligned by a small rigid or affine transformation where only translation and rotations are allowed. The cross-correlation function directly computes sum of difference squares between the source and target image at each location of the source image. The correlation will reach its peak only if the source image perfectly matches the target image. The parameter used to determine the degree of matching is called the correlation coefficient (*C*_{c}) as shown in equation 1.

$$ {C}_c=\frac{covariance\left(T,S\right)}{\sigma_T{\sigma}_s}=\frac{\sum_x{\sum}_y\left(S\left(x,y\right)-{\mu}_s\right)\left(T\left(x-u,y-v\right)-{\mu}_T\right)}{\sqrt{\sum_x{\sum}_y{\left(T\left(x-u,y-v\right)-{\mu}_T\right)}^2{\sum}_x{\sum}_yS\left(x,y\right)-{\left.{\mu}_s\right)}^2}} $$

(1)

Where *μ*_{s} and *σ*_{s} are the mean and standard deviation of the source and *μ*_{T} and *σ*_{T} are the mean and standard deviation of the target, *x* and *y* are the positions of the source, *u* and *v* are locations for which the source image is placed over the target image. The resultant value of *C*_{c} shows a linear similarity between the source and target image [17, 19]. The computed *C*_{c} values are usually between −1 and +1. When the *C*_{c} value is determined to be ≅ +1, it means the source and target images are highly similar and are accurately registered. However, when the source and target images are dissimilar and nor properly registered, the computed *C*_{c} value would be ≅ −1 [20].

#### Fourier method

The Fourier method registers the source and target image by matching them using information in the frequency domain. This method of registration is more effective when applied to images obtained under different conditions of illumination and sensors. The source and target image are matched by the translation property of the Fourier transform using the phase angle.

The Fourier transform of an image *f* (*x*, *y*) is given by equation 2:

$$ F\left({w}_x,{w}_y\right)=\left|F\left({w}_x,{w}_y\right)\right|{e}^{i\varphi \left({w}_x,{w}_y\right)} $$

(2)

Where |*F*(*w*_{x}, *w*_{y})| is the magnitude of the Fourier transform and *φ*(*w*_{x}, *w*_{y}) is the phase angle which determines the phase shift of the images at each frequency. The translation of the images is done using the phase angle. So, for the two images (i.e., source and target image) *f*_{1} and *f*_{2} which varies by displacement (*Δ*_{x}, *Δ*_{y}) only, to obtain equation 3 yields equation 4:

$$ {f}_2\left(x,y\right)={f}_1\left(x-{\Delta}_x,y-{\Delta}_y\right) $$

(3)

The Fourier transform of the two images in equation 3 is given by

$$ {F}_2\left({w}_x,{w}_y\right)={F}_1\left({w}_x,{w}_y\right){e}^{-j\left({w}_x{\Delta}_x+{w}_{y\Delta y}\right)} $$

(4)

Equation 4 shows that the two images have the same magnitude of Fourier transform but different phases \( {e}^{j\left({\varphi}_1-\kern0.5em {\varphi}_2\right)} \) that are directly related to their displacements.

The two images are then registered by taking the inverse Fourier transform which yields an impulse function as shown in equation 5 below.

$$ \frac{F_1\left({w}_x,{w}_y\right){F}_{2\ast}\left({w}_x,{w}_y\right)}{\left\lceil {F}_1\left({w}_x,{w}_y\right){F}_{2\ast}\left({w}_x,{w}_y\right)\right\rceil }={e}^{-j\left({w}_x{\Delta}_x+{w}_{y\Delta y}\right)} $$

(5)

Where *F** is a complex conjugate function of *F* [17].

#### Point mapping

Point mapping is one of the recent approaches used in image registration. It is mostly applied to images whose misalignment cannot be determined due to uncertainty of the actual depth in the scene. As a result of this uncertainty, a general transformation (global method) is used to match the source and target image with the aid of landmarks such as smooth surfaces or distortions present in both images. However, if the landmarks become more local, point mapping methods (i.e., point mapping with feedback and point mapping without feedback) are used to determine the misalignment. Image registration with point mapping using the global method involves three steps. In the first step, the features present in the images are computed carefully. The second step is to assign control points to the computed feature points in the target image. Finally, a two-dimensional (2D) polynomial function is applied to the control points to map the source and target image.

For images whose misalignment are known (e.g., small rigid or affine transformation), point mapping feedback methods are used. Point mapping with feedback is used when feature detection and matching of the images are difficult to be performed. The feedback is used between the stages of control points in order to determine the correct transformation to register the images. Point mapping with feedback becomes necessary when the features detected in the target image is ambiguous or the presence of uncorrected variations between the source and target image. As a result of this, point mapping feedback methods such as relaxation, hierarchical, cooperation, and clustering are used for the registration of the source and target image. The relaxation method for example uses the translation properties of transformation to register images, while the clustering method with feedback evaluates all the pairs of feature matches for registration. The transformation parameters for all the pairs of feature matches are determined from which the transformation that best matches the largest number of points is found and used for the registration.

The point mapping without feedback is most appropriate for registering images whose transformation required to align the source and target image is unknown. To generate a single transformation for point mapping without feedback, global methods based on point matching are employed. However, this is only possible when there are enough control points available to obtain any transformation parameters through approximation or interpolation [17].

#### Basis functions registration method

This type of registration is a mathematical process that registers images by building a model through interpolation or approximation without causing any physical or biological changes to the images.

#### Approximation

The least-squares regression analysis is one of the basic functions used to obtain transformation parameters through the approximation method by matching points that satisfy it as nearly as possible. The method assumes local noise distortions for the matches that cannot be removed by transformation as a result of differences of interest between the source and target image. This makes it difficult to find a transformation to accurately match the control points. To overcome this difficulty, there must be enough statistical information available for approximation. This is achieved by ensuring that the number of match points is greater than the transformation parameters. This would help to generate a reliable transformation that would map the source image onto the target image because a single transformation is usually required to map one image onto the other [17]. Another basic function that is employed in the approximation method is the radius basis function (RBF). This type of RBF is performed between the interpolation point and identification point. The RBF simply registers images based on the identification point in the medical images by setting the radius basic approximate function to zero (0) [18].

#### Interpolation

Interpolation method is more applicable to manually controlled points where there are fewer but accurate matches that ensures the matches and their corresponding control points are exact. Here, interpolation uses polynomials to generate the transformation that best matches the source and target image. The coefficients of the polynomials have to be determined by a system of *N* equations through the mapping of each of the *N* control points. To register the images, first, anatomic landmarks are manually found in the source image and cross correlated with pixels near the corresponding landmarks in the target image. This process generates a set of matched control points. By using linear regression to fit a low order polynomial, a transformation is generated which is applied to map the source image onto the target image. Registration with interpolation is able to correct affine distortions (e.g., translation, rotation, scale, and shear) [17]. Alternatively, images can also be registered with interpolation using many iterative algorithms by maximizing measure calculated from the voxel values. However, this approach is very expensive due to the fact that interpolation is done for each iteration. As a result, low-cost interpolations such as trilinear or close neighbor iteration is usually recommended until the desired transformation is achieved. Trilinear interpolation iterative algorithms use low pass filters that removes high spatial frequency components of the processed images which minimizes registration errors [19].

#### Elastic model

Elastic model of registration is a local method that develops unique deformation transformation to map control points of the source image onto the target image by modeling distortions (e.g., 2D projections and 3D objects) present in the image as the deformation of elastic material. Transformation functions such as weighted mean, piecewise linear, thin plate splines, and multiquadric are usually compared and applied during the elastic model process. The process of developing the registration transformation results in some small amount of local stretching of images which is used to correct local non-linear deformations in the processed images. The degree of stretching is used to determine the energy state of the elastic material. In simple terms, the image is modeled as an elastic body. The similarities of features or control points generated between the source and target image as a result of modeling, acts as an external force, which “stretch” the elastic body. The stretch that is produced is returned to its normal state by an opposite parameterized stiffness or smoothness constraints. The threshold energy state of the elastic body is then computed using iterative numerical methods following the deformation transformation and it is that which defines the registration features or control points of the source and target image. The images are then registered by matching important structures between the source and target image. Elastic registration is relevant for the correction of deformed images that results from the use of intravenous contrast agents in the process of accessing the anatomy, patient movement, or during breathing [17, 21].

### Steps in image fusion

#### Image registration

The first step to image fusion is image registration (discussed above). This requires identification and extraction of important points (also called detector points) within the image that convey relevant information about the scene. The important points can be identified by using the intensity coordinates (measured in units of millimeter) at a given voxel point *p* within the 3D images, usually expressed as *I*(*r*,*c*,*z*) without altering image noise, image contrast, or blurring. After extraction of detector points, the next step is to estimate the similarities between them. Image descriptors (such as color, texture, shapes, color blobs, and corners) are used to represent detector points after which they are matched to obtain control points in the image. These control points represent the relevant features that are common between the processed images. The control points are extracted either statistically (e.g., mean and variance) or by texture (e.g., smoothness, coarseness, and regularity) such that they have distinct features.

With the registration process, one of the input images is marked as the source/moving image while the rest of the other images are called target/fixed images. The data sets representing the source image is spatially aligned with that of the target image. This is done by the application of a parametric transformation model to the control points to estimate the geometric relation between the input images. The parametric transformation model geometrically maps the coordinates of the source image to the target coordinates using different degrees of freedom (DoF). The DoF defines the number of ways in which the transformation can be changed. Preferably, bigger numbers of DoF are used because it permits greater transformation scope to make one image match the other. The image composition step is finally applied to the geometric relation to combine the registered images into a larger image [16, 22].

#### Feature extraction

In feature extraction, feature maps for each of the input images are produced as a result of the extraction of distinctive features of the registered images [16].

#### Decision operators

After feature extraction, decision operators are applied to label the feature maps to produce a set of decision maps. Where the decision maps do not have a link to the same object or phenomena, semantic equivalence is applied to connect these maps to a common object or phenomena [16].

#### Decision operators

Next to decision operators is radiometric calibration where the feature maps are aligned with the input images to obtain input images of a common scale for image fusion. Finally, the input images are combined into a single output image. The final output image contains detailed information of the scene with improved visibility when compared to any of the input images [16].