Skip to main content

Table 1 Summary of the characteristics and the results of the studies about CT radiomics reproducibility

From: Radiomics reproducibility challenge in computed tomography imaging as a nuisance to clinical generalization: a mini-review

Reference

Methodology specifications

Parameters studied

Main results and conclusions

Mackin [20]

CCR phantom

NSCLC patient

2 CT vendor

IBEX radiomics software

mAs (250–30)

Noise

kVp

The pattern of feature variation in two GE and Toshiba CT scan units is similar

Smoothing does not affect the results of the features

Noise is the main factor in changing features

mAs seem to have the most significant impact on radiomics reproducibility among mAs, kVp, and pitch

Midya [21]

Anthropomorphic phantom (consists of five simulated tissue types)

In-house radiomics software

mAs (50, 100, 200, 300, 400, and 500 mA)

Noise index (NI) levels (12, 14, 16, 18, and 20)

Reconstruction (from FBP (0% ASIR) to 100% ASIR in increments of 10%)

Noise, tube current, and reconstruction algorithm significantly affect the reproducibility of radiomics results

By reducing image noise, the reproducibility of features increases

FBP algorithm has the most reproducibility

By increasing the weight of the ASIR algorithm from 0 to 100%, the number of reproducible features decreased because the image noise gradually increased

8–25% of features are reproducible across mAs variation

Noise: 19–25% of features are reproducible across Noise variation

Berenguer [22]

Anthropomorphic pelvic phantom

CCR phantom

IBEX radiomics software

Test–retest (5 CT vendors)

Pitch

Reconstruction Kernel

mAs

kVp

FOV

The more limited the range of variation of the scan parameters, the higher the reproducibility

During intra-scanner analysis, changing the kernel has the most and the pitch has the least impact on reproducibility

During test–retest, about 91% of features were reproducible

During intra-scanner analysis, about 89% of features were reproducible with the change of pitch factor, and 43% were reproducible with the change of reconstruction algorithm

During inter-scanner analysis, about 85% of features were reproducible in wood (heterogeneous), and only 15% were reproducible in polyurethane (homogeneous)

In general, ten features out of 177 features remained reproducible after changing all parameters

Buch [23]

In-house Phantom

Two different CT brand

In-house radiomics software

kVp (80–140)

mAs (80–140)

pitch

Section thickness (0.625, 1.25, 2.5, 5 mm)

Acquisition mode (axial vs. helical)

The change of kVp and mA has less impact on reproducibility than other scan parameters

The features change significantly by changing the pitch, acquisition mode, and section thickness

The features vary significantly by changing the pitch, acquisition mode, and section thickness

Fave [24]

20 NSCLC patient

IBEX radiomics software

mA (100–150–200–250)

kVp (80–100–120–140)

2D vs. 3D

Respiratory phase (10-time phase)

Changing the tube voltage has little impact on the value of the features, while changing the mA leads to significant changes in the value of the features

Reproducibility in intra-patient studies is higher than inter-patient studies

By adding Gaussian noise to the images, the values of the features do not change

By changing mA, about 43% of features (10 of 23) remain reproducible

By changing the respiratory phase (motion), about 65% of features (15 of 23) remain reproducible

By changing the dimensionality of ROI segmentation (2D vs. 3D), about 65% of features (15 of 23) remain reproducible

kVp does not influence the features significantly

Gao [25]

105 Pulmonary patients

Pyradiomics radiomics software

Dose (Low dose vs. Conventional dose CT)

CTDIvol of LDCT ~ 2 mGy

CTDIvol of CDCT ~ 12 mGy

With changing the radiation dose, 45% of features extracted from a solid nodule and 35% from ground-glass nodules remained reproducible

Li [26]

CCR Phantom

IBEX radiomics software

Inter-CT vs. intra-CT

mAs

kVp

Pitch

FOV

Kernel

Slice thickness

Reproducibility depends on the structure and texture of the material

Parameters related to image resolution, such as FOV, slice thickness, and kernel, have a more significant impact on reproducibility than scanning parameters (mAs, kVp, pitch)

The reproducibility of radiomics features depends on the noise level

Test–retest show ICC > 0.9

The highest reproducibility was for shape features (94% of features were reproducible). Even in the least reproducibility, 14% of the features were still stable

Changing the kernel (from bone to standard) significantly affects the reproducibility of features

Larue [27]

CCR phantom

In-house radiomics software

Inter-scanner

Slice thickness

gray-level discretization (bin widths ranging from 5 to 50 Hounsfield Units with a step size of 5 HU)

voxel resampling (resampling into voxel sizes of 1 × 1 × 3 mm3 using cubic, linear, and nearest-neighbor interpolation)

CT scanner, slice thickness, and bin width affected radiomics feature values

No impact of radiation exposure observed

Resampling images before feature extraction decreases the variability of radiomics features

'GLRLM – RLN' features in 1.5 mm and 3 mm slice thickness were more similar after resampling, which was not the case for the 'GLSZM – SAE' feature values

The test–retest analysis demonstrated that the feature 'GLRLM – RLN' is reproducible (CCC > 0.85)

Mackin [28]

20 NSCLC

CCR phantom

IBEX radiomics software

Inter-scanner (17 CT units)

Patient vs. Phantom

Variability was large relative to the inter-patient variation in the NSCLC tumors for some features

The variability in radiomics features extracted from CT images of the phantom was comparable in size to the variability observed in the same features extracted from CT images of NSCLC tumors

The reproducibility of radiomics features extracted from different CT vendors is low, but the different brands of the same vendors have higher reproducibility

Ibrahim [29]

338 HCC patients with arterial and portal venous phases

RadiomiX radiomics software

Inter-scanner (9 CT units)

About 25% of the features were reproducible across the inter-scanner study

About 28% of features (42 of 167) are reproducible between the arterial and portal venous imaging phases

The combat harmonization only improved by 1% reproducibility

Caramella [30]

Phantom

Lifex radiomics software

Inter-scanner (8 CT units)

About 23% of features (8 of 34) exhibited high reproducibility

Zwanenburg [31]

Patient

Phantom

In-house radiomics software

Multicentral study (25 research teams)

The Image Biomarker Standardization Initiative produced and validated a set of consensus-based reference values for radiomics features

15% of features have good to excellent reproducibility in a validation dataset between patient and phantom

46% of features were reproducible in test-rest

Balagurunathan [32]

32 NSCLC patient

In-house radiomics software

Manual vs. automatic segmentation

2D vs. 3D

About 22% of features (48 of 219) across segmentation methods (2D vs. 3D and manual vs. automatic) were reproducible (CCC > 0.9), and 13% (29 features) were reproducible with CCC > 0.95

Fave × 2015

[33]

CCR phantom & NSCLC patient

IBEX radiomics software

Inter-scanner vs. intra-scanner (19 CBCT units of Linac accelerator)

Noise(scatter)

Motion (or ROI identification)

About 54% of the features (37 out of 68) were reproducible in the intra-scanner test, but none were reproducible in the inter-scanner test

No feature can be reliably measured if the tumor motion is greater than 1 cm

With 4 mm of motion, 12 features from the entire volume and 14 from the center slice measurements were reproducible

Almost all features changed significantly when scatter material was added around the phantom. For the dense cork, 23 features passed in the thoracic scans and 11 in the head scans when the differences between one and two layers of scatter were compared

Lorena Escudero Sanchez [19]

43 HCC patient

Pyradiomics radiomics software

Gray Level (8, 16, 32, 64, 128, and 256)

Slice thickness (2 mm vs. 5 mm)

Features value depends on slice thickness

Slice thickness does not affect the ROI segmentation

The most optimal gray level for high reproducibility is between 32 and 64

Shafiq‐ul‐Hassan [34]

CCR phantom

in-house radiomics software

Inter-scanner (8 CT units)

slice thicknesses

FOV

Pixel sizes (0.39 to 0.98 mm)

resampled (to a voxel size of 1 × 1 × 2 mm 3 using linear interpolation)

Gray level (16, 32, 64, 128, and 256 GL)

70% (150 of 213 features) were reproducible across voxel size variation

Resample and normalizing feature values by voxel size can heighten reproducibility (resampling increases reproducibility until to 80%)

Seventeen texture features were dependent on the number of gray levels. This dependency can also be removed or reduced by normalizing the number of gray levels used

Mackin [35]

lung cancer patients

Phantom

IBEX radiomics software

Resampling

Filtering (with Butterworth)

Resampling and low-pass filtering of CT images could correct much of the variability in features due to inconsistent image pixel sizes

This correction may also reduce the variability introduced by other CT scan acquisition parameters

This correction reduces the dependence of features on pixel size from 80 to 10%

Solomon [36]

20 patients

In-house radiomics software

Reconstruction Algorithm (MBIR and ASIR vs. FBP)

Radiation Dose

Among the 23 imaging features assessed, radiation dose significantly affected 5, 3, and 4 of the features for liver lesions, lung nodules, and renal stones, respectively

ASIR reconstruction significantly affected 3, 1, and 1 features for liver lesions, lung nodules, and renal stones, respectively

MBIR reconstruction significantly affected 9, 11, and 15 features for liver lesions, lung nodules, and renal stones, respectively

Kim [37]

42 patient Lung tumor (contrast-enhanced CT scans)

in-house radiomics software

Reconstruction Algorithm (FBP vs. Iterative)

ROI segmentation (Inter-reader vs. intra-reader)

About 40% of features (6 of 15) were reproducible among reconstruction algorithms

Inter-reader variability was more significant than intra-reader or inter-reconstruction algorithm variability in 9 features

Inter-reconstruction algorithm variability was more significant than inter-reader variability for entropy, homogeneity, and GLCM-based features

Meyer [38]

75 liver patients

Radiomics version 1.0.9 radiomics software

Radiation Dose levels

section thicknesses

Kernels

Reconstruction algorithm

About 11% of features (12 of 106) were reproducible for any variation of the different technical parameters

Reconstructed section thickness had the most considerable impact on reproducibility (only 12% of features were stable)

Reconstruction kernel had a minor impact on the reproducibility (53% of features were stable)

inter-reader variability induced by the ROI segmentation was significantly higher than the reconstruction algorithm

The number of reproducible radiomics features in:

Kernels = 56 (52.8%)

Section thicknesses = 42 (39.6%)

Radiation Dose levels = 22 (20.08%)

Reconstruction algorithm = 13 (12.2%)

Huang lan He [39]

240patient with a solitary pulmonary nodule

In-house radiomics software

Reconstruction slice thickness

Convolution kernel

Contrast-enhancement (non-contrast vs. contrast CT)

NECT-based radiomics demonstrated better discrimination and classification capability than CECT in both primaries

Thin-slice (1.25 mm) CT-based radiomics signature had better diagnostic performance than thick-slice CT (5 mm)

Standard convolution kernel-based radiomics signature had better diagnostic performance than lung convolution kernel-based CT

radiomics signature based on the non-contrast, thin-slice, and standard convolution kernel-based CT was more informative on the differential diagnosis of SPN

Muenzfeld [40]

48 prostate cancer patients

Pyradiomics radiomics software

Kernel (two soft tissue kernels and one bone kernel)

11 of 86 features (12.7%) as highly reproducibility with CCC ≥ 0.85

Feature reproducibility was also impaired for most first-order features by applying the sharp-edge kernel

Bone kernel resulted in overall lower reproducibility compared to both soft tissue kernels

Haarburger [1]

Patients with liver, kidney, or lung lesions

Pyradiomics radiomics software

Manual vs. Automatic segmentation

Manual vs. automated segmentation approaches was highly correlated with a Pearson correlation coefficient of r = 0.921

Features found to be unstable based on human annotations were also found to be unstable based on automated annotations

When a feature exhibited high reproducibility (i.e., ICC > 0.9) on one lesion, it also achieved high ICCs on others

Zwanenburg [41]

31 (NSCLC) patients

19 H&N SCC patients

in-house radiomics software

Adding perturbation as:

Noise addition (N)

Translation (T)

Rotation (R)

Volume growth/shrinkage (V)

Super voxel-based contour Randomization (C)

The reproducibility of NSCLC CT images under image perturbations (N, T, R, V, C) was higher

Reproducibility of H&N SCC ICCs was generally lower

J Kalpathy-Cramer [42]

Patient with lung disease

in-house radiomics software

Manual vs. Automatic segmentation

68% of features were reproducible across segmentations with CCC > 0.75

Kelahan LC [43]

Segmentation

Inter-reader reproducibility is dependent on the ROI size

Groups of "large" and "small" lesions show different inter-reader reproducibility

Ying Li [44]

Lung Phantom

In-house radiomics software

mAs (25, 100, or 200)

pitch (0.9 or 1.2)

Slice thicknesses (0.75, 1.5, or 3 mm)

reconstruction kernels ((medium or detail)

gray-level (3 ranges)

gray-level bin (11 sizes)

For the three gray-level ranges, 50% (44/88) of features were reproducible

For gray level, bin size, 33.3% (24/72) of features were reproducible for 11 bin sizes

Feature calculating parameters (range and bin size) may have a greater influence than imaging parameters (effective dose, pitch, slice thickness, and filter) on the reproducibility of CT radiomics features

Jensen [46]

Homogeneous phantom

Pyradiomics software

Sphere-shaped ROIs of diameters 4, 8, and 16 mm, and 4, 8, and 16 pixels

70 CT-derived features were significantly different between ROI sizes

many features indicated significant differences and only few showed excellent agreement across varying ROI sizes

Jensen [47]

Homogeneous phantom

Pyradiomics software

sphere-shaped ROIs of diameters 4, 8, and 16 mm

parametric maps with a fixed voxel size of 4 mm3 were created

Fifty-five conventionally extracted and 8 parametric map-based features were significantly different between the VOI sizes

Only 3 of 93 parametric map-based features showed excellent agreement across varying ROI sizes

  1. CCR = credence cartridge radiomics phantom, NSCLC = non-small cell lung cancer, CT = computed tomography, IBEX = imaging biomarker explorer, FBP = filtered back projection, ASIR = adaptive statistical iterative reconstruction, ROI = region of interest, CTDI = CT Dose index, LDCT = low Dose CT, FOV = field of view, ICC = intraclass correlation coefficient, GLRLM-RLN = gray level run length matrix-run length non-uniformity, GLSZM-SAE = gray level size zone matrix-small area emphasis, CCC = concordance correlation coefficient, HCC = hepatocellular carcinoma, CBCT = cone beam CT, MBIR = model-based iterative reconstruction, GLCM = gray level co-occurrence matrix, CECT = contrast-enhanced CT, NECT = non-contrast CT, SPN = solitary pulmonary nodule, H&N SCC = head and neck squamous cell carcinoma