Radiomics reproducibility challenge in computed tomography imaging as a nuisance to clinical generalization: a mini-review

Jahanshahi, Amirreza; Soleymani, Yunus; Fazel Ghaziani, Mona; Khezerloo, Davood

doi:10.1186/s43055-023-01029-6

Egyptian Journal of Radiology and Nuclear Medicine

Table 1 Summary of the characteristics and the results of the studies about CT radiomics reproducibility

From: Radiomics reproducibility challenge in computed tomography imaging as a nuisance to clinical generalization: a mini-review

Reference	Methodology specifications	Parameters studied	Main results and conclusions
Mackin [20]	CCR phantom NSCLC patient 2 CT vendor IBEX radiomics software	mAs (250–30) Noise kVp	The pattern of feature variation in two GE and Toshiba CT scan units is similar Smoothing does not affect the results of the features Noise is the main factor in changing features mAs seem to have the most significant impact on radiomics reproducibility among mAs, kVp, and pitch
Midya [21]	Anthropomorphic phantom (consists of five simulated tissue types) In-house radiomics software	mAs (50, 100, 200, 300, 400, and 500 mA) Noise index (NI) levels (12, 14, 16, 18, and 20) Reconstruction (from FBP (0% ASIR) to 100% ASIR in increments of 10%)	Noise, tube current, and reconstruction algorithm significantly affect the reproducibility of radiomics results By reducing image noise, the reproducibility of features increases FBP algorithm has the most reproducibility By increasing the weight of the ASIR algorithm from 0 to 100%, the number of reproducible features decreased because the image noise gradually increased 8–25% of features are reproducible across mAs variation Noise: 19–25% of features are reproducible across Noise variation
Berenguer [22]	Anthropomorphic pelvic phantom CCR phantom IBEX radiomics software	Test–retest (5 CT vendors) Pitch Reconstruction Kernel mAs kVp FOV	The more limited the range of variation of the scan parameters, the higher the reproducibility During intra-scanner analysis, changing the kernel has the most and the pitch has the least impact on reproducibility During test–retest, about 91% of features were reproducible During intra-scanner analysis, about 89% of features were reproducible with the change of pitch factor, and 43% were reproducible with the change of reconstruction algorithm During inter-scanner analysis, about 85% of features were reproducible in wood (heterogeneous), and only 15% were reproducible in polyurethane (homogeneous) In general, ten features out of 177 features remained reproducible after changing all parameters
Buch [23]	In-house Phantom Two different CT brand In-house radiomics software	kVp (80–140) mAs (80–140) pitch Section thickness (0.625, 1.25, 2.5, 5 mm) Acquisition mode (axial vs. helical)	The change of kVp and mA has less impact on reproducibility than other scan parameters The features change significantly by changing the pitch, acquisition mode, and section thickness The features vary significantly by changing the pitch, acquisition mode, and section thickness
Fave [24]	20 NSCLC patient IBEX radiomics software	mA (100–150–200–250) kVp (80–100–120–140) 2D vs. 3D Respiratory phase (10-time phase)	Changing the tube voltage has little impact on the value of the features, while changing the mA leads to significant changes in the value of the features Reproducibility in intra-patient studies is higher than inter-patient studies By adding Gaussian noise to the images, the values of the features do not change By changing mA, about 43% of features (10 of 23) remain reproducible By changing the respiratory phase (motion), about 65% of features (15 of 23) remain reproducible By changing the dimensionality of ROI segmentation (2D vs. 3D), about 65% of features (15 of 23) remain reproducible kVp does not influence the features significantly
Gao [25]	105 Pulmonary patients Pyradiomics radiomics software	Dose (Low dose vs. Conventional dose CT) CTDI_vol of LDCT ~ 2 mGy CTDI_vol of CDCT ~ 12 mGy	With changing the radiation dose, 45% of features extracted from a solid nodule and 35% from ground-glass nodules remained reproducible
Li [26]	CCR Phantom IBEX radiomics software	Inter-CT vs. intra-CT mAs kVp Pitch FOV Kernel Slice thickness	Reproducibility depends on the structure and texture of the material Parameters related to image resolution, such as FOV, slice thickness, and kernel, have a more significant impact on reproducibility than scanning parameters (mAs, kVp, pitch) The reproducibility of radiomics features depends on the noise level Test–retest show ICC > 0.9 The highest reproducibility was for shape features (94% of features were reproducible). Even in the least reproducibility, 14% of the features were still stable Changing the kernel (from bone to standard) significantly affects the reproducibility of features
Larue [27]	CCR phantom In-house radiomics software	Inter-scanner Slice thickness gray-level discretization (bin widths ranging from 5 to 50 Hounsfield Units with a step size of 5 HU) voxel resampling (resampling into voxel sizes of 1 × 1 × 3 mm³ using cubic, linear, and nearest-neighbor interpolation)	CT scanner, slice thickness, and bin width affected radiomics feature values No impact of radiation exposure observed Resampling images before feature extraction decreases the variability of radiomics features 'GLRLM – RLN' features in 1.5 mm and 3 mm slice thickness were more similar after resampling, which was not the case for the 'GLSZM – SAE' feature values The test–retest analysis demonstrated that the feature 'GLRLM – RLN' is reproducible (CCC > 0.85)
Mackin [28]	20 NSCLC CCR phantom IBEX radiomics software	Inter-scanner (17 CT units) Patient vs. Phantom	Variability was large relative to the inter-patient variation in the NSCLC tumors for some features The variability in radiomics features extracted from CT images of the phantom was comparable in size to the variability observed in the same features extracted from CT images of NSCLC tumors The reproducibility of radiomics features extracted from different CT vendors is low, but the different brands of the same vendors have higher reproducibility
Ibrahim [29]	338 HCC patients with arterial and portal venous phases RadiomiX radiomics software	Inter-scanner (9 CT units)	About 25% of the features were reproducible across the inter-scanner study About 28% of features (42 of 167) are reproducible between the arterial and portal venous imaging phases The combat harmonization only improved by 1% reproducibility
Caramella [30]	Phantom Lifex radiomics software	Inter-scanner (8 CT units)	About 23% of features (8 of 34) exhibited high reproducibility
Zwanenburg [31]	Patient Phantom In-house radiomics software	Multicentral study (25 research teams)	The Image Biomarker Standardization Initiative produced and validated a set of consensus-based reference values for radiomics features 15% of features have good to excellent reproducibility in a validation dataset between patient and phantom 46% of features were reproducible in test-rest
Balagurunathan [32]	32 NSCLC patient In-house radiomics software	Manual vs. automatic segmentation 2D vs. 3D	About 22% of features (48 of 219) across segmentation methods (2D vs. 3D and manual vs. automatic) were reproducible (CCC > 0.9), and 13% (29 features) were reproducible with CCC > 0.95
Fave × 2015 [33]	CCR phantom & NSCLC patient IBEX radiomics software	Inter-scanner vs. intra-scanner (19 CBCT units of Linac accelerator) Noise(scatter) Motion (or ROI identification)	About 54% of the features (37 out of 68) were reproducible in the intra-scanner test, but none were reproducible in the inter-scanner test No feature can be reliably measured if the tumor motion is greater than 1 cm With 4 mm of motion, 12 features from the entire volume and 14 from the center slice measurements were reproducible Almost all features changed significantly when scatter material was added around the phantom. For the dense cork, 23 features passed in the thoracic scans and 11 in the head scans when the differences between one and two layers of scatter were compared
Lorena Escudero Sanchez [19]	43 HCC patient Pyradiomics radiomics software	Gray Level (8, 16, 32, 64, 128, and 256) Slice thickness (2 mm vs. 5 mm)	Features value depends on slice thickness Slice thickness does not affect the ROI segmentation The most optimal gray level for high reproducibility is between 32 and 64
Shafiq‐ul‐Hassan [34]	CCR phantom in-house radiomics software	Inter-scanner (8 CT units) slice thicknesses FOV Pixel sizes (0.39 to 0.98 mm) resampled (to a voxel size of 1 × 1 × 2 mm 3 using linear interpolation) Gray level (16, 32, 64, 128, and 256 GL)	70% (150 of 213 features) were reproducible across voxel size variation Resample and normalizing feature values by voxel size can heighten reproducibility (resampling increases reproducibility until to 80%) Seventeen texture features were dependent on the number of gray levels. This dependency can also be removed or reduced by normalizing the number of gray levels used
Mackin [35]	lung cancer patients Phantom IBEX radiomics software	Resampling Filtering (with Butterworth)	Resampling and low-pass filtering of CT images could correct much of the variability in features due to inconsistent image pixel sizes This correction may also reduce the variability introduced by other CT scan acquisition parameters This correction reduces the dependence of features on pixel size from 80 to 10%
Solomon [36]	20 patients In-house radiomics software	Reconstruction Algorithm (MBIR and ASIR vs. FBP) Radiation Dose	Among the 23 imaging features assessed, radiation dose significantly affected 5, 3, and 4 of the features for liver lesions, lung nodules, and renal stones, respectively ASIR reconstruction significantly affected 3, 1, and 1 features for liver lesions, lung nodules, and renal stones, respectively MBIR reconstruction significantly affected 9, 11, and 15 features for liver lesions, lung nodules, and renal stones, respectively
Kim [37]	42 patient Lung tumor (contrast-enhanced CT scans) in-house radiomics software	Reconstruction Algorithm (FBP vs. Iterative) ROI segmentation (Inter-reader vs. intra-reader)	About 40% of features (6 of 15) were reproducible among reconstruction algorithms Inter-reader variability was more significant than intra-reader or inter-reconstruction algorithm variability in 9 features Inter-reconstruction algorithm variability was more significant than inter-reader variability for entropy, homogeneity, and GLCM-based features
Meyer [38]	75 liver patients Radiomics version 1.0.9 radiomics software	Radiation Dose levels section thicknesses Kernels Reconstruction algorithm	About 11% of features (12 of 106) were reproducible for any variation of the different technical parameters Reconstructed section thickness had the most considerable impact on reproducibility (only 12% of features were stable) Reconstruction kernel had a minor impact on the reproducibility (53% of features were stable) inter-reader variability induced by the ROI segmentation was significantly higher than the reconstruction algorithm The number of reproducible radiomics features in: Kernels = 56 (52.8%) Section thicknesses = 42 (39.6%) Radiation Dose levels = 22 (20.08%) Reconstruction algorithm = 13 (12.2%)
Huang lan He [39]	240patient with a solitary pulmonary nodule In-house radiomics software	Reconstruction slice thickness Convolution kernel Contrast-enhancement (non-contrast vs. contrast CT)	NECT-based radiomics demonstrated better discrimination and classification capability than CECT in both primaries Thin-slice (1.25 mm) CT-based radiomics signature had better diagnostic performance than thick-slice CT (5 mm) Standard convolution kernel-based radiomics signature had better diagnostic performance than lung convolution kernel-based CT radiomics signature based on the non-contrast, thin-slice, and standard convolution kernel-based CT was more informative on the differential diagnosis of SPN
Muenzfeld [40]	48 prostate cancer patients Pyradiomics radiomics software	Kernel (two soft tissue kernels and one bone kernel)	11 of 86 features (12.7%) as highly reproducibility with CCC ≥ 0.85 Feature reproducibility was also impaired for most first-order features by applying the sharp-edge kernel Bone kernel resulted in overall lower reproducibility compared to both soft tissue kernels
Haarburger [1]	Patients with liver, kidney, or lung lesions Pyradiomics radiomics software	Manual vs. Automatic segmentation	Manual vs. automated segmentation approaches was highly correlated with a Pearson correlation coefficient of r = 0.921 Features found to be unstable based on human annotations were also found to be unstable based on automated annotations When a feature exhibited high reproducibility (i.e., ICC > 0.9) on one lesion, it also achieved high ICCs on others
Zwanenburg [41]	31 (NSCLC) patients 19 H&N SCC patients in-house radiomics software	Adding perturbation as: Noise addition (N) Translation (T) Rotation (R) Volume growth/shrinkage (V) Super voxel-based contour Randomization (C)	The reproducibility of NSCLC CT images under image perturbations (N, T, R, V, C) was higher Reproducibility of H&N SCC ICCs was generally lower
J Kalpathy-Cramer [42]	Patient with lung disease in-house radiomics software	Manual vs. Automatic segmentation	68% of features were reproducible across segmentations with CCC > 0.75
Kelahan LC [43]	–	Segmentation	Inter-reader reproducibility is dependent on the ROI size Groups of "large" and "small" lesions show different inter-reader reproducibility
Ying Li [44]	Lung Phantom In-house radiomics software	mAs (25, 100, or 200) pitch (0.9 or 1.2) Slice thicknesses (0.75, 1.5, or 3 mm) reconstruction kernels ((medium or detail) gray-level (3 ranges) gray-level bin (11 sizes)	For the three gray-level ranges, 50% (44/88) of features were reproducible For gray level, bin size, 33.3% (24/72) of features were reproducible for 11 bin sizes Feature calculating parameters (range and bin size) may have a greater influence than imaging parameters (effective dose, pitch, slice thickness, and filter) on the reproducibility of CT radiomics features
Jensen [46]	Homogeneous phantom Pyradiomics software	Sphere-shaped ROIs of diameters 4, 8, and 16 mm, and 4, 8, and 16 pixels	70 CT-derived features were significantly different between ROI sizes many features indicated significant differences and only few showed excellent agreement across varying ROI sizes
Jensen [47]	Homogeneous phantom Pyradiomics software	sphere-shaped ROIs of diameters 4, 8, and 16 mm parametric maps with a fixed voxel size of 4 mm³ were created	Fifty-five conventionally extracted and 8 parametric map-based features were significantly different between the VOI sizes Only 3 of 93 parametric map-based features showed excellent agreement across varying ROI sizes

CCR = credence cartridge radiomics phantom, NSCLC = non-small cell lung cancer, CT = computed tomography, IBEX = imaging biomarker explorer, FBP = filtered back projection, ASIR = adaptive statistical iterative reconstruction, ROI = region of interest, CTDI = CT Dose index, LDCT = low Dose CT, FOV = field of view, ICC = intraclass correlation coefficient, GLRLM-RLN = gray level run length matrix-run length non-uniformity, GLSZM-SAE = gray level size zone matrix-small area emphasis, CCC = concordance correlation coefficient, HCC = hepatocellular carcinoma, CBCT = cone beam CT, MBIR = model-based iterative reconstruction, GLCM = gray level co-occurrence matrix, CECT = contrast-enhanced CT, NECT = non-contrast CT, SPN = solitary pulmonary nodule, H&N SCC = head and neck squamous cell carcinoma

Back to article page