Assessment of ILD involves not only accurate and early diagnosis but also evaluation of disease extent and severity which should be integrated into the care provided to ILD patients. However, to date, there are no generally accepted or validated staging systems [9].
The qualitative visual evaluation of HRCT is the base for detection and classification of the type of lung structural abnormalities. Nevertheless, this can be supplemented by visual scales for a semi-quantitative rating of the extent or severity of ILD [6]. This semi-quantitative visual assessment of disease extent may be reported as poorly defined terms like mild, moderate, or severe, or it can be reported as a scoring system or even as an estimate of percentage of lung affected to the nearest 5%, 10%, or 25% [10].
In our work, parenchymal abnormalities on HRCT were coded and visually scored in all images according to Warrick et al. [11]. We found a fair negative significant correlation between PFT and CT-based visual scoring (p value = 0.04 and r = − 0.190), indicating that decrease of FEV1 values is associated with an increase in visual scoring of lung abnormalities which is logic and expected. Similarly, Sverzellati et al. [8] found that visual score was a significant predictor of functional impairment with good correlation (p < 0.05, r = 0.60, r2 = 0.38).
However, visual-based assessments are subjective, with large inter-reader and intra-reader variation. A further difficulty is represented by complexity of integrating the extent of the different components of abnormalities seen on several HRCT slices and deriving a quantitative measure of the total extent of lung abnormality. Moreover, visual-based scoring is not reliable for follow-up of ILD patients [12, 13].
These variabilities and difficulties are a reason for automation in an attempt to provide more consistent indices for assessment of ILD [14]. Multiple commercial software packages are available for lung densitometry and automated quantitative of ILD, but they are not widely used because they are complicated or expensive, even among experienced thoracic radiologists, making automated image analysis of ILD confined mainly to research work. The introduction of free softwares and the development of open-source platforms as well made access and use of lung densitometry relatively easy and potentially free [14].
In our study, we attempt to investigate structure function relationship between PFT and an open access computer-based CT quantification scoring (3D SLICER). This is to verify whether computer-based data could distinguish between patients who have normal lungs and ILD patients.
The global histogram of density metrics of CT images—skewness, kurtosis and mean lung density—is helpful to estimate ILD extent. Such metrics are sufficiently reproducible [5, 14]
Mean lung density (MLD) is the simplest measurement which is utilized especially for pulmonary fibrosis [6]. Threshold of − 900 HU, corresponding to attenuation values of a normal lung inflated by air, has been proposed [15]. The skewness is a measure of the lack of symmetry of the density histogram, whereas kurtosis is a measure to which the distribution is peaked relative to a normal distribution [6].
MLD for our patients was − 744.9 ± 37.1 HU which is far above normal lung attenuation density. Mean skewness and kurtosis values were 1.12 ± 0.43 and 5.97 ± 1.2, respectively. Mean CT-based visual scoring of studied patients was 42.5% ± 22.5%. In concordance with our results, Sverzellati et al. [8] found that MLD, skewness, and kurtosis on frequency histograms of their patients were on average − 732.1 HU ± 71, 2 ± 0.5 HU, and 8.4 ± 2.2 HU, respectively. Histogram features, after comparing UIP and non-UIP groups, showed no significant statistical differences (p > 0.05). The extent of interstitial disease on visual score was 39.5% ± 21.2%.
In lung fibrosis (as in our results), collagen deposition caused increasing lung density, with subsequent rightward shift of CT frequency histogram, both skewness and kurtosis will typically decrease [16].
We observed a good significant correlation between computer-aided CT quantified MLD and FEV1 (p < 0.001, r = − 0.57). In agreement with our results, Salaffi et al. [17], found that computerized aided scores showed a moderate to high significant negative correlation with forced vital capacity (FVC) (r = − 0.490; p value < 0.0001), forced expiratory volume in 1s (FEV1) (r = − 0.675; p value < 0.0001), and single breath carbon monoxide diffusing capacity of the lung (DLco) (r = − 0.653; p value < 0.0001). Shin et al. [18] reported that mean lung attenuation best correlated with reticulation extent (p < 0.001, r = 0.42). Best et al. [19] also found moderate correlations existed between histogram features and PFT results, but kurtosis showed the greatest degree of correlation with physiologic abnormality (p < 0.01, r = 0.53). Ash et al. [20] found that CT densitometric measures and visual fibrosis score were strongly correlated with FVC% as follows: mean lung attenuation (r = − 0.78), skewness (r = 0.76), kurtosis (r = 0.71) with p values < 0.001.
However, Sverzellati et al. [8] found a poor correlation between histogram features and functional data in UIP group, which may be explained by more cystic dead spaces of honeycombing, but they confirmed that fibrosis ratio together with histogram features can differentiate fibrotic lung from normal lung [8]. This is supported by a better correlation with DLCO than did visual score at the same study in patients with a predominant pattern of ground-glass and reticular opacities without honeycombing.
We also found that computer-aided CT quantification of lung density histograms showed a more significant, stronger correlation, and higher performance than visual-based semi-quantitative CT scaling (p < 0.01, AUC = 0.85 versus 0.71). Direct comparison of CT densitometry with visual score in patients with pulmonary fibrosis showed that the former is more reproducible and more sensitive [21]. Jacob et al. [22] reported that baseline texture-based CT quantification of total disease extent was superior to visual scoring in clinic-functional models predictive of outcome in IPF (p = 0.002). Similarly, Salaffi et al. [4] on ROC curve analysis, computer-aided scores confirmed the highest performance (AUC = 0.861 versus 0.689; p = 0.011)
The qualitative visual assessment of the lung must always be performed before densitometry “eye first role,” since it is fundamental for diagnosis, classification, and decreasing risk of false interpretation of lung density values [6].
Lung densitometry might replace semi-quantitative visual rating of severity and extension of lung changes [6]. In fact, lung densitometry has several advantages over visual semi-quantitative assessment of diffuse lung alterations. The visual assessment is subjective and shows intra- and inter-observer variations. Thanks to the automatic segmentation of lung tissue and negligible time for software computation of density, the time required for lung densitometry is generally shorter as compared to visual scoring.