Computer-aided analysis in evaluation and grading of interstitial lung diseases in correlation with CT-based visual scoring and pulmonary function tests

Interstitial lung diseases (ILDs) represent a large group of more than 200 different entities. High resolution computed tomography (HRCT) is accepted as the gold standard imaging modality in the diagnosis of ILD. The visual-based scoring offers an advantage in finding a specific type of ILD. Computer-aided CT attenuation histogram is another way of characterizing and quantifying diffuse lung disease. The histogram analysis (HIST) consists of calculating skewness, kurtosis, and mean lung density to quantify lung disease and monitor progression. The aim of our study was to investigate the value of computer-aided analysis of HRCT for interstitial lung diseases in correlation with scoring and pulmonary function tests. This prospective study included 50 patients with suspected ILD. The mean age of patients was 46.7 years ± 12.5. Mean forced expiratory volume FEV1 was 63.6 ± 20.9. HRCT examination was done for all patients followed by CT-based visual scaling. Most of the studied patients (43.3%) had a CT visual semi-quantitative scoring ranged between 40 and 64. CT-based lung density histograms (LDH) were obtained for all patients using the 3D Slicer Software (Chest Imaging Platform). There was a significant difference between patient’s groups of different (mild, moderate, and severe) grades of ILD according to FEV1 regarding MLD, skewness, and kurtosis of corresponding CT-based density histograms (p values < 0.001). More significant and higher correlation was observed between computerized aided CT quantified mean lung densities (MLD) and (FEV1) (p value < 0.001 and r = − 0.570). The ROC curve analysis demonstrated good performance for CT visual scoring with PFT (AUC = 0.71); a cutoff scoring 15 or higher was associated with best sensitivity (75%) and specificity (100%). Meanwhile, ROC curve analysis for MLD and FEV1 demonstrated an excellent performance for computer-based CT quantification (AUC = 0.85) with a value of − 769 HU which increased sensitivity to 65% and specificity to 100%. Visual-based scoring techniques offer an advantage in finding a specific type of ILD. Computer-based quantification system could be a means for accurately monitoring the disease progression or response to therapy.


Background
Interstitial lung diseases (ILDs) represent a very large group of more than 200 different entities, many of which are rare or "orphan" diseases. Much remains unknown or debatable for many of these ILDs, notable issues of prevalence, incidence, and mortality rates [1].
This group of diseases is associated with substantial morbidity and mortality. Thus, a multi-disciplinary approach including clinical, pathological, and radiological correlation is required to reach accurate early diagnosis [2]. High resolution computed tomography (HRCT) of the chest became accepted as the gold standard imaging modality in the diagnosis of ILD [3]. The visual-based scoring offers an advantage in finding a specific type of ILD or in distinguishing other causes of increased lung density, such as infection or neoplasm [4].
The computer-aided CT attenuation histogram is another way of characterizing and quantifying diffuse lung disease. The attenuation of a voxel is determined by the relative contribution of air and blood within that voxel. The relative frequency of voxels with particular attenuation values can be calculated and expressed as a histogram [5]. Normal lung tissue deviates from the Gaussian distribution; it is markedly skewed to the left and peaks at approximately − 900 HU. The histogram analysis (HIST) consists of calculating skewness, kurtosis, and mean lung density to quantify lung disease and monitor progression [6]. The aim of our study was to investigate the value of computer-aided analysis of HRCT for interstitial lung diseases in correlation with semi-quantitative visual scoring and pulmonary function tests.

Methods
After approval from the institutional ethical committee and informing the individual patients, 50 patients were incorporated in this prospective clinical study. The patients were subjected to: 1-History taking including the type of work, bird, and animal breeding, drug history; 2-Pulmonary function tests; 3-HRCT of the chest without contrast.

Pulmonary function tests
Pulmonary function tests were performed using a 2130 spirometer (v max, Sensormedia, USA) which was calibrated daily. Results were obtained for forced vital capacity (FVC), forced expiratory volume in the first second (FEV1), and FEV1/FVC ratio. Restrictive ventilatory defect was defined on spirometric findings of FEV1/FVC ratio < 70% and FVC < 80% predicted [7].
The patients were classified functionally based on FEV1 which represents proportion of patients' vital capacity that they are able to expire in the first second of forced expiration to the full forced vital capacity. FEV1 categorization: ≤ 70 → mild, 69-50 → moderate, and ≤ 49 → severe.

HRCT
HRCT exams were performed at the Radiology Department, and the scanning protocol included only unenhanced scans to all the 50 patients with suspected ILD based on history, clinical examination, and pulmonary function tests. The examination was performed using bright speed MDCT 16 slices scanner (General Electric Medical Systems, Milwaukee, WI). The patient was trained on how to hold breath and how to listen and follow the instructions from the recorded voice in the machine. Patients who were unable to hold their breath were instructed to breath as shallow as possible during the acquisition. The images were reviewed for the following: ground glass opacities, fibrotic changes, reticulations, bronchiectatic changes, honey-combing, subpleural cysts.

CT-based visual scoring
CT-based semi-quantitative scoring was calculated according to number of lung segments affected on both sides. The finding on each segment was given a score from 1 to 4 as follows: score of (1) for ground glass opacities, score of (2) for reticulations and fibrotic changes, score of (3) for bronchiectatic changes, and score of (4) for honeycombing and sub-pleural cysts. If one segment has two findings or more, we consider the score of the higher finding [8]. The extent of disease was obtained by counting the number of broncho-pulmonary segments involved for each abnormality. In each patient, "severity of disease" score was then calculated as total score with a maximum total score of 80 for whole lungs.

Computer-aided analysis
HRCT images were analyzed using the open-source 3D Slicer software (Version 4.8.1) for creating automated computer-based lung density histograms (LDH). It is a multi-platform open source software package for visualization, analysis, and post-processing of medical images. It is built through support from National Institutes of Health. The parenchymal analysis module is a part of chest imaging platform of 3D Slicer. The parenchyma analysis module performs densitometry in chest CT scans by isolating the lung region and computing different phenotypes based on the histogram of the density measurements. First, we select an input CT image. Then, we select Lung Label Map for a selected input CT. Then, filtering option is turned "on" to activate filter. Select whether to apply filter in 2D or 3D. The filtering strength was selected as smooth, medium, or heavy. The slow method takes more time to finish but is more accurate. Lastly, we choose to apply to start the parenchyma analysis.

Statistical analysis
SPSS version 20 was used for statistical analysis. The quantitative variables tested for normality by Kolmogorov-Smirnov test. The descriptive data were expressed as mean, median, and standard deviation (SD). One-way ANOVA test was used for comparison between different groups. The correlation analyses were performed using Pearson's and Spearman's correlation tests. Statistical significance was defined as a P value < 0.05.
HRCT examination was done for all studied patients followed by visual scaling of CT abnormalities. CTbased lung density histograms (LDH) were then obtained for all patients using the 3D Slicer Software (Chest Imaging Platform).
The most common HRCT patterns of ILD were ground-glass opacification, reticulation, traction bronchiectasis, and honeycombing. Most of the studied patients (43.3%) had a CT visual scoring ranged between 40/80 and 64/80, i.e., 50% and 80%. The distribution of visual scoring among different patients of the study is shown in Table 2.
There was a significant difference between patient's groups of different (mild, moderate, and severe) grades of ILD according to FEV1 regarding MLD, skewness, and kurtosis of corresponding CT-based density histograms (p value < 0.001).
There was a significant fair negative correlation between CT-based visual scoring of studied patients and PFT represented by forced expiratory volume 1s FEV1 (p value = 0.04 and r = − 0.190). More significant and higher correlation was observed between computerized aided CT quantified mean lung densities (MLD) and (FEV1) (p value < 0.001 and r = − 0.570) ( Table 4).
The ROC curve analysis demonstrated a good performance for CT visual scoring with PFT (p value < 0.01, AUC = 0.71), a cutoff scoring 15/80 (18.75%) or higher that was associated with best sensitivity (75%) and specificity (100%). Meanwhile, ROC curve analysis for MLD and FEV1 demonstrated an excellent performance for computer-based CT quantification (p value < 0.001, AUC = 0.85) with a cutoff value of − 769 HU which increased sensitivity to 65% and specificity to 100% (Fig. 1). Illustrative cases of studied patients having different degrees of severity of ILD are shown in Figs. 2, 3, and 4.

Discussion
Assessment of ILD involves not only accurate and early diagnosis but also evaluation of disease extent and severity which should be integrated into the care provided to ILD patients. However, to date, there are no generally accepted or validated staging systems [9].
The qualitative visual evaluation of HRCT is the base for detection and classification of the type of lung structural abnormalities. Nevertheless, this can be supplemented by visual scales for a semi-quantitative rating of the extent or severity of ILD [6]. This semi-quantitative   visual assessment of disease extent may be reported as poorly defined terms like mild, moderate, or severe, or it can be reported as a scoring system or even as an estimate of percentage of lung affected to the nearest 5%, 10%, or 25% [10]. In our work, parenchymal abnormalities on HRCT were coded and visually scored in all images according to Warrick et al. [11]. We found a fair negative significant correlation between PFT and CT-based visual scoring (p value = 0.04 and r = − 0.190), indicating that decrease of FEV1 values is associated with an increase in visual scoring of lung abnormalities which is logic and expected. Similarly, Sverzellati et al. [8] found that visual score was a significant predictor of functional impairment with good correlation (p < 0.05, r = 0.60, r2 = 0.38).
However, visual-based assessments are subjective, with large inter-reader and intra-reader variation. A further difficulty is represented by complexity of integrating the extent of the different components of abnormalities seen on several HRCT slices and deriving a quantitative measure of the total extent of lung abnormality. Moreover, visual-based scoring is not reliable for follow-up of ILD patients [12,13].
These variabilities and difficulties are a reason for automation in an attempt to provide more consistent indices for assessment of ILD [14]. Multiple commercial software packages are available for lung densitometry and automated quantitative of ILD, but they are not widely used because they are complicated or expensive, even among experienced thoracic radiologists, making automated image analysis of ILD confined mainly to research work. The introduction of free softwares and the development of open-source platforms as well made access and use of lung densitometry relatively easy and potentially free [14].
In our study, we attempt to investigate structure function relationship between PFT and an open access computer-based CT quantification scoring (3D SLICER). This is to verify whether computer-based data could distinguish between patients who have normal lungs and ILD patients.
The global histogram of density metrics of CT images-skewness, kurtosis and mean lung density-is helpful to estimate ILD extent. Such metrics are sufficiently reproducible [5,14] Mean lung density (MLD) is the simplest measurement which is utilized especially for pulmonary fibrosis [6]. Threshold of − 900 HU, corresponding to attenuation values of a normal lung inflated by air, has been proposed [15]. The skewness is a measure of the lack of symmetry of the density histogram, whereas kurtosis is a measure to which the distribution is peaked relative to a normal distribution [6]. MLD for our patients was − 744.9 ± 37.1 HU which is far above normal lung attenuation density. Mean skewness and kurtosis values were 1.12 ± 0.43 and 5.97 ± 1.2, respectively. Mean CT-based visual scoring of studied patients was 42.5% ± 22.5%. In concordance with our results, Sverzellati et al. [8] found that MLD, skewness, and kurtosis on frequency histograms of their patients were on average − 732.1 HU ± 71, 2 ± 0.5 HU, and 8.4 ± 2.2 HU, respectively. Histogram features, after comparing UIP and non-UIP groups, showed no significant statistical differences (p > 0.05). The extent of interstitial disease on visual score was 39.5% ± 21.2%.
In lung fibrosis (as in our results), collagen deposition caused increasing lung density, with subsequent rightward shift of CT frequency histogram, both skewness and kurtosis will typically decrease [16].
We observed a good significant correlation between computer-aided CT quantified MLD and FEV1 (p < 0.001, r = − 0.57). In agreement with our results, Salaffi et al. [17], found that computerized aided scores showed a moderate to high significant negative correlation with   [18] reported that mean lung attenuation best correlated with reticulation extent (p < 0.001, r = 0.42). Best et al. [19] also found moderate correlations existed between histogram features and PFT results, but kurtosis showed the greatest degree of correlation with physiologic abnormality (p < 0.01, r = 0.53).
Ash et al. [20] found that CT densitometric measures and visual fibrosis score were strongly correlated with FVC% as follows: mean lung attenuation (r = − 0.78), skewness (r = 0.76), kurtosis (r = 0.71) with p values < 0.001. However, Sverzellati et al. [8] found a poor correlation between histogram features and functional data in UIP group, which may be explained by more cystic dead spaces of honeycombing, but they confirmed that fibrosis ratio together with histogram features can differentiate fibrotic lung from normal lung [8]. This is supported by a better correlation with DLCO than did visual score at the same study in patients with a predominant pattern of ground-glass and reticular opacities without honeycombing.
We also found that computer-aided CT quantification of lung density histograms showed a more significant, stronger correlation, and higher performance than visual-based semi-quantitative CT scaling (p < 0.01, AUC = 0.85 versus 0.71). Direct comparison of CT densitometry with visual score in patients with pulmonary fibrosis showed that the former is more reproducible and more sensitive [21]. Jacob et al. [22] reported that baseline texture-based CT quantification of total disease extent was superior to visual scoring in clinic-functional models predictive of outcome in IPF (p = 0.002). Similarly, Salaffi et al. [4] on ROC curve analysis, computer-aided scores confirmed the highest performance (AUC = 0.861 versus 0.689; p = 0.011) The qualitative visual assessment of the lung must always be performed before densitometry "eye first role," since it is fundamental for diagnosis, classification, and decreasing risk of false interpretation of lung density values [6].
Lung densitometry might replace semi-quantitative visual rating of severity and extension of lung changes [6]. In fact, lung densitometry has several advantages over visual semi-quantitative assessment of diffuse lung alterations. The visual assessment is subjective and shows intra-and inter-observer variations. Thanks to the automatic segmentation of lung tissue and negligible time for software computation of density, the time required for lung densitometry is generally shorter as compared to visual scoring.

Conclusion
In conclusion, taken together, a computer-based quantification system will be efficient and providing the best overall estimates of HRCT-measured lung disease, but they cannot be used alone. The visual-based scoring techniques offer an advantage in finding specific type of ILD or in distinguishing other causes of increased lung density, such as infection or neoplasm. The visual and computer-based quantitative scoring systems are complementary, rather than competitive. In combination with physiologic parameters, a computer-based quantification system could be a means for accurate monitoring of disease progression or response to therapy.
There are still some important topics that need to be addressed in the future. The QCT assessment of ILD might have a crucial role in evaluation of prognosis as well as mortality risk prediction models. It may have important implications for multi-center clinical trials that rely on accurate and reproducible quantitative analysis of CT images collected under varied conditions across multiple sites, scanners, and time points. We are aware of some limitations in our study. The diagnosis of pulmonary fibrosis was not based on histopathologic examination. Another is that lung volume variations due to different levels of inspiration may represent a major limitation of any density based analysis of the lungs. Such a problem may be overcome by taking into account both lung volume and lung density which is not done in our work.