Automated quantification of COVID-19 pneumonia severity in chest CT using histogram-based multi-level thresholding segmentation

Background Chest computed tomography (CT) has proven its critical importance in detection, grading, and follow-up of lung affection in COVID-19 pneumonia. There is a close relationship between clinical severity and the extent of lung CT findings in this potentially fatal disease. The extent of lung lesions in CT is an important indicator of risk stratification in COVID-19 pneumonia patients. This study aims to explore automated histogram-based quantification of lung affection in COVID-19 pneumonia in volumetric computed tomography (CT) images in comparison to conventional semi-quantitative severity scoring. This retrospective study enrolled 153 patients with proven COVID-19 pneumonia. Based on the severity of clinical presentation, the patients were divided into three groups: mild, moderate and severe. Based upon the need for oxygenation support, two groups were identified as follows: common group that incorporated mild and moderate severity patients who did not need intubation, and severe illness group that included patients who were intubated. An automated multi-level thresholding histogram-based quantitative analysis technique was used for evaluation of lung affection in CT scans together with the conventional semi-quantitative severity scoring performed by two expert radiologists. The quantitative assessment included volumes, percentages and densities of ground-glass opacities (GGOs) and consolidation in both lungs. The results of the two evaluation methods were compared, and the quantification metrics were correlated. Results The Spearman’s correlation coefficient between the semi-quantitative severity scoring and automated quantification methods was 0.934 (p < 0.0001). Conclusions The automated histogram-based quantification of COVID-19 pneumonia shows good correlation with conventional severity scoring. The quantitative imaging metrics show high correlation with the clinical severity of the disease.

The internationally adopted COVID- 19 Reporting and Data System (CO-RADS), recommended by the Radiological Society of North America and other radiological societies [26,27], uses a scoring system from 0 to 5 to classify lung involvement in CT images from very unlikely to very likely, respectively. The CO-RADS has shown a very good performance for predicting the likelihood of COVID-19 infection with substantial interobserver agreement [28].
Most of the imaging studies since the outbreak of COVID-19 have focused on lung CT findings, with only few studies concerning the quantitative analysis of these findings [29].
Unfortunately, this role has been reluctantly integrated into the routine radiological practice because the radiologist's semi-quantitative visual assessment is subjective, time consuming and lacking inter-observer consistency. A semi-quantitative visual severity scoring (SS) system for lung affection has been proposed [30] for assessment of lung affection in SARS, and for evaluation of ARDS [31].
The scoring of disease severity is performed in each of the five lung lobes on a scale from 0 to 5, with 0 indicating no involvement, 1 indicating less than 5% involvement, 2 indicating 5-25% involvement, 3 indicating 26-49% involvement, 4 indicating 50-75% involvement, and 5 indicating more than 75% involvement. The sum of individual lobar scores represents the total SS that ranges from 0 (no involvement) to 25 (maximum involvement). This semi-quantitative lobar-based visual scoring system has been adopted in the assessment of the severity of COVID-19 lung affection [15,32,33].
Quantitative CT analysis is superior to semi-quantitative visual SS in assessment of the severity of COVID-19 infection. Computerized quantitative segmentation methods could provide objective assessment of the percentage of the diseased part of the lung containing GGOs and consolidation to determine the disease burden [34,35].
The lung volumes measured by CT are well correlated with pulmonary function test results such as total lung capacities and forced vital capacities [36]. Quantitative indices have higher reproducibility than visual scoring and are significantly correlated with lung function and clinical parameters [37].
Still there is no consensus regarding the grading of severity of lung affection in CT images of COVID-19 pneumonia. Simple subjective descriptive terms are cordially used by most radiologists for determination of the severity of lung affection such as mild, moderate, severe or critical.
This study aims to explore an automated histogrambased quantitative CT method together with the

Patients
This is a single-center retrospective study that included 153 patients with COVID-19 infection (79 male, 74 female; age range from 19 to 78 years, mean age, 53 ± 14 years) who were presented to the emergency ward of Assiut University Hospitals during the period from March to July 2020. COVID-19 was diagnosed based on a positive result of RT-PCR assay on pharyngeal swab specimens. Chest CT scanning was performed within 3-6 days after the onset of symptoms. The clinical and laboratory data of these patients were collected. The patient characteristics are summarized in Table 1.
Based on the clinical presentation, the patients were divided into three groups: mild illness: individuals who have any of the various signs and symptoms of covid-19 (e.g., fever, cough, sore throat, malaise, headache, muscle pain, nausea, vomiting, diarrhea, loss of taste and smell) but who do not have shortness of breath, dyspnea, or abnormal chest imaging findings; moderate illness: individuals who show evidence of lower respiratory tract disease during clinical assessment or imaging and who have an oxygen saturation (SpO 2 ) ≥ 94% on room air; severe illness: individuals who have spO 2 < 94% on room air, a ratio of arterial partial pressure of oxygen to fraction of inspired oxygen (PaO2/FiO2) < 300 mm hg, respiratory frequency > 30 breaths/min, or lung infiltrates > 50% [38].

CT examination
Whole lung volumetric CT scanning was performed using a 16-row multi-detector CT scanner (BrightSpeed 16; General Electric Healthcare, Milwaukee, USA). Scanning was performed from lung apices to the diaphragm during a single breath-hold at deep inspiration, using the following parameters: tube voltage 120 kVp; automatic tube-current modulation; gantry rotation speed of 0.5 s; and beam collimation of 16 × 0.625 mm. Thin-section CT data were reconstructed at 0.625 mm thickness using standard filtered back-projection algorithm; iterative reconstruction algorithms were not used.

Infection control
The CT technologist and the attending nurse routinely wore personal protective equipment (PPE) while handling patients in the CT suite. The PPE included transparent face-shield, a surgical cap, a surgical mask, gloves, a fluid-resistant gown, and shoe covers. Decontamination of the CT machine was performed routinely after finishing CT scanning according to the infection control guidelines. Disposable sheets were used to cover the CT imaging table. The patients wore disposable masks and head caps before entering the CT examination room.

Semi-quantitative visual severity scoring assessment
The high-resolution computed tomography (HRCT) images were independently reviewed by two expert radiologists on a picture archiving and communication system (PACS) computer workstation at window setting for lung parenchyma (center, − 600 HU; width, 1600 HU). A lobar-based visual SS was independently identified by two expert radiologists, and the scores were then averaged to determine the mean total SS of COVID-19 lung affection. The scoring system considered the overall extent of parenchymal abnormalities, including the GGOs and consolidation (Co), using the definitions of the Fleischner Society glossary of terms for thoracic imaging [39]. Any co-existing reticular pattern (inter-lobular, intra-lobular and/or peri-bronchial thickening) or other types of opacities (crazy paving or reversed halo), or pleural effusion were also documented alongside the severity scoring.

Post-processing and quantitative CT measures
A commercially available computer workstation, Synapse 3D version 3.5 (Fujifilm Medical Systems, Tokyo, Japan), was used for quantitative analysis of CT images. The digital imaging and communications in medicine (DICOM) data of the CT scans of all patients were transferred to the workstation from the scanner. Whole lung extraction was automatically performed by eliminating the thoracic wall, mediastinum, large vessels, and tracheo-bronchial airway down to tertiary bronchi. The lung extraction process (according to vendor's data) uses both Hounsfield thresholding and anatomical knowledge-based algorithms.
An additional COVID-19 analysis dataset was added to the Synapse 3D workstation for analysis of COVID-19 pneumonia in CT images. The dataset consists of 4 groups of density ranges: • From − 1024 to − 950 HU (red), representing emphysema (low-attenuation areas, LAA). • From − 949 to − 750 HU (yellow), corresponds to healthy lung tissue. • From − 749 to − 300 HU (blue), it represents the lung parts which are more dense than healthy lung (high-attenuation areas, HAA) and can be used to quantify ground-glass opacities • From − 299 to + 40 HU (violet), this group corresponds to areas with further increase in density, including the semi-consolidation and consolidation.
For density-based quantitative analysis, 1.5 mm highresolution slices were reconstructed at sharp kernel settings. The lung analysis software of the workstation automatically generates the histogram of distribution of the density of each voxel within the lung and calculates the mean of distribution. In quantitative lung analysis, the following metrics were automatically extracted from the lung density histogram: • The volume of each lung and the total volume of both lungs (TLV) in cubic centimeter (cc) • The mean density of each lung and the mean density of both lungs (MLD) in HU • The volume and percentage of GGOs ( vol GGO and % GGO, respectively) and the volume and percentage of consolidation ( vol Co and % Co, respectively) in both lungs. The total volume of diseased lung was manually calculated as the sum of vol GGO and vol Co, and the total percentage of the diseased lung or total lesion load (TLL), representing the disease burden, was calculated as the sum of % GGO and % Co.

• The volumes and percentages of normal and hyperinflated parts of both lungs
These numerical data were expressed as mean ± standard deviation (SD).

Statistical analysis
Statistical analysis was performed using MedCalc Statistical Software version 20 (MedCalc Software Ltd, Ostend, Belgium). Inter-observer agreement between the two radiologists who performed semi-quantitative analysis of CT images of patients of the study was evaluated using kappa test. Pearson correlation coefficients were calculated between the visual severity scoring and the automated quantitative measures; between the total disease burden calculated by quantitative CT analysis and the MLD; and finally between the % GGO and MLD. p value of less than 0.05 was considered statistically significant.

Results
The clinical and laboratory data of the patients included in the study are shown in Table 1. The CT features which were observed in patients of the study were as follows ( Table 2): most of the patients presented with groundglass opacities (149 patients, 97.3%); alveolar consolidation was encountered in 67 patients (43.7%); crazy-paving pattern in 11 patients (16.8%); reversed halo sign in 4 patients (6.1%); and pleural effusion in 12 (18.2%). Both lungs were involved in most of the patients (145 patients, 94.8%); with multi-lobar affection (146 patients, 95.4%). The changes affected mainly the lower lobes in 103 patients (67.3%). The calculated quantitative metrics including volumes, percentages and densities are summarized in Table 3.
The inter-observer agreement of severity scoring between both radiologists who performed semi-quantitative evaluation of lung lesions in this study (Weighted Kappa) is 0.7984 ± 0.0140, with 95% confidence interval of 0.7708-0.8261. The concordance correlation coefficient between the two radiologists is 0.9615 with 95% confidence interval of 0.9474-0.9719.
In this study, Spearman's correlation coefficient shows strong positive correlation between the semi-quantitative CT severity scoring and the automated quantification method in all patients is 0.934 (p < 0.0001), with confidence interval (CI) for r between 0.9092 and 0.9512 (Fig. 1). There is also strong positive correlation coefficient between the quantitative analysis and the MLD is (r = 0.9544, 95% CI 0.9376-0.9667, p < 0.0001) (Fig. 2). There is also strong positive correlation between the %GGO and MLD (r = 0.9429, 95% CI 0.9222-0.9582, p < 0.0001) (Fig. 3).
Demonstrative data of the quantitative findings in three cases of the study are shown in Figs. 4, 5 and 6 including axial HRCT images, 3D volume-rendered images of both lungs, density histograms and calculated metrics.

Discussion
The key for containment of COVID-19 pandemic is early detection and early isolation [40]. CT plays an important role in COVID-19 diagnosis, monitoring, severity stratification, and evaluation of treatment response [6,41,42]. The overall CT picture of COVID-19 pneumonia is based on the severity of lung abnormalities and its distribution. The severity of COVID-19 pneumonia should be objectively stratified on the basis of quantitative data. The severity of lung affection is a critical metric in treatment and prognosis of COVID-19 patients [19,25]. Severe abnormalities in lung CT at an early stage are suggestive of poor prognosis [40].
Fast, accurate, and reproducible quantitative analytical tools are especially needed for assessment of COVID-19 pneumonia in CT images because, in addition to being multi-focal, lung lesions often show rapid progression and change of its pattern [33].
Semi-quantitative visual assessment of COVID-19 lesions is impractical in clinical routine because it is time-consuming, lacks reproducibility and suffers interobserver and even intra-observer variations. The objective assessment of the disease burden expressed as the percentage of the affected lung relative to the total lung volume is a sensitive and specific metric for estimation of disease progression and treatment response [15,34]. Computer-aided diagnosis (CAD) has become an important auxiliary diagnostic tool. The automated segmentation of lung lesions from volumetric 3D images allows calculation of the total burden of COVID-19 pneumonia as a percentage subvolume of the total lung volume. Quantitative methods for determination of the severity of lung affection in CT images of covid-19 patients could improve the diagnostic efficiency and mitigate the workload of radiologists, allowing more timely and appropriate treatment decisions for COVID-19 patients. Quantitative CT analysis represents a reproducible assessment that allows fast, reliable and potentially predictive tool for assessing disease progression and response to treatment in COVID-19 pneumonia [43].
Another advantage of computer-assisted quantification methods for determination of COVID-19 disease burden is the reproducibility of the technique, allowing for more accurate comparison of data among different centers. Different patterns of lung abnormalities in COVID-19 pneumonia change the lung attenuation values and affect its histogram. The histogram-based image analysis is the most basic and least computation demanding segmentation technique, implemented in most PACS workstations being simple, fast and reproducible [44].
Image thresholding is one of the most commonly used image segmentation technique that segment images depending upon the grayscale values within the image histogram. Multi-level thresholding CT densitometry techniques rely upon choosing multiple cutoff values in the analysis of the frequency distribution of the lung attenuation values in the histogram. The pixels within an image are divided into multiple classes according to multiple grayscale thresholds [45][46][47].
With the rapid increase of the infected population in the COVID-19 pandemic, semi-quantitative conventional CT scoring is challenging and impractical in the overloaded Radiology service.
In this study, the density-based multi-level thresholding technique was utilized for quantification of COVID-19 lung affection in high-resolution thin-cuts volumetric whole-lung CT images. The findings of this study show that the extent of COVID-19 lesions visually scored by radiologists from HRCT images does significantly correlate with the volumetric measurements obtained by the quantitative computer-assisted histogram-based automatic method.
The data obtained in this study show that both semiquantitative visual severity scoring and quantitative CT performed nearly equally, and their parameters correlated well with the clinical data of patients. Our findings indicate that the severity of COVID-19 pneumonia could be accurately stratified on the basis of objective CADbased quantitative data of disease burden in CT images. This is consistent with the results of Li et al. [48] who described a high consistency between COVID-19 severity scoring assessed by visual semi-quantitative CT analysis with the clinical classification of COVID-19 pneumonia. On the other hand, Colombi et al. [24] reported a good correlation between the well-aerated lung volume and the patient's clinical outcome in COVID-19 pneumonia.
Lanza et al. [49] used density-based quantitative lung analysis to predict clinical outcome in COVID-19 pneumonia regarding the need for respiratory support and the risk of in-hospital mortality. They reported that the compromised lung volume was the most accurate metric in this regard. Bressem et al. [50] used density-based classification to detect correlation between high-density lung volume (diseased lung) and the severity of COVID-19 pneumonia requiring intensive care unit (ICU) admission and assisted ventilation.
In a recent study, Salvatore et al. [51] used a computer-assisted density-based quantitative technique for automatic segmentation of CT images to calculate the volumes of GGOs, consolidation and the residual healthy lung in COVID-19 pneumonia. Their findings have shown that the results of these quantitative methods are good predictors of COVID-19 patient outcome.
In another recent study, Romanov et al. [52] used histogram-based analysis and HU thresholding to automatically extract CT imaging biomarkers in atypical pneumonia caused by COVID-19 and influenza viruses. The authors reported that the derived imaging biomarkers correlate with the clinical severity scale and the inflammatory laboratory markers.
Semi-quantitative scoring of the extent of COVID-19 pneumonia based upon visual assessment has been shown to correlate with the duration of infection [15,52] as well as with the disease severity [35,49]. In ARDS, a ratio less than 40% between well-aerated lung volume and the total lung capacity was reported to be associated with a higher mortality risk [43].
Colombi et al. [24] used lung attenuation thresholds to quantify well-aerated lung volume in admittance CT images and stated that it can be used to predict the risk of adverse outcome in COVID-19 patients. CT score of the diseased lung has been reported as a risk factor for mortality in ARDS [53].
Patients with COVID-19 lung affection frequently develop ARDS [54,55] which is the primary cause of death in COVID-19 pneumonia, especially in old aged patients with comorbidities [25].
In all cases of this study regardless its clinical or radiological severity, in presence of GGOs, the automatically extracted whole lung attenuation histograms show blunted peak that is shifted to the right compared with the normal lung, while, in presence of alveolar consolidation, the histograms show a high sharp peak that is shifted more to the right compared to ground-glass opacities and normal lung. These findings are consistent with those described by Sumikawa et al. [44].

Conclusions
The results of this study show that the automated histogram-based quantification of COVID-19 disease burden is a rapid, reliable and reproducible method for objective estimation of lung affection in CT images. There is good correlation between the conventional semi-quantitative CT severity scoring and automated quantitative analysis methods. The automated quantitative methods are especially important in situations like the current COVID-19 pandemic in which radiology departments are overloaded with cases.
Some limitations do exist in this study. First and foremost, the lack of longitudinal assessment of disease progression of patients enrolled in the study. The study also included a relatively small number of patients.