Low-dose CT radiomics features-based neural networks predict lymphoma types

Fluorodeoxyglucose positron emission tomography (PET)–computed tomography (CT) is preferred for pretreatment staging and treatment planning in patients with lymphoma. This study aims to train and validate the neural networks (NN) for predicting lymphoma types using low-dose CT radiomics. Few radiomics features were stable in intraclass correlation coefficient and coefficient of variation analysis (n = 119). High collinear ones with variance inflation factor were eliminated (n = 56). Twenty-four features were selected with the least absolute shrinkage and selection operator regression for network training. NN had 75.76% predictive accuracy in the validation set and has 0.73 (95% CI 0.55–0.91) area under the curve (AUC) to differentiate Hodgkin lymphoma from non-Hodgkin lymphoma. NN which was used to differentiate B-cell lymphoma from T-cell lymphoma had 78.79% predictive accuracy and has 0.81 (95% CI 0.63–0.99) AUC. In this study, in which we used low-dose CT images of PET–CT scans, predictions of the neural network were near acceptable lower bound for Hodgkin and non-Hodgkin lymphoma discrimination, and B-cell and T-cell lymphoma differentiation.


Background
Lymphomas are a heterogeneous group of malignancies affecting the lymphoid system [1].Imaging findings of the lymphoma may vary due to their different behaviors and relatively low incidence depending on the organ polymorphism involved in their expressions.The final diagnosis is made through pathology.Tissue evaluation differentiates between Hodgkin lymphoma (HL) and non-Hodgkin lymphoma (NHL), followed by B-cell lymphoma (BCL) and T-cell lymphoma differentiation (TCL).Afterward, different types of characteristic translocations are investigated on the basis of the cytogenetic and molecular evaluation results [2].The F-18 fluorodeoxyglucose (FDG) positron emission tomography-computed tomography (PET-CT) is preferred for pretreatment staging and treatment planning in patients with lymphoma.
Radiomics is a developing tool that makes radiological images multidimensional and suitable for data screening, thus being more useful and helpful for diagnosis purposes [3].It reveals hidden quantitative information using computer-assisted advanced techniques so that there will be more information in the obtained radiological images than the naked eye can distinguish, including the detailed quantitative tissue feature parameters [4].Texture analysis is a statistical method used in the quantitative evaluation of tissue images [5].The radiomics algorithm consists of steps such as obtaining consecutive images, determining the area of interest, preprocessing operations, and extracting and evaluating parameters [6].
This study aims to train and validate the neural networks (NN) for predicting lymphoma types using lowdose CT radiomics obtained from FDG PET-CT.

Ethical considerations
A retrospective model-development study was done after the university's local ethics committee approval, and written informed consent was waived by local ethics committee.This study had been performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments.The STARD statement was followed for reporting [7].

Patients selection
Among the 330 patients whose lymphoma diagnosis was confirmed by biopsy between January 2014 and June 2020, 110 patients who underwent FDG PET-CT scan were included.
The low-dose CT data from the FDG PET-CT imaging of the cases were evaluated with radiomics by two independent observers to differentiate between HL-NHL and BCL-TCL.The "ground truth" was the pathology results of the cases.
Vertex-upper thigh imaging was performed with a 60 mAs CT parameter after at least 8 h of fasting using FDG radiopharmaceutical with the PHILIPS Gemini TF 64 Slice PET-CT device.

Inclusion criteria
Patients diagnosed with lymphoma in our institution within the period from January 2014 to June 2020 and who underwent PET-CT examination before treatment were included in the study, without gender and age discrimination.

Exclusion criteria
Patients with FDG PET-CT screening at another center, patients diagnosed in another center, patients with inaccessible imaging data and insufficient imaging quality, and patients with cutaneous lymphoma or without significant lymph node involvement to be segmented were excluded from the study.The exclusion criteria are presented in Fig. 1.

Segmentation of the target lymph nodes and extracting radiomics features
The lymph node with the maximum standard uptake volume (SUVmax) value was determined as the target lymph node to be segmented in each axial slice of the low-dose CT images.Target lymph nodes are involved cervical, axillary, paraaortic, mesenteric, pelvic, and inguinal nodes.Resampling (1.0 × 1.0 × 1.0 mm) and normalization (μ ± 3σ) were performed as described in the literature [8].Two independent observers segmented the target lymph node from the axial slices using the "Segment Editor" module in the 3D Slicer software (open-source imaging software for Mac OS X).Afterward, Radiomics, the 3D Slicer module of the "PyRadiomics" platform, was used to obtain radiomics data.All classes of the features available in this module were included.Wavelet-based filters were utilized so that parameters other than shape could be studied from both the original image and eight filters (Original, HLL, LHL, LHH, LLH, HLH, HHH, HHL, and LLL).A total of 851 parameters from each segmentation were recorded by two observers.
The pathology data were binary-coded for Hodgkin's lymphoma (positive/negative) and B-cell lymphoma (positive/negative).

Statistical evaluation
The statistical analysis in this study was performed using Statistica (Version 13.5.0,Palo Alto, Cal: TIBCO Software), and 3D Slicer (open-source imaging software version 4.10.2 for Mac OS X) package software.

Inter-observer segmentation overlap evaluation
The Dice coefficient was used for segmentation overlap analysis.The Dice coefficient was evaluated as "weak" if it was below 0.50 in this index, "moderate" if it was 0.50-0.74,"good" if it was 0.75-0.89,and "excellent" if it was ≥ 0.90 [9].

Radiomics feature stability evaluation
Stabile features are repeatable and have low variance.This study tested repeatability at the segmentation level with the Dice coefficient.Radiomics features and interobserver reliability were evaluated with ICC analysis [10].The reliability was considered "good" if the ICC value was > 0.75-0.89,and "excellent" if the ICC value was > 0.89.Features which have "excellent" and "good" reliability were included in the study."Moderate" and "weak" reliable radiomics features (ICC < 0.75) were eliminated from the study [10].Radiomics features with a coefficient of variance (CoV) > 0.15 were eliminated from this study, as low variance radiomics features are more reproducible [11].

Radiomics feature selection process
The radiomics features that were found to be highly correlated in the variance inflation factor analyses (VIF) were eliminated due to collinearity [12].The selected features were recorded for next step of feature selection process.
The least absolute shrinkage and selection operator (LASSO) method was proposed by Tibshirani et al. [13].The most related features were selected for the model with LASSO.In this study, the data were analyzed separately for HL versus NHL, BCL versus TCL with the LASSO Regression plugin.Fivefold cross-validation was used in the analysis for L1 normalization.

Artificial neural networks structure
The radiomics features selected by the LASSO regression were used for structuring neural networks for binary classification tasks (one-vs-all fashion).The data were randomly sampled into three groups ("train, " 60%, "test, " 10%, and "validation" 30%) for each analysis by the software.The test set was used for hyper-parameter tuning with an early stopping algorithm, and the validation set was used as a holdout set.The multilayer perceptron (MLP) and the radial base function (RBF) type networks were trained.These networks had two hidden layers and two bias neurons.The pipeline of the study is shown in Fig. 2. Graphs of coefficients against log lambda are provided for feature selection with LASSO regression (Additional files 1 and 2).

Demographic results
Of all the cases in this study, 13 (11.8%)were children (< 18 years), 69 (62.7%) were adults (18-65 years), and 28 (25.5%) were elderly (> 65 years).The overall mean age was 50.75 ± 20.86 years; the mean age for the children was 11.92 ± 2.81 years, that for the adults was 48.43 ± 12.51 years, and that for the elderly was 74.50 ± 5.87 years.As for the gender, 42 (38.2%)were female and 68 (61.8%) were male.The mean age of the females was 48.33 ± 21.30 years and that of the males was 52.25 ± 20.60 years.The detailed demographic data are presented in Table 1.

Patients' pathology results
Seventy-one patients (64.6%) were diagnosed with NHL and 39 (35.4%) with HL.Of the patients diagnosed with Fig. 1 Flowchart of the study.Patients who were eliminated from the study were presented with their reasons NHL, 60 (54.6%) were diagnosed with BCL and 11 (10%) with TCL.Eighteen cases (16.4%) of HL were nodular sclerosis, 14 (12.7%) were mixed cellular, 2 (1.8%) were lymphocyte-depleted, and only 1 (0.9%) was lymphocyterich.The remaining four patients were not specified in the pathology report (Table 2).
The inter-observer reliability of radiomics features of the 14 "shape" features ranged from 0.45 to 0.95.Only 33 (33.5%) of the original images had "good" and "excellent" inter-observer reliability whereas the number of radiomics features obtained from the wavelet filters that had "good" and "excellent" inter-observer reliability and was highest in the HLH filter (94%).Out of a total of 851 radiomics features, 250 (29.38%) of them had an ICC < 0.75.This result showed that most radiomics features were reproducible among observers.However, many of the remaining 601 features showed high variance in the high CoV analysis and were eliminated (n = 482, 56.64%).

Radiomics feature selection process results
After ICC and CoV analysis, 119 radiomics features remained stable.Fifty-six features were eliminated due to high collinearity.Finally, among the remaining radiomics features, the most related features to outcomes were selected for two neural network types with LASSO regression (Table 3).

Hodgkin/non-Hodgkin lymphoma type prediction results
Neural networks which were used to differentiate Hodgkin lymphoma from non-Hodgkin lymphoma trained with 22 selected radiomics features and age-gender information.The neural network used the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm and reached the most accurate results in the seventh cycle.In this network, the error function was "SOS, " hidden activation was made with "Logistic" and output activation was made with "Identity" algorithms.Training set accuracy was 74.24%, and test set accuracy was 90.91%.The selected neural network had 75.76% predictive accuracy in the validation set and has 0.73 (95% CI 0.55-0.91)AUC.

B-cell/T-cell lymphoma type prediction results
Neural networks which were used to differentiate B-cell lymphoma from T-cell lymphoma trained with 22 selected radiomics features and age-gender information.The neural network used the BFGS algorithm and reached the most accurate results in the fifth cycle.In this network, the error function was "Entropy, " hidden activation was made with "Logistic" and output activation was made with "Softmax" algorithms.Training set accuracy was 71.21%, and test set accuracy was 100%.The selected neural network had 78.79% predictive accuracy and had 0.81 (95% CI 0.63-0.99)AUC in the validation set.

Discussion
In this study, we investigated whether HL-NHL differentiation and BCL-TCL differentiation with the radiomics analysis of the low-dose CT images from FDG PET-CT.According to our findings, artificial neural networks could be applied to estimate lymphoma type prediction.We do not expect these and similar models to replace biopsy any time soon.Instead, the expectation from these models may be to triage patients in the increasing workload and undertake a supportive second opinion for differential diagnosis.
A prior study investigated the role of mean density values in PET-CT patients with lymphoma [14].All the cases had a histopathological diagnosis and were classified as HL or NHL.The density of the malignant lesions was shown to be statistically significantly higher than that of the benign lesions.In addition, 20 HU was determined to be the cut-off for the differentiation of malignant and benign lesions [14].This study showed that mean density values of segmentations are not accurate, reliable, and reproducible enough.Similarly, their approach, we utilized CT-based radiomics features.However, radiomics enables sophisticated image analysis which provides additional diagnostic and prognostic power [15].
Parvez et al. [14] investigated the effects of metabolic tumor parameters and radiomics features on treatment response and survival in aggressive BCL in FDG PET-CT [16].PET-CT images taken for BCL staging were included in their study, and the effects of the whole-body metabolic tumor volume and radiomics features on disease-free survival and total survival were investigated.A correlation was found between the whole-body tumor volume and the treatment response.The texture features were found to be insufficient for predicting the treatment response, but it was found that they could be successfully used in predicting the presence of residual mass and survival [16].Therapy responses and lymphoma outcomes could be evaluated using PET/CT modalities [17].Milgrom et al. investigated the possibility of using PET radiomics in predicting refractory mediastinal HL [18].The PET images of patients diagnosed with stage 1 and 2 HL were evaluated in their study.Variables such as metabolic tumor volume and total lesion glycolysis were also evaluated with a model created from machine learning.The AUC was able to predict refractory disease with 95.2% rate by evaluating the five factors with the highest predictive power of the created model.This model can make it possible to come up with a personalized treatment plan by stratifying early stage HLs.PET radiomics were performed in this study, and the evaluation included parameters related to FDG uptake [18].Unlike the aforementioned studies, our study aimed to reach a pathological diagnosis through CT radiomics.In conclusion, it was shown in this study that artificial neural networks could be successful only when they use radiomics features, and that they could enable good predictions by adding age information.It was also shown that radiomics features-based models could be successfully used in BCL-TCL differentiation.
In 2020, Ou et al. investigated the possibility of differentiating the SUV and radiomics properties of breast cancer-breast lymphoma in the PET-CT images of patients [18].The clinical information obtained showed that using the radiomics features of SUV in PET images could differentiate breast cancer from lymphoma, with the AUC values being 0.806 and 0.891, respectively.Limited number of researches have been done about this topic, and the published radiomics studies have focused on differentiating lymphoma from other pathologies including breast cancer and glioblastoma [17,19].Mayerhoefer et al. reported that central nervous system lymphoma can be differentiated from other tumors, such as glioblastoma, through functional imaging using radiomics features in the evaluation of lymphoma, and that prognostic predictions can be made [20].They reported that standardization is needed in image reconstruction, post-processing, and segmentation so that the studies conducted on this subject could be compared with each other, and that this subject is open to study.Our study showed that the PET-CT images of lymphomas, which are very heterogeneous diseases when the radiomics features are considered, were successfully used in differentiating HL-NHL and BCL-TCL through the CT radiomics features.The correlation of the radiomics features with histological subgroups or molecular level markers can be investigated in larger patient groups as there have been limited studies on this subject.
One recent study on the use of radiomics for diagnostic purposes was the meta-analysis conducted by Wang et al. in 2020 [17].The data obtained in such a study showed that radiomics could be a diagnostic and prognostic indicator for lymphoma.In addition, the importance of optimal studies where lesion selection, segmentation, the effect of pathological patterns and the like are better evaluated is emphasized [17].

Study limitations
The main limitation of our study was that HL and NHL and their subtypes do not show equal distributions.PET data were not evaluated through radiomics, and the relatively small number of patients (as the data were divided into three sets: training 60%, test 10%, and validation 30%) were also limitations of our study.Increasing sample size may yield more generalizable results.Also, we didn't categorize the patients according to their weight which may affect the outcomes of the study by altering the received dose.

Conclusions
Histopathological examination is still considered the most valuable diagnosis method of lymphoma.HL-NHL and BCL-TCL differentiation showed acceptable performance with low-dose radiomics features on FDG PET-CT imaging.In near future, these tools will not replace biopsy; however, they may be used for determining a patient's priority for diagnosis, reporting, and treatment.

Fig. 2
Fig.2The pipeline of the study.The one with the highest SUVmax value was selected among the lymph nodes in the whole body.Two observers segmented the lesions.The inter-observer agreement of the segments was evaluated with the Dice coefficient (Mean Dice: 0.807).Inter-observer agreement was evaluated with the ICC coefficient, and features < 0.75 were eliminated.Then, LASSO regression was performed for final selection of features.Neural networks were trained and validated