Discrimination between phyllodes tumor and fibro-adenoma: Does artificial intelligence-aided mammograms have an impact?

The indulgence of artificial intelligence (AI) has been considered recently in the work up for the detection and diagnosis of breast cancer through algorithms that could supply diagnosis as the radiologist do. The algorithm learns from a supervised and continuous input of large and new data sets unlike the standard programming, which requires clear step-by-step instructions. The aim of this study is to assess the ability of AI scanned mammograms to aid the ultrasound in the discrimination between phyllodes tumors and fibro-adenomas. This is a retrospective analysis included 374 proven phyllodes tumors (PT) and fibro-adenomas (FA). Digital mammogram and breast ultrasound was performed for all the cases and each breast was given a “Breast Imaging Reporting and Data System” (BI-RADS) score. Included mammograms were scanned by AI with resultant a qualitative heatmap and a quantitative abnormality scoring of suspicion percentage. The study included 164 PT (43.9%) and 210 FA (56.1%). BI-RADS category 2 was assigned in 40.1%, category 3 in 38.2%, category 4 in 18.5% and category 5 in 3.2% with median value of the AI abnormality scoring of 23%, 44%, 65% and 90% respectively. Sensitivity and specificity of the conventional imaging were 59.2% and 75.8% respectively. The AI abnormality scoring of 49.5% upgraded the sensitivity to 89.6% and specificity to 94.8% in the ability to discriminate PT from FA masses. Artificial intelligence-aided mammograms could be used as method of distinction between PT from FA detected on sono-mammogram. The color hue and the quantification of the abnormality scoring percentage could be used as a one setting method for specification and so guide clinicians in their decision of conservative management or the choice of the surgical procedure.


Background
It has been reported that phyllodes tumors are usually benign and consist of the stromal component and the epithelial component that may progress to malignancy [1].
There are histopathological similarities between fibroadenomas and benign and malignant phyllodes tumors, moreover there are no pathognomonic radiological finding for distinction as well [2].
Fibro-adenomas can be followed without the need for further work up or be subjected to simple removal, but phylloids tumors must be surgically removed with wide adequate margins to avoid local recurrence. The rate of local recurrence of the benign and borderline subtypes of the phyllodes tumors are 20 and more than 25%, respectively and approximately 25% of the malignant phyllodes tumors metastasize [3].
Recently, the advances in deep learning presented a valuable participation to upgrade the performance of digital mammogram for breast imaging [4].
The goal from introducing technology in medicine was to make algorithms able to make decisions like physicians [5].
The role of artificial intelligence (AI) in the work up for the detection of breast cancer is becoming clear [6].
Artificial intelligence is to make computers/computer programs smart enough to imitate the human mind behavior. In machine learning, the continuous exposure to new data improves and modulates the algorithm, there is no need for explicit programming [7].
The article discusses the feasibility to use artificial intelligence (AI) as a supportive tool that could be used to discriminate between phyllodes tumor or fibro-adenoma clearly detected at the conventional imaging (mammogram aided by ultrasound) and so could guide the physicians in their choice for the proper surgical procedure.

Patients
The current work evaluated a total of 374 masses of 350 females (single breast was involved in 326 patients and the remaining 24 patients presented with bilateral breast affection) over the period of August 2019 till July 2021. Included mammograms was supported by ultrasound assessment and presented masses proved to be phylloids tumors (PT) in164 and fibro-adenomas (FA) in 210 lesions. Masses were palpable in 332 (88.8%) and non-palpable in the remaining 42 masses. Mammograms included in the study presented one mass at a time.
The study is a retrospective analysis, ethics committee approved and received wavier of the informed consent from the included patients. The age of the included patients ranged between 30 and 54 years old with mean age 42.2 ± 6.6 SD.

Inclusion criteria
Solid (purely solid or complex) breast masses proved by a tissue core biopsy of 14 G needle and surgery to be either phylloids tumor or fibro-adenomas clearly detected by the combined evaluation of mammogram and ultrasound.
Patients with benign or probably benign looking masses underwent biopsy and surgery at the request of the patient, for cosmetic reasons, or because of a recommendation of the referring physician to guide the type of excision.

Exclusion criteria
Masses that were shown only on the ultrasound and not obvious on the mammograms.
Masses with no available abnormality scoring percentage on mammograms.
Diagnosed tumors without pathological documentation.
No clear definitions of benign and malignant phyllodes in the pathology report.

Methods
All patients were imaged by digital mammography machine (Amulet Innovality, Fujifilm Global Company, Japan) and the traditional cranio-caudal and medio-lateral oblique views were taken per breast.
Since some masses showed obscured borders or were not obviously seen on the mammogram, ultrasound examinations (the used machine was HS60 Samsung ultrasound, Korea, 2019) were performed to support (confirm or upgrade) the "Breast Imaging Reading and Data System" (BI-RADS) category for the detected masses.
Artificial intelligence software scanned the included mammograms (Lunit Insight, FDA approved, and version 2019) which were provided with an estiablashed algorithm to read the mammogram images.

Image analysis
Mammograms were interpreted by three radiologists in individual sessions that were blinded about the pathology of the included masses. Two primary evaluators (each had 20 years experience in breast imaging) assessed the mammograms and performed the complementary ultrasound. Afterwards, a third reader (35 years experience) re-interpreted the mammograms to settle the decision in case of debate or confirm diagnosis in case of matched opinions.
Morphology descriptors of the included masses as elicited by the conventional imaging (digital mammogram and complementary ultrasound) were size, margin (circumscribed, lobulated, indistinct), shape (rounded/oval, irregular), internal structure (purely solid, complex with few cystic spaces, calcifications).
Following the ACR BI-RADS ® Atlas, Breast Imaging Reporting and Data System [8], the included masses were assigned BI-RADS categories 2, 3, 4 or 5 that were given with regard the combined analysis of the digital mammogram and the hand-held ultrasound.
In view of Plaza et al. [9], breast masses that displayed one or more of these features: large size of > 4 cm, lobulated margin, irregular shape, heterogeneous texture, presence of cystic spaces and absence of microcalcifications were characteristics of PA rather than of FA. Mammograms were rescanned by the AI software where an algorithm was developed based on deep convolutional neural networks (CNNs). ResNet-34, one of the most popular CNN architectures, was used as a backbone network. The algorithm training consists of two stages: patch-level training from scratch for learning lowlevel features, followed by image-level fine tuning from the stage-1 model for learning high-level context (stage 2).
Batch-instance normalization was performed to overcome variance pixel-level characteristics and a deconvolution module was additionally adopted to decrease the rate of the false positives.
For an input mammogram image (i.e., one of the four views) the AI algorithm provided: (I) A qualitative parameter; pixel-level abnormality scores as a "heatmap" where a color hue overlaid the masses (i.e., abnormalities) detected on the mammogram images. The colors ranged between cold intensities as blue or green for lesions with low suspicion and high intensity colors as yellow, orange, and red for masses with high suspicion of activity. (II) Abnormalities detected by the AI software on the mammogram were assigned a numerical value which represents a degree of confidence for the suspicion. The abnormality scores are floatingpoint values between 0 and 1 which is estimated by an auto-applied percentages that ranged from 1 to 100% (the number 100% represented the upper limit value of suspicion).
The used AI software (Lunit INSIGHT MMG) provides four-view heatmaps and an abnormality score per breast (i.e., the maximum of the craniocaudal and mediolateral oblique abnormality scores) for each input mammogram. The performance of the AI algorithm was supervised and maintained by a specialist of AI informatics (10 years experience).

Statistical analysis
Data were coded and entered using the statistical package SPSS (Statistical Package for the Social Sciences) version 26 (IBM Corp., Armonk, NY, USA). Data was summarized using mean, standard deviation, minimum and maximum in quantitative data and using frequency (count) and relative frequency (percentage) for categorical data. Standard diagnostic indices including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and diagnostic efficacy were calculated.
The statistical dispersion was measured by the interquartile range to enhance the accuracy of the dataset statistics by dropping lower contribution, outlying points. For comparing categorical data, Chi square test was performed, and P value was obtained, where values less than 0.05 were considered as statistically significant. The interobserver variability was measured (to calculate measurement error intrinsic to between-observer difference) using kappa indices.
The cut-off value for the abnormality scoring percentage was detected with the aid of the "receiver operating characteristic" (ROC) curve. Confidence interval percentage (CI %) was done for the range of the abnormality scoring values elicited by the AI software where the narrower the interval (upper and lower values), the more precise is the AI estimate.
The mean size of the proven PT and FA masses at surgery was 3.55 cm (min. 0.60 cm, max. 10.00 cm) and was 2.94 cm (min. 0.70 cm, max. 11.3 cm) on the ultrasound.
Since the resultant AI image is an overlay scan of the mammogram, so actual estimation of the lesion size was not applicable by the AI system that is why the lesion size was only presented with regard the conventional imaging and was not considered in the correlation between the used breast imaging tools and the AI-scanned mammograms.
PT compromised 43.9%, while FA presented 56.1% of the studied masses. Included phylloids masses were found benign in 152 masses and the remaining twelve masses were malignant tumors (n = 12/374; 3.2%).
Suspicious morphology descriptors presented the near frequency in case of the PT and FA masses. These descriptors included: the irregular margin and the indistinct shape. Irregular margin was more common in the PA (34.7%) compared to the FA (29.5%), while indistinct shape was more frequently seen with the FA (45%) than the PT (42%).
The heterogeneous texture and the complex appearance compromised the internal features of 82.3% of the PT and only 42.8% of the FA masses. The various morphology descriptors of the PT and FA included in the study presented on the conventional imaging (mammogram and ultrasound) were displayed by Table 1.
Masses were diagnosed on imaging basis as likely benign in 150 masses (40.1%), probably benign in 143 masses (38.2%), probably malignant in 69 masses (18.5%) and likely malignant in 12 masses (3.2%). The median value for the AI abnormality scoring for each category was calculated. Table 2 displayed the correlation between the BI-RADS category of the included PT and FA masses as evaluated by the mammogram and ultrasound and the AI assigned percentage of the abnormality scoring.
Reliability of the radiographic diagnosis of the included masses was better between readers when AI informatics was added to the sonomammogram in the evaluation 0.58 (range 0.52-0.66) (kappa = 0.59, range between 0.52 and 0.66) than when readers assessed masses only by mammogram and ultrasound (kappa = 0.93, range between 0.89 and 0.97).
According to the morphologic descriptors noted at the combined mammogram and ultrasound, readers diagnosed 277 FA and 97 PT masses. Resultant sensitivity and specificity of the conventional imaging to discriminate PT from FA masses were 59.2% and 75.8% respectively.
To determine a cut-off value for the abnormality scoring percentage that can suggest PT versus FA breast masses, the point on ROC curve was used. AI abnormality scoring of 49.5% yielded a sensitivity of 89.6% and specificity of 94.8% in the discrimination of the PT from FA.
Since PT is the entity of the more concern and requires wide excision surgery unlike FA that could be followed up so it was considered as the positive cases. Table 3 shows discrimination ability of the AI abnormality scoring to define PT from FA in the studied masses.
In the current work, AI displayed 17 false negative and 11 false positive masses when correlated with the pathology outcome.

Discussion
Phylloids and fibro-adenomas are two benign breast pathologies which had a very close histology at the beginning, yet later their pattern of growth does divert. PT had more positive proliferative cellular nuclear antigen, Ki-67 and androgen receptors than found in FA. Moreover, PT   had perforated capsule with finger like projections that is why wide excision surgical procedure is required unlike FA where only simple excision may be needed [10]. Even with the aid of the ultrasound, unless there is a history of rapid increase in the size of the mass; their distinction from each other in many situations does present a dilemma.
In the current work, the performance of the artificial intelligence was studied with regard the diagnosis and discrimination between the PT and FA that presented as masses on mammograms. Ultrasound evaluation was included since it is the descriptive modality which in most situations provides the proper distinction between the variable breast pathologies and sometimes the margins of the masses may be obscured by the glandular tissue on the mammograms.
Previous studies that had focused on the mammograms and AI were concerned mainly with the performance of the AI as a stand-alone screening strategy or as a complementary reading tool to mammogram for the detection of breast cancer in the screening practice not as a tool of disease discrimination [11].
A recent study by Mansour et al. [12] studied the diagnostic performance of the AI-scanned mammograms in correlation with the traditional used conventional breast imaging modalities (the mammogram and the ultrasound) with regard the different breast entities. Another study assessed the impact of the ultrasound artificial intelligence on the differentiation between benign and malignant breast lesions of BI-RADS 4A [13].
Also, there was specific work that considered the potential role of artificial intelligence in the distinction between phyllodes and fibro-adenoma with regard the AI-aided ultrasound [14] or the whole-slide images in core biopsies [15].
The current work was also considered with such differentiation between phyllodes and fibro-adenoma, however it is a leading work to discuss the assessment using the AI-scanned mammograms. There is a left breast upper inner focal rounded dense mass (arrow), that showed associate secondary signs suspicious of malignancy of increased density and coarsening of the surrounding parenchyma (suggestive peri-tumroal edema) and focal areolar dermal thickening. B Tomosynthesis slices of the left breast (MLO view) showed that the mass had a circumscribed margin (circle), yet there is related thickened Cooper's ligament (arrow). C Ultrasound images displayed large about 4 cm solid mass with irregular margin and the ipsilateral axilla displayed an average sized likely reactionary lymph node (white circle). D AI scanned mammogram image marked the left breast mass in a heatmap by a faint blue/green colour hue and displayed abnormality scoring of 56% suspicion of malignancy. Such mass was diagnosed BI-RADS 4c on conventional breast imaging and malignancy was highly suspected. The mass proved to be benign phylloids by tissue core biopsy and the suspicious secondary signs of: focal tissue edema, thickened Cooper's ligament and dermal thickening were likely due to inflammatory changes from the rapid rate of growth. AI abnormality scoring of more than 49.5% (i.e. 56%) suggested phylloids rather than fibro-adenoma and benign phylloids not malignant one since the value was less than 90% Re-scanning of the mammograms was done through AI in the form of focal color to target the breast mass (already proved as phylloids or fibro-adenoma) on the mammogram that was supported with an auto-applied abnormality scoring percentage of these masses. Then, a correlation between this numerical estimate (i.e., the abnormality scoring percentage) and the pathology results was performed.
Duman et al. [3], suggested that FA could be differentiated from PT by the shape of the tumor, while other groups found no significant difference in the shape between both tumors which tend to be oval or irregular in shape; phyllodes tumors may grow more rapidly than fibroadenomas on follow-up ultrasonography, but they cannot be reliably differentiated by imaging [14,16,17].
In this work, PT presented mainly with the rounded / oval shape pattern (61% versus 55% for FA), while the irregular shape went more with the FA (45% versus 39% for PT).
Lee et al. [18] reviewed in accordance with most findings in the literature; that PT presented predominantly with circumscribed margins. This went in concordance with the present work, as circumscribed margins were more common in PT (75.6% versus 61.9% for the FA). However, they disagreed with Duman et al. [3], who stated that circumscribed margins were significantly more common in FA than in PT.
In the study by Wiratkapun et al. [19], 85% of the included PT was complex/heterogeneous masses, as reported in other studies where FA was commonly presented by homogenous texture [3,19]. The current study also showed that FA was mainly homogeneous (57.2% versus 17.7% for PA). Many benign breast diseases show irregular hypoechoic masses that can mimic carcinoma on ultrasound [20].
Even when masses displayed irregular shape; low abnormality scoring percentage-elicited at the AI scanned mammograms-favor benign nature of the mass and so follow up could be recommend to the patient rather than biopsy or unnecessary surgical removal, Fig. 3.
PT commonly is presented as a rapidly growing mass that could be associated with significant painful erythema and warmth of the overlying breast skin [21]. This is a very misleading feature that can delay the diagnosis and sometimes may suggest malignant pathology. However, the diagnosis of PT rather than FA was easily applicable in the current work when the cut off value of the abnormality scoring was more than 49.5%, Figs. 1 and 4.
Such AI related-probability had a positive impact on upgrading the sensitivity and the specificity of the conventional breast imaging from 75.8% and 59.2% to 89.6% and 94.8% respectively. Large tumor size at presentation or rapid growth raises the suspicion for a phylloids tumor rather than a fibroadenoma [22]. Fibro-adenoma could show large size as well, which is usually encountered in pregnant or lactating women as their growth is associated with increase in the estrogen, progesterone, and prolactin hormones and at this situation it is termed giant fibro-adenomas. Giant fibro-adenomas could overlap with borderline or malignant phyllodes [23].
In the current study, the accurate diagnosis of large masses more than 5 cm was helped using the abnormality scoring of the AI algorithm into PT or FA, Figs. 1 and 4.
Benign, borderline, and malignant PT of the breast have similar imaging features; some MRI findings can be used to determine the risk of malignancy which include non-circumscribed margins, peri-tumor edema, and low signal intensity on T2-weighted images. On basis of the conventional breast imaging, this task is a challenge [24].
A study performed in 2012, by Dheeba and Selvi [25] showed one of the highest sensitivity (96.9%) and specificity (92.9%) for the proposed AI algorithm in the detection of the cancer in the mammograms. In 2021; the study by Mansour and co-authors [12] showed a sensitivity of 96.8% and a specificity of 90.1% in the discrimination between benign and malignant breast lesions.
Distinction of malignant PA masses that were included in this study was applicable in 100% of these masses (n = 12/12) by the aid of the AI-mammogram combination. This was the condition with masses that presented on mammogram with high density, haziness of the tissue surrounding the tumor due to edema, superimposition of a predominantly intense red color hue on the AI images and a correlating high abnormality scoring suspicion of malignancy that was more than 90%, Fig. 5.
According to Stavros et al. [26], the AI-feature analytic algorithms may support subdividing the BI-RADS category 4, thus can prompt the use of BI-RADS-based structured reporting and encourage the reconsider of tissue sampling for these lesions.
This was the condition with some masses who presented by suspicious features on the primary evaluation by the mammogram and ultrasound, yet when these mammograms were scanned by AI, these masses were not overlaid by color hue and eventually the breast was assigned a low scoring percentage of less than 10% (i.e. lesions of low significance) Figs. 1, 3 and 4. In these cases, Fig. 4 Female patient 34 years old with right breast giant fibro-adenoma. A Digital mammogram of the right breast that showed lower inner and retroareolar large circumscribed dense mass compressing the overlying breast tissue (arrow), phylloids tumor was the more suitable diagnosis and it was given a BI-RADS 4 category, due to the large size of the mass and borderline/malignant phylloids could not be excluded. B AI scanned mammogram showed very faint turquoise color hue overlaying the mass and a low abnormality scoring percentage of 31%. The case was true negative for being FA on AI basis and not PT whether benign, borderline or malignant subtypes if the AI pattern of interpretation were considered in the clinical setting, then biopsy could have been dismissed (especially in case of the small sized masses). Biopsy is warranted if masses presented large size. The interventional procedure in such condition is needed to confirm the absence of malignant potential. The indulgence of the AI findings in the decision of the diagnosis would save the patient from unneeded panic and/or anxiety in large sized benign looking masses. The heatmap (i.e., color hue) elicited by AI on the scanned mammogram images is used to spot abnormal breast lesions, so it could guide future follow ups or localize lesions that warrant biopsy from those that require further imaging settings. Biopsy is to be considered in case of serial increase in the abnormality scoring percentage of masses under follow up [12].
The current experience showed that masses with low scoring percentage on AI scanned mammograms can then be subjected to interval supervision. Even in case of stationary morphologic features, AI could be considered as a parameter of follow up; in the by monitoring the changes in the intensity of the color hue (i.e., changes from cold light blue or green colors to intense hot colors as yellow, orange, or red) on the mammogram and /or in the value of the abnormality scoring percentage (i.e., increase in the value of the abnormality scoring percentage).
However, precise reports and proper recommendation of the management require continuous supervision of the AI performance by the radiologist [27].
In the current work, AI presented 17 false negative cases that were misdiagnosed as FA instead of PT. That . B Ultrasound image that showed smooth outline of the mass and mainly a homogeneous solid internal structure, conventional imaging BI-RADS category was 3 and breast phylloids versus complex adenoma was the suggestion. C AI scanned mammograms showed intense mainly red colour hue applied to the mass and a very high abnormality scoring 100% confidence of malignancy matching with the pathology outcome  6 Female patient 40 years old with right breast benign phylloids tumor. A Digital mammogram (cranio caudal and medio-lateral oblique views) that showed a right breast lower inner quadrant rounded shaped dense mass (arrow). B AI scanned mammograms showed no colour hue applied to the mass and a low abnormality scoring (i.e., < 10%). The mass was benign looking given BI-RADS 2 category and at this stage, FA was the expected diagnosis. C A previous 12 months earlier comparable digital mammogram (that was available on the system) showed a smaller size of the right breast mass. This time the mass displayed faint blue color hue and a low abnormality scoring of 25% on the AI scanned mammogram seen at (D). The mass in (C) and (D) was classical of being FA and was given BI-RADS 2 category and was subjected to interval follow up instead of biopsy. The increase in the size of the mass was the reason for the application of intervention. The AI missed the clinical history of the patient, which affected the proper evaluation of the AI algorithm and subjected the patient for a while to improper management assumption was based on the low abnormality scoring percentage that sometimes presented a value of less than 10%, Fig. 6. The example case: Fig. 6, presented with right breast mass that rapidly increased in size and this was the point that supported the recommendation of the biopsy; otherwise, the mass showed the classic morphologic features of benignity on the baseline as well as the follow up mammograms. Also, the AI scanned images showed a low abnormality scoring that didn't exceed 25%. That's why, it is important to keep direct communication with the patient to be accounted with the clinical history and the circumstances of the breast disease. In breast densities with ACR a and b; masses are obvious with respect to the breast glandular tissue and so mammograms scanned with AI that showed no color hue demarcation and was given a low abnormality scoring of suspicion (i.e., less than 10%) could be considered as a satisfactory modality of scanning. Yet, in case of breast densities assigned ACR c or d patterns; further scanning with ultrasound is required so that not to miss masses overlapped by the dense glandular tissue of the breast and be sure about the BI-RADS category whether normal (category 1) or benign (category 2) Fig. 7.
The current experience showed significant correlation between the BI-RADS category that was assigned by the mammogram and ultrasound combination and the abnormality scoring elicited by the AI scanning with regards the categories; "likely benign, BI-RADS 2" (P value < 0.001), "probably malignant, BI-RADS 4" (P value 0.020) and "likely malignant, BI-RADS 5" (P value < 0.001).
There were limitations to the current work: (1) its retrospective nature limited the ability to determine the Doppler ultrasound details of all the included masses. Resistive index value has not been reported previously and moreover it was not possible to determine whether the distribution vascularity was central or peripheral. Female patient 38 years old with right breast peri-canalcuilar fibro-adenoma. A Digital mammogram (cranio-caudal view), that displayed dense breasts (ACR c) and no obvious masses could be detected. B Breast ultrasound showed, right breast upper outer quadrant; irregular heterogeneous purely solid mass in plane with the breast tissue. So mammogram was assigned BI-RADS zero and ultrasound was given BI-RADS 4, and biopsy was requested. C AI scanned mammogram image showed no localized abnormality by the colour hue and the abnormality scoring for both breasts was less than 10%. The low scoring by the AI supported the benign suggestion of the right breast mass yet, the absence of color hue by the AI and the high breast density (the mass couldn't be visualized by the reader on the mammogram) would have given the breast a wrong BI-RADS score of 1 and consequently an improper management and recommendation