Performance of AI-aided mammography in breast cancer diagnosis: Does breast density matter?

One of the top four malignancies affecting women worldwide is breast cancer. Breast density is a risk factor for breast cancer on its own and also a limiting factor for the sensitivity of screening mammography. Tools of artificial intelligence (AI) can help radiologists to make decisions, potentially reducing perceptual and interpretation errors, or as a way to prioritize exams based on the likelihood of malignancy. The purpose of this study was to assess the impact of breast density on the performance of AI in mammography (MG) for the diagnosis of breast malignancy. In total, 110 patients with pathologically proven breast cancer participated in this retrospective study. These patients had full field digital mammography, and the mammogram pictures were exported to the AI software system. Heat maps displaying the location of discovered lesions then highlighted the affected area or areas and also provided abnormality scores indicating the probability of malignancy (POM). The results of the histopathological analysis were correlated with the breast density and AI category. The artificial intelligence software gave a breast density score to each patient as well as POM scoring. Both the software and the radiologist agreed on the breast density in 80.00% (N = 88) of the patients. Upon correlation of AI results to the BI-RADS given by radiologist, demonstrated statistically very significant correlation (P value 0.001), indicating that the likelihood of error is less than one in a thousand. Upon correlating the pathology results with the AI abnormality score, the AI showed sensitivity of 93.64% as it detected 103 true positive lesions. AI showed 100% sensitivity in both ACR A and ACR B, and 94.74%, 76.47% in ACR C, ACR D, respectively. False negative results represented 5.26% in ACR C group and the highest with 23.53% in ACR D group of patients. The P value was found less than 0.001. Pearson correlation coefficient was calculated (R = 0.27) which was interpreted as a weak correlation between the decrease in sensitivity of AI and the breast density. Our study showed that there is a slight link between increasing breast density and a relative decline in AI's ability to detect malignant lesions, suggesting that AI can detect breast cancer effectively in breasts of different parenchymal densities, with its effectiveness being highest in breasts with lower parenchymal density.


Background
The primary cause of cancer death for women in poor nations is breast cancer, while it is the second most common cancer among women in industrialized nations [1].
Breast density, or the amount of fibroglandular tissue within the breast, has been demonstrated in studies to be both a risk factor for breast cancer and a limiting factor for the sensitivity of screening mammography [2].
Women with high mammographic breast density are nearly four times more likely than women with fatty breasts to be diagnosed with breast cancer [3].
The risk of interval cancer was 17 times higher in women with the densest breast tissue, defined as "extremely dense" (> 75% density), than in those with fatty breasts [4].
Mammographically dense breasts have been linked to reducing mammography sensitivity.One of the key factors contributing to false negative mammography results, a threefold increase in recall rate, and false positive mammography results that lessen the test's specificity is breast density [5].
Full-field digital mammography (FFDM) images can be used to measure breast density, and it is typically evaluated in clinics visually by classifying patients into one of the four categories outlined by the American College of Radiology BI-RADS [6].However, BI-RADS density assessment lacks a quantitative, continuous measure of breast density, which would enable more accurate risk categorization and measurement of changes in breast density.It is also highly subjective [7].
Qualitative assessment of breast density according to the 5th edition of American College of Radiology (ACR), Breast Imaging Reporting and Data Systems (BI-RADS lexicon) is the most commonly used tool in clinical practice for assessing mammographic density on a mammogram [5].
Breast cancer risk is increased in females with dense breasts, and mammographic sensitivity for the disease decreases dramatically with increasing breast density [8].
Interpretation of mammograms is challenging, especially in young women with dense breasts [9].
During the past few years, sophisticated AI products have been created and dominated the field of digital mammography breast cancer detection, and in retrospective data sets, research comparing their outcome to that of skilled breast radiologists demonstrate that these algorithms have performance levels comparable to those of humans [10].
AI-based computer added detector (CAD) helps to detect area of concern in the mammograms after screening triage is done.Synergistic combination of human review and AI-based CAD would simultaneously lower recall rates and raise cancer detection rates.Through computer assisted diagnosis, the identified lesion is characterized, and the likelihood of a biopsy is stratified [11].
Our study aimed to assess the effect of breast density on performance of AI-aided mammography for the detection of breast malignancy.

Methods
This study was a retrospective study and was conducted at our department during the period from first of March 2022 to the end of December 2022.
In all, 110 patients were enrolled in this trial.From 29 to 83 years old (mean age 47.89 ± 10.99).
Inclusion criteria: Female participants who did mammography with a breast lesion classified as BI-RADS 4 and BI-RADS 5 and pathologically proven malignant were included.
Exclusion criteria: Patients with normal mammography and ultrasound (US), patients with BIRADS2 and BIRADS3 breast lesions, patients contraindicated for mammography (MMG), e.g., pregnancy.
All of the patients who were participating underwent breast US, FFDM, and their mammography pictures were supplied to the AI software system.
True cut tissue core biopsy using a 14 G needle was the standard of reference for confirmation of the final diagnosis for all suspicious/malignant abnormalities.
Full-field digital mammography machine (Amulet Innovality, Fujifilm Global Company, Japan).Four standard mammogram planes, cranio-caudal (CC) and mediolateral oblique (MLO) views of each breast were done for all participants.
Ultrasound device (LOGIQ S8-GE) using a high-frequency linear probe (7-12 MHz) for breast scanning.Two experienced radiologists conducted all of the realtime scanning to achieve a double-blind analysis and double-checks the results.
Patients with suspected breast lesions were scheduled for ultrasound-guided biopsies.
Assessment of breast density following the ACR Mammography "Breast Imaging Reporting and Data System" BI-RADS atlas 2013 was done.The ultrasound and mammographic BI-RADS category was determined for each breast according to the BI-RADS atlas 2013, guided by the results of clinical data, ultrasound and mammographic findings but unaware to final pathologic outcome.
AI software algorithm for scanning and reading mammograms is available (Lunit INSIGHT MMG ver.1.10.2,Seoul, South Korea, FDA approved, version 2019).The 4 standard CC and MLO plans of each breast were processed and scanned by the AI software that generated heat maps highlighting the suspicious area/s and also provided abnormality scores reflecting the probability of malignancy (PoM) score detected for each lesion ranging from 1 to 100% (In terms of suspicion, 1% is the lowest level and 100% the greatest.).Each breast's AI category was established using the probability of malignancy score.According to a study by Mansour et al., 97% of suspicious and malignant-looking lesions that were identified by readers as falling into the BI-RADS 4 and 5 category and were later determined to be cancer (n = 623/642) had abnormality scores at the AI ranging from 59 to 100%, we used 59% as the cutoff value for malignancy in our study [12].
Correlation between the breast density and AI rating for each breast and histopathological outcomes was done.
The statistical program SPSS (Statistical Package for the Social Sciences), version 26 (IBM Corp., Armonk, NY, USA), was used to code and enter the data.Quantitative data were illustrated using the mean, standard deviation, minimum and maximum, while categorical data were illustrated using frequency (count) and relative frequency (%).Sensitivity, one of the common diagnostic indices, was derived as explained by Galen in 1980.An analysis using the Chi square (χ2) test was performed to assess categorical data.When the expected frequency is less than 5, the exact test was utilized instead (Chan, 2003).Statistics were significant with P values under 0.05.Two variables were correlated using the Pearson correlation coefficient.

Results
This study involved 110 female participants during the months from March to November in the year 2022.The unit performed FFDM and US on these individuals as diagnostic or screening procedures.The included patients' age range was between 29 to 83 years with a mean age of 45.5 ± 12.1 (Standard deviation).
Among the total cases, Mammography ACR C was the commonest among the included patients followed by ACR B and to lesser extent ACR D and A (Table 1).
A comparison between the breast density assessment done by the radiologist and the density given by the AI software was done showing that both the software and the radiologist agreed on the density in 80.00% (N = 88) of the patients, and they both did not give the same estimation in 20.00% (N = 20) of the patients.
Among ACR A group, both AI and radiologists agreed on 71.43% of the cases, while among ACR B, they agreed on 70.00% of the cases.They showed the highest percentage of matching opinions 91.67% among the ACR D group of patients and 86.27% among ACR C group of patients was noted (Table 3).
The most common suspicious finding in mammography was mass 59.20% (N = 74) of the cases, suspicious calcifications in 24.80% (N = 31), asymmetry and distortion in 16.00% (N = 20), and none in 12.00% (N = 15) of the included patients.Patients with no suspicious lesions in mammography were included after detecting suspicious lesion by ultrasound which was biopsied.
The BI-RADS score was given to each breast according to ACR BI-RADS lexicon.All cases had a high BIRADs score as 39 breast lesions (35.45%) had a score of 5 and 71 breast lesions (64.55%) had a score of 4.
Abnormality score was given to each patient by the AI software showing 103 (93.63%) with score suggesting high probability of malignancy (score more than 59%) and 7 (6.36%) with low malignancy score.
When correlating mammography results to the biopsy results, we found that false negative results were 13.64% while true positive cases were 86.36%.
when correlating the pathology results with the AI abnormality score, the AI showed sensitivity of 93.64% as it detected 103 true positive lesions, while it had 6.36% (n = 7) as false negative results of the lesions.
Upon correlation of AI results to the BI-RADS given upon the mammographic and US findings,  showed statistically highly significant correlation (P value < 0.001), which means that there is less than one in a thousand chance of being wrong.
Given the percentage of each subtype of the malignant breast lesions detected in the included group of patients, the sensitivity of AI in detecting the different types was assessed.AI showed sensitivity of 95.45% (n = 21) in detecting ILC, 92.86% (n = 65) in detecting IDC, 13% (n = 13) in detecting DCI and 100% (n = 4) in detecting other breast malignancies among the examined group of patients.
We calculated sensitivity of mammography in relation to breast density where Mammography showed the highest sensitivity at the ACR A category with 100% sensitivity and the least at the ACR D group with 58.33%.The majority of the false negative results were accounted at the ACR D category by 41.67% followed by ACR C and B by 17.65% and 2.5%, respectively (Table 4).
Among each category of the breast density, the accuracy of AI software in seeing breast malignancy was assessed and compared to the gold standard pathology, AI showed 100% sensitivity in both ACR A and ACR B, and 94.74%, 76.47% in ACR C, ACR D, respectively.False negative results represented 5.26% in ACR C group and the highest with 23.53% in ACR D group of patients.The p value was found less than 0.001 (Table 5).
Comparing the sensitivity of AI and mammography among each category of the breast density, both showed 100% sensitivity in ACR A and the least sensitivity in ACR D with 76.47% and 58.33% by AI and mammography, respectively, yet AI showed higher sensitivity than mammography in that category.
False negative results were more common in ACR D by both AI and mammography 23.53% and 41.67% results, yet AI showed a lesser percentage of false negative cases in that category.
Pearson correlation coefficient was calculated (R = 0.27) which is interpreted as a weak correlation between the decrease in sensitivity of AI and the breast density.As the breast density increases, the relative decrease in the performance of AI to detect the malignant lesions is noted.

Discussion
Although mammography screening is effective, there were many drawbacks: (1) False positive recalls force patients to have additional imaging tests and biopsies, which raises medical costs and puts the patient through mental stress; (2) False negative results ultimately cause diagnosis to be delayed; (3) exposure to radiation; (4) overdiagnosis of tumors that might not be life-threatening, like low risk ductal carcinoma in situ [13].
With a sensitivity of 76.5% and a specificity of 87.1% for women under the age of 40, mammography is still the most economical tool for detecting breast cancer, but it is far from a perfect screening test.In contrast, the sensitivity and specificity of mammography in women aged 75 to 79 are 88.4% and 93.5%, respectively [8].
Mammography offers advantages, but it also has limitations, particularly in dense breasts.The usage of AI is one of the most recent innovations designed to overcome the limits of mammography [14].
When compared to less dense patterns, dense breasts on a mammography provide more false negative results.It is blamed on the masking effect, which reduces mammographic sensitivity and causes the recall rate to triple [15].This study discussed the efficiency of the AI in detecting malignant breast lesions in correlation with the breast density compared to the pathology Fig. 1.
In the current study, 110 patients were included with 116 proven malignant breast lesions.Their age ranged between 28 and 83 with the mean of 45.5 ± 12.1 years old.
Regarding the pathological subtypes: IDC was seen in 63.64% of cases; ILC was 20%, DCIS was 12.73%, while other breast malignancies were 3.64% in our study.
These results are in co-ordinance with the results of Mansour et al. plotted at a study conducted on 2169 malignant breast lesions showing that IDC was seen in 76% of cases; ILC was 10.65%, DCIS was 3.9% while other breast malignancies were 9.4% [15].
Each breast in the current study had an AI abnormality scoring and category; 94.55% breast lesions were considered malignant at cut off value 59% abnormality score, and 5.45% breast lesions were considered benign Figs. 2  and 3.
Upon correlation with final diagnosis, we found, 103 lesions were true positives and 7 lesions were false negatives Fig. 4.
In the current research, we discovered that artificial intelligence is more sensitive than mammography at spotting cancerous breast tumors.Sensitivity between the two methods (AI and mammography) was 93.64%, 86.36% and false negative rate 6.36%, 13.64%, respectively.
Our findings are consistent with a study done by Kim et al. [16] which was performed on 170 230 mammography tests obtained from (5 institutions in South Korea, the USA, and the UK) and stated that when considering AI performance, overall sensitivity in the three validation datasets was 91%, whereas the individual sensitivity was: 90% in the South Korea dataset, 93% in the USA dataset, and 91% in the UK dataset.
Other study conducted by Rodríguez-Ruiz et al. [17] performed on 240 examinations (100 cancers, 40 leading to false positive recalls, 100 normal) noted a rise in sensitivity with AI support 86% vs 83% than mammography.
Another study by Pacilè et al. [18] which performed on 240 participants found that the use of AI help improved sensitivity by an average of 0.033.(P = 0.021).
Close results are also attained by Raafat et al [14].the sensitivity of AI was 96.6%, and false negative rate was 3.4%, while mammography sensitivity was 87.3% and false negative rate 12.7%.
In order to distinguish between benign and malignant breast lesions, Mansour et al. [12] found that AI-aided mammograms had a sensitivity of 96.8% and a specificity of 90.1%.Mammography combined with ultrasound exams had a sensitivity of 98.6% and a specificity of 91.6%.
Considering the breast density, our research discovered fair agreement between the AI categorization for the breast density as compared to that of the radiologist, as they both agreed on the assessment of the density in 80% of the cases and had different assessment to the density in 20% of the patients.
These findings are consistent with a study conducted by Le Boulc'h, et al. [19] on 311 female patients, showing substantial agreement upon the breast density assessment between the senior radiologist and AI (κ = 0.79; 95% CI: 0.73-0.84).
Our findings are also agreeing with Magni et al. [20] reporting an agreement of 90.4% and a reliability of 0.807 (Cohen κ) between AI breast density classification and radiologist readings.
In a study by Singh et al. [21] assessing 476 full field mammography examinations, it showed fair agreement between the estimated breast density by a fully automated software and the reading of two radiologists (κ = 0.398 and 0.388, respectively).
Although many researches have implemented deep learning to mammography, most of these papers are minded with the detection of breast cancer and the classification of lesions as benign or malignant.The present study is different from them because we also searched for the correlation between the sensitivity of AI in detection of malignant lesions in correlation with the breast density.AI showed 100% sensitivity in both ACR A, ACR B and 94.74%, 76.47%% in ACR C, ACR D, respectively.False negative results were 5.26% in ACR C group and 23.53%% in ACR D group of patients.The p value was found less than 0.001.
Our experience also showed a correlation coefficient of 0.27 between the sensitivity of the AI in detecting the malignant lesions and the variation in breast density which is interpreted as there is a weak correlation between the breast density and the efficiency of the AI in detecting the malignant breast lesions.
This matches kim et al [5] which demonstrates that breast density had less impact on AI's diagnostic performance than it did on radiologists' performance, leading to a notable improvement in radiologists' AI-assisted performance in dense breasts.
These results are concordant with the results plotted by Suh et al. [22] noting that as breast density increased the performance for malignancy diagnosis by AI declined.(density A, mean AUC = 0.984 vs. density D, mean AUC = 0.902 by DenseNet-169).
Mansour et al. [15] also assessed the performance of AI in dense breasts "ACR C & ACR D" with 2169 malignant lesions, and the AI algorithm presented a sensitivity of 88.29% (95% CI: 78.63-95.05%),a specificity of 96.34% (95% CI: 87.07-99.98%),and a diagnostic accuracy of 94.5% (95% CI: 88.24-99.15%) in its capability to evaluate dense breasts.They also noticed that while the specificity of the AI was higher, the sensitivity was lower in dense breasts when compared to mammography with ultrasound.
The relatively small image data sets and potential for bias in model training are the key limitations of our work.Clinical aspects like symptoms or family history are not taken into consideration by our AI algorithm, which may prevent a thorough examination.
Further studies with larger and variable datasets in AI models which have been trained with different datasets than that used for researches might help to improve the outcome of AI.

Conclusions
Our study showed that digital mammograms of varied density can be effectively assessed for breast cancer using artificial intelligence, this efficiency might be enhanced in breasts with less parenchymal tissue density as there was a weak correlation between the increase in breast density and relative decrease in the performance of AI to detect the malignant lesions.Dense breasts examined with AI shown noticeable reduction in mammographic misdiagnosis.

Fig. 1 A
Fig.1 A 43-year-old female attended for annual screening.a Digital mammography revealed heterogeneously dense breasts (ACR c) with no suspicious findings.b AI highlight area of suspicious lesion in outer central region of left breast with 94% risk of malignancy.The pathology revealed DCIS.AI successfully detected a suspicious mass lesion in the left breast, and this shows better performance than mammography in this dense breast

Fig. 4
Fig.4 A 36-year-old female patient with positive family history came with a right breast lump.a Digital mammography revealed heterogeneously dense breasts (ACR c) showing deeply seated dense mass lesion with indistinct margins in UIQ of the right breast.b AI highlighted area of suspicious mass lesion in upper region of the right breast in MLO with 94% risk of malignancy.Pathology revealed IDC.AI successfully detected suspicious mass noted in mammography in a dense breast

Table 1
ACR breast density given by radiologist

Table 2
ACR breast density given by AI software

Table 3
ACR density measured by Radiologist vs AI

Table 4
Sensitivity of Mammography in correlation with Breast density

Table 5
Sensitivity of AI in correlation with Breast density