A comparative efficacy study of diagnostic digital breast tomosynthesis and digital mammography in BI-RADS 4 breast cancer diagnosis

Purpose: Probability of malignancy for BI-RADS 4-designated breast lesions ranges from 2% to 95%, contributing to high false-positive biopsy rates. We compare clinical performance of digital breast tomosynthesis (DBT) versus digital mammography (2D) among our BI-RADS 4 population without prior history of breast cancer. Methods: We extracted retrospective data i.e., clinical, mammogram reports, and biopsy data, from electronic medical records across Houston Methodist’s nine hospitals for patients who underwent diagnostic examinations using both modalities (02/01/2015 – 09/30/2020). 2D and DBT cohorts were not intra-individual matched, and there was no direct mammogram evaluation. Using Student’s t test, Fisher’s exact test, and Chi-squared test, we evaluated the data to determine statistical significance of differences between modalities in BI-RADS 4 cases. We calculated adjusted odds-ratio between modalities for cancer detection rate (CDR) and biopsy-derived positive predictive value (PPV3). Results: There were 6,356 encounters (6,020 patients) in 2D and 5,896 encounters (5,637 patients) in DBT assessed as BI-RADS 4. Using Fisher’s exact test, DBT mammography cases were significantly assessed as BI-RADS 4 5.66% more often than those undergoing 2D mammography, P = 0.0046 (1.0566 95% CI: 1.0169–1.0977). The CDRs were 112.65 (2D) and 120.76 (DBT), adjusted odds-ratio: 1.04 (0.93, 1.16)), P = 0.5029, while PPV3 were 14.41% (2D) and 15.99% (DBT), adjusted odds-ratio: 1.09 (0.97, 1.22), P = 0.1483; both logistic regression-adjusted for all other factors. Conclusion: DBT did not achieve better performance and sensitivity in assigning BI-RADS 4 cases compared with 2D, showed no significant advantage in CDR and PPV3, and does not reduce false-positive biopsies among BI-RADS 4-assessed patients.


Introduction
Mammography plays a vital role in early breast cancer detection and diagnosis [1,2]. It has also been observed that race and socioeconomic determinants are important factors in breast cancer detection, including access to mammography, incidence, and prognosis. Caucasian women are more likely to develop breast cancer than other races (Blacks, Hispanics, and Asians) and are also expected to have the most mammograms performed while breast cancer mortality rates and prognosis are poorer among Black women [3]. Some of these differences in outcomes may be attributed to less access to mammography and lower quality medical care, as well as various lifestyle patterns associated with different ethnic groups [3][4][5].
The American College of Radiology developed the Breast Imaging Reporting and Data System (BI-RADS) lexicon [6][7][8] to standardize breast imaging reporting, evaluate risk of breast lesions, and facilitate biopsy decision-making. BI-RADS classifies lesions into seven assessment categories (zero to six), each implying specific management recommendations. However, significant intra-and inter-observer variability from the BI-RADS lexicon has resulted in considerable variation in the rate of biopsy across the US, with 55%-85% of breast biopsies ultimately found to be benign lesions [9,10]. Some other studies have posited that of more than a million biopsies for breast lesions in the US annually, as many as 75% turn out to be benign [11,12]. Among the BI-RADS categories, BI-RADS 4 -"suspicious findings with a recommendation for biopsy" [13], stands out for its enormous uncertainty, with a 2-95% likelihood of malignancy [8,14,15]. A biopsy is thus considered as the most appropriate next course of action for BI-RADS 4 categorized lesions and serves as a quality metric and performance standard [10,[16][17][18] resulting in a majority (69-95%) of BI-RADS 4 lesions being biopsied [19]. Over the decades, BI-RADS 4 tissue biopsy-proven positive predictive value (PPV3) [20,21] rates have not improved [8], and this translates to high false-positive rates of mammography. Currently, BI-RADS 4 PPV3 in the US is reported to be at 21.1% [8]. Researchers have estimated that false-positive mammograms and breast cancer overdiagnoses in the United States cost approximately $4 billion per year [22].
Digital mammograms have been found to show up to 89.3% diagnostic accuracy at detecting breast cancer cases [23], but none of the various standard techniques used for breast imaging are 100% effective, especially for tiny tumors, diffusely infiltrating carcinomas like invasive lobular cancers, and ductal in-situ cancers without microcalcifications [24,25]. With advances in technology, digital breast tomosynthesis (DBT) compared to digital mammography (2D) has been touted to improve cancer detection rates and avoid unnecessary biopsies, despite its slightly higher radiation dose [26][27][28]. While a lot of hospitals and health systems have transitioned fully to systemwide use of the DBT modality for all mammography examinations, several scientific questions remain unanswered. Some screening (prospective and retrospective) studies projected that a combination of 2D and DBT mammography improves cancer detection compared with stand-alone 2D digital mammography, but with conflicting recall rates [29][30][31][32][33][34][35]. However, this combination translates to the risk of more radiation exposure for patients, increased time for interpretation by radiologists, and problems of more significant data storage and management [36,37]. Furthermore, a published study suggested that there may not be much significant advantage in terms of cancer detection rates (CDR), but their DBT (2D + DBT) group had a higher proportion of invasive cancers than in-situ cancers [38].
In this study, we considered diagnostic mammogram examinations given an assessment of BI-RADS category 4 of patients with no history of prior breast cancers in our multi-hospital system over a period of five and half years to compare the clinical performance of DBT mammography i.e., using DBT tomograms alone with 2D mammography, and examined if the DBT mammography confers advantage in terms of better CDR and PPV3.

Study population
Our retrospective study was conducted on mammography data extracted from METEOR (Methodist Environment for Translational Enhancement and Outcomes Research), an enterprise-wide clinical data warehouse and analytics environment at our institution [39]. We compared the mammography data for patients who underwent diagnostic (performed based on signs of possible breast cancer or as a further evaluation of suspicious findings from a screening imaging) examinations using 2D digital mammography with that of those who had DBT mammography, performed between February 1, 2015, and September 30, 2020, focusing on those who were given an initial assessment of BI-RADS category 4. They also did not have history of breast cancer to avoid affecting their BI-RADS 4 designation and possible cancer diagnosis. Our hospital's Institutional Review Board (IRB) approved the study protocol and granted a waiver of written informed consent.

Data collection
METEOR recorded the patients who underwent 2D and DBT as two different patient cohorts. The mammograms were performed using Hologic Lorad Selenia units for 2D, while Hologic Selenia Dimensions and GE Senographe Essential Tomosynthesis units were used for DBT at the Breast Centers throughout our hospital system. The following information was extracted from METEOR: patient demographics including age, gender, race; modality used (2D vs. DBT); presence of prior mammogram; final BI-RADS assessment category; biopsy type: image-guided core needle or surgical biopsy; pathology results within three months after the mammogram; tumor staging; and hormone receptor and growth-promoting protein expression, i.e., estrogen receptor (ER), progesterone receptor (PR), and growthpromoting protein (HER2) status. Available data on tumor staging as well as hormone receptor and growth-promoting protein status was limited, as not everyone had these records. Biopsy outcomes, i.e., benign or malignant lesions, determined by a review of pathology reports, served as cancer diagnosis status. There was no direct mammogram evaluation and the 2D and DBT cohorts were not intraindividual matched.

Statistical methods
R statistical software [40] was used to analyze and compare the data between 2D and DBT. We determined the number of patients among the DBT population who had an intial 2D digital mammography and then subsequent DBT (2D + DBT) examinations. Using student's t, Fisher's exact, and Chi-squared tests, we evaluated the collected data to determine statistical significance. Using Fisher's exact test, we determined the difference in BI-RADS 4 cases and the malignant diagnosis ratio in the BI-RADS 4 biopsy results. We adjusted for confounders such as prior mammograms, age, race, etc., and calculated the adjusted odds ratios between modalities for cancer detection rate (CDR) and biopsy-derived positive predictive value (PPV3). Overall, we tested if there was any significant difference between 2D and DBT mammography in BI-RADS 4 cases. A p-value of <0.05 was considered statistically significant.

Results
Within the over five-year period (February 2015 -September 2020), there were 6356 encounters (6020 unique patients) who had 2D mammography examinations and 5896 encounters (5637 unique patients) who had DBT examinations and were assessed as BI-RADS 4. Meanwhile, there were a total of 31,303 unique patients (43,485 encounters) and 29,538 unique patients (38,177 encounters) who had 2D and DBT mammography, respectively, performed across the nine hospitals of our health system (Fig. 1). Using Fisher's exact test, the results show that the BI-RADS 4 assessed cases in DBT are slightly, yet significantly more than those in 2D mammography by 5.66%, p-value = 0.004591 (1.0566 95% CI: 1.0169-1.0977). There were only 35 (0.59%) cases among the DBT group who had a subsequent DBT examination after a 2D digital mammogram. In terms of descriptive statistics, a series of preliminary analyses were conducted on the patient factors (age, race, gender, menopausal status, and presence of prior mammogram), and tumor characteristics (staging, ER, PR, and HER2 status) in both modalities. We then analyzed the CDR and PPV3 of patients scanned by 2D mammography against patients scanned by DBT mammography.
There were no statistically significant differences between the two modalities (2D vs. DBT mammography) concerning age (P = 0.7092), sex (0.05435), and menopausal status (P = 0.7031) in BI-RADS 4 category. Table 1 shows the summary of these patient characteristics.
Statistical distributions and p values (for significant differences in comparisons between 2D and DBT modalities) for the above parameters in the entire patient cohorts i.e., all BI-RADS categories are detailed in Supplementary Table 1.

Biopsy outcome, staging, and ER, PR, and HER2 status
Available biopsy outcome records showed that 14.41% (716/4969) in 2D mammography cases and 15.99% (712/4452) in DBT cases were malignant among the BI-RADS 4 cohort who had biopsy performed and results available. This difference was not significant based on Fisher's test (P = 0.0688). Difference in proportions of malignant BI-RADS 4 cases in the entire BI-RADS 4 population (biopsy or no biopsy) in 2D (11.26%) vs. DBT (12.07%) mammography was also found to be not statistically significant (P = 0.2183). Distributions, proportions, and percentages of the malignant cases among biopsy cases as well as in the entire cohort (i.e., irrespective of whether they had biopsy or not) for the various BI-RADS groups and a combination of BI-RADS 1 through 5 i.e., without category 0 (inconclusive result) and 6 (known biopsy-proven malignant lesions) is shown in Supplementary Table 2. A summary of tumor characteristics based on our available data is shown in Table 2. From the available tumor staging data, the distribution of staging based on 2D and DBT mammography in BI-RADS 4 cases had no significant difference using a Chi-squared test (P = 0.0678). Similarly, when we analyzed available data on the ER, PR, and HER2 status results (i.e., positive/negative in ER and PR and positive/negative/equivocal in HER2), Fisher's exact test showed no significant differences between 2D and DBT mammography (Table 3). Thus, tumor characteristics including staging, hormone receptor status (ER, PR), and growth-promoting protein (HER2) in our available data revealed no significant difference among BI-RADS 4 patients. For tumor characteristics among the entire patient cohort (all BI-RADS categories), between 2D and DBT modalities, please see Supplementary Table 3.

Cancer detection rate (CDR) and biopsy-derived positive predictive value (PPV3)
In the BI-RADS 4 category, the CDRs were 112.65 for 2D and 120.76 for DBT with logistic regression-adjusted odds ratio of 1.04 (0.93, 1.16), which was not significant P = 0.5029. Fig. 2 shows the CDR and the logistic regression-adjusted odds ratios for BI-RADS 14.41% (2D) and 15.99% (DBT) and again, this difference was not significant with an adjusted odds ratio of 1.09 (0.97, 1.22), P = 0.1483. Fig. 3 shows the PPV3 and the logistic regression-adjusted odds ratios for BI-RADS 4. In both cases, logistic regression was used to adjust the impact of all other variables.

Discussion
In this study, we assessed the performance of 2D mammography in comparison with DBT (DBT tomograms only) among the cases that were given an initial assessment of BI-RADS 4 in diagnostic examinations. In our hospital system, DBT is not evaluated as a combination of synthetic 2D and DBT i.e., s2D + DBT. There have been some studies in this subject but often assessing entire dignostic cohorts i.e. with BI-RADS categories zero through six. Our study focuses on BI-RADS 4 designated lesions, which have a wide range of probability of malignancy and resultant overbiopy, thus making improved cancer detection highly desirable in this category. The advent of DBT has certainly resulted in the production of better images and better portrayal of massess, asymetries, and other anomalies by facilitating the separation of over-lapping structures common with 2D mammography [41]. Also, early studies have reported that DBT detects as much as 40 percent more cancers than digital (2D) mammograms in breast cancer screening examinations [42].
Nevertheless, our findings indicate that once the assessment is made as BI-RADS 4, biopsy outcomes were comparable for both DM and DBT as malignancy rates, cancer detection rates, and biopsy-derived positive predictive values were similar, such that differences were not statistically significant. A slightly deeper dive into granular data (histologic staging, hormone receptor and growth-promoting protein status) also showed similar performance. However, of note is that our results in BI-RADS 4 do not discount the fact that overall or when certain other BI-RADS categories are considered, DBT mammography might exhibit a better performance over 2D mammography.
It is worth pointing out that BI-RADS 4 demographic distributions including age, gender, and race were of similar patterns among 2D and DBT mammography cohorts in our data, and our data also mirrors established national statistics. For instance, Caucasians who have the highest overall incidence of breast cancer [43][44][45] are in the majority, followed by Blacks and then Asians in both cohorts. The racial distribution is significant between 2D and DBT mammography, yet there was no significant difference in the cancer detection rates. Despite histologic workup being recommended for all BI-RADS 4 patients, this is often not the case as seen in our health system where there is 24-25% unavailable biopsy data, probably due to loss in follow-up, change of hospital, etc.
The application of DBT has increased since 2012 after FDA's approval the prior year. The modality, which was commonly used as a special application for supplementing 2D X-ray mammography in the case of suspicious findings initially, has became a standard of care in many organizations for breast evaluation with some centers including ours already using DBT as first line for mammography service. While some studies suggest that DBT detects cancers better than 2D DM and results in lower recall rates, especially when combined with actual or synthesized 2D images [46][47][48][49], one study suggests that there is no real advantage, reporting similarity in cancer detection rate [38].
Hofvind et al. suggested that DBT and 2D synthetic mammography (SM), recreated from tomosynthesis images, increased the detection rate of histologically favorable tumors compared with that attained from DM evaluation alone [47]. Li et al. concluded that DBT exhibits higher diagnostic accuracy for benign calcifications, dense breasts, and both premenopausal and postmenopausal women, but has no advantage in non-dense breasts and malignant calcification cases when compared to 2D digital mammography (DM) and recommended DBT for breast cancer evaluation in young women with dense breasts [48]. Bahl and colleagues in an earlier work demonstrated that overall cancer detection rates were similar in both the DM and DM + DBT cohorts, however the proportion of invasive cancers compared to in situ lesions as well as the PPV were significantly higher among the DM + DBT group compared to DM group [38].
Our more recent study comparing 2D digital mammography with mostly initial DBT (only 0.59% DM + DBT) suggests that breast cancer evaluation with DBT may not have much comparative advantage over DM in clinical practice when it comes to detecting cancers or improving the PPV3 for BI-RADS 4-assessed lesions. DBT does not reduce unnecessary biopsies associated with BI-RADS 4 and brings to the fore the need for tools that could help to mitigate the problem of unnecessary biopsies [50].

Limitations
This current work was done using data from a single large multi-hospital health system made up of nine hospitals and several clinics in Houston, Texas, including outpatient breast imaging centers, so we do not rule out the possibility of certain bias in the results, even though Houston is one of the most diverse cities in the United States. Further studies are warranted on this subject matter, especially across a wide geographical area and several health systems. Gray screening could be a limitation, however, we used mammograms designated strictly as diagnostic in terms of procedure name and physician notes, and the data was well validated. Another area that our data may engender limitation is the non-availability of BI-RADS 4 sub-categorizations i.e. 4a, 4b, and 4c. These are not reported in our hospital mammogram reports; it is noteworthy that this data is poorly documented, has low utilization prevalence, and is heterogenous in use across the US. Though we assessed performance in some granular data, i.e., histologic staging, hormone receptor and growth-promoting protein status, available data was limited. We could not look at the impact of tumor grading or breast fibrodensity and its higher risk, due to unavailability of this data. However, our data is homogeneous and demographic distributions including age and race are similar and balanced between the 2D and DBT cohorts. Also, an analysis of characteristics of histological types of cancers detected by DBT is a subject for future research. All of these could form the basis of future extension of this work to investigate if there are categories of patients that will benefit more from DBT modality evaluation if they are assigned as BI-RADS 4.

Conclusion
Among BI-RADS 4 assigned patients undergoing diagnostic examinations, DBT appears not to show statistically significant improvement to performance and sensitivity in assigning BI-RADS 4 cases when compared with DM as a slightly higher number (5.66%) of patients undergoing DBT mammography were assessed as BI-RADS 4 more often than those undergoing 2D mammography. Also, biopsy outcomes were comparable for both DM and DBT as malignancy rates, cancer detection rates, and biopsy-derived positive predictive values were similar, and differences were not statistically significant between modalities. Thus, DBT does not reduce unnecessary biopsies associated with BI-RADS 4 assessed patients. Our research findings would benefit from more extensive investigation, especially using data with an increased granularity and from a wider geographical coverage area.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.    Table 2 Summary of tumor characteristics in BI-RADS 4 population (2D vs. 3D).