Advertisement

Diagnostic performance of the Kaiser score for characterizing lesions on breast MRI with comparison to a multiparametric classification system

  • Aleksandr Istomin
    Affiliations
    Kuopio University Hospital, Diagnostic Imaging Center, Department of Clinical Radiology, Kuopio, Finland
    Search for articles by this author
  • Amro Masarwah
    Affiliations
    Kuopio University Hospital, Diagnostic Imaging Center, Department of Clinical Radiology, Kuopio, Finland
    Search for articles by this author
  • Ritva Vanninen
    Affiliations
    Kuopio University Hospital, Diagnostic Imaging Center, Department of Clinical Radiology, Kuopio, Finland

    University of Eastern Finland, Cancer Center of Eastern Finland, Kuopio, Finland

    University of Eastern Finland, Institute of Clinical Medicine, School of Medicine, Kuopio, Finland
    Search for articles by this author
  • Author Footnotes
    1 Shared authorship.
    Hidemi Okuma
    Footnotes
    1 Shared authorship.
    Affiliations
    Kuopio University Hospital, Diagnostic Imaging Center, Department of Clinical Radiology, Kuopio, Finland
    Search for articles by this author
  • Author Footnotes
    1 Shared authorship.
    Mazen Sudah
    Correspondence
    Corresponding author at: Department of Clinical Radiology, Breast Unit, Kuopio University Hospital, Puijonlaaksontie 2, P.O. Box 100, FI-70029, Kuopio, Finland.
    Footnotes
    1 Shared authorship.
    Affiliations
    Kuopio University Hospital, Diagnostic Imaging Center, Department of Clinical Radiology, Kuopio, Finland

    University of Eastern Finland, Cancer Center of Eastern Finland, Kuopio, Finland
    Search for articles by this author
  • Author Footnotes
    1 Shared authorship.

      Highlights

      • The Kaiser score has high diagnostic accuracy in breast magnetic resonance imaging.
      • The interobserver agreement for the Kaiser score was excellent.
      • The use of the Kaiser score may reduce the biopsy rate for true negative lesions.

      Abstract

      Purpose

      To determine the diagnostic performance of the Kaiser score and to compare it with the BI-RADS–based multiparametric classification system (MCS).

      Method

      Two breast radiologists, blinded to the clinical and pathological information, separately evaluated a database of 499 consecutive patients with structural 3.0 T breast MRI and 697 histopathologically verified lesions. The Kaiser scores and corresponding MCS categories were recorded. The sensitivity and specificity of the Kaiser score and the MCS categories to differentiate benign from malignant lesions were calculated. The interobserver reproducibility and receiver operating characteristic (ROC) parameters were analysed.

      Results

      The sensitivity and specificity of the MCS were 100 % and 12 %, respectively, and those of the Kaiser score were 98.5 % and 34.8 % for reader 1 and 98.7 % and 47.5 % for reader 2. The area under the ROC-curve was 85.9 and 87.6 for readers 1 and 2. The interobserver intraclass correlation coefficient was excellent at 0.882. Reader 1 upgraded six lesions from BI-RADS 3 to a Kaiser score of >4, and reader 2 upgraded seven lesions. When applying the Kaiser score to 158 benign lesions readers 1 and 2 would have reduced the biopsy rate by 22.8 % and 35.4 %, respectively.

      Conclusions

      The Kaiser score showed high diagnostic accuracy with excellent interobserver reproducibility. The MCS had perfect sensitivity but low specificity. Although the Kaiser score had slightly lower sensitivity, its specificity was 3–4 times greater than that of the MCS. Thus, the Kaiser score has the potential to considerably reduce the biopsy rate for true negative lesions.

      Abbreviations:

      ADC (diffusion coefficient values), AUC (area under the curve), BI-RADS (Breast Imaging Reporting and Data System), CI (confidence intervals), EUSOMA (European Society of Breast Cancer Specialists working group), ICC (intraclass correlation coefficients), MCS (multiparametric classification system), MRI (magnetic resonance imaging), NME (non-mass enhancement), NPV (negative predictive values), PPV (positive predictive values), ROC (receiver operating characteristic)

      Keywords

      1. Introduction

      Dynamic contrast-enhanced magnetic resonance imaging (MRI) is one of the most sensitive breast imaging modalities, but it has a relatively low specificity [
      • Sardanelli F.
      • Boetes C.
      • Borisch B.
      • et al.
      Magnetic resonance imaging of the breast: recommendations from the EUSOMA working group.
      ]. Breast MRI findings are evaluated using the standardised American College of Radiology Breast Imaging Reporting and Data System (BI-RADS) lexicon, which uses multiple descriptors to characterise enhancing lesions [
      • D’Orsi C.J.
      • Sickles E.A.
      • Mendelson E.B.
      • Morris E.A.
      ACR BI-RADS® Atlas, Breast Imaging Reporting and Data System.
      ]. Morphological and kinetic parameters are then summed by an experienced reader to assign a BI-RADS category. However, because the features of benign and malignant lesions often overlap, the positive predictive values (PPV) for individual descriptors vary considerably, ranging from 17.9 % to 100 % [
      • Istomin A.
      • Masarwah A.
      • Okuma H.
      • Sutela A.
      • Vanninen R.
      • Sudah M.
      A multiparametric classification system for lesions detected by breast magnetic resonance imaging.
      ] resulting in high sensitivity at the expense of lower specificity. MRI was reported to detect additional lesions in 20 % of women with a PPV of 67 % and in the contralateral breast in 5.5 % of women with a PPV of 37 % [
      • Plana M.N.
      • Carreira C.
      • Muriel A.
      • et al.
      Magnetic resonance imaging in the preoperative assessment of patients with primary breast cancer: systematic review of diagnostic accuracy and meta-analysis.
      ].
      False-positive MRI findings might result in an increased mastectomy rate [
      • Houssami N.
      • Turner R.M.
      • Morrow M.
      Meta-analysis of pre-operative magnetic resonance imaging (MRI) and surgical treatment for breast cancer.
      ], an unfavourable harm-to-benefit ratio [
      • Houssami N.
      • Turner R.
      • Morrow M.
      Preoperative magnetic resonance imaging in breast cancer: meta-analysis of surgical outcomes.
      ], or a longer wait until surgery due to the additional diagnostic workup required [
      • Nessim C.
      • Winocour J.
      • Holloway D.P.
      • Saskin R.
      • Holloway C.M.
      Wait times for breast cancer surgery: effect of magnetic resonance imaging and preoperative investigations on the diagnostic pathway.
      ].
      To improve the specificity and diagnostic accuracy of MRI findings, researchers have explored the value of multiparametric scoring and evaluation schemes, with varying results [
      • Fischer U.
      • Kopka L.
      • Grabbe E.
      Breast carcinoma: effect of preoperative contrast-enhanced MR imaging on the therapeutic approach.
      ,
      • Tozaki M.
      • Igarashi T.
      • Matsushima S.
      • Fukuda K.
      High-spatial-resolution MR imaging of focal breast masses: interpretation model based on kinetic and morphological parameters.
      ,
      • Kawai M.
      • Kataoka M.
      • Kanao S.
      • et al.
      The value of lesion size as an adjunct to the BI-RADS-MRI 2013 descriptors in the diagnosis of solitary breast masses.
      ,
      • Ellmann S.
      • Wenkel E.
      • Dietzel M.
      • et al.
      Implementation of machine learning into clinical breast MRI: potential for objective and accurate decision-making in suspicious breast masses.
      ]. Furthermore, a recent meta-analysis of breast MRI studies concluded that diffusion-weighted imaging with a multiparametric protocol improved the specificity for the diagnosis of malignant lesions but it did not affect the sensitivity [
      • Zhu C.R.
      • Chen K.Y.
      • Li P.
      • Xia Z.Y.
      • Wang B.
      Accuracy of multiparametric MRI in distinguishing the breast malignant lesions from benign lesions: a meta-analysis.
      ]. More recently, a BI-RADS–based multiparametric classification system (MCS), which incorporates the apparent diffusion coefficient values (ADC) and T2-weighted signal intensity, was able to categorise lesions with PPVs within the recommended ranges for BI-RADS categories [
      • Istomin A.
      • Masarwah A.
      • Okuma H.
      • Sutela A.
      • Vanninen R.
      • Sudah M.
      A multiparametric classification system for lesions detected by breast magnetic resonance imaging.
      ].
      Baltzer et al. [
      • Baltzer P.A.
      • Dietzel M.
      • Kaiser W.A.
      A simple and robust classification tree for differentiation between benign and malignant lesions in MR-mammography.
      ] introduced a classification tree flowchart to aid the interpretation of enhancing breast MRI lesions based on a large database with 17 different variables. This classification tree, later termed the Kaiser score, includes five major diagnostic criteria (root sign, dynamic enhancement curve type, margins, internal enhancement pattern, and oedema) and its diagnostic accuracy was high. Since then, a few studies have validated the Kaiser score in selected patient populations and reported high diagnostic accuracies for differentiating between benign and malignant lesions detected by breast MRI [
      • Marino M.A.
      • Clauser P.
      • Woitek R.
      • et al.
      A simple scoring system for breast MRI interpretation: does it compensate for reader experience?.
      ,
      • Woitek R.
      • Spick C.
      • Schernthaner M.
      • et al.
      A simple classification system (the tree flowchart) for breast MRI can reduce the number of unnecessary biopsies in MRI-only lesions.
      ,
      • Cloete D.J.
      • Minne C.
      • Schoub P.K.
      • Becker J.H.R.
      Magnetic resonance imaging of fibroadenoma-like lesions and correlation with Breast Imaging-Reporting and Data System and Kaiser scoring system.
      ,
      • Wengert G.J.
      • Pipan F.
      • Almohanna J.
      • et al.
      Impact of the Kaiser score on clinical decision-making in BI-RADS 4 mammographic calcifications examined with breast MRI.
      ,
      • Milos R.I.
      • Pipan F.
      • Kalovidouri A.
      • et al.
      The Kaiser score reliably excludes malignancy in benign contrast-enhancing lesions classified as BI-RADS 4 on breast MRI high-risk screening exams.
      ,
      • Jajodia A.
      • Sindhwani G.
      • Pasricha S.
      • et al.
      Application of the Kaiser score to increase diagnostic accuracy in equivocal lesions on diagnostic mammograms referred for MR mammography.
      ,
      • Zhang B.
      • Feng L.
      • Wang L.
      • Chen X.
      • Li X.
      • Yang Q.
      Kaiser score for diagnosis of breast lesions presenting as non-mass enhancement on MRI.
      ].
      The aims of this study were to explore the diagnostic performance of the Kaiser score using a database of 697 histopathologically verified lesions in 499 patients, and to compare its performance with that of the BI-RADS–based MCS.

      2. Material and methods

      2.1 Patient selection

      Breast MRI examinations are performed in our institution for selected patients in accordance with national guidelines, which are in concordance with the European Society of Breast Cancer Specialists working group (EUSOMA) recommendations [
      • Sardanelli F.
      • Boetes C.
      • Borisch B.
      • et al.
      Magnetic resonance imaging of the breast: recommendations from the EUSOMA working group.
      ]. The specific indications for breast MRI were as follows: 1) Screening of patients at high-risk for breast cancer; 2) Occult primary breast cancer; 3) Preoperative staging, whenever the exact tumour size or extension cannot be confidently identified with an expected impact on treatment decisions; 4) As a problem solving modality for equivocal or discordant findings at conventional imaging; 5) Evaluation of response to neoadjuvant chemotherapy; 6) Patients with pathological nipple discharge where galactography was technically unsuccessful and 7) Patients with Paget’s disease of the nipple before breast conserving surgery.
      This study used a prospective database of consecutive breast MRI examinations performed in our hospital between 2011 and 2015. During this period a total of 829 consecutive breast MRI examinations were performed on 613 patients. The exclusion criteria for this study were as follows: screening examinations with no findings (BI-RADS 1), examinations with only BI-RADS 2 findings and repeated examinations without new suspicious findings, bringing the final number of examinations included to 697 histologically confirmed lesions found in 499 patients. At the time of imaging, all detected BI-RADS 3–5 lesions were meticulously evaluated by specialized breast radiologists and all suspicious lesions were histologically verified.
      The MCS categories were retrospectively determined for the patient population, as previously reported [
      • Istomin A.
      • Masarwah A.
      • Okuma H.
      • Sutela A.
      • Vanninen R.
      • Sudah M.
      A multiparametric classification system for lesions detected by breast magnetic resonance imaging.
      ]. Briefly, using the BI-RADS 5th edition lexicon, the internal enhancement patterns, types of curves, shapes, and margins of masses, as well as the T2-weighted signal intensity were analysed and combined with the ADC values and kinetic curve assessment. For the purposes of the present study, we reanalysed the same group of patients by applying the Kaiser scoring system.
      The ethics committee of Kuopio University Hospital approved this study and waived the need for written informed consent due to the retrospective nature of the study. The study was conducted in accordance with the Declaration of Helsinki following all relevant national and international guidelines.

      2.2 MRI protocol

      The MRI protocol is summarised in Table 1. Breast MRI was performed using a 3.0 T MRI scanner (Philips Achieva TX, Philips N.V., Eindhoven, The Netherlands) with a dedicated seven-element phased-array bilateral breast coil.
      Table 1Breast magnetic resonance imaging protocol.
      SequenceTR/TE (ms)in-plane resolution mmSlice thickness (mm)Scanning time
      T1-FFE4.57 /2.30.48 × 0.480.76 min 11 s
      T2-TSE5000/1200.6 × 0.623 min 20 s
      STIR5000 /601 × 125 min 40 s
      T1 dynamic
      eTHRIVE spectrally adiabatic inversion recovery (SPAIR) fat suppression; pre-contrast and six phases after the gadoterate meglumine (0.1 mL/kg, 3 mL/s) injection followed by a saline chaser.
      4.67 /2.310.96 × 0.96158.5 s
      DWI
      DWI = Diffusion weighted echo planar imaging with five respective b factors (0, 200, 400, 600 and 800 s/mm2) routinely performed after contrast administration. The apparent diffusion coefficients maps were automatically calculated linearly using the manufacturer’s method.
      7168 /951.15 × 1.1544 min 8 s
      FFE = fast field echo.
      TSE = turbo spin echo.
      STIR = Short tau inversion recovery.
      a eTHRIVE spectrally adiabatic inversion recovery (SPAIR) fat suppression; pre-contrast and six phases after the gadoterate meglumine (0.1 mL/kg, 3 mL/s) injection followed by a saline chaser.
      b DWI = Diffusion weighted echo planar imaging with five respective b factors (0, 200, 400, 600 and 800 s/mm2) routinely performed after contrast administration. The apparent diffusion coefficients maps were automatically calculated linearly using the manufacturer’s method.

      2.3 Image analysis

      2.3.1 Multiparametric classification system

      Two experienced breast radiologists with over 10 (AI) and 25 years (MS) of experience in multimodal breast imaging analysed the MRI images in chronological order. They were blinded to all clinical and pathological information. A modified lesion descriptors algorithm, as originally described by Maltez de Almeida et al. [
      • Maltez de Almeida J.R.
      • Gomes A.B.
      • Barros T.P.
      • Fahel P.E.
      • de Seixas Rocha M.
      Subcategorization of suspicious breast lesions (BI-RADS category 4) according to MRI criteria: role of dynamic contrast-enhanced and diffusion-weighted imaging.
      ], was used. The lesions were analysed for the presence of descriptors using the BI-RADS 5th edition lexicon and combined with ADC threshold values and kinetic curve assessment. Major criteria included lesions with the highest PPV in breast MRI, including rim enhancing or spiculated mass lesions; segmental, clumped or clustered ring non-mass enhancement (NME); and type 3 (washout) curve and low ADC (≤0.69 × 10−3 mm2/s). Intermediate descriptors were round mass lesion, irregular shape or margin, and linear, regional or heterogeneous enhancements, intermediate or low T2 signal intensity, and type 2 curve. Other descriptors were designated as minor.
      All lesions were then assigned a BI-RADS category based on the presence and number of descriptors [
      • Istomin A.
      • Masarwah A.
      • Okuma H.
      • Sutela A.
      • Vanninen R.
      • Sudah M.
      A multiparametric classification system for lesions detected by breast magnetic resonance imaging.
      ]. Lesions with two or more major findings were designated as category 5. Lesions with intermediate or only one major descriptor were designated as category 4. Lesions with only minor descriptors were designated as category 3.

      2.3.2 Kaiser score

      Two specialised breast radiologists each with over 10 years (AI and HO) of experience in breast MRI and experienced in the interpretation of breast MRI according to the BI-RADS lexicon independently analysed the MRI images. Both radiologists were blinded to the clinical and pathological information. They assessed the MRI findings by applying the Kaiser score (available at http://www.meduniwien.ac.at/kaiser-score/), based on the five diagnostic criteria (lesion type, shape and margins, root sign, enhancement kinetics, and oedema). To evaluate the change in the readers’ performance over time and to eliminate any possible recall bias, both readers evaluated the MRI examinations in alphabetical order. There was a minimum period of approximately 10 months between the MCS and Kaiser score readings.
      Prior to this study, the readers had no experience of using the Kaiser score. Before the individual assessments, an extensive literature search on the Kaiser score was performed. This was followed by a joint training session on the interpretation algorithm using 20 breast MRI examinations in patients who were not included in the study cohort. This training was provided by a senior breast radiologist with over 25 years of experience in performing multimodal breast imaging.
      The independently assessed Kaiser scores as well as the corresponding MCS categories were recorded in a database. The final pathological diagnosis for each lesion was set as the reference standard.

      2.4 Statistical analysis

      The sensitivity, specificity, PPV and negative predictive values (NPV) with 95 % confidence intervals (CI) were calculated using VasarStats for statistical computation available online (http://www.vasarstats.net). Other statistical analyses were performed using IBM SPSS Statistics for Windows, Version 26.0 (IBM Corp, Armonk, NY: IBM Corp.).
      Receiver operating characteristic (ROC) analysis was performed and the area under the curve (AUC) was calculated to determine the overall diagnostic accuracy. Learning curves were constructed for both readers separately by plotting the overall diagnostic accuracy over time for each set of 50 consecutive examinations within the overall cohort. We then applied multinominal logistic regression to assess and compare the two readers in terms of the changes in diagnostic accuracy, sensitivity, specificity, PPV and NPV over time. Interobserver reproducibility was evaluated using intraclass correlation coefficients (ICCs) according to the 11 categories of the Kaiser score. An r of 1.0 was considered to indicate perfect agreement, 0.81–0.99 as near-perfect agreement, 0.61–0.80 as substantial agreement, 0.41–0.60 as moderate agreement, 0.21–0.40 as fair agreement and ≤0.20 as weak agreement. To distinguish between malignant and benign lesions, a cutoff value of >4 was used for the Kaiser score and a cutoff value of >3 was used for the MCS.

      3. Results

      A total of 697 lesions in 499 patients were analysed. The mean age of the patients was 57 years (range 24–88 years). A total of 555 mass (79.6 %) and 142 (20.4 %) non-mass lesions were diagnosed. The mean ± standard deviation size of the detected mass lesions was 19.2 ± 14.7 mm and that of non-mass lesions was 42.0 ± 25.3 mm. The final histopathology revealed that 76.2 % (n = 531) of the lesions were malignant, 22.7 % (n = 158) were benign and 1.1 % (n = 8) were high-risk lesions.

      3.1 Diagnostic performance of the Kaiser scores

      In the evaluation of the agreement between readers, the interobserver ICC for the Kaiser score across the 11 assignment categories was 0.882, indicating near-perfect agreement.
      The PPVs for malignancy for both readers using the Kaiser score are shown in Table 2. Both readers showed tendencies for increased probability of malignancy with scores ranging from 1 (lowest probability) to 11 (highest), except for scores 4 and 9.
      Table 2Kaiser scores of detected lesions with positive predictive values according to the final histological diagnosis.
      Kaiser scoreLesions n(%)Malignant n(%)Benign n(%)Risk lesion n(%)PPV
      PPV = positive predictive value, including risk lesions.
      (95% CI)
      Reader 1Reader 2Reader 1Reader 2Reader 1Reader 2Reader 1Reader 2Reader 1Reader 2
      124 (3.4)30 (4.3)0 (0)2 (6.7)24 (100)28 (93.3)0 (0)0 (0)06.7 (1.2−23.5)
      220 (2.9)25 (3.6)4 (20.0)3 (12.0)16 (80.0)22 (88.0)0 (0)0 (0)20.0 (6.6−44.3)12.0 (3.2−32.3)
      38 (1.1)15 (2.2)2 (25.0)1 (6.7)6 (75.0)12 (80.0)0 (0)2 (13.3)25.0 (4.5−64.4)20.0 (5.3−48.6)
      410 (1.4)13 (1.9)1 (10.0)0 (0)9 (90.0)13 (100)0 (0)0 (0)10.0 (0.5−45.9)0
      527 (3.9)31 (4.4)12 (44.4)17 (54.8)15 (55.6)13 (41.9)0 (0)1 (3.2)44.4 (26.0−64.4)58.1 (39.3−74.9)
      675 (10.8)66 (9.5)42 (56.0)42 (63.6)30 (40.0)23 (34.8)3 (4.0)1 (1.5)60.0 (48.0−70.9)65.2 (52.3−76.2)
      7115 (16.5)174 (25.0)80 (69.6)140 (80.5)32 (27.8)32 (18.4)3 (2.6)2 (1.1)72.2 (62.9−79.9)81.6 (74.9−86.9)
      812 (1.7)13 (1.9)11 (91.7)11 (84.6)1 (8.3)1 (7.7)0 (0)1 (7.7)91.6 (59.8−99.6)92.3 (62.1−99.6)
      973 (10.5)131 (18.8)61 (83.6)118 (90.1)12 (16.4)12 (9.2)0 (0)1 (0.8)83.6 (72.7−90.9)90.8 (84.2−95.0)
      10165 (23.7)75 (10.8)156 (94.5)74 (98.7)8 (4.8)1 (1.3)1 (0.6)0 (0)95.2 (90.3−97.7)98.7 (91.8−99.9)
      11168 (24.1)124 (17.8)162 (96.4)123 (99.2)5 (3.0)1 (0.8)1 (0.6)0 (0)97.0 (92.8−98.9)99.2 (94.9−99.9)
      Total697 (100)531 (76.2)158 (22.7)8 (1.1)
      CI = confidence interval.
      a PPV = positive predictive value, including risk lesions.
      Applying the Kaiser score to all lesions, the overall accuracy represented by the AUC was 0.859 and 0.876 for readers 1 and 2, respectively. When the mass and non-mass lesions were assessed separately, the AUCs for mass lesions were 0.888 and 0.905, and those for non-mass lesions were 0.742 and 0.749, for readers 1 and 2, respectively (Fig. 1).
      Fig. 1
      Fig. 1Receiver operating characteristic curves for the diagnostic accuracy of the Kaiser scores for all lesions (left), mass lesions (middle), and non-mass lesions (right) as interpretated by both readers separately.
      Seven (six mass and one non-mass) malignant lesions were misdiagnosed as benign according to the Kaiser scores (Fig. 2). One of those lesions was correctly classified as suspicious by the second reader. The characteristics of those lesions are presented in Table 3.
      Fig. 2
      Fig. 2A 7-mm invasive ductal carcinoma with a false-negative Kaiser score that was classified as BI-RADS category 4c. The lesion is visible as a round, circumscribed mass (arrows) on T1-dynamic contrast-enhanced sequence (A) and subtraction (B) images. The mass is not visible on the short-T1 inversion recovery (STIR) image (C). The kinetic curve contains a central region with rapid wash-in and wash-out (D).
      Table 3Characteristics of lesions missed by the Kaiser score.
      NR1 Kaiser scoreR2 Kaiser scoreSize mmMass/Non-massRoot signfinal PDMultiparametric classification system
      1318NMnoDCIS, high grade4b
      2227MnoIDC, G14c
      3447MnoIDC, G24c
      4237MnoIDC; G25
      5315MnoDCIS, high grade4b
      6225MnoIDC, G24c
      7489MnoIDC, G14c
      R1: reader 1; R2: reader 2; PD- histopathological diagnosis.
      The learning curves for both readers are presented separately in Fig. 3. The regression analyses yielded virtually identical results, showing clear learning curves for both readers with initial, middle and late phases. Neither reader showed measurable improvements in the diagnostic accuracy, sensitivity, specificity, PPV and NPV (p = 1.00).
      Fig. 3
      Fig. 3The learning curves for readers 1 and 2.

      3.2 Comparison between the Kaiser scores and the BI-RADS–based MCS

      The sensitivity, specificity, PPV and NPV of the Kaiser score relative to the MCS are summarised in Table 4. Although the MCS showed perfect sensitivity, the Kaiser score showed higher specificity (Fig.4).
      Table 4Diagnostic performance of the Kaiser score with both readers compared with the multiparametric classification system.
      Sensitivity (95 % CI)Specificity (95 % CI)PPV (95 % CI)NPV (95 % CI)
      Kaiser scoreReader 198.7 (97.2−99.4)34.8 (27.5−42.8)83.8 (80.6−86.5)88.7 (77.5−95.0)
      Reader 298.5 (97.0−99.3)47.5 (39.5−55.5)86.5 (83.5−89.0)90.4 (81.4−95.4)
      MCS100 (99.1−100)12.0 (7.6−18.4)79.5 (76.2−82.4)100 (79.1−100)
      PPV = positive predictive value.
      NPV = negative predictive value.
      Fig. 4
      Fig. 4A 6-mm fibroadenoma with a true-negative Kaiser score classified as BI-RADS 4c. The lesion is visible as a round circumscribed mass (arrows) on T1-dynamic contrast-enhanced sequence (A) and subtraction (B) images. The mass has a high intensity signal on short-T1 inversion recovery image (STIR) (C). The kinetic curve has a central region with rapid wash-in and wash-out (D).
      A total of 42 and 63 lesions were downgraded by readers 1 and 2, respectively, from BI-RADS categories 4 or 5 to a benign Kaiser score of 1–4. However, reader 1 upgraded six benign lesions from a BI-RADS category of 3 to a Kaiser score of >4, and reader 2 upgraded seven lesions. Therefore, when the readers applied the Kaiser score to 158 benign lesions, the biopsy rate would have been reduced by 22.8 % (36/158) and 35.4 % (56/158), for readers 1 and 2, respectively.

      4. Discussion

      The results of this study show that the Kaiser score has high diagnostic accuracy with near-perfect interobserver agreement when applied by experienced breast radiologists. Furthermore, while the MCS showed perfect sensitivity, its specificity was low at 12 %. By contrast, the Kaiser score showed greater specificity (34.8 % and 47.5 %) at the expense of a slight decrease in sensitivity. The use of the Kaiser score would have significantly reduced biopsy rate for benign lesions within our database.
      Our results agree with those of previously published articles. Since its introduction in 2013, the Kaiser score has been validated in studies including a variety of selected patient populations. Marino et al. [
      • Marino M.A.
      • Clauser P.
      • Woitek R.
      • et al.
      A simple scoring system for breast MRI interpretation: does it compensate for reader experience?.
      ] evaluated 100 patients with 121 verified lesions (52 malignant, 68 benign) and reported high accuracy, with AUCs ranging from 0.889 to 0.943. Woitek et al. [
      • Woitek R.
      • Spick C.
      • Schernthaner M.
      • et al.
      A simple classification system (the tree flowchart) for breast MRI can reduce the number of unnecessary biopsies in MRI-only lesions.
      ] evaluated the Kaiser score in 454 patients with 469 breast lesions only visible on MRI (98 malignant, 371 benign). The sensitivity and specificity of the Kaiser score for differentiating between benign and malignant lesions (Kaiser score >4) were 80.6 % and 82.5 %, respectively, and the AUC was 0.873. Wengert et al. [
      • Wengert G.J.
      • Pipan F.
      • Almohanna J.
      • et al.
      Impact of the Kaiser score on clinical decision-making in BI-RADS 4 mammographic calcifications examined with breast MRI.
      ] evaluated 167 consecutive patients with suspicious mammographic calcifications (95 malignant, 72 benign) and reported sensitivities and specificities of 96.8 %–98.9 % and 58.3 %–65.3 %, respectively, and AUCs of 0.968–0.989. Milos et al. [
      • Milos R.I.
      • Pipan F.
      • Kalovidouri A.
      • et al.
      The Kaiser score reliably excludes malignancy in benign contrast-enhancing lesions classified as BI-RADS 4 on breast MRI high-risk screening exams.
      ] evaluated 41 malignant and 142 benign lesions in 159 patients with BI-RADS category 4 findings on MRI, and reported AUCs for three readers of 0.865–0.902, with sensitivities and specificities of 92.7 %–97.6 % and 45.1 %–72.5 %, respectively. More recently, Jajodia et al. [
      • Jajodia A.
      • Sindhwani G.
      • Pasricha S.
      • et al.
      Application of the Kaiser score to increase diagnostic accuracy in equivocal lesions on diagnostic mammograms referred for MR mammography.
      ] evaluated 316 patients with equivocal or inconclusive lesions on mammography (221 malignant, 95 benign). The AUC was 0.796 and the sensitivity and specificity were 94.5 % and 43.1 %, respectively. Thus, the values in our study are well within the reported ranges of 0.859–0.876 for the AUC, 98.5 %–98.7 % for sensitivity, and 34.8 %–47.5 % for specificity.
      In our opinion, the differences in the reported diagnostic accuracies are probably related to the selected patient populations with different ratios of benign and malignant lesions, which may explain the low specificity in our study, and which can be deduced from the previous results. Cloete et al. [
      • Cloete D.J.
      • Minne C.
      • Schoub P.K.
      • Becker J.H.R.
      Magnetic resonance imaging of fibroadenoma-like lesions and correlation with Breast Imaging-Reporting and Data System and Kaiser scoring system.
      ] reported a sensitivity of only 50 % and a specificity of 84.6 % by evaluating 100 fibroadenoma-like benign lesions. Furthermore, of the lesions evaluated by Woitek et al. [
      • Woitek R.
      • Spick C.
      • Schernthaner M.
      • et al.
      A simple classification system (the tree flowchart) for breast MRI can reduce the number of unnecessary biopsies in MRI-only lesions.
      ], 79.1 % were benign and the resulting sensitivity was relatively low at 80.6 %. In that study, the sensitivity improved to 100 % for lesions with a Kaiser score >2, but the specificity decreased from 82.5 % to 27.8 %.
      The interobserver agreement for the Kaiser score varied between the earlier studies, ranging from fair to almost perfect agreement [
      • Marino M.A.
      • Clauser P.
      • Woitek R.
      • et al.
      A simple scoring system for breast MRI interpretation: does it compensate for reader experience?.
      ,
      • Woitek R.
      • Spick C.
      • Schernthaner M.
      • et al.
      A simple classification system (the tree flowchart) for breast MRI can reduce the number of unnecessary biopsies in MRI-only lesions.
      ,
      • Wengert G.J.
      • Pipan F.
      • Almohanna J.
      • et al.
      Impact of the Kaiser score on clinical decision-making in BI-RADS 4 mammographic calcifications examined with breast MRI.
      ,
      • Milos R.I.
      • Pipan F.
      • Kalovidouri A.
      • et al.
      The Kaiser score reliably excludes malignancy in benign contrast-enhancing lesions classified as BI-RADS 4 on breast MRI high-risk screening exams.
      ,
      • Jajodia A.
      • Sindhwani G.
      • Pasricha S.
      • et al.
      Application of the Kaiser score to increase diagnostic accuracy in equivocal lesions on diagnostic mammograms referred for MR mammography.
      ,
      • Dietzel M.
      • Krug B.
      • Clauser P.
      • et al.
      A multicentric comparison of apparent diffusion coefficient mapping and the Kaiser score in the assessment of breast lesions.
      ]. This indicates that, while the Kaiser score is simple and easy to use, there may be some challenging aspects. In our experience, the main challenge involves the interpretation of the root sign, the key parameter used for image interpretation. The root sign is not a strict BI-RADS descriptor and it is sometimes difficult to assess, especially in small masses or non-mass lesions, and may therefore result in misinterpretations. Small masses and non-mass lesions are more challenging to evaluate on breast MRI, and are associated with lower diagnostic performance, as with the Kaiser score [
      • Wengert G.J.
      • Pipan F.
      • Almohanna J.
      • et al.
      Impact of the Kaiser score on clinical decision-making in BI-RADS 4 mammographic calcifications examined with breast MRI.
      ,
      • Milos R.I.
      • Pipan F.
      • Kalovidouri A.
      • et al.
      The Kaiser score reliably excludes malignancy in benign contrast-enhancing lesions classified as BI-RADS 4 on breast MRI high-risk screening exams.
      ,
      • Jajodia A.
      • Sindhwani G.
      • Pasricha S.
      • et al.
      Application of the Kaiser score to increase diagnostic accuracy in equivocal lesions on diagnostic mammograms referred for MR mammography.
      ].
      To the best of our knowledge, no previous studies have evaluated the learning curve for applying the Kaiser score. However, it has been reported that the Kaiser score was associated with an improvement in the diagnostic performance when applied by inexperienced readers [
      • Marino M.A.
      • Clauser P.
      • Woitek R.
      • et al.
      A simple scoring system for breast MRI interpretation: does it compensate for reader experience?.
      ], and its accuracy was consistently high, regardless of the MRI protocol or scanner being used [
      • Woitek R.
      • Spick C.
      • Schernthaner M.
      • et al.
      A simple classification system (the tree flowchart) for breast MRI can reduce the number of unnecessary biopsies in MRI-only lesions.
      ]. More importantly, both readers in our study provided consistent and comparable results throughout the study. Although both readers were experienced, they received a short training session to ensure they interpreted the readings consistently. This might explain the high diagnostic performance and the near-perfect interobserver agreement in this study, paralleling the study by Marino et al. [
      • Marino M.A.
      • Clauser P.
      • Woitek R.
      • et al.
      A simple scoring system for breast MRI interpretation: does it compensate for reader experience?.
      ].
      The MCS showed perfect sensitivity and categorized lesions with PPVs within the recommended ranges for the BI-RADS categories. In this study, the readers mostly showed tendencies for increased probability of malignancy according to the score. However, the PPV yielded BI-RADS category 5 category only for Kaiser scores of 10 and 11. It was not possible to subcategorise BI-RADS category 4 into 4a, 4b, and 4c. Another crucial difference between these two algorithms is the subset of lesions with a dynamic wash-out curve. In MCS, this is considered to be a major descriptor, and lesions displaying this feature are assigned a minimum BI-RADS category of 4c. By contrast, a Kaiser score of 4 is applied to small mass lesions without root sign that have a central or complete wash-out curve, and often centrifugal enhancement; this score is equivalent to a BI-RADS category of 2/3, and the same applies for NME. In this study, only two small lesions with a wash-out curve were missed using the Kaiser score, so this omission is likely to have a limited effect.
      Because the Kaiser score had greater specificity than the MCS in this study, its use would have correctly and substantially reduced the biopsy rate, consistent with previous studies. A reduction in the biopsy rate will allow hospitals to better allocate their resources and reduce the costs and delays associated with the diagnostic or treatment protocols. Nevertheless, some downgraded lesions, especially newly diagnosed lesions, may require follow-up to exclude subsequent growth of the lesions. According to the Kaiser score web-based algorithm, if a follow-up is required, it should be performed by ultrasound if possible. If the lesions are not visible on conventional imaging, then follow-up visits should be scheduled at 6 months for mass lesions or 12 months for non-mass lesions. This shift from immediate evaluation to a follow-up strategy may reduce the benefits of the Kaiser score.
      Empirical approaches have been advocated to reduce possible false-negative scores. Examples include upgrading the Kaiser scores by 2 points in patients with suspicious mammographic microcalcifications; whether an additional criterion for benign lesions should be considered, especially if the ADC value exceeds 1.4 × 10−3 mm2/s; and whether the clinical context should be considered [
      • Dietzel M.
      • Baltzer P.A.T.
      How to use the Kaiser score as a clinical decision rule for diagnosis in multiparametric breast MRI: a pictorial essay.
      ]. All of these situations are continuously evaluated when interpreting MRI findings; therefore, in clinical practice, further histological examination may be recommended for some lesions, even if the Kaiser score is <5.
      Interestingly, only one NME was missed by the Kaiser score and the greatest diameter of the false-negative lesions in this study was <1 cm, and they were presumably difficult to diagnose. Of the seven missed lesions, five were strongly suspected of being malignant based on the MCS (BI-RADS category 4c/5). Based on our results, we speculate that additional evaluation of suspicious lesions based on MCS after applying the Kaiser score might help to upgrade some lesions. However, the opposite approach might increase the specificity of MCS. Further studies are needed to evaluate this possibility and the upgrading methods described above.
      A limitation of this study was its retrospective design and the relatively high ratio of malignant to benign lesions. Nevertheless, this study represents the day-to-day practice in a specialised tertiary hospital and, to date, represents the largest Kaiser score validation study in which MRI was performed in accordance with international guidelines, where breast MRI is not indicated for all patients. Indeed, all consecutive patients were included in this analysis to reduce bias.
      In conclusion, the Kaiser score provided high diagnostic accuracy with excellent reproducibility. After a short introductory training session, the diagnostic performance of both readers was constant in the initial, middle and late phases of their learning curves. The MCS had perfect sensitivity but low specificity. Although the Kaiser score had slightly lower sensitivity for both readers, the specificity was 3–4 times greater than that of the MCS. Thus, the Kaiser score has the potential to considerably reduce the biopsy rates for true-negative lesions.

      Authors’ contribution

      MS; HO; AI: Conceptualization.
      MS; AI; HO: Methodology.
      HO; AI: Validation.
      HO; AI: Formal analysis.
      AI; HO; MS: Investigation.
      MS; RV: Resources.
      All authors: Data Curation Management.
      AI; HO; MS: Writing - Original Draft.
      All authors: Writing - Review & Editing.
      All authors: Visualization.
      MS: Supervision.
      MS: Project administration.
      RV: Funding acquisition.
      All authors: Final approval.

      Funding

      This work was supported in part by grants (to AI) from Kuopio University Hospital (Special government funding (VTR), grant 5063542 ) and the Mauri and Sirkka Wiljasalo fund (to AI and HO). The authors declare no relationships with any companies whose products or services may be related to the content of this article. The funding sources were not involved in the study design, data collection or analysis, preparation of the manuscript, or the decision to submit the manuscript.

      Declaration of Competing Interest

      The authors report no declarations of interest

      References

        • Sardanelli F.
        • Boetes C.
        • Borisch B.
        • et al.
        Magnetic resonance imaging of the breast: recommendations from the EUSOMA working group.
        Eur. J. Cancer. 2010; 46: 1296-1316
        • D’Orsi C.J.
        • Sickles E.A.
        • Mendelson E.B.
        • Morris E.A.
        ACR BI-RADS® Atlas, Breast Imaging Reporting and Data System.
        American College of Radiology, Reston, VA2013
        • Istomin A.
        • Masarwah A.
        • Okuma H.
        • Sutela A.
        • Vanninen R.
        • Sudah M.
        A multiparametric classification system for lesions detected by breast magnetic resonance imaging.
        Eur. J. Radiol. 2020; 132109322
        • Plana M.N.
        • Carreira C.
        • Muriel A.
        • et al.
        Magnetic resonance imaging in the preoperative assessment of patients with primary breast cancer: systematic review of diagnostic accuracy and meta-analysis.
        Eur. Radiol. 2012; 22: 26-38
        • Houssami N.
        • Turner R.M.
        • Morrow M.
        Meta-analysis of pre-operative magnetic resonance imaging (MRI) and surgical treatment for breast cancer.
        Breast Cancer Res. Treat. 2017; 165: 273-283
        • Houssami N.
        • Turner R.
        • Morrow M.
        Preoperative magnetic resonance imaging in breast cancer: meta-analysis of surgical outcomes.
        Ann. Surg. 2013; 257: 249-255
        • Nessim C.
        • Winocour J.
        • Holloway D.P.
        • Saskin R.
        • Holloway C.M.
        Wait times for breast cancer surgery: effect of magnetic resonance imaging and preoperative investigations on the diagnostic pathway.
        J. Oncol. Pract. 2015; 11: e131-8
        • Fischer U.
        • Kopka L.
        • Grabbe E.
        Breast carcinoma: effect of preoperative contrast-enhanced MR imaging on the therapeutic approach.
        Radiology. 1999; 213: 881-888
        • Tozaki M.
        • Igarashi T.
        • Matsushima S.
        • Fukuda K.
        High-spatial-resolution MR imaging of focal breast masses: interpretation model based on kinetic and morphological parameters.
        Radiat. Med. 2005; 23: 43-50
        • Kawai M.
        • Kataoka M.
        • Kanao S.
        • et al.
        The value of lesion size as an adjunct to the BI-RADS-MRI 2013 descriptors in the diagnosis of solitary breast masses.
        Magn. Reson. Med. Sci. 2018; 17: 203-210
        • Ellmann S.
        • Wenkel E.
        • Dietzel M.
        • et al.
        Implementation of machine learning into clinical breast MRI: potential for objective and accurate decision-making in suspicious breast masses.
        PLoS One. 2020; 15e0228446
        • Zhu C.R.
        • Chen K.Y.
        • Li P.
        • Xia Z.Y.
        • Wang B.
        Accuracy of multiparametric MRI in distinguishing the breast malignant lesions from benign lesions: a meta-analysis.
        Acta Radiol. 2020; (284185120963900)https://doi.org/10.1177/0284185120963900
        • Baltzer P.A.
        • Dietzel M.
        • Kaiser W.A.
        A simple and robust classification tree for differentiation between benign and malignant lesions in MR-mammography.
        Eur. Radiol. 2013; 23: 2051-2060
        • Marino M.A.
        • Clauser P.
        • Woitek R.
        • et al.
        A simple scoring system for breast MRI interpretation: does it compensate for reader experience?.
        Eur. Radiol. 2016; 26: 2529-2537
        • Woitek R.
        • Spick C.
        • Schernthaner M.
        • et al.
        A simple classification system (the tree flowchart) for breast MRI can reduce the number of unnecessary biopsies in MRI-only lesions.
        Eur. Radiol. 2017; 27: 3799-3809
        • Cloete D.J.
        • Minne C.
        • Schoub P.K.
        • Becker J.H.R.
        Magnetic resonance imaging of fibroadenoma-like lesions and correlation with Breast Imaging-Reporting and Data System and Kaiser scoring system.
        SA J. Radiol. 2018; 22: 1532
        • Wengert G.J.
        • Pipan F.
        • Almohanna J.
        • et al.
        Impact of the Kaiser score on clinical decision-making in BI-RADS 4 mammographic calcifications examined with breast MRI.
        Eur. Radiol. 2020; 30: 1451-1459
        • Milos R.I.
        • Pipan F.
        • Kalovidouri A.
        • et al.
        The Kaiser score reliably excludes malignancy in benign contrast-enhancing lesions classified as BI-RADS 4 on breast MRI high-risk screening exams.
        Eur. Radiol. 2020; 30: 6052-6061
        • Jajodia A.
        • Sindhwani G.
        • Pasricha S.
        • et al.
        Application of the Kaiser score to increase diagnostic accuracy in equivocal lesions on diagnostic mammograms referred for MR mammography.
        Eur. J. Radiol. 2021; 134109413
        • Zhang B.
        • Feng L.
        • Wang L.
        • Chen X.
        • Li X.
        • Yang Q.
        Kaiser score for diagnosis of breast lesions presenting as non-mass enhancement on MRI.
        Nan Fang Yi Ke Da Xue Xue Bao. 2020; 40: 562-566
        • Maltez de Almeida J.R.
        • Gomes A.B.
        • Barros T.P.
        • Fahel P.E.
        • de Seixas Rocha M.
        Subcategorization of suspicious breast lesions (BI-RADS category 4) according to MRI criteria: role of dynamic contrast-enhanced and diffusion-weighted imaging.
        AJR Am. J. Roentgenol. 2015; 205: 222-231
        • Dietzel M.
        • Krug B.
        • Clauser P.
        • et al.
        A multicentric comparison of apparent diffusion coefficient mapping and the Kaiser score in the assessment of breast lesions.
        Invest. Radiol. 2020; https://doi.org/10.1097/RLI.0000000000000739
        • Dietzel M.
        • Baltzer P.A.T.
        How to use the Kaiser score as a clinical decision rule for diagnosis in multiparametric breast MRI: a pictorial essay.
        Insights Imaging. 2018; 9: 325-335