Advertisement

Detection and PI-RADS classification of focal lesions in prostate MRI: Performance comparison between a deep learning-based algorithm (DLA) and radiologists with various levels of experience

Published:August 05, 2021DOI:https://doi.org/10.1016/j.ejrad.2021.109894

      Highlights

      • Deep learning-based algorithm (DLA) can promisingly assign PI-RADS categories.
      • PI-RADS categories assigned by the radiologists with different experience level varied.
      • The sensitivities and specificities of the DLA and expert were similar with PI-RADS ≥ 4.
      • The performance of DLA was similar to that of clinical reports in clinical practice.

      Abstract

      Purpose

      To compare the performance of lesion detection and Prostate Imaging-Reporting and Data System (PI-RADS) classification between a deep learning-based algorithm (DLA), clinical reports and radiologists with different levels of experience in prostate MRI.

      Methods

      This retrospective study included 121 patients who underwent prebiopsy MRI and prostate biopsy. More than five radiologists (Reader groups 1, 2: residents; Readers 3, 4: less-experienced radiologists; Reader 5: expert) independently reviewed biparametric MRI (bpMRI). The DLA results were obtained using bpMRI. The reference standard was based on pathologic reports. The diagnostic performance of the PI-RADS classification of DLA, clinical reports, and radiologists was analyzed using AUROC. Dichotomous analysis (PI-RADS cutoff value ≥ 3 or 4) was performed, and the sensitivities and specificities were compared using McNemar’s test.

      Results

      Clinically significant cancer [CSC, Gleason score ≥ 7] was confirmed in 43 patients (35.5%). The AUROC of the DLA (0.828) for diagnosing CSC was significantly higher than that of Reader 1 (AUROC, 0.706; p = 0.011), significantly lower than that of Reader 5 (AUROC, 0.914; p = 0.013), and similar to clinical reports and other readers (p = 0.060–0.661). The sensitivity of DLA (76.7%) was comparable to those of all readers and the clinical reports at a PI-RADS cutoff value ≥ 4. The specificity of the DLA (85.9%) was significantly higher than those of clinical reports and Readers 2–3 and comparable to all others at a PI-RADS cutoff value ≥ 4.

      Conclusions

      The DLA showed moderate diagnostic performance at a level between those of residents and an expert in detecting and classifying according to PI-RADS. The performance of DLA was similar to that of clinical reports from various radiologists in clinical practice.

      Keywords

      Abbreviations:

      AI (artificial intelligence), bpMRI (biparametric MRI), CSC (clinically significant prostate cancer), DLA (deep learning-based algorithm), mpMRI (multiparametric MRI), PI-RADS (Prostate Imaging-Reporting and Data System), PSA (prostate-specific antigen), TRUS (transrectal ultrasonography)

      1. Introduction

      Prostate MRI is beneficial for the diagnosis of prostate cancer in biopsy-naïve patients and patients with prior negative biopsy results. Multicentric, randomized studies have shown that prostate MRI before biopsy and MRI-targeted biopsy is superior to standard transrectal ultrasonography (TRUS)-guided biopsy in patients at clinical risk [
      • Kasivisvanathan V.
      • Emberton M.
      • Moore C.M.
      MRI-Targeted Biopsy for Prostate-Cancer Diagnosis.
      ,
      • Ahmed H.U.
      • El-Shater Bosaily A.
      • Brown L.C.
      • Gabe R.
      • Kaplan R.
      • Parmar M.K.
      • Collaco-Moraes Y.
      • Ward K.
      • Hindley R.G.
      • Freeman A.
      • Kirkham A.P.
      • Oldroyd R.
      • Parker C.
      • Emberton M.
      Diagnostic accuracy of multi-parametric MRI and TRUS biopsy in prostate cancer (PROMIS): a paired validating confirmatory study.
      ]. A meta-analysis revealed that prebiopsy MRI is a more favorable diagnostic work-up than systematic biopsy for the diagnosis of clinically significant prostate cancer (CSC) [
      • Drost F.J.H.
      • Osses D.
      • Nieboer D.
      • Bangma C.H.
      • Steyerberg E.W.
      • Roobol M.J.
      • Schoots I.G.
      Prostate Magnetic Resonance Imaging, with or Without Magnetic Resonance Imaging-targeted Biopsy, and Systematic Biopsy for Detecting Prostate Cancer: A Cochrane Systematic Review and Meta-analysis.
      ].
      Because of increasing evidence and guideline recommendations, the demand for prostate MRI is growing, and radiologists face a substantial increase in the number of referrals. The Prostate Imaging-Reporting and Data System (PI-RADS) has suggested fundamental guidelines for the interpretation of prostate MRI [
      • Weinreb J.C.
      • Barentsz J.O.
      • Choyke P.L.
      • Cornud F.
      • Haider M.A.
      • Macura K.J.
      • Margolis D.
      • Schnall M.D.
      • Shtern F.
      • Tempany C.M.
      • Thoeny H.C.
      • Verma S.
      PI-RADS Prostate Imaging - Reporting and Data System: 2015, Version 2.
      ]; however, the PI-RADS score can be interpreted differently by different radiologists [
      • Sonn G.A.
      • Fan R.E.
      • Ghanouni P.
      • Wang N.N.
      • Brooks J.D.
      • Loening A.M.
      • Daniel B.L.
      • To'o K.J.
      • Thong A.E.
      • Leppert J.T.
      Prostate Magnetic Resonance Imaging Interpretation Varies Substantially Across Radiologists.
      ]. In particular, radiologists with less experience have greater inter-reader variability in PI-RADS scoring [
      • Sonn G.A.
      • Fan R.E.
      • Ghanouni P.
      • Wang N.N.
      • Brooks J.D.
      • Loening A.M.
      • Daniel B.L.
      • To'o K.J.
      • Thong A.E.
      • Leppert J.T.
      Prostate Magnetic Resonance Imaging Interpretation Varies Substantially Across Radiologists.
      ]. Machine learning-based automatic detection and classification of focal lesions in the prostate gland may be helpful for reducing radiologists’ reading time and reducing inter-reader variability [
      • Padhani A.R.
      • Turkbey B.
      Detecting Prostate Cancer with Deep Learning for MRI: A Small Step Forward.
      ].
      Recently, deep learning-based artificial intelligence (AI) algorithms have shown valuable performance in differentiating prostate cancer from normal tissues and in estimating the probability of prostate cancer [
      • Schelb P.
      • Kohl S.
      • Radtke J.P.
      • Wiesenfarth M.
      • Kickingereder P.
      • Bickelhaupt S.
      • Kuder T.A.
      • Stenzinger A.
      • Hohenfellner M.
      • Schlemmer H.P.
      • Maier-Hein K.H.
      • Bonekamp D.
      Classification of Cancer at Prostate MRI: Deep Learning versus Clinical PI-RADS Assessment.
      ,
      • Yoo S.
      • Gujrathi I.
      • Haider M.A.
      • Khalvati F.
      Prostate Cancer Detection using Deep Convolutional Neural Networks.
      ,
      • Song Y.
      • Zhang Y.D.
      • Yan X.
      • Liu H.
      • Zhou M.
      • Hu B.
      • Yang G.
      Computer-aided diagnosis of prostate cancer using a deep convolutional neural network from multiparametric MRI.
      ,
      • Sumathipala Y.
      • Lay N.
      • Turkbey B.
      • Smith C.
      • Choyke P.L.
      • Summers R.M.
      Prostate cancer detection from multi-institution multiparametric MRIs using deep convolutional neural networks.
      ,
      • Ishioka J.
      • Matsuoka Y.
      • Uehara S.
      • Yasuda Y.
      • Kijima T.
      • Yoshida S.
      • Yokoyama M.
      • Saito K.
      • Kihara K.
      • Numao N.
      • Kimura T.
      • Kudo K.
      • Kumazawa I.
      • Fujii Y.
      Computer-aided diagnosis of prostate cancer on magnetic resonance imaging using a convolutional neural network algorithm.
      ]. As PI-RADS is the standard of reporting on prostate MRI, a deep-learning based algorithm for both lesion detection and PI-RADS classification is necessary in clinical practice. Lately, a deep learning-based algorithm for PI-RADS classification has been demonstrated [
      • Sanford T.
      • Harmon S.A.
      • Turkbey E.B.
      • Kesani D.
      • Tuncer S.
      • Madariaga M.
      • Yang C.
      • Sackett J.
      • Mehralivand S.
      • Yan P.
      • Xu S.
      • Wood B.J.
      • Merino M.J.
      • Pinto P.A.
      • Choyke P.L.
      • Turkbey B.
      Deep-Learning-Based Artificial Intelligence for PI-RADS Classification to Assist Multiparametric Prostate MRI Interpretation: A Development Study.
      ]; however, this approach did not address the detection of prostate lesions and required manual lesion segmentation by a radiologist. Few studies have compared the performance of the machine to detect focal lesions and calculate cancer probability to the performance of radiologists’ PI-RADS classification [
      • Schelb P.
      • Kohl S.
      • Radtke J.P.
      • Wiesenfarth M.
      • Kickingereder P.
      • Bickelhaupt S.
      • Kuder T.A.
      • Stenzinger A.
      • Hohenfellner M.
      • Schlemmer H.P.
      • Maier-Hein K.H.
      • Bonekamp D.
      Classification of Cancer at Prostate MRI: Deep Learning versus Clinical PI-RADS Assessment.
      ,
      • Schelb P.
      • Wang X.
      • Radtke J.P.
      • Wiesenfarth M.
      • Kickingereder P.
      • Stenzinger A.
      • Hohenfellner M.
      • Schlemmer H.P.
      • Maier-Hein K.H.
      • Bonekamp D.
      Simulated clinical deployment of fully automatic deep learning for clinical prostate MRI assessment.
      ]. To the best of our knowledge, no studies have compared the performance of AI and radiologists with various levels of experience in prostate MRI.
      The purpose of this study was to compare the diagnostic performance of lesion detection and PI-RADS classification between a deep learning-based algorithm (DLA), clinical reports and radiologists with different levels of experience in prostate MRI

      2. Materials and methods

      This retrospective study was approved by the Institutional Review Board (KC19DISI0933), and informed consent was waived.

      2.1 Study population

      Patients with clinical suspicion of prostate cancer who underwent prebiopsy prostate MRI between November 2015 and February 2019 followed by TRUS-guided biopsy were eligible (n = 923). One of two experienced genitourinary radiologists performed TRUS-guided biopsy after review of MRI images and reports. Systematic biopsy with or without additional targeted biopsy was performed. Targeted biopsy was performed using cognitive fusion, and the location of the targeted lesion was recorded in the biopsy report. Only patients who underwent prostate MRI using the same 3-T MRI machine (MAGNETOM Verio, Siemens Healthcare, Erlangen, Germany) were included in this study. The exclusion criteria were 1) known prostate cancer before prostate MRI (n = 38), 2) MRI examination performed using a 1.5-T (n = 612) or other vendor 3-T MRI machine (n = 44), 3) MRI examination used in the development of the DLA (n = 100), 4) cases in which the DLA did not run (n = 1) due to technical error during MRI scanning, not technical error of DLA, and 5) patients with very high PSA (>40 ng/mL, n = 7). Finally, a total of 121 patients were enrolled in this study (Fig. 1). Clinical information of age, prostate-specific antigen (PSA) level, the use of 5α-reductase inhibitor, clinical report of prostate MRI, biopsy result, number of previous TRUS-guided biopsies before MRI, interval between MRI and biopsy, prostatectomy result, and interval between MRI and prostatectomy of patients who underwent prostatectomy were collected from electronic medical records.

      2.2 MR imaging techniques

      The MRI examinations in this study consisted of 58 biparametric MRIs (bpMRIs) and 63 multiparametric MRIs (mpMRIs) using a pelvic phased-array coil. The following sequences were acquired with the following parameters: axial, sagittal, and coronal T2-weighted images (T2WIs), repetition time [TR] > 3,200 ms; echo time [TE], 80–100 ms; matrix, 320 × 320; slice thickness, 3 mm; field of view [FOV], 200–220 mm; axial diffusion-weighted images (DWIs) with b-values of 0, 50, 500, and 1,000 sec/mm2, slice thickness 3 mm; matrix, 100 × 100; FOV, 200–220 mm. DWI with a b-value of 1,500 sec/mm2 was obtained in some patients, but apparent diffusion-coefficient (ADC) maps were calculated from DWI with b-values of 50 and 1,000 sec/mm2 to maintain consistency in the imaging protocol.

      2.3 DLA for prostate MRI

      A non-commercially available, deep learning-based prototype software (Prostate AI version 1.2.1, build date 2019-11-27, Siemens Healthcare) was used. The algorithm was trained and validated with 2,170 bpMRIs from seven institutions, including ours [

      X. Yu, B. Lou, B. Shi, D. Winkel, N. Arrahmane, M. Diallo, T. Meng, H.v. Busch, R. Grimm, B. Kiefer, D. Comaniciu, A. Kamen, H. Huisman, A. Rosenkrantz, T. Penzkofer, I. Shabunin, M.H. Choi, Q. Yang, D. Szolar, False Positive Reduction Using Multiscale Contextual Features for Prostate Cancer Detection in Multi-Parametric MRI Scans, 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), 2020, pp. 1355-1359.

      ]. Our institution provided 100 cases for the development of DLA and these cases were excluded from the current study. This DLA was designed to evaluate bpMRI using axial T2WI and DWI with high b-values to detect suspicious prostatic lesions and categorize the lesions according to PI-RADS v2. Only T2WI and DWI were loaded to DLA even in patients who underwent mpMRI. It displays abnormal areas using a suspicion map on T2-weighted images and shows the segmented abnormal lesion. It also automatically produces a draft radiologic report containing text information about PI-RADS classification, size and location of the lesions in order of higher PI-RADS score and larger size, for user to review and edit. The DLA results provide PI-RADS scores and a snapshot of each of up to five lesions in each patient. Detailed information on the DLA is provided in the Supplementary material.

      2.4 Prostate MRI review by radiologists

      Two reader groups of radiology residents (Reader group 1, composed of four 2-year residents, and Reader group 2, composed of four 3-year residents) and three board-certified radiologists (Reader 3, 4.5 years of experience; Reader 4, 5 years of experience; and Reader 5, 10 years of experience in prostate imaging) who were blinded to the biopsy results independently reviewed bpMRI (three planes of T2WIs, axial DWIs [b = 0, 1,000 sec/mm2], and an ADC map). Given that the radiology residents were not familiar with prostate MRI, four with similar levels of experience in prostate MRI split the cases. Integration of the four residents’ reviews was considered as a single reader review in Reader groups 1 and 2. Reader 5 had reviewed MRIs for more than four months before matching the other readers’ MRI reviews to pathology results. All readers recorded the number, location, and size of suspicious prostate lesions and PI-RADS version 2 score. For later comparison with the reference standard, all detected prostate lesions (PI-RADS scores from 2 to 5) were captured on axial images with indicators on the picture archiving and communication system by all readers. The index lesion was determined by the highest PI-RADS score or by the largest diameter if multiple lesions with the same PI-RADS score existed.

      2.5 Clinical report as a routine interpretation

      In the routine clinical process at our institution, six board-certified abdominal/genitourinary radiologists with at least six years of experience in prostate MRI reviewed prostate MRI and made radiological reports. More than 1,000 prostate MRIs are performed each year at our institution. The number, size, location, PI-RADS score, and snapshot of each of up to five lesions in each patient report were collected. In some reports made before release of PI-RADS v2, the conclusion did not follow the guidelines; these reports were interpreted according to PI-RADS v2 by a radiologist who was not aware of the biopsy results; an indeterminate conclusion was interpreted as PI-RADS 3 and a definite conclusion as prostate cancer was interpreted as PI-RADS 4 and 5 according to tumor size and extracapsular extension.

      2.6 Reference standard for prostate cancer

      All prostate cancers, regardless of Gleason grade group, and CSC ≥ Gleason grade group 2 (Gleason score 7 [3 + 4]) were used to define prostate cancer. Reader 5 reviewed the pathologic results and the MRI results from DLA and all other readers at least 4 months after Reader 5′s image review and determined whether the lesion were cancers based on the pathology. For patients who underwent systematic and targeted biopsy, the reader assessed whether prostate cancer was confirmed in the targeted lesion. If prostate cancer was confirmed in systematic cores other than the targeted cores, the reader matched the biopsy results to the focal lesion on prostate MRI. In patients who underwent prostatectomy, the schematic diagram of the histopathology map that depicted the cancer area, instead of whole mount pathology, was the primary reference for prostate cancer location. When the Gleason score differed between biopsy and prostatectomy pathology in a patient, the prostatectomy result was used for evaluation.

      2.7 Statistical analysis

      The PI-RADS assessments from the DLA, clinical reports and each radiologist were compared with those of Reader 5 (the most experienced radiologist) using weighted Kappa statistics to analyze inter-reader agreement.
      The diagnostic performance of per-patient PI-RADS scores for all readings was analyzed using the area under the receiver operating characteristics (ROC) curve (AUROC) in diagnosing all prostate cancers or CSCs. The ROC curves were compared using Delong’s method. We also performed dichotomous analysis of the diagnostic performance using either PI-RADS ≥ 3 or ≥ 4 as the cutoff value. Given that physicians dichotomically determine whether to conduct prostate biopsy, we thought that dichotomous analysis would be more helpful to understand the findings intuitively. To calculate the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy in detecting prostate cancer, we defined true or false positives/negatives for an index lesion at the per-patient level. For example, a true positive means that a reader (DLA, clinical report, or radiologist) correctly detected prostate cancer at the same location and categorized the lesion with at least the PI-RADS score cutoff value. Sensitivities and specificities of the clinical report and radiologists were compared to those of the DLA using McNemar’s test.
      Statistical analysis was performed using SPSS software, version 23.0 (IBM, Armonk, NY, USA) and MedCalc version 19.2.0 (MedCalc Software, Mariakerke, Belgium). A p-value of<0.05 was considered statistically significant. To compare diagnostic performance between clinical reports, the five radiologists, and DLA, p-values were multiplied by 6 according to the Bonferroni correction.

      3. Results

      3.1 Patient characteristics

      Among the 121 patients (mean age 68.2 ±8.5 years, range 47–85 years), 52 (43.0%) were diagnosed with prostate cancer, and 43 (35.5%) were confirmed to have CSC. The median prostate-specific antigen (PSA) level was 6.5 ng/mL (interquartile range 4.5–10.4 ng/mL). Twenty-three patients underwent radical prostatectomy. The demographic, clinical, and pathologic information of the patients is summarized in Table 1.
      Table 1Clinical characteristics of the 121 patients.
      CharacteristicsTotal (n = 121)
      Mean age (range) (years)68.2 ± 8.5(47–85)
      Median PSA (interquartile range) (ng/mL)6.5 (4.5–10.4)
      Use of 5α-reductase inhibitor4 (3.3%)
      Number of previous TRUS-guided biopsies before MRI
       None87 (71.9%)
       One time29 (24.0%)
       Two times3 (2.5%)
       Three times1 (0.8%)
       Four times1 (0.8%)
      Median time interval between MRI and biopsy (interquartile range) (days)17 (9–26)
      Median time interval between MRI and prostatectomy (interquartile range) (days)44 (31–67)
      Pathologically proven prostate cancers52 (43.0%)
      Pathologically proven CSCs43 (35.3%)
      Maximum Gleason score
       6 (3 + 3)9 (17.3%)
       7 (3 + 4)24 (42.3%)
       7 (4 + 3)13 (23.1%)
       8 (4 + 4)6 (11.5%)
       9 (4 + 5)3 (5.8%)
      CSC, clinically significant prostate cancer; PSA, prostate-specific antigen; TRUS, transrectal ultrasonography.

      3.2 PI-RADS assessment and inter-reader agreement for DLA, clinical reports, and radiologists

      The proportions of PI-RADS categories determined by the DLA, clinical reports, and radiologists varied. The detailed distribution of the PI-RADS scores by DLA, clinical reports, and Readers 1–5 is shown in Fig. 2. DLA assigned 60.3%, 0.8%, 9.9% and 27.3% of cases as PI-RADS 1, 3, 4, and 5, respectively. Inter-reader agreement for PI-RADS score was moderate (κ, 0.461) between DLA and Reader 5 and varied from poor to good agreement between the other readers and Reader 5 (κ, 0.340, 0.457, 0.467, 0.609, for Readers 1–4, respectively). Inter-reader agreement for the PI-RADS score between clinical reports and Reader 5 was moderate (κ, 0.422).
      Figure thumbnail gr2
      Fig. 2Proportion (%) of PI-RADS score in DLA, clinical reports and Readers 1–5 The distribution of PI-RADS scores determined by DLA, clinical reports and Readers 1–5 were variable.

      3.3 Comparison of diagnostic performance using AUROC

      For diagnosing all prostate cancers, the AUROC of the DLA was 0.808, which was significantly higher than those of Reader 1 (AUROC, 0.698; p = 0.031) and clinical reports (AUROC, 0.687; p = 0.015) and was similar to those of Readers 2–5 (AUROC, 0.786, 0.729, 0.862 and 0.874; p = 0.623, 0.101, 0.174 and 0.098, respectively) (Fig. 3a). In the diagnosis of CSCs, the AUROC of DLA was 0.828, with no significant difference from that of Readers 2–4 and clinical reports (AUROC, 0.811, 0.754, 0.882, and 0.730; p = 0.661, 0.110, 0.122, and 0.060, respectively). The performance of the DLA was superior to that of Reader group 1 (AUROC, 0.706; p = 0.011) but inferior to that of Reader 5 (AUROC, 0.914; p = 0.013) (Fig. 3b).
      Figure thumbnail gr3
      Fig. 3Receiver operating characteristic curves (ROCs) of the DLA, clinical reports and Readers 1–5 In diagnosis of all cancers, the performance of DLA is better than that of clinical reports for all prostate cancers (p = 0.015). There are no significant differences in the performances between the DLA and Readers 2–5 (a). In diagnosis of clinically significant cancers, only Reader 5 shows better performance than DLA (b).

      3.4 Dichotomous analysis

      The sensitivities and specificities of the DLA, clinical reports, and radiologists in the diagnosis of prostate cancer (all prostate cancers or CSCs) varied widely (Table 2, Table 3). For both all prostate cancers and CSCs, no significant difference in sensitivity was noted between the DLA results and any of the readers or clinical reports for PI-RADS cutoff value of either ≥ 3 or 4 except for Reader 5 at a PI-RADS cutoff value ≥ 3. The DLA showed significantly higher specificity in diagnosing all prostate cancers and CSCs relative to any of the radiologists and clinical reports for a PI-RADS cutoff value ≥ 3. The sensitivities and specificities of the DLA and Reader 5 did not significantly differ when using a PI-RADS cutoff value ≥ 4 for diagnosing all prostate cancers and CSCs. The PPV also varied among the radiologists and clinical reports. The DLA showed a better PPV than all radiologists and clinical reports. The accuracy of the DLA was higher than those of Readers 1–3 and clinical reports, and comparable to those of Readers 4 and 5. Fig. 4, Fig. 5, and Supplementary Fig. 2, Fig. 3 show representative examples of cancer detection by the DLA and radiologists.
      Table 2Dichotomous analysis of DLA, clinical reports and Readers 1–5, reference standard based on presence of pathologically proven all prostate cancer.
      Sensitivity, %Corrected P value
      Bonferroni corrected p-value; p-values were multiplied by 6.
      Specificity, %Corrected P value
      Bonferroni corrected p-value; p-values were multiplied by 6.
      Accuracy, %PPV, %NPV, %
      DLA
      PI-RADS ≥ 373.1 (38/52)Reference87.0 (60/69)Reference81.0 (98/121)80.9 (38/47)81.1 (60/74)
      PI-RADS ≥ 469.2 (36/52)Reference88.4 (61/69)Reference80.2 (97/121)81.8 (36/44)79.2 (61/77)
      Reader group 1
      PI-RADS ≥ 369.2 (36/52)>0.9940.6 (28/69)<0.00152.9 (64/121)46.8 (36/77)63.6 (28/44)
      PI-RADS ≥ 457.7 (30/52)0.65468.1 (47/69)0.04263.6 (77/121)57.7 (30/52)68.1 (47/69)
      Reader group 2
      PI-RADS ≥ 376.9 (40/52)>0.9949.3 (34/69)<0.00161.2 (74/121)53.3 (40/75)73.9 (34/46)
      PI-RADS ≥ 471.2 (37/50)>0.9965.2 (45/69)0.03067.8 (82/121)60.7 (37/61)78.3 (45/60)
      Reader 3
      PI-RADS ≥ 382.7 (43/52)>0.9929.0 (20/69)<0.00152.1 (63/121)46.7 (43/92)69.0 (20/29)
      PI-RADS ≥ 480.8 (42/52)0.87650.7 (35/69)<0.00163.6 (77/121)55.3 (42/76)77.8 (35/45)
      Reader 4
      PI-RADS ≥ 390.4 (47/52)0.07260.9 (42/69)<0.00173.6 (89/121)63.5 (47/74)89.4 (42/47)
      PI-RADS ≥ 486.5 (45/52)0.07279.7 (55/69)>0.9982.6 (100/121)76.3 (45/59)88.7 (55/62)
      Reader 5
      PI-RADS ≥ 392.3 (48/52)0.03658.0 (40/69)<0.00172.7 (88/121)62.3 (48/77)90.9 (40/44))
      PI-RADS ≥ 484.6 (44/52)0.23481.2 (56/69)>0.9982.6 (100/121)77.2 (44/59)87.5 (56/64)
      Clinical reports
      PI-RADS ≥ 384.6 (44/52)>0.9923.2 (16/69)<0.00149.6 (60/121)44.4 (44/99)72.7 (16/22)
      PI-RADS ≥ 478.8 (41/52)>0.9936.2 (25/69)<0.00154.5 (66/121)47.1 (41/87)73.5 (25/34)
      DLA, deep learning-based algorithm; PI-RADS, prostate imaging-reporting and data system; PPV, positive predictive value; NPV, negative predictive value.
      * Bonferroni corrected p-value; p-values were multiplied by 6.
      Table 3Dichotomous analysis of DLA, clinical reports and Readers 1–5, reference standard based on presence of pathologically proven clinically significant prostate cancer.
      Sensitivity, %Corrected P value
      Bonferroni corrected p-value; p-values were multiplied by 6.
      Specificity, %Corrected P value
      Bonferroni corrected p-value; p-values were multiplied by 6.
      Accuracy, %PPV, %NPV, %
      DLA
      PI-RADS ≥ 381.4 (35/43)Reference84.6 (66/78)Reference83.5 (101/121)74.5 (35/47)89.2 (66/74)
      PI-RADS ≥ 476.7 (33/43)Reference85.9 (67/78)Reference82.6 (100/121)75.0 (33/44)87.0 (67/77)
      Reader group 1
      PI-RADS ≥ 376.7 (33/43)>0.9943.6 (34/78)<0.00155.4 (67/121)42.9 (33/77)77.3 (34/44)
      PI-RADS ≥ 462.8 (27/43)0.42067.9 (53/78)0.06666.1 (80/121)51.9 (27/52)76.8 (53/69)
      Reader group 2
      PI-RADS ≥ 383.7 (36/43)>0.9950.0 (39/78)<0.00162.0 (75/121)48.0 (36/75)84.8 (39/46)
      PI-RADS ≥ 479.1 (34/43)>0.9965.4 (50/77)0.03671.2 (84/118)55.7 (34/61)87.7 (50/57)
      Reader 3
      PI-RADS ≥ 388.4 (38/43)>0.9930.8 (24/78)<0.00151.2 (62/121)41.3 (38/92)82.8 (24/29)
      PI-RADS ≥ 486.0 (37/43)>0.9950.0 (39/78)<0.00162.8 (76/121)48.7 (37/76)86.7 (39/45)
      Reader 4
      PI-RADS ≥ 395.3 (41/43)0.42057.7 (45/78)<0.00171.1 (86/121)55.4 (41/74)95.7 (45/47)
      PI-RADS ≥ 490.7 (39/43)0.42074.4 (58/78)0.46880.2 (97/121)66.1 (39/59)92.5 (58/62)
      Reader 5
      PI-RADS ≥ 3100.0 (43/43)0.04856.4 (44/78)<0.00171.9 (87/121)55.8 (43/77)100.0 (44/44)
      PI-RADS ≥ 493.0 (40/43)0.23478.2 (61/78)>0.9983.5 (101/121)70.2 (40/57)95.3 (61/64)
      Clinical reports
      PI-RADS ≥ 388.4 (38/43)>0.9924.4 (19/78)<0.00147.1 (57/121)38.4 (38/99)86.4 (19/22)
      PI-RADS ≥ 483.7 (36/43)>0.9937.2 (29/78)<0.00153.7 (65/121)41.4 (36/87)85.3 (29/34)
      DLA, deep learning-based algorithm; PI-RADS, prostate imaging-reporting and data system; PPV, positive predictive value; NPV, negative predictive value.
      * Bonferroni corrected p-value; p-values were multiplied by 6.
      Figure thumbnail gr4
      Fig. 4True positive lesion detected by the readers and the DLA. MRI of the prostate gland was performed in a 71-year old male with elevated PSA (19.9 ng/mL). Axial T2-weighted image shows ill-defined low signal intensity mass (area with yellow dotted line) in both peripheral zone at the prostate base (a). The mass shows high signal intensity on diffusion-weighted image (b = 1000 sec/mm2) (b) and low value on ADC map (c). DLA detected the same lesion and assigned PI-RADS category 5. DLA shows the abnormal area by using a suspicion map and presents the lesion as pink area on T2-weighted image (d). Readers 2 and 3 missed this lesion and Readers 1, 4 and 5 detected the lesion. In clinical report, PI-RADS score was 1. Clinically significant cancer (Gleason score 7 [4 + 3]) was confirmed by biopsy. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
      Figure thumbnail gr5
      Fig. 5False negative result of DLA MRI of the prostate gland was performed in an 80-year old male with elevated PSA (15.0 ng/mL). Axial T2-weighted image shows focal homogeneous low signal intensity lesion (area with white dotted line) in anterior aspect of transition zone (a). The mass shows high signal intensity on diffusion-weighted image (b = 1000 sec/mm2) (b) and low value on ADC map (c). DLA could not detect the lesion, but all readers detected the same lesion as PI-RADS ≥ 4. In clinical report, the lesion was also classified to PI-RADS 4. Clinically significant cancer (Gleason score 7 [3 + 4]) was confirmed by biopsy.

      4. Discussion

      The PI-RADS assignments and performance in diagnosing prostate cancer varied depending on radiologists’ experience in this study. The DLA showed moderate diagnostic performance on a level between that of residents and an expert for detecting and classifying PI-RADS. The diagnostic performance of the residents and less-experienced radiologists was not significantly better than that of the DLA. Moreover, the performance of DLA was also similar to that of clinical reports for diagnosing CSCs. Only the expert had significantly superior diagnostic performance to the DLA based on ROC curve analysis.
      The most important strength of the DLA in this study was its higher specificity than those of the radiologists and clinical reports while maintaining sensitivity. For both all prostate cancers and CSCs, the DLA showed significantly higher specificity than all readers for a PI-RADS cutoff value ≥ 3; the same was observed for a PI-RADS cutoff value ≥ 4 but without statistical significance. High specificity was a characteristic of the DLA in this study, not a general characteristic of other algorithms; the specificities of U-Net ranged from 24% to 55% in a previous study [
      • Schelb P.
      • Kohl S.
      • Radtke J.P.
      • Wiesenfarth M.
      • Kickingereder P.
      • Bickelhaupt S.
      • Kuder T.A.
      • Stenzinger A.
      • Hohenfellner M.
      • Schlemmer H.P.
      • Maier-Hein K.H.
      • Bonekamp D.
      Classification of Cancer at Prostate MRI: Deep Learning versus Clinical PI-RADS Assessment.
      ,
      • Schelb P.
      • Wang X.
      • Radtke J.P.
      • Wiesenfarth M.
      • Kickingereder P.
      • Stenzinger A.
      • Hohenfellner M.
      • Schlemmer H.P.
      • Maier-Hein K.H.
      • Bonekamp D.
      Simulated clinical deployment of fully automatic deep learning for clinical prostate MRI assessment.
      ]. Using a novel false positive reduction network in the pipeline, DLA succeeded in increasing specificity. Moreover, the DLA showed less reduction in specificity than the radiologists when the cutoff value was reduced from 4 to 3. This result may be caused by the difference in proportion of PI-RADS 3 scores between the DLA and radiologists; the proportion of PI-RADS 3 scores from the DLA was only 0.8%, but that for all readers and clinical reports ranged from 9.9% to 24.0%. Given the ambiguous meaning of PI-RADS 3, reducing the number of PI-RADS 3 scores may help reduce the number of unnecessary biopsies in clinical practice [
      • de Rooij M.
      • Israel B.
      • Tummers M.
      • Ahmed H.U.
      • Barrett T.
      • Giganti F.
      • Hamm B.
      • Logager V.
      • Padhani A.
      • Panebianco V.
      • Puech P.
      • Richenberg J.
      • Rouviere O.
      • Salomon G.
      • Schoots I.
      • Veltman J.
      • Villeirs G.
      • Walz J.
      • Barentsz J.O.
      ESUR/ESUI consensus statements on multi-parametric MRI for the detection of clinically significant prostate cancer: quality requirements for image acquisition, interpretation and radiologists' training.
      ]. The sensitivity and specificity of the DLA (PI-RADS cutoff value ≥ 4) were 76.7% and 85.9%, respectively, which were comparable to those of the expert and those obtained in a previous meta-analysis, which showed 79% pooled sensitivity and 88% pooled specificity for bpMRI [
      • Kang Z.
      • Min X.
      • Weinreb J.
      • Li Q.
      • Feng Z.
      • Wang L.
      Abbreviated Biparametric Versus Standard Multiparametric MRI for Diagnosis of Prostate Cancer: A Systematic Review and Meta-Analysis.
      ]. No significant difference in sensitivity was noted between DLA and clinical reports for a PI-RADS cutoff value of either ≥ 3 or 4, and the AUROC of DLA (0.828) was similar to that of clinical reports (AUROC, 0.730; p = 0.060).
      When using DLA, PI-RADS 1 was assigned in 60.3% of all cases, and PI-RADS 2 was not assigned. According to DLA’s pipeline, DLA detects abnormal lesions using a localization net that computes PI-RADS 1 and 2 vs. PI-RADS ≥ 3 and then assigns PI-RADS scores from 3 to 5. Therefore, the distribution of PI-RADS scores 1 and 2 using DLA was different from that of radiologists. Despite this distribution, the inter-reader agreement between the DLA and expert was moderate, not inferior to that of other radiologists except for Reader 4. In addition, discrimination between PI-RADS scores 1 and 2 is not important to diagnose prostate cancer in clinical practice.
      Deep learning-based AI algorithms have shown valuable performance in differentiating prostate cancer from normal tissues [
      • Schelb P.
      • Kohl S.
      • Radtke J.P.
      • Wiesenfarth M.
      • Kickingereder P.
      • Bickelhaupt S.
      • Kuder T.A.
      • Stenzinger A.
      • Hohenfellner M.
      • Schlemmer H.P.
      • Maier-Hein K.H.
      • Bonekamp D.
      Classification of Cancer at Prostate MRI: Deep Learning versus Clinical PI-RADS Assessment.
      ,
      • Yoo S.
      • Gujrathi I.
      • Haider M.A.
      • Khalvati F.
      Prostate Cancer Detection using Deep Convolutional Neural Networks.
      ,
      • Song Y.
      • Zhang Y.D.
      • Yan X.
      • Liu H.
      • Zhou M.
      • Hu B.
      • Yang G.
      Computer-aided diagnosis of prostate cancer using a deep convolutional neural network from multiparametric MRI.
      ,
      • Sumathipala Y.
      • Lay N.
      • Turkbey B.
      • Smith C.
      • Choyke P.L.
      • Summers R.M.
      Prostate cancer detection from multi-institution multiparametric MRIs using deep convolutional neural networks.
      ,
      • Ishioka J.
      • Matsuoka Y.
      • Uehara S.
      • Yasuda Y.
      • Kijima T.
      • Yoshida S.
      • Yokoyama M.
      • Saito K.
      • Kihara K.
      • Numao N.
      • Kimura T.
      • Kudo K.
      • Kumazawa I.
      • Fujii Y.
      Computer-aided diagnosis of prostate cancer on magnetic resonance imaging using a convolutional neural network algorithm.
      ]. Few studies have shown good performance of the machine probability score to detect and classify a lesion compared to radiologist PI-RADS classification [
      • Schelb P.
      • Kohl S.
      • Radtke J.P.
      • Wiesenfarth M.
      • Kickingereder P.
      • Bickelhaupt S.
      • Kuder T.A.
      • Stenzinger A.
      • Hohenfellner M.
      • Schlemmer H.P.
      • Maier-Hein K.H.
      • Bonekamp D.
      Classification of Cancer at Prostate MRI: Deep Learning versus Clinical PI-RADS Assessment.
      ,
      • Schelb P.
      • Wang X.
      • Radtke J.P.
      • Wiesenfarth M.
      • Kickingereder P.
      • Stenzinger A.
      • Hohenfellner M.
      • Schlemmer H.P.
      • Maier-Hein K.H.
      • Bonekamp D.
      Simulated clinical deployment of fully automatic deep learning for clinical prostate MRI assessment.
      ]. In previous studies, algorithm performance was compared with clinical routine interpretations from 8 or 9 radiologists, but the radiologist reviews were not analyzed individually. In our study, more than five radiologists with various levels of experience independently reviewed prostate MRIs. As such, we could compare the individual diagnostic performance between radiologists with various levels of experience and the DLA. We also analyzed clinical reports from routine interpretation and compared performance between DLA and a mixture of board-certified radiologists.
      The inter-reader agreements of PI-RADS scores between the radiologists and the expert varied according to experience. The DLA and clinical report showed moderate agreement with the expert, and the DLA resulted in slightly higher inter-reader agreement than the clinical reports. Previous studies indicated that radiologist inter-reader agreement for PI-RADS category assignment varied from poor to good [
      • Greer M.D.
      • Shih J.H.
      • Lay N.
      • Barrett T.
      • Bittencourt L.
      • Borofsky S.
      • Kabakus I.
      • Law Y.M.
      • Marko J.
      • Shebel H.
      • Merino M.J.
      • Wood B.J.
      • Pinto P.A.
      • Summers R.M.
      • Choyke P.L.
      • Turkbey B.
      Interreader Variability of Prostate Imaging Reporting and Data System Version 2 in Detecting and Assessing Prostate Cancer Lesions at Prostate MRI, AJR.
      ,
      • Muller B.G.
      • Shih J.H.
      • Sankineni S.
      • Marko J.
      • Rais-Bahrami S.
      • George A.K.
      • de la Rosette J.J.
      • Merino M.J.
      • Wood B.J.
      • Pinto P.
      • Choyke P.L.
      • Turkbey B.
      Prostate Cancer: Interobserver Agreement and Accuracy with the Revised Prostate Imaging Reporting and Data System at Multiparametric MR Imaging.
      ,
      • Choi M.H.
      • Kim C.K.
      • Lee Y.J.
      • Jung S.E.
      Prebiopsy Biparametric MRI for Clinically Significant Prostate Cancer Detection With PI-RADS Version 2: A Multicenter Study.
      ,
      • Smith C.P.
      • Harmon S.A.
      • Barrett T.
      • Bittencourt L.K.
      • Law Y.M.
      • Shebel H.
      • An J.Y.
      • Czarniecki M.
      • Mehralivand S.
      • Coskun M.
      • Wood B.J.
      • Pinto P.A.
      • Shih J.H.
      • Choyke P.L.
      • Turkbey B.
      Intra- and interreader reproducibility of PI-RADSv2: A multireader study.
      ,
      • Rosenkrantz A.B.
      • Ginocchio L.A.
      • Cornfeld D.
      • Froemming A.T.
      • Gupta R.T.
      • Turkbey B.
      • Westphalen A.C.
      • Babb J.S.
      • Margolis D.J.
      Interobserver Reproducibility of the PI-RADS Version 2 Lexicon: A Multicenter Study of Six Experienced Prostate Radiologists.
      ], with more experienced radiologists showing greater inter-reader agreement. Therefore, the moderate agreement between DLA and the expert radiologist seems promising and DLA-based PI-RADS categorization may help reduce inter-reader variability in clinical practice.
      The performance of PI-RADS in 26 centers with members in the Society of Abdominal Radiology Prostate Cancer Disease-focused Panel varied widely across the center in a previous study [
      • Westphalen A.C.
      • McCulloch C.E.
      • Anaokar J.M.
      • Arora S.
      • Barashi N.S.
      • Barentsz J.O.
      • Bathala T.K.
      • Bittencourt L.K.
      • Booker M.T.
      • Braxton V.G.
      • Carroll P.R.
      • Casalino D.D.
      • Chang S.D.
      • Coakley F.V.
      • Dhatt R.
      • Eberhardt S.C.
      • Foster B.R.
      • Froemming A.T.
      • Fütterer J.J.
      • Ganeshan D.M.
      • Gertner M.R.
      • Gettle L.M.
      • Ghai S.
      • Gupta R.T.
      • Hahn M.E.
      • Houshyar R.
      • Kim C.
      • Kim C.K.
      • Lall C.
      • Margolis D.J.A.
      • McRae S.E.
      • Oto A.
      • Parsons R.B.
      • Patel N.U.
      • Pinto P.A.
      • Polascik T.J.
      • Spilseth B.
      • Starcevich J.B.
      • Tammisetti V.S.
      • Taneja S.S.
      • Turkbey B.
      • Verma S.
      • Ward J.F.
      • Warlick C.A.
      • Weinberger A.R.
      • Yu J.
      • Zagoria R.J.
      • Rosenkrantz A.B.
      ]. Even radiologists with a high level of experience in prostate MRI showed variable results. Therefore, greater dedication to training and developing a quality assurance program is necessary in prostate MRI [
      • Westphalen A.C.
      • McCulloch C.E.
      • Anaokar J.M.
      • Arora S.
      • Barashi N.S.
      • Barentsz J.O.
      • Bathala T.K.
      • Bittencourt L.K.
      • Booker M.T.
      • Braxton V.G.
      • Carroll P.R.
      • Casalino D.D.
      • Chang S.D.
      • Coakley F.V.
      • Dhatt R.
      • Eberhardt S.C.
      • Foster B.R.
      • Froemming A.T.
      • Fütterer J.J.
      • Ganeshan D.M.
      • Gertner M.R.
      • Gettle L.M.
      • Ghai S.
      • Gupta R.T.
      • Hahn M.E.
      • Houshyar R.
      • Kim C.
      • Kim C.K.
      • Lall C.
      • Margolis D.J.A.
      • McRae S.E.
      • Oto A.
      • Parsons R.B.
      • Patel N.U.
      • Pinto P.A.
      • Polascik T.J.
      • Spilseth B.
      • Starcevich J.B.
      • Tammisetti V.S.
      • Taneja S.S.
      • Turkbey B.
      • Verma S.
      • Ward J.F.
      • Warlick C.A.
      • Weinberger A.R.
      • Yu J.
      • Zagoria R.J.
      • Rosenkrantz A.B.
      ]. In our institution, abdominal/genitourinary radiologists interpreted prostate MRI as a routine clinical process. From this point of view, even though the performance of DLA was not superior to that of the expert’s retrospective review, it makes sense that the performance of DLA was superior to that of clinical reports made by multiple board-certified radiologists, not all of whom were experts in prostate imaging. In addition, DLA showed similar sensitivity and higher specificity than radiologists with various levels of experience did in retrospective review. From these results, we believe that DLA can assist radiologists as a second reader to reduce variability in PI-RADS assessment. In previous study, achieving consensus of deep convolutional neural network (DCNN) and PI-RADS score by radiologist showed better diagnostic performance than those of DCNN and PI-RADS score alone [
      • Song Y.
      • Zhang Y.D.
      • Yan X.
      • Liu H.
      • Zhou M.
      • Hu B.
      • Yang G.
      Computer-aided diagnosis of prostate cancer using a deep convolutional neural network from multiparametric MRI.
      ]. Further research on the value of DLA as a decision support tool is needed.

      5. Limitations

      Our study has several limitations. First, the size of the study group was small and the number of readers was small. DLA is prototype software that can analyze images from a single-vendor (Siemens Healthcare) 3 T MRI during the study period, so only a small number of 121 MRIs were enrolled. Given that the number of readers was relatively small, comparing DLA performance might have generalization issues. Second, selection bias can exist because patients who underwent biopsy were not representative of patients with clinical suspicion of prostate cancer in the real world. However, the percentage of assigned PI-RADS 1 or 2 scores by Reader 5 in the study population was 36.0%, which was similar to the percentage of 33% in a previous prospective study [
      • Padhani A.R.
      • Barentsz J.
      • Villeirs G.
      • Rosenkrantz A.B.
      • Margolis D.J.
      • Turkbey B.
      • Thoeny H.C.
      • Cornud F.
      • Haider M.A.
      • Macura K.J.
      • Tempany C.M.
      • Verma S.
      • Weinreb J.C.
      PI-RADS Steering Committee: The PI-RADS Multiparametric MRI and MRI-directed Biopsy Pathway.
      ]. This study included patients in the period at which the usefulness of prebiopsy MRI was being determined. Therefore, many patients underwent prostate biopsy even with negative prebiopsy MRI results. Third, the reference standard was mainly based on biopsy results. We tried our best to minimize this limitation by undergoing targeted biopsy and considering all systematic biopsy results. In addition, including only radical prostatectomy patients induces bias by excluding many patients who underwent active surveillance, systemic therapy, or focal therapy. Fourth, Reader groups 1 and 2 each consisted of four residents with similar experience in prostate imaging. Although this analysis cannot assess each resident’s diagnostic performance, we believe that the results reflect the overall diagnostic performance of residents with similar levels of experience. Fifth, our results did not strictly follow PI-RADS v2, because we used bpMRI but not mpMRI. Many previous studies have shown comparable diagnostic performance between bpMRI and mpMRI [
      • Kang Z.
      • Min X.
      • Weinreb J.
      • Li Q.
      • Feng Z.
      • Wang L.
      Abbreviated Biparametric Versus Standard Multiparametric MRI for Diagnosis of Prostate Cancer: A Systematic Review and Meta-Analysis.
      ,
      • Choi M.H.
      • Kim C.K.
      • Lee Y.J.
      • Jung S.E.
      Prebiopsy Biparametric MRI for Clinically Significant Prostate Cancer Detection With PI-RADS Version 2: A Multicenter Study.
      ,
      • Woo S.
      • Suh C.H.
      • Kim S.Y.
      • Cho J.Y.
      • Kim S.H.
      • Moon M.H.
      Head-to-Head Comparison Between Biparametric and Multiparametric MRI for the Diagnosis of Prostate Cancer: A Systematic Review and Meta-Analysis.
      ,
      • Junker D.
      • Steinkohl F.
      • Fritz V.
      • Bektic J.
      • Tokas T.
      • Aigner F.
      • Herrmann T.R.W.
      • Rieger M.
      • Nagele U.
      Comparison of multiparametric and biparametric MRI of the prostate: are gadolinium-based contrast agents needed for routine examinations?.
      ,
      • Niu X.K.
      • Chen X.H.
      • Chen Z.F.
      • Chen L.
      • Li J.
      • Peng T.
      Diagnostic Performance of Biparametric MRI for Detection of Prostate Cancer: A Systematic Review and Meta-Analysis.
      ,
      • Alabousi M.
      • Salameh J.P.
      • Gusenbauer K.
      • Samoilov L.
      • Jafri A.
      • Yu H.
      • Alabousi A.
      Biparametric vs multiparametric prostate magnetic resonance imaging for the detection of prostate cancer in treatment-naive patients: a diagnostic test accuracy systematic review and meta-analysis.
      ]. Sixth, the evaluation of MRIs taken in one of the institutions that had provided cases (100/2,170 cases) for developing DLA could overestimate the performance of DLA in this study. Seventh, lastly, PI-RADS v2 recommends high b-values (≥ 1400 sec/mm2). In our study, DWI with highest b-value of 1000 sec/mm2 were used for review by radiologists and analysis by DLA. This could potentially reduce overall diagnostic performance for both readers and DLA. However, this was because this study included patients who underwent prostate MRI before releasing PIRADS v2, therefore high b-values ≥ 1400 sec/mm2 were not routinely used as part of scan protocol.

      6. Conclusion

      This study provides the first comparison between DLA and radiologists with various levels of experience in PI-RADS classification. The DLA showed moderate diagnostic performance on a level between those of residents and an expert for detecting and classifying according to PI-RADS. The performance of the DLA was similar to that of clinical reports from various radiologists in clinical practice.
      Funding
      This work was supported by the National Research Foundation of Korea (NRF) under Grant (2018R1D1A1B07050160).

      CRediT authorship contribution statement

      Seo Yeon Youn: Methodology, Formal analysis, Writing - review & editing. Moon Hyung Choi: Conceptualization, Methodology, Writing - review & editing, Supervision. Dong Hwan Kim: Investigation, Formal analysis. Young Joon Lee: Investigation, Methodology. Henkjan Huisman: Investigation. Evan Johnson: Investigation. Tobias Penzkofer: Investigation. Ivan Shabunin: Investigation. David Jean Winkel: Investigation. Pengyi Xing: Investigation. Dieter Szolar: Investigation. Robert Grimm: Data curation, Software. Heinrich von Busch: Software. Yohan Son: Software, Resources. Bin Lou: Software. Ali Kamen: Software.

      Declaration of Competing Interest

      Robert Grimm, Heinrich von Busch, Yohan Son, Bin Lou, and Ali Kamen are employees of Siemens Healthineers or Siemens Healthcare. The other authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

      Appendix A. Supplementary data

      The following are the Supplementary data to this article:

      References

        • Kasivisvanathan V.
        • Emberton M.
        • Moore C.M.
        MRI-Targeted Biopsy for Prostate-Cancer Diagnosis.
        N. Engl. J. Med. 2018; 379: 589-590
        • Ahmed H.U.
        • El-Shater Bosaily A.
        • Brown L.C.
        • Gabe R.
        • Kaplan R.
        • Parmar M.K.
        • Collaco-Moraes Y.
        • Ward K.
        • Hindley R.G.
        • Freeman A.
        • Kirkham A.P.
        • Oldroyd R.
        • Parker C.
        • Emberton M.
        Diagnostic accuracy of multi-parametric MRI and TRUS biopsy in prostate cancer (PROMIS): a paired validating confirmatory study.
        The Lancet. 2017; 389: 815-822
        • Drost F.J.H.
        • Osses D.
        • Nieboer D.
        • Bangma C.H.
        • Steyerberg E.W.
        • Roobol M.J.
        • Schoots I.G.
        Prostate Magnetic Resonance Imaging, with or Without Magnetic Resonance Imaging-targeted Biopsy, and Systematic Biopsy for Detecting Prostate Cancer: A Cochrane Systematic Review and Meta-analysis.
        Eur. Urol. 2020; 77: 78-94
        • Weinreb J.C.
        • Barentsz J.O.
        • Choyke P.L.
        • Cornud F.
        • Haider M.A.
        • Macura K.J.
        • Margolis D.
        • Schnall M.D.
        • Shtern F.
        • Tempany C.M.
        • Thoeny H.C.
        • Verma S.
        PI-RADS Prostate Imaging - Reporting and Data System: 2015, Version 2.
        Eur. Urol. 2016; 69: 16-40
        • Sonn G.A.
        • Fan R.E.
        • Ghanouni P.
        • Wang N.N.
        • Brooks J.D.
        • Loening A.M.
        • Daniel B.L.
        • To'o K.J.
        • Thong A.E.
        • Leppert J.T.
        Prostate Magnetic Resonance Imaging Interpretation Varies Substantially Across Radiologists.
        European urology focus. 2019; 5: 592-599
        • Padhani A.R.
        • Turkbey B.
        Detecting Prostate Cancer with Deep Learning for MRI: A Small Step Forward.
        Radiology. 2019; 293: 618-619
        • Schelb P.
        • Kohl S.
        • Radtke J.P.
        • Wiesenfarth M.
        • Kickingereder P.
        • Bickelhaupt S.
        • Kuder T.A.
        • Stenzinger A.
        • Hohenfellner M.
        • Schlemmer H.P.
        • Maier-Hein K.H.
        • Bonekamp D.
        Classification of Cancer at Prostate MRI: Deep Learning versus Clinical PI-RADS Assessment.
        Radiology. 2019; 293: 607-617
        • Yoo S.
        • Gujrathi I.
        • Haider M.A.
        • Khalvati F.
        Prostate Cancer Detection using Deep Convolutional Neural Networks.
        Sci. Rep. 2019; 9: 19518
        • Song Y.
        • Zhang Y.D.
        • Yan X.
        • Liu H.
        • Zhou M.
        • Hu B.
        • Yang G.
        Computer-aided diagnosis of prostate cancer using a deep convolutional neural network from multiparametric MRI.
        J. Magn. Reson. Imaging. 2018; 48: 1570-1577
        • Sumathipala Y.
        • Lay N.
        • Turkbey B.
        • Smith C.
        • Choyke P.L.
        • Summers R.M.
        Prostate cancer detection from multi-institution multiparametric MRIs using deep convolutional neural networks.
        J Med Imaging (Bellingham). 2018; 5044507
        • Ishioka J.
        • Matsuoka Y.
        • Uehara S.
        • Yasuda Y.
        • Kijima T.
        • Yoshida S.
        • Yokoyama M.
        • Saito K.
        • Kihara K.
        • Numao N.
        • Kimura T.
        • Kudo K.
        • Kumazawa I.
        • Fujii Y.
        Computer-aided diagnosis of prostate cancer on magnetic resonance imaging using a convolutional neural network algorithm.
        BJU Int. 2018; 122: 411-417
        • Sanford T.
        • Harmon S.A.
        • Turkbey E.B.
        • Kesani D.
        • Tuncer S.
        • Madariaga M.
        • Yang C.
        • Sackett J.
        • Mehralivand S.
        • Yan P.
        • Xu S.
        • Wood B.J.
        • Merino M.J.
        • Pinto P.A.
        • Choyke P.L.
        • Turkbey B.
        Deep-Learning-Based Artificial Intelligence for PI-RADS Classification to Assist Multiparametric Prostate MRI Interpretation: A Development Study.
        J. Magn. Reson. Imaging. 2020;
        • Schelb P.
        • Wang X.
        • Radtke J.P.
        • Wiesenfarth M.
        • Kickingereder P.
        • Stenzinger A.
        • Hohenfellner M.
        • Schlemmer H.P.
        • Maier-Hein K.H.
        • Bonekamp D.
        Simulated clinical deployment of fully automatic deep learning for clinical prostate MRI assessment.
        Eur. Radiol. 2020;
      1. X. Yu, B. Lou, B. Shi, D. Winkel, N. Arrahmane, M. Diallo, T. Meng, H.v. Busch, R. Grimm, B. Kiefer, D. Comaniciu, A. Kamen, H. Huisman, A. Rosenkrantz, T. Penzkofer, I. Shabunin, M.H. Choi, Q. Yang, D. Szolar, False Positive Reduction Using Multiscale Contextual Features for Prostate Cancer Detection in Multi-Parametric MRI Scans, 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), 2020, pp. 1355-1359.

        • de Rooij M.
        • Israel B.
        • Tummers M.
        • Ahmed H.U.
        • Barrett T.
        • Giganti F.
        • Hamm B.
        • Logager V.
        • Padhani A.
        • Panebianco V.
        • Puech P.
        • Richenberg J.
        • Rouviere O.
        • Salomon G.
        • Schoots I.
        • Veltman J.
        • Villeirs G.
        • Walz J.
        • Barentsz J.O.
        ESUR/ESUI consensus statements on multi-parametric MRI for the detection of clinically significant prostate cancer: quality requirements for image acquisition, interpretation and radiologists' training.
        Eur. Radiol. 2020;
        • Kang Z.
        • Min X.
        • Weinreb J.
        • Li Q.
        • Feng Z.
        • Wang L.
        Abbreviated Biparametric Versus Standard Multiparametric MRI for Diagnosis of Prostate Cancer: A Systematic Review and Meta-Analysis.
        AJR Am. J. Roentgenol. 2019; 212: 357-365
        • Greer M.D.
        • Shih J.H.
        • Lay N.
        • Barrett T.
        • Bittencourt L.
        • Borofsky S.
        • Kabakus I.
        • Law Y.M.
        • Marko J.
        • Shebel H.
        • Merino M.J.
        • Wood B.J.
        • Pinto P.A.
        • Summers R.M.
        • Choyke P.L.
        • Turkbey B.
        Interreader Variability of Prostate Imaging Reporting and Data System Version 2 in Detecting and Assessing Prostate Cancer Lesions at Prostate MRI, AJR.
        Am. J. Roentgenol. 2019; : 1-8
        • Muller B.G.
        • Shih J.H.
        • Sankineni S.
        • Marko J.
        • Rais-Bahrami S.
        • George A.K.
        • de la Rosette J.J.
        • Merino M.J.
        • Wood B.J.
        • Pinto P.
        • Choyke P.L.
        • Turkbey B.
        Prostate Cancer: Interobserver Agreement and Accuracy with the Revised Prostate Imaging Reporting and Data System at Multiparametric MR Imaging.
        Radiology. 2015; 277: 741-750
        • Choi M.H.
        • Kim C.K.
        • Lee Y.J.
        • Jung S.E.
        Prebiopsy Biparametric MRI for Clinically Significant Prostate Cancer Detection With PI-RADS Version 2: A Multicenter Study.
        AJR Am. J. Roentgenol. 2019; 212: 839-846
        • Smith C.P.
        • Harmon S.A.
        • Barrett T.
        • Bittencourt L.K.
        • Law Y.M.
        • Shebel H.
        • An J.Y.
        • Czarniecki M.
        • Mehralivand S.
        • Coskun M.
        • Wood B.J.
        • Pinto P.A.
        • Shih J.H.
        • Choyke P.L.
        • Turkbey B.
        Intra- and interreader reproducibility of PI-RADSv2: A multireader study.
        J. Magn. Reson. Imaging. 2019; 49: 1694-1703
        • Rosenkrantz A.B.
        • Ginocchio L.A.
        • Cornfeld D.
        • Froemming A.T.
        • Gupta R.T.
        • Turkbey B.
        • Westphalen A.C.
        • Babb J.S.
        • Margolis D.J.
        Interobserver Reproducibility of the PI-RADS Version 2 Lexicon: A Multicenter Study of Six Experienced Prostate Radiologists.
        Radiology. 2016; 280: 793-804
        • Westphalen A.C.
        • McCulloch C.E.
        • Anaokar J.M.
        • Arora S.
        • Barashi N.S.
        • Barentsz J.O.
        • Bathala T.K.
        • Bittencourt L.K.
        • Booker M.T.
        • Braxton V.G.
        • Carroll P.R.
        • Casalino D.D.
        • Chang S.D.
        • Coakley F.V.
        • Dhatt R.
        • Eberhardt S.C.
        • Foster B.R.
        • Froemming A.T.
        • Fütterer J.J.
        • Ganeshan D.M.
        • Gertner M.R.
        • Gettle L.M.
        • Ghai S.
        • Gupta R.T.
        • Hahn M.E.
        • Houshyar R.
        • Kim C.
        • Kim C.K.
        • Lall C.
        • Margolis D.J.A.
        • McRae S.E.
        • Oto A.
        • Parsons R.B.
        • Patel N.U.
        • Pinto P.A.
        • Polascik T.J.
        • Spilseth B.
        • Starcevich J.B.
        • Tammisetti V.S.
        • Taneja S.S.
        • Turkbey B.
        • Verma S.
        • Ward J.F.
        • Warlick C.A.
        • Weinberger A.R.
        • Yu J.
        • Zagoria R.J.
        • Rosenkrantz A.B.
        Variability of the Positive Predictive Value of PI-RADS for Prostate MRI across 26 Centers: Experience of the Society of Abdominal Radiology Prostate Cancer Disease-focused Panel. 2020; 296: 76-84
        • Padhani A.R.
        • Barentsz J.
        • Villeirs G.
        • Rosenkrantz A.B.
        • Margolis D.J.
        • Turkbey B.
        • Thoeny H.C.
        • Cornud F.
        • Haider M.A.
        • Macura K.J.
        • Tempany C.M.
        • Verma S.
        • Weinreb J.C.
        PI-RADS Steering Committee: The PI-RADS Multiparametric MRI and MRI-directed Biopsy Pathway.
        Radiology. 2019; 292: 464-474
        • Woo S.
        • Suh C.H.
        • Kim S.Y.
        • Cho J.Y.
        • Kim S.H.
        • Moon M.H.
        Head-to-Head Comparison Between Biparametric and Multiparametric MRI for the Diagnosis of Prostate Cancer: A Systematic Review and Meta-Analysis.
        AJR Am. J. Roentgenol. 2018; 211: W226-W241
        • Junker D.
        • Steinkohl F.
        • Fritz V.
        • Bektic J.
        • Tokas T.
        • Aigner F.
        • Herrmann T.R.W.
        • Rieger M.
        • Nagele U.
        Comparison of multiparametric and biparametric MRI of the prostate: are gadolinium-based contrast agents needed for routine examinations?.
        World J. Urol. 2019; 37: 691-699
        • Niu X.K.
        • Chen X.H.
        • Chen Z.F.
        • Chen L.
        • Li J.
        • Peng T.
        Diagnostic Performance of Biparametric MRI for Detection of Prostate Cancer: A Systematic Review and Meta-Analysis.
        AJR Am. J. Roentgenol. 2018; 211: 369-378
        • Alabousi M.
        • Salameh J.P.
        • Gusenbauer K.
        • Samoilov L.
        • Jafri A.
        • Yu H.
        • Alabousi A.
        Biparametric vs multiparametric prostate magnetic resonance imaging for the detection of prostate cancer in treatment-naive patients: a diagnostic test accuracy systematic review and meta-analysis.
        BJU Int. 2019; 124: 209-220