Repeatability of diffusion-weighted MRI of the prostate using whole lesion ADC values, skew and histogram analysis

PURPOSE
To investigate the repeatability of diffusion-weighted imaging parameter including ADC-derived histogram values in prostate cancer.


METHODS
10 patients with prostate cancer were prospectively recruited to a retest cohort. 3 T diffusion-weighted MRI of the prostate was acquired consecutively with patient getting off the scanner between studies. Prostatectomy-histopathology defined tumour regions-of-interest were outlined on ADC maps and diffusion-weighted metrics including histograms were calculated. The coefficient of reproducibility (CoR) and Bland-Altman plots were used to assess repeatability.


RESULTS
10th centile, 90th centile, and median ADC showed good repeatability with mean difference ranging from -0.005 to -0.025 × 103 mm2s-1, and CoR ranging from 0.271-0.294 × 103 mm2s-1 of scan 1 mean). Two measures of heterogeneity and simplified texture, IQR and mean local range, had only moderate repeatability. IQR had a mean difference of -0.032 × 103 mm2s-1 between scans with CoR 0.181 × 103 mm2s-1 (56% of scan 1 mean). Mean local range had a mean difference -0.008 × 103 mm2s-1 between scans (37% of scan 1 mean). Bland-Altman plots showed good repeatability for test and re-test analysis for median, percentile and mean range values. All ADC values had good reliability regardless of whether the tumour border was included in quantitative analysis. ADC histogram skew had poor repeatability, CoR 0.78 × 103 mm2s-1 (373% of scan 1 mean).


CONCLUSION
10th and 90th centile ADC demonstrated sufficient repeatability for clinical use. However, more advanced measures of heterogeneity such as histogram skew, IQR, or mean local range may be limited by their repeatability.


Introduction
Prostate cancer is the commonest cancer in men, with the incidence expected to double by 2030 mainly due to the ageing population [1,2]. The traditional work-up of prostate cancer with transrectal ultrasound (TRUS) biopsy is limited by random and systematic errors in sampling [3,4]. However, this practice is beginning to change, driven by multiparametric (mp) MRI which offers the potential to overcome many of these disadvantages [5,6].
The improved ability of mpMRI to detect lesions has mainly been due to the addition of functional sequences such as diffusion-weighted imaging (DWI), and dynamic contrast-enhanced (DCE) MRI. DWI images the diffusion of water molecules and provides information related to tumour cellularity and tissue composition. DWI-derived apparent diffusion coefficient (ADC) maps can provide a quantitative measure of the degree of restricted diffusion, with a number of studies showing that these values correlate inversely to Gleason grade [7][8][9]. Whilst potentially attractive as a surrogate marker of tumour aggressiveness, absolute ADC values can vary depending on the choice and number of b values selected and thus current guidelines caution against the use of quantitative ADC measurements [10][11][12]. Another potential source of error is the reproducibility of the test itself, which may cause particular problems in the assessment of response to treatment and determining meaningful change in patients on active surveillance.
ADC values have shown reasonable retest reproducibility in the body, with a variation of around 20% [13][14][15]. However, these evaluations have been primarily focused on an assessment of the mean or median ADC values. Similar studies have been performed in the prostate looking at the reproducibility of mean ADC values in the same imaging session [16] or within 2 weeks [17], with reproducibility variation ranging from 10-40%. ADC histogram-derived values have shown promising results in detecting and characterizing disease along with evaluating treatment response [18][19][20][21]. Histogram-derived values attempt to assess the spatial variation of ADC values and may provide additional information on tumour heterogeneity, which may be increased within tumours, rather than averaging out these differences within a region-of-interest. Indeed, the current version of the Prostate Imaging Reporting and Data System (PI-RADS 2) guidelines strongly supports the continued development of further novel MRI sequences and analysis methods such as multiple b-value assessment of fractional ADC, and measures of cellular level heterogeneity such as diffusion kurtosis imaging [10,22]. However, for either quantitative ADC metrics or histogram analysis to be used clinically, their repeatability and reliability needs to be evaluated and quantified. Therefore, the purpose of this study was to evaluate, in the setting of prostate cancer, the reliability of ADC histogram-derived parameters and simplified textural analysis for selected regions of interest.

Methods and materials
The local institutional review board and ethics committee granted approval for this prospective study, with all participants signing written informed consent. 11 patients with biopsy-proven intermediate or high risk prostate cancer underwent a dedicated research prostate MRI scan prior to treatment with radical prostatectomy. Exclusion criteria included previous treatment for prostate cancer, or clinical contraindication to MRI. One patient was excluded due to significant susceptibility artefact on DW imaging, with 10 patients completing the study.

MR imaging technique
All patients underwent 3-T MRI (Signa HDx, GE-Healthcare, WI, USA) using an 8-channel cardiac phased-array coil. The protocol included axial T1-weighted images of the pelvis and T2-weighted images in axial, sagittal and coronal planes. Multislice diffusion-weighted (DW)-MRI was performed (TE/TR = 78/4400 ms; FOV 30 × 30cm 2 ; acquisition matrix 128 × 128 (reconstruction matrix 256 × 256); parallel imaging (ASSET) factor = 2; 8 signal averages; b-values: 150,1000s/mm 2 ) using a dual-spin-echo (DSE) EPI acquisition sequence to minimise eddy-current-related distortion. Slice thickness for the axial T2-weighted sequences was 3.5 mm, with 0.5 mm gap; the axial DW imaging was matched using a 4 mm slice thickness, with 0 mm gap. Water-selective excitation was used for fat saturation. An identical DWI acquisition was performed twice with the patient getting off the scanner bed in between the scans. Slice locations were matched to the initial scan using anatomical landmarks. ADC maps were generated using in-house software (b-values-150, 1000s/mm2) programmed with Matlab (MathWorks, Natick, Mass).

Pathologic assessment and comparison with MR images
Following surgery each ex-vivo prostate was measured in three dimensions, oriented by the location of the seminal vesicles, flat posterior surface, and by the position of the urethra. The apical end and basal cone were amputated, sliced left-right into 3-4 mm thick pieces, and placed in small cassettes preserving their order. The remaining gland was sliced into 5 mm sections in the horizontal plane from inferiorsuperior. Slices from each section were annotated by an experienced specialist uro-pathologist (XXAYW).

Image analysis
Regions of interest (ROIs) were drawn on the ADC maps by two authors (EML, TB) in consensus within the same session, for each patient, with reference to T2-weighted MRI and whole-mount histopathology annotated for index lesion tumour location (Fig. 1). ROIs drawn on the first acquisition were loaded onto the matching ADC maps. The authors then reviewed all ROI transfer results and manually adjusted the ROI locations when necessary. ADC values were recorded from each pixel with the ROI and after combination of values from all relevant slices into a volume of interest. An ADC histogram was generated using in-house software programmed in Matlab (Fig. 2), with the following parameters derived: (a) median ADC; (b) 10th and 90th centiles (defined as the ADC value below which the corresponding percentage of all ADC voxel values lie); (c) interquartile range (IQR; a measure of the spread of the distribution) and (d) skewness (a measure of the asymmetry of ADC distribution).

ROI erosion
ROI erosion was performed to assess whether partial volume will affect the measured results, and may be of particular relevance when transposing an identical ROI between separately acquired studies. The morphological operation of erosion reduces the shape of an ROI by removing voxels on the image boundary. The original ROI is eroded by a 3 × 3 circle structuring neighbourhood resulting in a smaller ROI after removal of the border voxels whose neighbourhood is not entirely within the ROI (Supplemental Fig. 1). For the histogram-derived quantitative parameters an additional set of central ROIs were obtained, in Matlab, by using this automated image erosion tool to remove the border voxels.

Local neighbourhood ROI filtering
Local range filtering, a texture filtering function available for Matlab, was used to assess local variability (Supplemental Fig. 2), to produce the "mean range", a simplified texture feature. Two parameters are set: the neighbourhood to be considered, and the filter to be used. Each pixel is filtered using a 3 × 3 structuring neighbourhood, with the range filter calculating the difference between the maximum and minimum local values and the standard deviation filter calculating the standard deviation of the local neighbourhood. The generated ROI is also eroded after filtering and using the above-described method in order to avoid inclusion of voxels outside the ROI in subsequent analysis. After filtering of the entire MR image the mean value for each ROI was determined.

Statistical analysis
In order to evaluate the short-term repeatability of the quantitative DWI parameters the difference between the two baseline mean quantitative measurements (d) for the cancerous regions for each patient were determined along with the mean difference for the study cohort.
The mean squared difference (msd) was calculated as: The 95% confidence intervals (CI) for changes in the study cohort were determined using CI = ± 2.228 × msd/n The coefficient of repeatability calculates the maximum expected absolute difference that may be observed between any future measurements on 95% of occasions. The coefficient of repeatability (CoR) was determined using the equation: Mean difference and 95% CI were also calculated to analyse the reliability of quantitative values depending on inclusion of the border voxels using the full ROI values and central ROI values from scan 1. All statistical analyses were performed in Matlab. Repeatability was also assessed on a lesion-by-lesion basis by the use of a Bland-Altman analysis [23].
The results of repeatability assessment for ADC 10th centile, median, 90th centile, IQR, skew, kurtosis, and mean local range are summarized in Table 1 for the full tumour ROIs and the ADC maps and histograms for one patient are presented in Fig. 3. 10th centile, median, and 90th centile ADC were found to exhibit reasonable repeatability. They all had a small mean difference between the two scans, with the Table 1 Repeatability for ADC heterogeneity and texture parameters using the full tumour region. *Data in units, (as % of scan 1 mean, in parentheses). SD = Standard deviation; CI = Confidence interval; IQR = Interquartile range. , and the skew -0.15 for study 1 and +0.12 for study 2. The mean local range, which was not derived from the histogram, was 0.350 × 10 −3 mm 2 s −1 for study 1 and 0.248 × 10 −3 mm 2 s −1 for study 2.

Table 2
Reliability for ADC heterogeneity parameters depending on tumour region evaluated. SD = Standard deviation; CI = Confidence interval; IQR = Interquartile range. IQR of the ADC histogram had a group mean of 0.32 ± 0.115 x x 10 -3 mm 2 s -1 for scan 1 and a mean difference (95% CI) of -0.032 ( ± 0.065) x x 10 -3 mm 2 s -1 between the two consecutive scans. The coefficient of repeatability was 0.181 × 10 -3 mm 2 s -1 , which is 56% of the scan 1 mean. The mean local range of ADC had a group mean of 0.302 ± 0.086 × 10 -3 mm 2 s -1 for scan 1 and a mean difference (95% CI) of -0.008 ( ± 0.04 × 10 -3 mm 2 s -1 between the two consecutive scans. The coefficient of repeatability was 0.112 × 10 -3 mm 2 s -1 , which is 37% of the scan 1 mean. When the values obtained from scan 1 for the full tumour ROIs and the central tumour ROIs are compared only a minimal difference is found ( Table 2). 10th centile ADC had a mean difference of +0.019 × 10 -3 mm 2 s -1 when the border was excluded while median, 90th centile and IQR had mean differences of −0.04 × 10 -3 mm 2 s -1 , -0.084 × 10 -3 mm 2 s -1 , and -0.068 × 10 -3 mm 2 s -1 , respectively. The mean difference for ADC skew between the full and central ROIs was very small at -0.0001, but it had a large group 95% CI of 0.18 due to the large standard deviation of the values.
Bland-Altman plots showed good repeatability for test and re-test analysis for median, percentile and mean range values (Fig. 4) and in comparing eroded to full ROIs for all variables (Fig. 5), with the majority of points lying within the two error bars (1.96 standard deviations).

Discussion
Diffusion-weighted MRI has been proposed as a non-invasive tool for prostate cancer characterization and treatment response assessment [24], with some researchers proposing that the adoption of quantitative ADC measurements can help improve the PI-RADS scoring system [25,26]. However, for a quantitative parameter to be used as a reliable measurement tool it must demonstrate adequate repeatability. ADC   Fig. 5. Bland-Altman difference plots comparing the distribution of erode regions-of-interest (ROIs) to full ROIs for the various measured parameters. The difference between the test and re-test study on the y-axis and the mean parameter value on the x-axis.
T. Barrett et al. European Journal of Radiology 110 (2019) [22][23][24][25][26][27][28][29] mean and median values have shown good repeatability in previous studies [14,[27][28][29], however, the reliability of ADC histogram-derived values along with texture analysis of ADC maps is largely untested. In this study we quantified the repeatability of different ADC-derived metrics, including histogram analyses in tumour regions-of-interest, demonstrating reasonable short-term repeatability for median, 10 th and 90 th centile ADC values. In our retest cohort the coefficient of repeatability was 0.294 × 10 3 mm 2 s -1 and 0.291 × 10 3 mm 2 s -1 for the 10 th and 90 th centile respectively, which is the maximum absolute difference that would be expected between any two future measurements on 95% of occasions. A study of ADC centile values repeatability in 18 patients with colorectal liver metastases reported a coefficient of repeatability of 0.260 × 10 3 mm 2 s -1 for 10 th centile and 0.280 × 10 3 mm 2 s -1 for 90 th centile ADC [30]. Histogram analyses have shown promise for the characterisation of prostate tumours, with the 10 th centile demonstrating better correlation to Gleason grade than mean and median ADC values [9] and 90 th centile K app values of diffusion kurtosis imaging outperforming other diffusion-weighted imaging metrics for differentiating lower from higher grader tumours [31]. Although no studies have used histogram analysis for the assessment of prostate cancer treatment response, Kyriazi et al demonstrated the 25 th ADC centile to be the best predictor of chemotherapy response in patients with metastatic ovarian and primary peritoneal cancer [20]. Despite this later study not quantifying histogram repeatability, they showed that the coefficient of repeatability for mean ADC was 9.5%. MRI is increasingly playing a role in the follow-up of patients on active surveillance (AS) for prostate cancer [32]. The Response Evaluation Criteria in Solid Tumours (RECIST) criteria used in oncology to define progression can only be applied for lesions ≥ 10 mm [33]. However, many of the low volume low-grade tumours suitable for AS do not meet this threshold, and therefore evaluation of functional measures such as DWI might be considered as surrogate markers of progression [34,35]. In order for DWI to be implemented as a measure of response to supplement the more established size criteria of RECIST, the reproducibility error of the test needs to be quantified in order to determine what constitutes meaningful change outside the range of normal variation. In patients on AS, Morgan et al showed that a whole gland reduction of > 10% was associated with disease progression, however, non-progressors additionally showed a less marked decrease in ADC [35]. These differences are within the repeatability error demonstrated here and within other studies. Although it should be noted that the error we report is for small tumours rather than whole gland where the difference might be much smaller, this is the more typical clinical situation in active surveillance where small lesions are compared over time. This highlights the limitations of measurement reliability, where a significant difference can often be established for large patient populations, but the difference is insufficient to allow for prediction on an individual basis.
In contrast to the results for the centile values, an analysis of the repeatability of ADC heterogeneity and simplified texture analysis was more equivocal. First, we found that histogram skew, which evaluates its asymmetry, had poor repeatability. More complex forms of heterogeneity and texture analysis are available, but two basic underlying characteristics are dispersion of the voxel values present, as measured by the interquartile range, and the spatial variation of those values, measured through the mean local range. We found that the repeatability for mean local range was better than IQR, having a coefficient of repeatability of 0.112 × 10 3 mm 2 s -1 compared to 0.181 × 10 3 mm 2 s -1 for IQR. Indeed, the coefficient of repeatability for mean local range of tumour ADC was 37% expressed as a percentage of scan 1 mean compared to 373% for IQR. Our results suggest that an evaluation of ADC heterogeneity and basic texture features might provide additional benefit in the analysis of prostate tumours, however careful consideration of the repeatability and reliability of the measurements will be essential. This is important given the increasing interest in developing more advanced means of texture analysis to characterise disease at baseline and assess response in the follow-up setting [36]. Finally, we found that inclusion of the tumour border had only a minimal impact on the ADC heterogeneity values we assessed, suggesting that partial volume effect and minor errors in ROI transposition did not substantially affect the results.
Our study has several limitations. First, although prostatectomy provides more definitive histology, there is a potential selection bias in only including patients fit to undergo surgery. Second, this study had a small sample size and the evaluation in a larger study is warranted, in particular for differences relating to tumour size, origin or grade. We did not assess additional sources of error such as inter-observer differences or repeatability by testing intra-observer variation. Intra-observer repeatability has been shown to be around 10% for DWI regardless of whether 2D or 3D ROIs are used [37], and inter-observer variation brings an additional subjective element of assessing lesion presence and conspicuity, which would be better assessed from blinded clinical reads, rather than direct pathological correlation.
In summary, we found that 10 th and 90 th centile ADC had sufficient repeatability to be considered for clinical use. While more advanced heterogeneity assessment might be clinically beneficial, the use of values such as histogram IQR or mean local range could be limited by their repeatability and further evaluation is warranted.