A multi-institution study: comparison of the heating patterns of five different MR-guided deep hyperthermia systems using an anthropomorphic phantom

Abstract Introduction Within the hyperthermia community, consensus exists that clinical outcome of the treatment radiotherapy and/or chemotherapy plus hyperthermia (i.e. elevating tumor temperature to 40 − 44 °C) is related to the applied thermal dose; hence, treatment quality is crucial for the success of prospective multi-institution clinical trials. Currently, applicator quality assurance (QA) measurements are implemented independently at each institution using basic cylindrical phantoms. A multi-institution comparison of heating quality using magnetic resonance thermometry (MRT) and anatomical representative anthropomorphic phantoms provides a unique opportunity to obtain novel QA insights to facilitate multi-institution trial evaluation. Objective Perform a systematic QA procedure to compare the performance of MR-compatible hyperthermia systems in five institutions. Methods and materials Anthropomorphic phantoms, including pelvic and spinal bones, were produced. Clinically relevant power of 600 watts was applied for ∼12 min to allow for 8 sequential MR-scans. The 3D-heating distribution, steering capabilities, and presence of off-target heating were analyzed. Results The evaluated devices show comparable heating profiles for centric and eccentric targets. The differences observed in the 3D-heating profiles are the result of variations in the exact phantom positioning and applicator characteristics, whereby positioning of the phantom followed current ESHO-QA guidelines. Conclusion Anthropomorphic phantoms were used to perform QA-measurements of MR-guided hyperthermia systems operating in MR-scanners of different brands. Comparable heating profiles are shown for the five evaluated institutions. Subcentimeter differences in position substantially affected the results when evaluating the heating patterns. Integration of advanced phantoms and precise positioning in QA-guidelines should be evaluated to guarantee the best quality patient care.


Introduction
Hyperthermia (HT), the elevation of tumor temperature to a supraphysiological level in the range of 40-44 C, is an effective radiation-and chemo-sensitizer. Randomized clinical trials have reported significant improvements in the clinical outcome when HT is added to standard oncological treatment regimens of radiation and/or chemotherapy [1]. The addition of HT has proven effective in melanoma [2], soft tissue sarcoma [3], pediatric tumors [4], head and neck [5], esophageal [6], recurrent breast [7,8], bladder [9] and cervical [10,11] cancers. In these clinical trials, QA measurements were performed independently in each institution and were non-comparable. The existence of a thermal dose-effect relationship [12][13][14][15] strongly supports identical QA measurements among institutions to determine whether clinical results using these different systems can be meaningfully combined in multi-institution clinical studies [16]. We recently demonstrated that magnetic resonance thermometry (MRT) measurements, as facilitated by MR-radiofrequency (RF) hyperthermia systems, provide unprecedented 3D QA capabilities [17]. However, an anthropomorphic and human-shape representative phantom has not yet been utilized to compare results of multiple clinically active MR-compatible systems.
Magnetic resonance (MR)-guided HT (MR-HT) is considered the latest HT technology, and the BSD-2000-3D MR-compatible applicators are the only radiofrequency-based applicators that are clinically used for the treatment of deep pelvic cancers and those of the extremities. MR-compatible ultrasound-based applicators are also clinically used for the treatment of deeply seated cancers. While ultrasound technology was initially proposed to ablate (>60 C) small tumors, the introduction of both electrical and mechanical steering of the focus energy has facilitated adaptation of the systems to allow the application of hyperthermia to larger tumors. An extensive review of ultrasound technology can be found elsewhere [18]. The first radiofrequency-based 1.5 T MR-HT hybrid system (named 'Sigma Eye,' Pyrexar Medical Corp., UT, USA) was installed in 2000 at the Charit e Berlin, widely tested conducting phantom measurements, and subsequently clinically validated [19,20]. This early and extensive QA work on the MR-HT hybrid system in a single institute included the use of a novel 3D phantom, and has been published by Gellerman et al. [21,22] and Weihrauch et al. [23]. Based on this work, identical MR-HT hybrid systems were installed in the University Hospitals of D€ usseldorf, Erlangen, and T€ ubingen between 2007 and 2011 to operate within a Siemens-Symphony MR-system (Siemens Healthineers, Erlangen). A newer version of this applicator was installed in the Erasmus MC Cancer Institute in 2014 to operate inside a GE-Optima-450W MR-system (GE, Boston, MA, US). The latest generation of these applicators, the 'Universal Applicator,' was installed in the University Hospital of Munich in 2017 to operate within a Philips Ingenia MR-system (Philips, Amsterdam, The Netherlands). Each MR-HT hybrid system provides 3D-steering of the heating pattern together with noninvasive MRT for monitoring of the achieved temperature distribution. These MRT monitoring capabilities also provide an unmatched tool for 3D quality assurance (QA) [17]. Such novel QA procedures in hyperthermia treatments may provide new insights into the possible level of assurance of uniform quality of hyperthermia treatment delivery and monitoring among multiple institutions contributing to a clinical trial [16]. To make the first inventory of the performance of the five different MR-hybrid BSD2000-3D systems installed in Europe, we initiated two studies. The first study focused on performing a quantitative evaluation of the differences between the temperatures measured by MRT and thermistor probes and is reported in Curto et al. [24]. The current, i.e. second, study, focuses on the evaluation of the heating patterns generated by the different BSD-2000-3D MR-compatible applicators.
Wust et al. [25] evaluated the performance of the non-MRcompatible BSD-2000-2D Sigma-60 applicator (Pyrexar Medical Corp., UT, USA) in four European hyperthermia institutions using the lamp phantom. A frequency-dependent defocussing of heating patterns was detected in all four systems, and it was concluded that a universal procedure for quality control is required. Bruggmoser et al. [26,27] provided practical QA guidelines for the application of deep HT using non-MR-compatible systems and recommended verification of the functionality of the HT system using phantoms. Such QA guidelines have not yet been established for MR-HT systems.
Several authors have evaluated the performance of specific non-MR-compatible BSD-2000 [28,29] and BSD-2000-3D [24,30,31] applicators. Previously, the Charit e Berlin hyperthermia group reported the results of their QA testing in an MR-HT hybrid system using a cylindrical phantom and an inhomogeneous elliptical phantom containing a skeleton. MRT profiles were compared with direct temperature measurements [21] and planning calculations [22,23]. Additionally, the influence of various parameters was discussed as a local drift of the static magnetic field B0 and amplitude and phase at the antenna's feed point. The effect of anatomical body contour and shape on system performance has previously been evaluated by numerical models but experimental verification remains lacking [32,33]. In subsequent work, Wyatt et al. [34] evaluated the heating performance of the miniannular phased array applicator (MAPA) and breast applicator with homogeneous and inhomogeneous phantoms. It was found that MR-system drifts can be appropriately corrected using oil-based references.
The primary goal of a QA hyperthermia program is to establish a minimum level of quality in hyperthermia treatments [35]. QA guidelines have been introduced for regional deep hyperthermia [26,27,36]. More recent QA guidelines have been formulated by the technical committee of the European Society for Hyperthermic Oncology (ESHO) [37] for superficial [35,38] and interstitial [39] hyperthermia. Current standards of QA as issued by the ESHO are based on measuring the temperature increase patterns in simple geometric homogenous phantoms [26,27,35,36,38,39]. The design of these new QA protocols is based on the principle that a minimum level of QA should be feasible with the components available in the commercial hyperthermia equipment. In practice, this means that QA measurements are at best obtained in 2D along the main axes if infrared thermography with split phantoms or direct E-field measurement systems are used [35]. When QA is performed with the standard hyperthermia equipment, the number of temperature sensors is limited (the minimum is 5 sensors); and when using the same phantom, the time interval between experiments is long (>10 h) since the phantom needs to stabilize its temperature after each experiment. Consequently, the precision (spatial and numerical) in assessing the quality assurance of a hyperthermia system is not at the same level as that in radiotherapy. Such a precision will most likely not be necessary, although discussion on the final requirements remains ongoing. Current state-of-the-art QA procedures, such as single-point thermometry or 2D-QA tools like lamp phantoms, LEDs, e-field sensors, or infrared thermometry, cannot provide 3D-visualisation of the heating profiles in anthropomorphic phantoms. Performing sequential measurements with the matrix containing a lamp, LEDs, or e-field sensors at different positions can provide 3D-visualisation only in homogeneous phantoms. Additionally, infrared thermometry is limited to split phantoms. In summary, no 2D-QA approach is suitable for describing or explaining clinically relevant features.
To facilitate QA control in multi-institution studies, we performed systematic and rigorous QA measurements in all currently clinically used MR-HT hybrid systems (University Hospitals of D€ usseldorf, Erlangen, Munich, and T€ ubingen in Germany, and the Erasmus MC Cancer Institute, Rotterdam, The Netherlands), as shown in Figure 1. For this purpose, phantoms with a representative anthropomorphic shape were developed and new evaluation methods are presented. To evaluate the steering capabilities of the applicators, measurements were performed with the aim of centric and eccentric focusing in each of the institutions. The long-term aim of this work was to contribute to the development of QA guidelines for MR-guided hyperthermia; thus, the current manuscript should be seen from this perspective. It reports the first experiences of comparing the QA performance among the BSD-2000-3D MR-compatible applicators while following the current QA guidelines but using a 3D temperature measurement device (MR-thermometry).

Hyperthermia and magnetic resonance systems
The MR-compatible Sigma-Eye applicators have been designed and built following a custom-specific design to operate within Siemens and GE MRs. The MR-compatible Universal Applicator has been specifically designed to provide a standardized and reproducible applicator capable of operating within any 1.5 T MR-system [40]. Both hyperthermia applicators, the Sigma-Eye and the Universal Applicator, operate at 100 MHz and consist of three rings of antennas with four dipole-pairs per ring. Optimized amplitude and phase can be delivered to each of the twelve dipole-pairs to provide 3D-steering of energy. Dedicated filters are implemented to decouple the operating frequencies of the HT-(100 MHz) and MR-system ($63.5 MHz), preventing system cross-talk as a basis for simultaneous heating and imaging [21]. Both applicators contain a bolus that is positioned between the antennas and the patient. The bolus is filled with deionized water for efficient transfer of electromagnetic energy from the antennas to the patient, in addition to cooling the patient surface.

Phantom development
A mannequin with a realistic male shape and dimensions and a wall thickness of 4 mm was used as a shell for the new anthropomorphically shaped phantom. The transverse dimensions of the mannequin were 352 Â 213 mm. Artificial plastic pelvic bones, spinal bones and disks of a full-body anatomical skeleton (VM101, Vosmedisch, Amsterdam, The Netherlands) were fixed in the interior of the shell. The dielectric properties of the mannequin shell and artificial bones were a relative permittivity of 2.8 and an effective conductivity of 0 S/m at 100 MHz. Closed-tip sterile catheters were positioned inside and around the bone structure to facilitate the insertion of high-resistance thermistor probes, as shown in Figure 2. Further details of the phantom construction can be found in Curto et al. [24]. The phantom shell was filled with a mixture of sodium benzoate, agar, and deionized water (10 g sodium benzoate and 20 g agar per 1000 g deionized water). Sodium benzoate was used to adjust the conductivity and as a preservative agent. The dielectric properties of the phantom were measured using a Dielectric Assessment Kit (DAK, Speag, Zurich, Switzerland) at 21 C. The measured values were a relative permittivity of 78.6 ± 0.03 and an effective conductivity of 0.41 ± 0.0002 S/m at 100 MHz. The relative permittivity of the phantom material was larger than the average value within the human pelvis, which can potentially reduce the focus size and amplify differences as compared with actual clinical treatments or variations among the systems. These relative permittivity and conductivity values are within the range used in previous studies [17,41]. A phantom with a lower conductivity value was also proposed in the latest ESHO QA guidelines for deep hyperthermia [27]. The center of the phantom was determined and labeled on the surface. For the experiments, the center of the phantom was aligned with the center of the applicator and MR-system. Two phantoms were produced following the same procedure and using the same materials. The manual process of the production procedure may lead to a small difference between the two phantoms regarding the positioning of the bone structure within the mannequin shell. Only one measurement can be performed per phantom per day due to the need to cool down and stabilize the temperature of the phantom; therefore, two phantoms were produced. One of the phantoms was used to evaluate focus-steering for a centric target and the other phantom was used to evaluate focussteering for an eccentric target [24].

QA measurements
The QA measurements aimed to assess the performance of the five hyperthermia applicators under clinically representative conditions, with the phantom measurements and conditions being as similar as possible in all institutions. We aimed for a positioning error to be as low as possible, preferably below 10-mm uncertainty, as is the requirement in the ESHO guidelines [27]. The water flow within the water bolus was deactivated to reduce artifacts due to water motion. The only residual parameter was the room temperature, which was slightly different among the five institutions; therefore, the phantoms were stored in the MR room the day prior to performing the measurements to avoid any residual temperature gradients between the phantom and the MR room. Samples of deionized water were collected from the water tank used to fill the water bolus in each institution. These samples were stored in sterilized 250-ml containers until triplicate measurements were performed on the same day with a DAK at 21 C. The MR room and water bolus temperatures were recorded before starting each measurement.
In the Sigma-Eye applicator, the phantom was positioned in the hammock (Figure 1(A-D)). Using the Universal Applicator, the phantom was positioned on the applicator mattress ( Figure 1(E)). The positioning was verified at the center of the phantom by high-resolution MR scanning.
All the systems are maintained by a service contract with the supplier (Dr. Sennewald Medizintechnik GmbH, Munich, Germany), and verification of the central focus is usually performed with a lamp phantom. Regular (bimonthly) QA procedures include phase and amplitude calibrations and heating tests. The power amplitude and phase calibration processes of the BSD-2000-3D MR-compatible applicators have built-in checks to help ensure calibration accuracy. The treatment control processes include additional checks to ensure that the system remains stable enough for use, including continuous monitoring of the measured phase and power levels. The design of the system for phase calibration includes the calibration of both the phase detection and phase shifters; therefore, the desired phase for each channel relies on these two separate functions. Selection of the phase uses the calibration of each of the phase shifters to determine the phase setting needed for each channel to achieve the desired output phase. Generally, the accuracy of the output phase is within 10 to comply with the ESHO QA guidelines [42]. If the lag phase error of all three channels, on for example the anterior channel, was 20 more than it should be, it would cause an effective steering error of 1 cm. Canters et al. determined that a steering error of 1 cm was acceptable for clinical work, since this malpositioning will create a maximum deviation of 5% in the hotspot SAR to tumor SAR quotient (HTQ) [43]. Lee et al. evaluated the power and phase stability of the BSD-2000-3D system over a duration of one year [44], showing that the variation in power was less than 0.22 dB (5.2%) and the variation in phase measurements was 1.18 or less. Nevertheless, it is important to note that while phase errors of 10 À 20 , which may not impair a 2D-pattern in a homogeneous phantom, can cause relevant changes in patterns in a heterogeneous 3D-phantom [22,23,45].
The intention of the present study is to compare the similarity of the performance of the MR-HT systems as they are used in the clinic. During this work, the heating was applied according to the QA instructions of the local hospital. Focussteering of the applicator was assessed for a centric target at a (0, 0) cm location and for an eccentric target at a (3, 0) cm at the system console. Measurements were performed by applying equal power to each of the twelve channels of the applicator, with a total clinically relevant power of 600 W.
Spoiled gradient recalled echo (GRE) sequences with two echo times (double echo) were used as provided by the manufacturer for clinical applications. A high-resolution scan was used to support the positioning of the phantom, detect air pockets inside the bolus, and facilitate segmentation of fat-like references attached inside the applicator to compensate for B 0 drift [17]. Two MRT scans were performed as baseline without heating. Afterwards, eight sequential MRT scans were acquired with the power on, as shown in Figure  2(C). MRT was evaluated in the last scan with the power on. Table 1 shows the main parameters of the scans.

Data analysis
The proton resonance frequency shift (PRFS) method was used to calculate MR-based thermal maps [17,24,[46][47][48], which are a dataset of MR temperatures that can be quantitatively compared with direct temperature measurements (in C). MR temperatures are deduced from phase differences (corrected for B0 drift) between the baseline scan and the actual scan. The MR-based thermal maps were obtained from the PRFS signal using the DTE method as used by Dadakova et al. [49]. The DTE method corrects for phase changes due to temperature-induced changes in phantom conductivity, which would otherwise lead to an overestimation of the temperature change [17]. The Sigma Vision Advance software (Dr. Sennewald Medizintechnik GmbH, Munich, Germany), which is based in the PRFS method, was used to calculate the MR-based thermal maps. The prototype of this software package (later called Sigma Vision Advance) has previously been applied and described [21,50]. The PRFS method is the most widely used MRT method and has been validated versus temperature probes in phantoms and volunteers [21,34,51] and is used clinically [19,20,52]. The MR-based thermal maps were subsequently 3D and quantitatively evaluated using Matlab (R2016b) in the axial, sagittal, and coronal views at the end of the heating period. The MR temperature was normalized to the maximum 98 percentile at the center of the applicator (axial image) to compensate for variation in system efficiency and reduce the impact of MR noise. This value provides the highest MR temperature within the 1.3 C uncertainty value of repeatability in a cylindrical phantom [17] for a centric target. Iso-contours at 90%, 75%, and 50% of the normalized MR temperature were generated.

Results
Experimental conditions varied due to the different lengths of cables used to connect the RF amplifiers and antennas of each system, and consequently, the RF power lost in the cables differed among the systems. By normalizing, the variability in system efficiency does not have an impact on the heating performance. All scans were performed with the following parameters: 25 axial slices; 10-mm slice thickness, and an FOV of 50 Â 50 cm.
The collected water samples presented a permittivity and effective conductivity standard deviation at 100 MHz of ± 0.1 and ± 8.3 mS/m, respectively. The room and water bolus temperature prior to each measurement was 22.3 ± 1.5 C and 19.9 ± 2.1 C, respectively.
Measurements were performed successfully in all institutions and the results allowed comparison. Figure 3 shows the normalized MR-based thermal maps at the five institutions for centric targets. All maps featured comparable heating in the centric region and top part of the pelvic bones. Figure 3. Normalized MRT distribution for centric targets. Axial, sagittal, and coronal cross-sections are plotted at the center of the field of view (white lines). Isocontours at 90% (red), 75% (blue), and 50% (yellow) are shown. The center of the heating target is indicated with 'Â' in all cross-sections.
Besides these common features, the MRT maps for Institution 1 showed a centrally located focus within the pelvic bones in the axial view; a focus centrally located above the spine in the sagittal view; and two well-defined foci on both sides of the spine in the coronal view. The phantom in Institution 2 was shifted 9.8 mm in the ventral direction with respect to the average phantom position in all the experiments. The axial view indicates a centrally located focus; however, the sagittal view indicates a cranial shift in the focus. The coronal view indicates a preferential heating on the right side. Institution 3 showed a centric focus in all three planes. The system in Institution 4 exhibited centrally located and off-target heating in the dorsal area, which can be seen in both the axial and sagittal view. While the focus is centrally located in the coronal view, a slight shift toward the feet of the phantom can be seen in the sagittal view. The system in Institution 5 showed preferential heating toward the phantom's left in the axial and coronal view. Table 2 shows a quantitative comparison of the 50% iso-contours in the axial, sagittal, and coronal cross-sections. The clinical relevance of the differences in the 50% iso-contours, together with the visual assessment of the heating performance of the applicator, is described in Table 3. Figure 4 shows the MR-based temperature distributions obtained for the eccentric heating target. In all thermal maps, the core heating pattern was located within the volume enclosed by the pelvic bones, with a clear focus toward the patient's left side. For all institutions, a secondary focus Table 2. Surface enclosed by the iso-contours for the axial, sagittal, and coronal views of the centric and eccentric targets. Three types of difference are observed in the heating patterns: a shift of the primary heating focus, a change in the size of the primary focus, and the generation of a secondary focus. Scores for the quantitation of the observed differences are given according to: 3 (similar heating patterns); 2 (one type of difference between the heating patterns); 1 (two types of difference between the heating patterns); and 0 (three types of difference between the heating patterns). The observed differences are indicated in brackets.  Iso-contours at 90% (red), 75% (blue), and 50% (yellow) are shown. The center of the heating target is indicated with 'Â' in the axial and coronal cross-sections. Note that the sagittal cross-section is through the center of the field of view.
Institutions 1 and 2. Institution 4 showed preferential heating toward the eccentric left target; however, off-target heating in the posterior of the phantom and above the left pelvic bone was also observed. Lastly, the MR temperature distribution for Institution 5 showed the highest focus shift toward the left side; additionally, an off-target secondary focus was found in the area of the os-pubis.

Discussion
In this work, the performance of clinically used MR-guided HT applicators was systematically evaluated in five European institutions in Germany and The Netherlands. While independent QA measurements are performed in each institution using basic cylindrical phantoms, no comparison of the different systems has been reported to date. The aim of the present study, therefore, was to investigate the system performance regarding heating capabilities under clinical patient conditions. Hereto, new dedicated phantoms with a realistic human shape, including artificial bone structures, were developed. These phantoms were transported to the different institutions, where the same experimental setup was applied for the QA measurements performed by the same principal investigator supported by local investigators. Measurements were performed to evaluate the heating capabilities of the systems to treat tumors centrally and eccentrically located in the pelvic region. Thermal maps (Figures 3 and 4) show comparable heating distribution of all applicators. The location of the heating focus was generally well correlated with the set centric and eccentric targets. Off-target heating between the 'legs' was observed in all the institutions, which was caused by the fact that the water bolus could not fill the volume between the 'legs', in addition to potential artifacts resulting from FOV border effects. During clinical treatment, an extra water bolus Figure 5. Temporal evolution of the temperature increase determined by high-resistance thermistor probes for centric and eccentric heating targets. The positions of the probes in the phantom are indicated by numbers and colored squares in the MR image showing the cross-section of the phantom and the surrounding water bolus. The target location is indicated by a yellow circle. A comparison between MRT and thermistor probes for the different institutions can be found in Curto et al. [24].
is positioned between the legs of the patient to mitigate this effect. At Institution 4, off-target heating was observed at the dorsal part of the phantom for both the centric and eccentric target. As many interrelated factors may generate this effect, such as phase-induced errors [22,23,45], coupling between antennas, mismatching of antennas, or poor cable connections leading to an unexpected applicator performance, the origin of this secondary focus remains unexplained. For the centric as well as the eccentric target, a secondary focus adjacent to the left ala of the ileum (ala ossis ilii), near to the left iliac spine (spina iliaca ant. sup.), was found, which was more pronounced for the eccentric target. This has also been previously described by Canters et al., [53] and can lead to patient complaints regarding that area. Figure 5 shows the thermistor probe measurements for centric and eccentric targets. A linear temperature increase was observed for both target locations. For the centric target ( Figure  5(A,B)), a similar temperature increase was measured for all thermistor probes (maximum difference in temperature increase between the probes was within 1 C); however, for the eccentric target ( Figure 5(C,D)), the temperature difference between the more distant probes (green and black curves) reached 4.1 C, illustrating the steering capabilities of the system. A quantitative evaluation performed at the five institutions showed good agreement between MRT and thermistor probe measurements [24]. For all institutions, a linear relationship was found between MRT and thermistor probe measurements, with an R 2 (mean ± standard deviation) of 0.97 ± 0.03 and 0.97 ± 0.02 for centric and eccentric heating targets, respectively. The RMSE was found to be 0.52 ± 0.31 C and 0.30 ± 0.20 C, respectively. The Bland À Altman evaluation showed a mean difference of 0.46 ± 0.20 C and 0.13 ± 0.08 C, respectively.
Although MR provides the best imaging approach, variations in phantom position or tilting can have a non-negligible impact on the obtained heating pattern of one of the measurement cross-sections; therefore, careful evaluation of the measured heating patterns is needed to differentiate between applicator and positioning effects. This is especially relevant for anthropomorphic phantoms, where a small position shift may have strong effects on the visualized heating patterns due to the differences in dielectric properties of the various materials. Such effects are normally not seen or are less pronounced in homogeneous phantoms. In clinical situations, these effects are generally averaged out by constant and small patient movements. An interpretation of the heating performance of the different applicators in comparison with the applicator in Institution 1 is provided in Table 3. Additionally, it is important to note that the anthropomorphic phantom used in the present study has sharp electrical boundaries between the bones and the phantom tissue mixture, which may be less pronounced in patients. The influence of electrical boundaries on the generated heating patterns has been identified in previous publications [29]. In a recent study [17], we performed QA measurements with an ideal set up using the same applicator and a homogeneous phantom. This study showed that despite the phantom being precisely positioned and fixed in wooden stands, there was a shift in the focus by 1 cm between different measurements. Previous publications [22,23] have demonstrated that phase and amplitude deviations at the antenna feed point and inaccuracies in positioning were important sources of error. The present study strongly suggests that the impact of phantom position accuracy should be further evaluated, and future standards should be developed to provide indications regarding how to compare the performance of different applicators. Moreover, if better control of phases and amplitudes is achieved at the antenna feed points, more accurate prediction of the heating pattern is possible, and hence, a better controlled treatment is obtained.
Different applicator settings can be implemented to counteract specific applicator performance. For example, at Institution 4, the power on the top and bottom antennas is decreased by 20À30% under clinical operation to reduce the heating on the top and lower parts of the patient. This compensation practice is guided by local QA testing and was purposely not used in the present study, in which the same power was applied to all antennas. Certain phase offsets can be predefined in the applicator to correct deviations in the obtained focus heating, and have been used for compensation in the current practice using the 3D-MR testing of gelfilled cylindrical phantoms. Verification of the effectiveness of these measures is the responsibility of the institutions and was not part of the present study.
The newly developed phantom with a realistic detailed human shape proved to be a valuable tool to periodically perform QA measurements validating the performance of the systems and to train the hyperthermia clinical team. The inclusion of bones and irregular shapes in the phantom is considered conducive to more clinically relevant heating patterns as compared with a homogeneous phantom [17]. The advantage of a more clinically relevant heating is that a realistic evaluation can be made; however, the evaluation is more dependent on the accurate positioning of the phantom. While the same phantoms and measurement settings were applied, there are a series of uncertainties inherent to this evaluation. As previously described by Gellermann et al. [22] and Weihrauch et al. [23], important sources of uncertainties in these applicators are phase errors at the feed point of the antennas. While the positioning of the phantom was within the ±10 mm guidelines for non-MR-guided hyperthermia [27], except for eccentric target in Institution 2 where the phantom was 11.7 mm from the average position, this work shows that a more accurate positioning procedure may be necessary, especially when evaluating the results with a high-resolution system such as an MR system. Despite the comparison of the results having been performed along the three main axes (axial, sagittal, and coronal), further evaluations should consider a full 3D data analysis. To facilitate a full 3D evaluation of newly acquired data, an improved and more rigid phantom positioning should be implemented. Secondly, due to the requirement of performing only one measurement per day in the same phantom (necessary cooling of the phantom after an experiment), only a limited number of measurements were possible, and therefore only one QA measurement per setup was performed in each of the five institutions. However, multiple measurements for each phantom setup would be required in order to evaluate the reproducibility of the results [17]. MRT data was collected during baseline and heating period. Further work should evaluate the collection of MRT data also during the cooling down period to show return to baseline temperature in order to demonstrate the effectiveness of the procedure to correct for field drift. Lastly, while the present study suggests that the Universal Applicator has a performance comparable with Sigma-Eye applicators, follow-up work should demonstrate whether this applicator will lead to higher QA comparability between systems.
The present study is the first to best ensure comparable performance of MR-compatible hyperthermia applicators using 3D-measurement of the MR temperature distribution in a realistic heterogeneous phantom. This work provides insight into the comparable performance of both MR-compatible hyperthermia systems for current clinical treatments, depending upon applicator-specific characteristics and precise patient positioning. The current study in representative phantoms indicates that unnoticed variation can be present in the applied hyperthermia between institutions, systems, different patients, and intra-individually between sequential treatment sessions. Hence, when compiling a 3D-visualisation obtained by multiple 2D measurements with associated sequential construction of the measurement, set-up errors noted in the present study will be strongly amplified and seriously affect the accuracy of the resulting 3D heating patterns. The value of the information made available in this manuscript lies in its ability to demonstrate that the experience obtained by historical 1D and 2D QA measurements cannot be directly translated to QA guidelines based on 3D measurements. In addition, the current study shows that if the hyperthermia society wants to progress to accurate quantitative comparison of the 3D temperature profiles obtained by MRT, a step has to be made toward precision of the positioning of the phantom in the applicator. Moreover, an in-depth discussion is needed regarding how to translate clinical experience with variations in phases at the antenna feed point and antenna cross-talk. In current hyperthermia treatment, modeling these effects is not incorporated but undoubtedly adds to the uncertainty in translating the predicted energy to the clinical situation. This manuscript aimed to initiate such discussion toward a deeper evaluation of the clinically used systems and to reinforce the necessity of specific guidelines for deep MR-HT systems. Future efforts should incorporate this new knowledge when designing new QA guidelines and recommend the highest level QA tools available.

Conclusions
This investigation reports the first international multi-institution QA evaluation of MR-guided hyperthermia systems for the treatment of deep-seated tumors using radiofrequency applicators. A novel anthropomorphic phantom was developed to evaluate the performance of the systems. Comparable heating distribution was assessed using MRT-derived thermal maps for both centric and eccentric targets. However, subcentimeter differences in positioning when performing QA measurements in heterogeneous phantoms were found to substantially affect the resulting heating patterns in the cross-sectional planes, which complicates the accurate evaluation of the 3D heating patterns, underlining the demands for positioning accuracy. The integration of phase and amplitude evaluation at the feed points, advanced phantoms, and precise positioning into current and future QA procedures should be evaluated to guarantee the best possible quality of patient care.