The Impact of data selection and fitting on SAR estimation for magnetic nanoparticle heating

Magnetic fluid heating has great potential in the fields of thermal medicine and cryopreservation. However, variations among experimental parameters, analysis methods and experimental uncertainty make quantitative comparisons of results among laboratories difficult. Herein, we focus on the impact of calculating the specific absorption rate (SAR) using Time-Rise and Box-Lucas fitting. Time-Rise assumes adiabatic conditions, which is experimentally unachievable, but can be reasonably assumed (quasi-adiabatic) only for specific and limited evaluation times when heat loss is negligible compared to measured heating rate. Box-Lucas, on the other hand, accounts for heat losses but requires longer heating. Through retrospective analysis of data obtained from two laboratories, we demonstrate measurement time is a critical parameter to consider when calculating SAR. Volumetric SAR were calculated using the two methods and compared across multiple iron-oxide nanoparticles. We observed the lowest volumetric SAR variation from both fitting methods between 1 – 10 W/mL, indicating an ideal SAR range for heating measurements. Furthermore, our analysis demonstrates that poorly chosen fitting method can generate reproducible but inaccurate SAR. We provide recommendations to select measurement time for data analysis with either Modified Time-Rise or Box-Lucas method, and suggestions to enhance experimental precision and accuracy when conducting heating experiments.


Introduction
Magnetic nanoparticle or magnetic fluid hyperthermia (MFH) is a cancer treatment indicated as an adjunct treatment for recurrent glioblastoma with radiation therapy [1,2]. It comprises direct delivery of a magnetic fluid, i.e. aqueous suspension of iron oxide nanoparticles, into the tumor that generates heat when exposed to an alternating magnetic field [1,[3][4][5][6]. The therapeutic agent is heat produced by the nanoparticles, and thus potency or heating capability of the suspension is a critical parameter for treatment planning and execution [7][8][9]. Additionally, emerging clinical applications, such as neuromodulation and cryopreservation of tissues, exploit controlled heat production with magnetic nanoparticles [10][11][12]. A direct comparison of iron oxide nanoparticle (IONP) heating capability for MFH across laboratories has proven unreliable, thus challenging quantitative comparisons among IONP formulations to determine clinical suitability, and inhibiting development of universal standardized methodologies for clinical translation [13,14].
Hysteresis loss power heat generated by magnetic nanoparticles is often measured with simple constant pressure (atmospheric) calorimetric methods, from which normalized heating power is estimated. The calorimeter often comprises a thermally insulated vessel with thermometer that is placed within the induction coil. A known volume of sample, or nanoparticle suspension having a known concentration, is placed into the vessel. Heat generated by the nanoparticles with AMF is absorbed by the suspending medium (e.g. water), causing a temperature change (ΔT) which is measured. It is typically assumed that the temperature of the sample is reasonably homogeneous, and that the temperature probe measurement accurately reflects the temperature throughout the sample volume. Using the sample specific heat (i.e. typically assumed to be the medium), estimates of the energy absorbed are then performed from analysis of changes of temperature within a specified time interval. An inherent assumption with this methodology is that the temperature changes measured in the sample occur only from heat generated by the nanoparticles in response to the AMF. Heat transfer between the sample and the environment, if unaccounted for, will inevitably generate erroneous values of loss power heating. The method of analysis, therefore must be carefully chosen to incorporate the limitations of the measurement methods and/or apparatus, and the data collected. It is often the case that estimation of heating power implicitly assumes adiabatic conditions (Eq. 1). This condition may not be strictly possible given that the induction coil can be either heat source or heat sink, depending on specific properties of its operation (e.g. current load, voltage, cooling capacity of coolant, etc.). Further, this method is limited by errors within temperature measurements, which arise from limitations of equipment resolution, experimental setup (heat exchange with surroundings, field inhomogeneities/instabilities, chemical/morphological changes in the sample over time), execution, uncertainty in iron mass determination and thermometry measurement [15][16][17][18]. Additionally, various methods are used to fit and analyse data, further contributing to reduced consistency and reliability [8,18,19]. Often method(s) used to estimate normalized dissipated power from temperature-time measurements is/are fixed without considering the underlying assumptions for the calculations, and limitations of the apparatus used to measure heat generated. For iron containing magnetic nanoparticles, specific loss power (SLP) is often reported as a function of iron mass to indicate IONP heating capability (W/g Fe) [3,14]. Assuming no heat exchange with surroundings, no work performed, no chemical or physical (change of state) changes in the sample, and constant pressure, the relationship between SLP, power (P) and the change in temperature are: dT dt (1) where c = the specific heat (J/ g.°C), m s = sample mass, m Fe = the mass of iron (g), and dT/dt is the temperature change within a defined time interval, or first derivative of the temperature as a function of time (°C/s). Volumetric SAR can be described as power output normalized to sample volume (SAR v , W/mL): where, ρ is the sample density, c Fe is the iron concentration in the sample. While SAR v and SLP can be easily converted as shown in Equation (2), there are advantages to working with SAR v alone from the point of view of calorimetry and heat transfer. First, the sample density and specific heat (i.e. water, tissue or other) are largely unchanged for low concentrations of nanoparticles and can be directly measured prior to calorimetry. This means that SAR v can be reasonably estimated based on the dT/dt even without precise knowledge of iron content or SLP as long as our calorimetric assumptions hold. This means that estimation of error in fitting SAR v , a main feature of the present work, is more directly correlated to the calorimetry itself and not compounded by possible errors from iron concentration estimation [21,22,34,35]. In addition to being a concept that can be directly related to the calorimetric measurement it also becomes an estimate of heat that can be produced by iron oxide nanoparticles in complex systems (i.e. tissues/organs) for known IONP concentrations [8,[20][21][22]. This in turn can be used in computational heat transfer modelling efforts employing the bioheat transfer equation. We report results of comparisons between fitting methods and power output from heating measurements obtained for several samples in two different laboratories. For the evaluated datasets, sample density was consistent; however, IONP concentrations among samples varied. Therefore, we use SAR v for comparison because it correlates with power for the scope of the evaluated datasets.
Using the Time-Rise method, SLP, and SAR v are calculated from the slope (dT/dt) by fitting T vs t [18,19]. The Time-Rise calculation can be used during low heat loss conditions (i.e., a well-insulated system, short experimental evaluation times, and low SAR v ).
Box-Lucas analysis assumes a non-adiabatic system where the convective heat losses can be approximated as a loss term with the difference between the sample and ambient temperature. Wildeboer et. al. provide a derivation of the Box-Lucas equation based on this assumption which results in an exponential growth dependence of temperature as a function of time [14] ( Figure 1): where, L is the heat loss from the sample, and P is the power generated by the sample.
Multiplying the amplitude (P/L) and exponential factor (L/c) from fitting the Box-Lucas equation (Eq. 4) provides P/c is shown in Eqs. 1 & 2 to be equivalent to dT/dt and can be used to calculate SLP, and SAR v . We note that radiative heat losses can be linear and important when ΔT << T.
Several alternate calculation methods have been proposed to compensate for heat losses. Specifically, measurements of the temperature steady state occurring at the end of the heating phase of the experiment or the temperature decay during the cooling phase can be used to directly calculate the heating losses of the system. (Figure 1) [14-16, 23, 24]. Calculation of heat losses allows for the determination if linear heat losses are dominant [14]. For convenience, SAR v is often calculated from the heating phase of the experiment using either Time-Rise (i.e. initial slope), or Box-Lucas fitting. Due to the different assumptions, discrepancies between estimates of SAR v obtained from Time-Rise and Box-Lucas fitting can occur [18,19,25]. The many sources of error within the measuring equipment and physical sample can affect the accuracy and precision of SAR v [14,16,[24][25][26]. The selection of Time-Rise or Box-Lucas fitting to calculate SAR v will variably affect precision and accuracy depending on the selected measurement time interval and variance within the data. Herein, we perform a direct comparison between Box-Lucas and Time-Rise fitting methods by reanalysing previously published datasets to determine the appropriate conditions (optimal SAR v range, length of evaluation time and starting times) for using Box-Lucas and Time-Rise fitting methods.

Datasets:
These two datasets were collected from several different IONPs, under different conditions (frequency, magnetic field strength, IONP concentration), from two different laboratories (Johns Hopkins University & University of Minnesota). The acquisition time of the temperature-time datasets was sufficient to enable a comparison of variations in evaluation time.

Retrospective Data Analysis:
Retrospective data analysis was performed with MATLAB 2012b on temperature-time datasets published in Bordelon et. al. and Etheridge et. al [19,23]. Four different evaluation time scenarios were compared: (1) 30s; (2) 60s; (3) entire dataset (varied); (4) R 2 optimization. The R 2 optimization is similar to the Pearson correlation coefficient (R 2 ) analysis used in Etheridge et. al., however, instead of the use of conditional statements, the R 2 optimization used here was simplified to select only the evaluation time based on the highest R 2 value for both Time-Rise and Box-Lucas fitting [27]. We compared the effects of (1) heating start time; (2) 5s after heating start time; (3) following the criteria for the quasiadiabatic regime. The discrete derivative required to satisfy the quasi-adiabatic criteria, was observed to amplify noise within the measurement. Within our assessment, the identification of the quasi-adiabatic criteria was tested across a 10s timeframe and was met when the slope of the time change and the average of the first derivative were within 5%. Noise was removed from the first derivative using total-variation regularization [28].
Once the timeframe for evaluation was defined, Box-Lucas and performed six replicates for each measured condition allowing for the assessment of reproducibility. The evaluation times varied from 10 to 60 s and the start time ranged from 0 -10 s after heating began. The evaluation time was shorter than the total heating time, 300s. Only Time-Rise calculations were performed and the analysis timeframe for these datasets was selected based on a series of conditional criteria to minimize the fit error assessed by the Pearson correlation coefficient (R 2 ) [27].
Soetaert et. al. demonstrated the impact of analysis time frame on the calculated SLP [18].
The effect of evaluation time is intuitive based on the assumptions made for Time-Rise and Box-Lucas fitting and is discussed in further detail below. The selection of start time affects the calculation of SAR v , due to the observed non-linear response between starting the heating experiment and measurable changes in temperature. The cause of the non-linear response has been attributed to the response of inductive heating mechanisms, diffusion of heat within the sample, temperature probe delay, thermal mixing, colloidal dispersion of IONPs, field homogeneity, and heat transfer properties of the sample [14,15,18,24,[30][31][32].
Regardless of the cause, the non-linear response at the beginning of the heating experiment needs to be adjusted and reported. Soetaert et. al. [18] demonstrated a method to select the dataset within the 'quasi-adiabatic' regime (i.e. the time frame where temperature changes are approximately linear within a non-adiabatic system). The quasi-adiabatic regime is defined by comparing dT/dt calculated from the first derivative and the slope of the dataset [18].

Case 1: Bordelon et. al.[19]
The range of the SAR v and RMSE calculated with all start and evaluation time scenarios are shown in Figure 2, demonstrating the most stable results despite fitting method, for a SAR v between 1 and 10 W/mL. At low SAR v (<1 W/mL) an increase in RMSE and SAR v variation is observed, indicating the lower limit of SAR v measurement (see red circle, Figure  2). The comparison of fitting parameters was challenging when SAR v > 1.2 W/mL, due to the short duration (< 30s) removing the ability to evaluate the different evaluation times (30s, 60s, and full data range). The shortened timeframe results in a wider range in SAR v for Box-Lucas than Time-Rise. The SAR v range from 0.2 -1.2 W/mL was selected to compare multiple evaluation times and adjust the starting point, because more data allowed for a comparison among all evaluation times. SAR v did not show high variability (Supplemental Figures S1 and S2). Within this range of SAR v values, it is observed that the Time-Rise fitting method is more sensitive than the Box-Lucas fitting method to the chosen evaluation time. This is expected since with increasing evaluation time the effect of heat loss on the dataset becomes noticeable and the assumed adiabatic assumption needed for Time-Rise is violated. More specifically within the dataset ( Figures S1 & S2), we observed that the calculated SAR v decreased while RMSE increased with increasing evaluation time. Therefore, SAR v is underestimated when a long evaluation time which includes heat losses is selected. RMSE is more sensitive to heat losses caused by long evaluation times and can be used to select an appropriate evaluation time.

Figures 3.C and 3.D show
Box-Lucas fitting to be more sensitive to start time than Time-Rise fitting. While not observed within this comparison, if the selection of the start time creates a dataset that includes the initial heating non-linear response, the SAR v will be underestimated with Time-Rise fitting due to the decrease in the slope of the linear fit [14]. The Box-Lucas fitting method shows an increase in SAR v and a decrease in RMSE ( Figures  S1 & S2) caused by selecting a start time that does not include the initial heating non-linear response (i.e. analysis is performed 5 s after the experimental start time). Therefore, when the timeframe selected incorporates data from the initial 5 s, inaccurate results are acquired. For Box-Lucas fitting, including the initial non-linear heating response causes a decrease in the exponential factor to fit the curvature of the temperature change as a function of time.
For both Time-Rise and Box-Lucas fitting methods, the assumption is that heat is being deposited into the sample/system; the initial heating non-linear response indicates a timeframe where steady-state heat generation is not observed and should not be used for analysis. It is important to note the variation in SAR v is relatively low (< 10%) with changes in the evaluation time and start time; however, the effects of selecting the time frame of the dataset can be observed with the RMSE.
Finally, rigidly selected evaluation and start times were compared with automated selection methods for the evaluation time (R 2 optimization) and start time (quasi-adiabatic criteria). Both automated methods resulted in a reduction in RMSE ( Figure S2). For the Time-Rise fitting method, the calculated SAR v increased with increasing heat production when using R 2 optimization. The R 2 optimization was able to reduce the evaluation time to as short as 5 s, which allowed the dataset to comply with the adiabatic assumption of Time-Rise as experimental conditions (i.e. IONP, frequency, and magnetic field amplitudes) resulted in an increase of SAR v . Therefore, the observed increase in SAR v with a decrease in RMSE indicates a shortened dataset removing heat loss effects. R 2 optimization had negligible impact on SAR v or RMSE calculated through the Box-Lucas fitting method. Furthermore, the use of the quasi-adiabatic selection for start time, did not have a significant effect on SAR v or RMSE compared to the 5 s delay start time.  Figure S4) and RMSE to be over 350% higher than SAR v (Supplemental Figure S5). The large effect on Time-Rise SAR v and RMSE is caused by the 300 s evaluation time causing a violation of the adiabatic assumption.
Very few datasets were observed to have RMSE < 10% SAR v (Supplemental Figure S5); however, the standard deviation of replicate measurements was < 10% SARv with rigid time-frame selection (Figure 4). Although a very high RMSE (> 500% SAR v ) was calculated for many datasets, this did not impact the repeatability of using the Time-Rise fitting for analysis. It is important to recognize that a high RMSE indicates a deviation from the assumptions of the fitting method. For Time-Rise fitting, as the SAR v increases the heating losses become sufficient to violate the adiabatic assumption. Similarly, a violation of the linear heating losses assumption for Box-Lucas fitting should occur as SAR v increases, although this was not observed for the range of SAR v evaluated. Therefore, it is important to report RMSE to indicate the accuracy of the assumptions made with the fitting method used to calculate SAR v and to report the standard deviation as an indication of the ability to repeat measurements. Figure 4 demonstrates the impact evaluation time has the on repeatability of measurement, as well as the impact of using an automated method, such as the R 2 optimization for selecting the evaluation time. The selection of start time, resulted in the variations of the calculated SAR v , however, no trend based on the start time selected is visible (Supplemental Figure S6). Furthermore, automating the start point selection using the quasi-adiabatic method did not result in an increased variation in the repeatability of the measurements. Within the datasets, five data points were removed as outliers using the Q-test for comparing SAR v across the different fitting parameters. These outliers only occurred for automated fittings (both R 2 optimization and quasi-adiabatic selection). Additionally, the high noise within the discrete derivative was problematic and was removed by using a total-variation regularization [28]. The increase in this effect as SAR v decreases indicates automated methods for selecting the dataset timeframe become less robust with increasing noise. Table 1 summarizes recommendations for parameters to report for fitting with either linearrise or Box-Lucas, depending on the range of SAR v . Overall, the evaluation of RMSE, for a dataset and a fitting method (Time-Rise vs Box-Lucas) with its underlying assumptions, will help guide the selection of evaluation time and start time for that fitting method. Generally, while employing the Time-Rise fitting method, short time intervals (<30 s) that satisfy the quasi-adiabatic criteria are suitable for SAR v estimation (small RMSE). Box-Lucas estimation, on the other hand, is less influenced by the length of the evaluation time interval, if losses in the system are linear. However, high SAR v values (eg., SAR v >10 W/mL) can result in high RMSE using the Box-Lucas method because of non-linear losses in the measurement. In such cases, Time-Rise fit, constrained by the quasi-adiabatic criteria, can be a better fitting method. For both fitting methods, a target SAR v range of 1-10 W/mL (specific to the datasets evaluated here), appears to generate consistent SAR v values. An initial calibration of c(Fe) vs SAR v across a broad range of iron concentration (e.g. 1-10 mg Fe/mL) may be useful to ascertain an optimal SAR v range for a specific laboratory and setup. For both fitting methods, it is recommended to select a start time from the T vs t data that does not include the initial heating non-linear response (i.e. remove temperature data in the 0-5 s from the onset of heating in the fitting analysis). Additionally, reduction in peakto-peak noise in the temperature measurement to ≤ 0.06 °C is desirable to reduce its contribution to the RMSE. Calibration of the AMF coil, calorimeter, temperature probes and the use of appropriate insulation are essential to characterizing and reducing the peak-topeak noise.
The challenges of reliably comparing SAR v across laboratories are related to the complex nature of the heat losses involved with the heating system and the sample. Most research findings report field strength, field amplitude, and sample information; however, many details related to the experimental setup are necessary to understand the heat losses within the system. Calibration of calorimeter, field properties, and other components of the apparatus are rarely performed, presenting significant challenges for comparisons absent standard reference materials. Additionally, the precision and accuracy with which the iron content is quantified will affect the calculated SLP. While ICP-MS is the gold-standard for iron quantification due to its sensitivity (ng/L detection limits), less expensive spectrophotometric assays have been optimized to measure iron content in 0.1-1μg/mL range [34,35], which should be suitable for SLP evaluation. Finally, the sample volume used for measurements has been shown to critically affect the results, likely due to heat transfer between sample and environment [25,26]. We recommend researchers report at least the experimental details listed in Table 2 to indicate the sample preparation and heat losses within an experimental setup. Finally, it cannot be overlooked that each experimental setup has an optimal range of SAR v and adjustment of IONP concentration to be within that range can reduce variation results caused by fitting method.

Conclusion
Valid selection between Time-Rise and Box-Lucas analysis for heating experiments depends on the experimental setup and the systematic error introduced in the calorimetric measurement. If a heating range appropriate for the apparatus is selected, where variation in reported SAR v is minimal, the choice between parameters for analysis are within 10% of each other. For the experimental measurements evaluated here, this range was 1 -10 W/mL. Recommendations for selecting between Time-Rise or Box-Lucas parameters are provided in Table 1.
Outside of the optimal SAR v measurement range, a consistent analysis method can allow for repeatable SAR v results, although accuracy is questionable. Selection of analysis methods and parameters can hardly compensate for poor quality data obtained. Proper calibration of calorimeter, thermometers, AMF field and frequency, etc. are absolute requirements for conducting such measurements. The assumptions of Time-Rise and Box-Lucas fitting should be considered when selecting evaluation time and start time, although the underlying cause of this is presently unclear. A decrease in RMSE can act as a helpful metric for comparing the selection of evaluation time or start time; however, automated methods optimized on minimizing RMSE can produce higher variations in results if a measurement is noisy. In short, these anomalies should be considered as indications that systematic (or systemic) errors exist in the measurement methods or apparatus, bearing a re-examination. Schematic of adiabatic, and non-adiabatic heating curves during the heating (coil on) and cooling (coil off) phases of a heat experiment.  Effect of variation in SAR V and RMSE using Box-Lucas and Time-Rise fitting as a function of SAR V . Red circles indicate higher variation of SARv and RMSE caused by measurements performed in a sub-optimal heating range. The blue square indicates the range with a consistent SARv despite fitting method.   Figure 3C evaluates the percentage difference in SAR between (1) SAR V calculated excluding the first 5 s and (2) SAR V calculated including t=0-5 in the selected data range. Similarly, 3D evaluates the percentage difference in RMSE between these 2 cases. In this case, Box-Lucas fitting is observed to have larger variations in both SAR V and RMSE than variations observed with Time-Rise fitting.  Repeatability of SAR V measurement. Plot A shows the standard deviation of the replicate measurements as a function of evaluation time (using 5s start time) for SAR V < 0.1 W/mL. Plot B right shows the standard deviation of the replicate measurement as a function of evaluation time for SAR V > 0.1 W/mL. An increase in variation is observed at lower SAR V . Furthermore, the R 2 optimization results in more variable results than a statically set evaluation time.