Forensic profiling of non-volatile organic compounds in soil using ultra-performance liquid chromatography: a pilot study

Abstract Soil is of particular interest to the forensic community because it can be used as valuable associative evidence to link a suspect to a victim or a crime scene. Liquid chromatography is a powerful analytical tool for organic compound analysis. Recently, high-performance liquid chromatography (HPLC) has proven to be an efficient method for forensic soil analysis, especially in discriminating soils from proximity locations. However, ultra-performance liquid chromatography (UPLC), which is much more sensitive than HPLC, has never been explored in this context. This study proposed a UPLC method for profiling non-volatile organic compounds in three Malaysian soils (red, brown and yellowish-brown soils). The three soils were analysed separately to assess the effects of individual chromatographic parameters: (a) elution programme (isocratic vs. two gradient programmes); (b) flow rate (0.1 vs. 0.2 mL/min); (c) extraction solvent (acetonitrile vs. methanol) and (d) detection wavelength (230 vs. 254 nm). The injection volume and total run time were set to 5 µL and 35 min, respectively. Consequently, each soil sample gave 24 different chromatograms. Results showed that the most desirable chromatographic parameters were (a) isocratic elution; (b) flow rate at 0.2 mL/min and (c) acetonitrile extraction solvent. The proposed UPLC system is expected to be a feasible method for profiling non-volatile organic compounds in soil, and is more chemical-efficient than a comparable HPLC system.


Introduction
Soil is a complex matrix and consists of loose inor ganic and organic materials. By nature, soil is pri marily composed of mineral particles [1]. However, a multitude of human activities have directly or indirectly contributed varying amounts of extraneous materials into soils. For example, organic materials are introduced into soils through the excessive use of fertilizers and pesticides by farmers [2,3]. In contrast, heavy metal contamination in soils is typi cally caused by industrial activities, such as mining and mineral processing [4,5]. Therefore, soils origi nating from different locations, regions or countries tend to be unique in composition. In practice, the evidential value of soil strongly depends on the diversity of the soil composition [6].
Numerous instrumental techniques have been applied for soil analysis and reported in numerous agricultural and geoscience studies [7][8][9][10][11][12][13]; however, optimizing the analytical methods for a very small quantity of sample is seldom a prime consideration. In a forensic context, the amount of soil that may be readily recoverable from a victim or suspect could be minute [14][15][16][17]. Hence, analytical methods that have the ability to acquire information from a minimal quantity of soil are the mostneeded methods in forensic soil analysis.
Fundamentally, soil variability can be evaluated by physicochemical, chemical and biological proper ties [18][19][20]. Obviously, characterization of physico chemical properties including pH, moisture content, particle size distribution, colour, soil density and humic acid content [21,22] is the simplest way to discriminate between soils. However, in practice, the comparison of soils by analysis of physicochemical properties alone seldom provides reliable discrimi nation [23]. To address this problem, various sophis ticated instrumental techniques have been explored in the context of forensic soil science, mainly for studying the chemical composition of soil [24][25][26][27][28][29]. Recently, McCulloch et al. [30][31][32] published several articles regarding the feasibility of highperformance liquid chromatography (HPLC) in discriminating soils for forensic purposes. The authors concluded that HPLC has the potential to provide an accurate and practical method of differentiating soils origi nating from various proximity areas.
To date, there has been no report on the merits of advanced HPLC (i.e. ultraperformance liquid chromatography (UPLC)) in forensic soil characteri zation. Technically, the particle size in a UPLC col umn (1.7 µm) is much smaller than that employed in HPLC systems (3-5 µm). Because of this diffe rence, UPLC systems tend to outperform HPLC systems and show: (i) higher sensitivity, (ii) increased resolution, (iii) shorter analysis time and (iv) reduced consumption of eluent [33][34][35]. From this perspective, UPLC could be an alternative technique to HPLC for informative fingerprinting of soil and is especially advantageous when the amount of soil available for analysis is very small. Therefore, the present study proposed a UPLC method to profile the nonvolatile organic com pounds of three Malaysian soils (red, brown and yellowishbrown soils). The soils were collected from the fern garden of the National University of Malaysia (Universiti Kebangsaan Malaysia; UKM). Four different UPLC parameters (elution programme, flow rate, detection wavelength and extraction sol vent) were studied using the soil samples. The best UPLC chromatographic conditions were determined and then validated with respect to repeatability of peak retention time. The discrimination power of the proposed method is not reported herein but will be the topic of future work.

Sample preparation
Three soil samples (red, brown and yellowishbrown soils) were collected from the fern garden of UKM, Malaysia. The soils were sampled following the gridpattern procedure described by Pye [6], which was successfully applied by McCulloch et al. [30][31][32]. The procedure enables the researcher to assess the intralocation variability of the soil. First and fore most, locations showing one of the three soil colours were identified by a quick visual inspection. Then, a 1 m 2 square grid was placed on the ground, and topsoil (0-10 cm) from the four corners and cen trepoint of the grid was collected using a stainlesssteel spatula. The other two soil samples were collected using the same procedures. The collected soil samples were transferred to the laboratory in separate clean labelled ziplock plastic bags within a day of sampling.
The soil extraction method was based on the pre vious work of McCulloch et al. [30][31][32] with some modification. The samples were first dried in the laboratory at room temperature overnight. Before sieving through a stainless steel analytical sieve (600 µm), each sample was ground using a mortar and pestle. The fraction that passed through the sieve was placed in a glass petri dish and dried in an oven at 60 °C for 3 h. Dried soil (0.5 g) and 1.0 mL of HPLCgrade acetonitrile (ACN) (Fisher Chemical, Maharashtra, India) were placed into a 1.5mL micro centrifuge tube. The snap cap of the tube was closed tightly before the tube was sonicated for 20 min. After the extraction step, the tubes were centrifuged at 13 000 rpm for 15 min. The supernatant (containing nonvolatile organic compounds extracted from the soil) was carefully collected using a syringe and then filtered through an 0.2µm polytetrafluoroethy lene syringe filter into an HPLC vial.
To assess the impact of extraction solvent, another batch of extracts was prepared using the same extraction procedure using HPLCgrade methanol (Fisher Chemical) as solvent.

UPLC analysis
UPLC was performed with a Waters ACQUITY UPLC TM system (Waters Corp., Milford, MA, USA) equipped with a binary solvent manager, autosampler and photodiodearray detector (PDA). Chromatographic separation was carried out on a Waters ACQUITY UPLC TM BEH C18 column (2.1 × 50 mm, 1.7 µm par ticle size). The mobile phase consisting of (A) water containing 10% ACN and (B) HPLCgrade ACN (Fisher Chemical) was used in three different elution programmes as shown in Table 1. The gradient2 (G2) programme was used by McCulloch et al. [30][31][32], whereas the isocratic (ISO) programme was proposed by Bommarito et al. [24]. The gradient1 (G1) pro gramme is an adaptation from the G2 programme and is first reported in this work.
Apart from comparisons of elution programme (ISO vs. G1 vs. G2 programme) and extraction sol vent (ACN vs. methanol), this study also evaluated: (i) flow rate (0.1 vs. 0.2 mL/min) and (ii) detection wavelength (230 vs. 254 nm). In essence, this work has considered 24 different chromatographic condi tions. The injection volume and total run time were fixed at 0.5 µL and 35 min, respectively, for each analysis. To avoid carryover, the autosampler needle and loop were purged with 600 µL of 10% ACN (weak wash), followed by 200 µL of 100% ACN (strong wash) before sample injection.
Data were collected using the builtin Empower TM 2 software (Waters Corp.). The peak height and area were integrated by manually setting the baseline using the software.

Statistical analysis
Principal component analysis (PCA) and hierarchical clustering analysis (HCA) are among the most popu lar multivariate exploratory tools. Both techniques reduce the high dimensional chromatographic data and present variations of the data in graphical rep resentations, such as score plots in PCA and den drograms in HCA.
PCA constructs latent variables according to cor relation/covariance among the exploratory variables [36][37][38]. In this work, the latent variables were constructed by considering all the peak area values reported in the chromatograms. Then, distribution among the samples (i.e. the 24 different chromato graphic conditions for a particular soil sample) was inspected based on a twodimensional score plot. Theoretically, similar chromatograms would be clus tered together in the score plot and otherwise would be located far apart from each other.
In contrast, HCA employs multivariate distance metrics (e.g. Mahalanobis distance or Euclidean dis tance) to assess the similarities or dissimilarities among the samples [39]. The hierarchical relation ship among the samples is typically illustrated in a dendrogram. Essentially, similar chromatograms are expected to be joined via a shorter distance in the dendrogram and otherwise would be connected via a greater distance.
In brief, PCA and HCA were employed to reveal information about (relative) variations of chromatograms for the four UPLC parameters of flow rate, extraction solvent, detection wavelength and elution programme. All statistical analysis was performed using the R programme for statistical computing (R version 3.6.2, https://cran.rproject.org/bin/windows/base/ old/3.6.2).

Results
The most desired UPLC chromatographic conditions were first determined by visual inspection on the 24 chromatograms generated for each of the three soil samples. Then, the interpretation derived from the visual examinations was further ascertained by analysis of the PCA score plot and the HCA den drogram. For the sake of brevity, only the chromato grams of the red soil were discussed in detail whilst those of the other two soils were described briefly. Conversely, plots resulting from the two multivariate statistical tools were discussed for the three soil samples. The best UPLC chromatographic conditions were validated with respect to peak retention time repeatability.

Visual examination of chromatograms
The chromatograms for red soil obtained at flow rates of 0.1 and 0.2 mL/min are presented in Figures 1  and 2, respectively. The chromatograms of the brown and yellowishbrown soil samples as well as the chro matograms of blank sample are available in Supplementary Figures S1-S5. Fundamentally, the blank sample did not contribute any significant peaks to the chromatograms of the soils. In addition, the quality of the peaks was not compromised by the rather small absolute response values presented by most of the chromatograms. Most of the peaks resem bled typical Gaussian shapes [41].
In general, the chromatograms for red soil obtained using different elution programmes varied from each other, whereas flow rate, detection wavelength and extraction solvent caused only minor variations in the chromatograms. Thus, variation in absolute response value and in the overall chromatographic pattern were primarily influenced by the elution programme.
Although the absolute response values in the G1 chromatograms are rather high (0.10 AU), the respec tive chromatogram baselines were unstable ( Figures  1AD and 2AD). In contrast, the lowest absolute response value was observed in the ISO chromato grams (0.005 AU). Nonetheless, baseline fluctuation was insignificant in the ISO chromatograms ( Figures  1IL and 2IL). Theoretically, baseline drift is a com mon problem in gradient elution that is caused by poor column reequilibration or differences in the absorbance properties of the mobile phases [41,42]. Fundamentally, baseline refers to the part of the chro matogram having only mobile phase passing through the detector. Therefore, the baseline should be a straight line without visible peaks. This means that  an unsteady baseline may render chromatogram data analysis and interpretation unreliable or inaccurate. Grounded on that principle, isocratic elution is preferred over gradient elution.
Chromatograms obtained at different wavelengths showed similar chromatographic patterns, although the absolute response values and the number of minor peaks showed variation. In general, chro matograms obtained at 254 nm showed slightly higher absolute response value than the correspond ing chromatograms obtained at 230 nm. However, chromatograms obtained at 254 nm tended to pres ent fewer peaks than chromatograms recorded at 230 nm. Overall, we considered that both detection wavelengths are complementary to each other because each presented several peaks that are unique.
Varying flow rates appeared to have little effect on overall chromatographic separation. By focusing on the ISO chromatograms, faster separation was achieved at 0.2 mL/min than at 0.1 mL/min. This was expected because 0.2 mL/min allows more mobile phase to pass through the column within the same period. Therefore, the flow rate of 0.2 mL/min was selected in this study because it allows faster analysis while presenting a similar number of peaks and chromatographic pattern obtained using 0.1 mL/min.
To examine the influence of the extraction sol vent, only the ISO chromatograms were inspected carefully. This was because the ISO chromatograms were considered the most stable. Generally, the ACN extracts were preferred over the respective methanol extracts because the former produced steadier base lines. Therefore, ACN was chosen because it resulted in less baseline noise.
Mobile phase rich in ACN was observed to have more eluting power, and was able to separate more peaks. This can be observed in the G1 chromato grams in which most peaks were eluted in the region of 10-25 min ( Figure 1A-D). Table 1 shows that this time interval used 100% ACN as the mobile phase. Thus, we considered that a mobile phase rich in ACN promotes the chromatographic separation of soil sample components.
The relative differences among the chromato grams of the brown and yellowishbrown soils were evaluated by referring to Supplementary Figures  S1-S4. Overall, the variations caused by the param eters were similar to those observed in the chro matograms for red soil. Notably, the elution of nonvolatile organic compounds in soil was improved by using mobile phase rich in ACN, especially when the ISO programme was adopted. However, the impact of flow rate was almost imperceptible except for the effect on total run time. Visual inspection suggested that the best chromatographic conditions for the three soil samples were isocratic elution, detection at 230 and 254 nm, ACN extract, flow rate at 0.2 mL/min.
It is noteworthy that chromatograms of the red, brown and yellowishbrown soils were different from each other. Red soil produced the most peaks, followed by brown soil and the yellowishbrown soil showed the fewest peaks. However, at this point, we emphasize that this work was aimed at establishing the best UPLC chromatographic parameters for qualitative profiling of nonvolatile organic compounds in Malaysian soils rather than discriminating the three soil types.

Multivariate exploratory examination
Typically, the purpose of multivariate exploratory tools like PCA and HCA is to discriminate or dif ferentiate samples by origin or source [20,43,44]. However, in this work, PCA score plots and HCA dendrograms were employed to simultaneously illus trate the relative differences between the 24 chro matograms per soil sample in a twodimensional space. This allowed us to verify the interpretation based on visual examination of the impacts of the four parameters (elution programme, extraction sol vent, detection wavelength and flow rate) on the soil chromatograms. For this purpose, the 24 chro matograms of each soil sample were arranged in a matrix in which the rows and columns respectively represented the chromatograms and the peak area values. Figure 3 shows the score plots generated using peak area values of the 24 chromatograms for the three soil samples. The tight clusters in the pri mary score plots (Figure 3, left panels, pink square boxes) are expanded and shown in the right pa nels of Figure 3. The corresponding dendrograms are shown in Supplementary Figure S6. The total explained variations represented by the first two principal components (PC1 and PC2) were about 85%, 65% and 67% for red, brown and yellowishbrown soils, respectively. Each score plot illustrated the similarities among the chromato grams because each plot accounted for more than 60% of the total variance.
In general, no soil samples formed visible clusters from the 24 chromatograms. However, careful inspection of Figure 3 revealed some of chromato grams were clustered according to elution pro gramme, with a few chromatograms of red soil also clustered according to flow rate. However, it was difficult to identify clusters based on extraction so lvent or detection wavelength. These findings are in line with visual examination, which suggested that elution programme was the primary factor causing differences among the 24 chromatograms of a given soil.

Validation of best chromatographic conditions
The primary purpose of forensic soil analysis is to determine whether a particular soil shares a similar origin with a known soil. Typically, samples show ing similar chromatographic profiles can be claimed to share similar origin. Therefore, quantitation and identification of compounds present in the soil are rarely the prime concern of the forensic soil analyst. In any case, the chromatographic method intended for comparative analysis must be validated with respect to retention time repeatability, reproducibi lity and robustness [45][46][47][48]. Being a pilot study, we have not performed a wellplanned validation experiment but have simply assessed the retention time repeatability, whereas the robustness of the UPLC method has been interpreted based on the variations observed in the PCA scores and HCA dendrogram.
The robustness of an analytical method refers to its capability to remain unaffected by small, inten tional changes in the operational parameters of the method [45]. Typically, the development of a chro matographic method starts with the selection of column, eluent system and then the elution mode [42]. Because of technical constraints, this work con sidered only three different elution programmes using the same column. Therefore, the robustness of three elution programs was evaluated by varying the flow rate, detection wavelength and extraction solvent.  Figure 3, the ISO chromatograms were often clustered together and concentrated in the center (origin) of the score plots. This was espe cially true for yellowishbrown soil. However, G1 and G2 chromatograms, respectively, were scattered around in the score plots except for red soil samples. As a result, the ISO programme was considered more robust than the G1 and G2 programmes. The corresponding dendrogram (Supplementary Figure  S6) suggests similar conclusions. In general, only chromatograms recorded via the ISO programme were connected via shorter distances. In contrast, chromatograms prepared using the gradient elution programmes, especially G1, were often joined via longer distances. All results suggested that the ISO programme was more robust than the two gradient programmes, and was unlikely to be affected by small changes in flow rate, detection wavelength and extraction solvent.

As shown in
With the knowledge that the separation of nonvolatile organic compounds in soil is promoted by a mobile phase rich in ACN, we performed another set of experiments utilizing an improved isocratic elution (i.e. ACN:H 2 O (9:1, v/v)), and the result was encouraging (data not shown). The derived optimum parameters for UPLC analysis of soil are shown in Table 2.
The UPLC method was validated with respect to the repeatability of retention time. Relative standard deviation (RSD) is one of the most common metrics for evaluating retention time repeatability, and is mathematically defined by Equations (1-3).
where s is standard deviation and xis the mean, as computed from n measurement values. A sequence of three consecutive injections was performed for each soil sample. As a result, each peak was repre sented by three measurement values. This allowed the assessment of the intraday reliabi lity (i.e. runto run repeatability) of the method based on the reten tion time of a peak in the soil sample.
To the best of our knowledge, previous studies have not reported the retention time repeatability [24,27,28,[30][31][32]. Therefore, an accepted reference value of RSD in forensic soil analysis is unavailable in the literature. This prompted us to adopt the benchmark value reported in typical pharmaceutical quality control work (1%) [48]. The results of pre cision experiments are presented in Tables 3 and 4, where the range of mean retention time of a peak was obtained based on the minimum and maximum values recorded for the red, brown and yellowishbrown soils. For all the components of the three soil samples, the RSD from three mea surements was 0.02%-0.85%. With no RSD value above 1%, the UPLC method is regarded as highly reproducible.
The effect of baseline noise is another important consideration in determining the reliability of chromatographic methods. To investigate the degree of baseline noise and disturbance, a blank run (no sample was injected) was performed and examined. It is important that the magnitude of baseline noise is at or below the accepted level. Figure 4 illustrates the amplified chromatographic baseline of the start and end regions for the blank sample. In general, the amplitude of the baseline ranged from 2 × 10 −5 to 3 × 10 −5 AU, whereas the smallest signal of the chromatograms was value 1 × 10 −3 AU (Figures 2IL and 3IL). Hence, the peaks detected in Figures 2 and 3 are unlikely to be baseline noise.

Discussion
To illustrate the merits of the UPLC system over the HPLC system, we should have observed the empirical performance of the HPLC system using the three Malaysian soils. However, technical con straints did not allow the collection of such data. McCulloch et al. [32] have also demonstrated that replication of previous work can be difficult because an identical column may not always be available. In facilitating the comparison, we have summarized the analytical parameters/performances of HPLC [24,27,28,[30][31][32] and UPLC (based on the current work) in Table 5.
With respect to the minimum quantity of soil, this work extracted the nonvolatile organic com pounds in soil by mixing 0.5 g of soil with 1.0 mL of ACN. Although McCulloch et al. [30][31][32] suc cessfully produced reliable chromatograms with only 0.25 g of soil, the final concentrations of the extracts    were similar to those observed in the current work (~0.5 g/mL). Therefore, we considered that the UPLC system was comparable with HPLC systems in terms of minimum soil sample size.
Although Reuland and Trinler [27] reported the use of injection volumes of 5-20 µL, the quality of the resulting chromatograms was not explicitly dis cussed. Therefore, we considered it is reasonable to assume the optimal injection volume for HPLC is 10-50 µL [24,[30][31][32]. Hence, it can be concluded that UPLC is more sensitive than HPLC because a reliable UPLC fingerprint could be produced from an injection of less than 10 µL. Regarding run time, the current study found that 15 min could produce sufficient data for forensic soil analysis. In contrast, HPLC analysis required more than 15 min to obtain a good chromatogram from a soil sample [24,28,[30][31][32]. Although Reuland and Trinler [27] reported that they needed only 15 min to complete chromatographic separa tion, details of retention time repeatability were not presented to the level demonstrated in the current study. As such, it is reasonable to conclude that the total run time to produce a reliable chromatogram is shorter for UPLC analysis than for HPLC analysis.
By minimizing injection volume and shortening total run time, the required volumes of extraction solvent and mobile phase are also reduced, which, in turn, decreases waste disposal costs. Although the purchase cost of a UPLC system (and column) is typically much higher than for an HPLC system, from a longterm perspective, UPLC consumes less eluent and will be much more chemical and costefficient than HPLC.
The choice of extraction solvent is also an important consideration in optimising UPLC methods. In many cases, methods use an extraction solvent that is also a component of the mobile phase (Table 5). Theoretically, equilibrium between the sample and the mobile phase can be instanta neous if the extraction solvent is part of the mobile phase. Furthermore, given the preference for ACN over methanol in the literature [49,50], only ACN/ water mixture was used in this pilot study.
Although it is generally agreed that spatial cha racterization is an important aspect in any soilbased analysis [51], this work has not considered temporal and spatial variations as reported by Bommarito et al. [24] and McCulloch et al. [32]. These aspects will be considered when we assess the discrimination power of the method in a further empirical study.

Conclusion
This study describes the first application of a UPLC-PDA technique for the profiling of nonvolatile organic compounds in Malaysian soils. When com pared with HPLC, UPLC reduces analysis time, injection volume and solvent consumption. These saving characteristics of UPLC can potentially dou ble the productivity of the method. Isocratic elution employing an ACNrich mobile system is recom mended for obtaining a highquality chromatogram that provides good retention time repeatability. In addition, strong preference is given to an extraction solvent that also serves as a component of the mobile phase. In further work, larger and more diverse soil samples should be used to confirm the merits of the UPLC method, and spatial and tem poral variations must be considered.