Nonparametric testing of the covariate significance for spatial point patterns under the presence of nuisance covariates

Determining the relevant spatial covariates is one of the most important problems in the analysis of point patterns. Parametric methods may lead to incorrect conclusions, especially when the model of interactions between points is wrong. Therefore, we propose a fully nonparametric approach to testing significance of a covariate, taking into account the possible effects of nuisance covariates. Our tests match the nominal significance level, and their powers are comparable with the powers of parametric tests in cases where both the model for intensity function and the model for interactions are correct. When the parametric model for the intensity function is wrong, our tests achieve higher powers. The proposed methods rely on Monte Carlo testing and take advantage of the newly introduced covariate-weighted residual measure. We also define a correlation coefficient between a point process and a covariate and a partial correlation coefficient quantifying the dependence between a point process and a covariate of interest while removing the influence of nuisance covariates.


Introduction 1.Motivation and overview
Spatial point patterns are often accompanied by spatial covariates.Determining the relevant covariates that influence the positions of points is certainly one of the most important questions of point pattern analysis.Applications include spatial epidemiology, spatial ecology, exploration geology, seismology, and many other fields.
In this paper, we mainly focus on this question.Our proposed methods use nonparametric tools.The second question that we are interested in is nonparametric quantifica-tion of the spatial dependence between a point process and a covariate, both without and with presence of nuisance covariates.We define a correlation coefficient and a partial correlation coefficient between a point process and a covariate.The second problem has not been studied before, to our knowledge.
The first problem is usually solved by parametric methods (Schoenberg, 2005;Waagepetersen and Guan, 2009;Kutoyants, 1998;Coeurjolly and Lavancier, 2013), see Section 2.1 for details.However, we show in our simulation study that even when the parametric model is selected correctly, these tests of covariate significance may lead to liberality.The parametric methods have even bigger problems when: 1) the parametric model for the intensity function is incorrect, or 2) the form of interactions between points is specified incorrectly.We propose here two tests of covariate significance, a fully nonparametric one which avoids both selecting the intensity function model and the interaction model, and a semiparametric one which does not assume an interaction model but uses the log-linear intensity function model as the one predominantly used in practice.These two proposed tests do not exhibit liberality, and their powers are comparable with the powers of parametric methods in cases with correctly specified models for the intensity function and the interactions.The proposed tests also have a higher power than the parametric ones when either the intensity function model or the interaction model is misspecified.
Since the proposed nonparametric tests do not need to choose a specific model and exhibit better properties than parametric methods, their use should become a standard practice in the analysis of point patterns.
For determining relevant covariates one can also use the lurking variable plots (Baddeley and Turner, 2005) or appropriate information critera (Choiruddin et al., 2021) but these do not provide formal tests.The only nonparametric method studying the dependence of a point process and a covariate without nuisance covariates was introduced in Dvořák et al. (2022).
Throughout the paper, we assume that the spatial covariates are continuous.The methodology is up to a certain extent also applicable for categorical covariates, as discussed in Section 7.

Motivational examples
To illustrate the relevance of the questions posed above, we consider a part of the tropical tree data set from the Barro Colorado Island plot (Condit, 1998).We focus on the positions of 3 604 trees of the Beilschmiedia pendula species in a rectangular 1 000 × 500 metre sampling plot, plotted in the top left panel of Figure 1.This part of the data set is available in the spatstat package.Below, we call it the BCI data set.
The intensity of point occurrence in the observation window is clearly nonconstant as the trees tend to prefer specific environmental conditions.The variation in the intensity of point occurrence may possibly be explained by the accompanying covariate information.The available covariates include the terrain elevation and gradient (available in the spatstat package) and the soil contents of mineralised nitrogen, phosphorus and potassium (Dalling et al., 2022), see Figure 1.Maybe all the covariates bring important information and should be used for inference.However, it is equally possible that some of the covariates bring redundant information (as could be expected from the nitrogen and potassium content in this data set, see the bottom left and bottom right panel of Figure 1) or that some of the covariates, in fact, do not influence the point process.It is important to determine with high degree of confidence which covariates influence the point process and should be included in the further steps of the inference.
In certain cases, a relevant parametric model can be specified based on the available expert knowledge.However, often no such parametric model is available, or we do not want to take a risk of model misspecification.Then nonparametric methods for covariate selection need to be used.
Furthermore, we consider the Castilla-La Mancha forest fire data set, again available in the spatstat package.We study the locations of 689 forest fires that occurred in this region in Spain in 2007, plotted in the left panel of Figure 2. Below we call it the CLM data set.The size of the region is approximately 400 by 400 kilometers.The intensity of point occurrence is nonconstant and may be influenced by the accompanying covariates (terrain elevation and gradient, see the middle and right panels of Figure 2).We aim at quantifying the strength of influence of the individual covariates on the point process and comparing it with the BCI data set.

Outline of the work
In order to achieve our objectives, we propose to employ the residual analysis (Baddeley et al., 2005) with respect to the model built from the nuisance covariates.The sample (Kendall's) correlation coefficient of the smoothed residual field and the interesting covariate then quantifies their dependence both without and with nuisance covariates.The latter defines the partial correlation.
The testing of covariate significance is proposed to be performed via a new test statistic, the covariate-weighted residual measure, and a Monte Carlo test.The residual analysis can be computed in the parametrical way, which defines our semiparametrical approach, or it can be computed nonparametrically using the nonparametrical estimate of the point pattern intensity (Baddeley et al., 2012) and it defines our completely nonparametrical approach.The nonparametric residuals are used for the first time in this work.
The replications in the Monte Carlo test are obtained through random shifts both with torus correction (Lotwick and Silverman, 1982) and variance correction (Mrkvička et al., 2021).The torus correction is a standard method whereas the variance correction was recently defined, and it allows to use nonrectangular windows and it better controls the level of the test than the torus correction.
The paper is organised as follows.Section 2 recalls all the concepts we need to define our procedures.Section 3 describes all new methods we are introducing in this work.That is, nonparametric residuals, spatial (partial) correlation coefficient, covariate-weighted residual measure, and tests of covariate significance with nuisance covariate.Section 4 contains a simulation study in which the exactness and power of our nonparametrical methods is compared with parametrical methods.Section 5 contains an example of the usage of our methods for nonparametric selection of relevant covariates.Section 6 contains an example of usage of our methods for comparison of dependence strength.Finally, Section 7 is left for conclusions and discussion.
The R codes providing an implementation of the proposed methods are available at https://msekce.karlin.mff.cuni.cz/~dvorak/software.htmland will be available in the planned package NTSS for R.

Notation and background
Let X be a point process on R 2 with the intensity function λ(u).Throughout this paper, we assume that the intensity function of X exists.Let C 1 , C 2 , . . ., C m+1 be the covariates in R 2 .Denote by W ⊂ R 2 a compact observation window with area |W | and n(X ∩ B) the number of points of the process X observed in the set B. We assume that the values of the covariates are available in all points of W , at least on a fine pixel grid.This can be achieved from a finite set of observations, e.g. by kriging techniques.

Covariate selection in parametric point process models
The dependence of the intensity function of a point process on the covariates C 1 , . . ., C m is often modelled parametrically, e.g. using the log-linear model (1) The standard approach to estimating the model parameters β i is to maximize the Poisson likelihood (Schoenberg, 2005;Waagepetersen and Guan, 2009).This corresponds to the maximum likelihood approach for Poisson models, while for non-Poisson models, this constitutes a first-order composite likelihood approach.For the log-linear model (1) the estimation is implemented in the ppm function from the popular spatstat package (Baddeley et al., 2015).
For Poisson or Gibbs processes, the ppm function also provides confidence intervals for the regression parameters β i and the p-values of the tests of the null hypothesis that β i = 0 for a given i, based on the asymptotic variance matrix (Kutoyants, 1998;Coeurjolly and Rubak, 2013).For cluster processes, the kppm function from the spatstat package provides means of model fitting.The regression parameters β i from (1) are again estimated using the ppm function, but the asymptotic variance matrix is determined according to Waagepetersen (2008), taking into account the attractive interactions between points.
The methods discussed above provide means for formal testing of the hypothesis that β i = 0 for a given i ∈ {1, . . ., m}, allowing one to select the set of relevant covariates to be included in the model.

Parametric residuals for point processes
Residuals can be used to check whether the fitted model for the intensity function is appropriate, see Baddeley et al. (2005) or Baddeley et al. (2015, Sec. 11.3).In the following we employ the version of residuals based on the intensity function, as suggested by R. Waagepetersen in the discussion to the paper Baddeley et al. (2005), rather than based on the conditional intensity function as discussed in the paper itself.Let β be the vector of the estimated regression parameters.The residual measure is defined as where B ⊆ W is a Borel set.The smoothed residual field is obtained as where e(u) = W k(u − v) dv is the edge-correction factor and k is a probability density function in R 2 .In fact, the first term in (3) gives the nonparametric kernel estimate of the intensity function, the covariates not being taken into account, while the second term gives the smoothed parametric estimate which incorporates the covariates.If the estimated model λ(v; β) describes the point process X well, the smoothed residual field s(u) is expected to fluctuate around 0. Its deviations from 0 indicate a disagreement between λ(v; β) and the true intensity function in the corresponding parts of the observation window.We remark that the residuals described above are the raw residuals of Baddeley et al. (2005), where scaled versions of the residuals are also considered.

Nonparametric estimation of the intensity function depending on covariates
As opposed to fitting a parametric model such as (1), the dependence of the intensity function on a set of covariates can be captured nonparametrically.Baddeley et al. (2012) assume that there is an unknown function ρ : R m → [0, ∞) such that λ(u) = ρ(C 1 (u), . . ., C m (u)).Assuming absolute continuity of the distribution of the vector of covariates (C 1 (u), . . ., C m (u)) on R m , the function ρ can be estimated using kernel smoothing in the space of covariate values, see Baddeley et al. (2012) or Baddeley et al. (2015, Sec. 6.6.3).This opens up the possibility to define the nonparametric residuals in Section 3.1.The estimation of ρ is implemented in the rhohat function from the spatstat package for m = 1 and in the rho2hat function for m = 2.We note that in these two cases, visualization of ρ is straightforward while it is not as easy for m > 2. In our simulation experiments in Section 4 we use the spatstat implementation, while in the analysis of the real data sets with higher number of covariates we use our implementation based on the ks package (Duong, 2007).

Monte Carlo tests
When the distribution of a test statistic is too complicated to be derived analytically but there is a way of obtaining replications (simulations, permutations, . . . ) of the data under the null hypothesis, it is possible to perform a formal test of the null hypothesis using the Monte Carlo approach (Davison and Hinkley, 1997).This approach relies on the exchangeability of the vector (T 0 , T 1 , . . ., T N ), where T 0 is the test statistic value computed from the observed data, and T 1 , . . ., T N are obtained from the replications.
The test is performed by determining how typical or extreme the value T 0 is in the whole sample T 0 , T 1 , . . ., T N .For univariate test statistics, this means determining the rank of T 0 , however, using functional test statistics is also possible if a suitable ranking of the functions from the most typical to the most extreme is available, as e.g. in Myllymäki et al. (2017).Excheangeability (invariance of the distribution with respect to permutations of the components) ensures that the Monte Carlo test matches the required significance level.

Random shift permutation strategy
Random shifts provide means of nonparametric testing of independence between a pair of spatial objects, such as a pair of random fields (Upton and Fingleton, 1985;Dale and Fortin, 2002) or a pair of point processes Lotwick and Silverman (1982).By randomly shifting one of the objects while keeping the other one fixed, any possible dependence between them is broken.At least one of the spatial objects must be assumed to be stationary.By performing a certain amount of shifts along randomly generated vectors, one obtains replications for performing a Monte Carlo test of independence.
Assume that the spatial objects are denoted by Φ and Ψ and we observe them in the window W .We denote the value of the test statistic computed directly from the observed data by T 0 = T (Φ, Ψ; W ).After producing N random shift vectors v 1 , . . ., v N we compute the value of the test statistic T i from Φ and Ψ shifted by v i , i.e.T i = T (Φ, Ψ + v i ; W ), i = 1, . . ., N .Clearly, some part of Ψ will be shifted outside of the observation window W and part of Ψ + v i will not overlap with Φ anymore.Hence, some form of correction is needed.

Torus correction
For a rectangular window W , one may identify its opposing edges, creating a toroidal geometry on W (Lotwick and Silverman, 1982;Upton and Fingleton, 1985).We denote by [Ψ + v i ] the version of Ψ shifted with respect to the toroidal geometry, as opposed to Ψ + v i which denotes Ψ shifted with respect to the Euclidean geometry.The replications T i are then obtained as As a result, all parts of the data are used for computing T i .On the other hand, artificial cracks appear in the correlation structure of the data, as parts of the data originally far away are now "glued together".This means that exchangeability is violated, which in turn introduces liberality of the random shift tests (Fortin and Payette, 2002;Mrkvička et al., 2021).However, simulation studies show that when the spatial autocorrelations in the data are not very strong, the tests match the nominal significance level quite closely (Mrkvička et al., 2021;Dvořák et al., 2022).Traditionally, the distribution of the random shift vectors is taken to be the uniform distribution on W , but other choices are also possible.

Variance correction
To remove the liberality of the torus correction, Mrkvička et al. (2021) proposed the variance correction.It uses shifts respecting the Euclidean geometry and discards those parts of the data that are shifted outside of W .No artificial cracks are introduced to the correlation structure of the data, removing the liberality of the random shift tests.Also, irregular observation windows can be considered.On the other hand, different amounts of data are dropped for different shift vectors v i and for typical choices of the test statistic the variance of T i varies greatly, making it impossible to perform the Monte Carlo test directly.Therefore, the variance of T i needs to be standardized before performing the test.
Formally, we denote by W i the smaller observation window where Φ and Ψ + v i overlap, i.e.
The values T 0 , T 1 , . . ., T N are then standardized to have zero mean and unit variance.This is achieved by subtracting the mean T = 1 N +1 N i=0 T i and dividing by the square root of the variance: S i = T i − T / var(T i ).The standardized values (S 0 , S 1 , . . ., S N ) are closer to exchangeability than (T 0 , T 1 , . . ., T N ) because their first two moments are the same.The standardized values are used to perform the Monte Carlo test.When a formula describing var(T i ) as a function of the size of W i is known, at least asymptotically, it can be directly used in the standardization.If such a formula is not available, Mrkvička et al. (2021) suggest a kernel regression approach to estimating var(T i ).
Simulation studies in Mrkvička et al. (2021); Dvořák et al. (2022) show that the random shift tests with variance correction match the nominal significance level even in the case of strong autocorrelation.In those papers, the shift vectors followed the uniform distribution on a disc with radius R centered at the origin.The choice of R is a compromise between two goals: longer shifts are more relevant for breaking the possible dependence between Φ and Ψ while shorter shifts mean that a larger amount of available data is used to compute T i .Choosing R so that |W i |/|W | ≥ 1/4 for all i turned out to provide satisfactory results.

Nonparametric testing of dependence between point process and a covariate
For nonparametric testing of the null hypothesis of independence between a point process X and a covariate C 1 the paper Dvořák et al. (2022) suggests to use the random shift test with the test statistic T = 1 n(X∩B) x i ∈X∩W C 1 (x i ), i.e. the mean covariate value observed at the points of the process.This test showed liberality (with torus correction) or slight conservativeness (with variance correction) in the simulation studies in Dvořák et al. (2022), with both versions having much higher power than the other tests considered there.
3 New methods

Nonparametric residuals for point processes
As discussed in Section 2.3, a nonparametric estimate of the intensity function λ(u) = ρ(C 1 (u), . . ., C m (u)) can be used to describe its dependence on the set of covariates.Using ρ, the nonparametric version of the residual measure (2) can be defined as The corresponding nonparametric smoothed residual field is then (5) Again, scaled versions of these residuals can be constructed as in Baddeley et al. (2005).
If ρ(C 1 (u), . . ., C m (u)) describes the intensity function of X well, meaning e.g. that no relevant covariate was left out, s(u) is expected to fluctuate around 0. Figure 3 illustrates that ρ is capable of capturing the correct form of dependence even without specifying a parametric model.

Correlation coefficient between a point process and a covariate
Assume now that no nuisance covariates are given (m = 0) and we want to investigate the strength of dependence between the intensity function of X and a given covariate C 1 .Without incorporating a possible effect of C 1 , the natural estimate of the intensity function is constant, λ = X(W )/|W |, and the smoothed residual field becomes s(u) = 1 e(u) If the covariate C 1 does not influence X, we expect C 1 and s to be independent.On the other hand, if C 1 influences the intensity function of X, s should capture the dependence structure and exhibit correlations with C 1 .This motivates us to quantify the strength of dependence between X and C 1 by some measure of dependence between the two random fields C 1 and s.
To this end, we consider Kendall's correlation coefficient (Nelsen, 2006, p.158) and let U 1 , U 2 be independent random vectors with uniform distribution in W . Denoting Figure 4: Left: plot of the correlation coefficient τ as a function of the parameter a for the example in Section 3.2.Right: plot of the correlation coefficient τ (black curve) and the partial correlation coefficient τp (red curve) as functions of the parameter a for the example in Section 3.3.
The empirical estimate of τ can be easily obtained if we consider a set of sampling points {y 1 , . . ., y n }, independently and uniformly distributed in W , independent of X and C 1 : where sgn is the sign function.Naturally, the values of the correlation coefficient are restricted to the interval [−1, 1] and allow direct comparison of the strength of dependence between different data sets.
To illustrate the use of this correlation coefficient in quantifying the strength of dependence between a point process and a covariate, we perform the following experiment.We consider the Poisson process with the intensity function proportional to exp{ax} in the observation window W = [0, 1] 2 , for a given value of a ∈ R, and with the expected number of points in W fixed at 200.The covariate of interest is C 1 ((x, y)) = x.The smoothed residual field s from ( 6) is obtained with a large bandwidth bw = 0.5 which reflects the fact that the true intensity function of the point process is very smooth.The value of τ is then computed according to (8).This is repeated for 500 independent realizations of the point process for each value of a from a fine grid, and the means of τ are plotted as a function of a in the left panel of Figure 4.The plot shows that τ increases in the absolute value with increasing strength of dependence, from 0 in case of independence (a = 0) to almost 1 or -1 in case of very strong dependence.It also correctly captures the form of dependence (positive or negative association).

Choice of sampling points
We stress that independent sampling points need to be used in this case instead of simply using the observed points of X ∩ W .In the latter case, the preferential sampling issues could arise, resulting in biased estimates of the properties of the two random fields (Diggle, 2010;Dvořák et al., 2022).Loosely speaking, if, for example, the sampling points {y 1 , . . ., y n } are more likely to be chosen in locations with high values of C 1 , the sample mean and sample variance of C 1 (y 1 ), . . ., C 1 (y n ) do not reflect well the true properties of C 1 .This negatively affects all subsequent steps of the analysis.

Choice of measure of dependence
Although different measures of dependence such as Pearson's or Spearman's correlation coefficients can be used, we suggest Kendall's correlation coefficient.It aligns well with the nonparametric spirit of this paper and has shown better performance in preliminary experiments not reported here and in previous studies on related topics (Dvořák et al., 2022).

Choice of bandwidth
For the construction of the smoothed residual field s(u) in ( 6) one has to select a specific kernel function k (a probability density function).The type of the kernel does not play an important role, and we use the Gaussian kernel.On the other hand, the choice of bandwidth (standard deviation of the kernel function) affects the properties of the estimates to a great extent.Traditional rules of thumb or more involved methods may be used for bandwidth selection in this case, see Baddeley et al. (2015, Section 6.5.1.2) or Cronie and Van Lieshout (2018).However, whenever available, expert knowledge about the specific problem at hand should guide the choice of bandwidth.

Partial correlation coefficient between a point process and a covariate
When several possibly correlated covariates are available, one might be interested in assessing the strength of dependence between the point process X and the covariate of interest C m+1 after removing the possible influence of the remaining (nuisance) covariates C 1 , . . ., C m , in the spirit of the partial correlation coefficient.The strength of dependence can be quantified by some measure of dependence between the covariate of interest C m+1 and the smoothed residual field s from (5) where the possible influence of the nuisance covariates C 1 , . . ., C m on X has been removed.When a parametric model for the intensity function of X is available, parametric residuals (3) may be used instead.
We suggest using Kendall's correlation coefficient to quantify the dependence.Again, we consider a set of sampling points {y 1 , . . ., y n }, independently and uniformly distributed in W , independent of X and C 1 , . . ., C m+1 , and define the sample version of the partial correlation coefficient as The population version can be defined in a similar way as in (7).Concerning the choice of the sampling points and the choice of the measure of dependence, comments from the previous section apply here, too.
To illustrate the use of the partial correlation coefficient in quantifying the strength of dependence between a point process and a covariate of interest, after removing the influence of nuisance covariates, we performed the following experiment.The point process model is the Poisson process from Section 3.2.Its intensity function depends in a log-linear way on the covariate C 1 ((x, y)) = x, now treated as a nuisance covariate.Specifically, the intensity function is proportional to exp{ax}.The covariate of interest is C 2 ((x, y)) = x + y.We consider 500 independent realizations of the point process for each value of a and compute the means of τ and τp .The correlation coefficient τ , again computed with bw = 0.5, correctly indicates that the point process depends on the covariate C 2 through the x−coordinate (black curve in the right panel of Figure 4).On the other hand, the partial correlation coefficient τp , computed with the adaptive choice of bandwidth described below, attains approx.zero values in this case (red curve in the right panel of Figure 4), implying that the influence of the nuisance covariate C 1 was successfully removed.

Choice of bandwidth
Construction of the smoothed residual field requires choosing a bandwidth for the smoothing kernel.Again, standard recommendations may be employed, or the available expert knowledge may be utilized.However, in our pilot experiments with a single nuisance covariate C 1 , we observed that the influence of C 1 was usually not completely removed from X during the construction of the smoothed residual field s(u), in the sense that the empirical Kendall's correlation coefficient of {(s(y j ), C 1 (y j )), j = 1, . . ., n} was nonzero.Its value was strongly influenced by the value of bandwidth.
To remove this effect, we suggest selecting the bandwidth value (from a given finite set of candidate values) that minimizes the absolute value of the empirical Kendall's coefficient of {(s(y j ), C 1 (y j )), j = 1, . . ., n}, denoted τ (s, C 1 ) in the following.In this way, we select the bandwidth value that removes the influence of C 1 on X the most successfully and it can be seen as a conservative version of the correlation coefficient.This is important mostly in cases where the nuisance covariate is correlated with the covariate of interest.For independent covariates, this procedure has very little effect on the performance of the random shift tests.We apply this approach to bandwidth selection in our simulation experiments below.When more than one nuisance covariate is available, this adaptive bandwidth procedure can be generalized by minimizing m i=1 τ (s, C i ) 2 .

Covariate-weighted residual measure
While τp is useful for quantifying the strength of dependence between X and the covariate of interest C m+1 after removing the influence of nuisance covariates C 1 , . . ., C m , the random shift test using τp as the test statistic turned out to have a rather low power in our simulation studies.The reason lies in the applied smoothing and the deliberate removal of the preferential sampling effects -the association between the points of X and the covariate C m+1 brings important information.
To overcome these issues, we define the following characteristic that we call the covariate-weighted residual measure of W : This can be viewed as a generalization of the test statistic T from Section 2.6 which also includes the sum of covariate values, but does not take into account possible nuisance covariates.By sampling the values of C m+1 at the points of X we take advantage of any possible preferential sampling effects, and no smoothing is performed when computing the value of CW R, hence we avoid the problem of bandwidth selection.The expectation of CW R is close to 0 if the covariates C 1 , . . ., C m capture all variation in λ(u), i.e. if ρ is close to λ, and will differ from 0 otherwise.This enables testing the significance of C m+1 after removing the influence of C 1 , . . ., C m .

Testing the covariate significance under the presence of nuisance covariates
Now we focus on the null hypothesis that X and C m+1 are independent, conditionally on C 1 , . . ., C m .We employ the random shift test described in Section 2.5, either with torus or variance correction.The test statistic can be τp in which case the two spatial objects to be shifted against each other are the two random fields Φ = s and Ψ = C m+1 .Alternatively, one can use the covariate-weighted residual measure of W as the test statistic.In this case Φ = R is a measure and Ψ = C m+1 is a random field.If v i is a shift vector, the shift of the random field Ψ should be interpreted in both cases as (Ψ + v i )(u) = Ψ(u − v i ).The choice of the correction factors for the variance correction is discussed in Appendix A, including Proposition 1 which studies the variance of CW R for Poisson processes and an empirical study for log-Gaussian Cox processes.
The assumption of stationarity of one of the spatial objects is discussed in detail in Section 7.

Simulation study
To assess the performance of the proposed tests, we present below a set of simulation experiments, both under the null hypothesis and under various alternatives.The models range from clustering through complete spatial randomness to regularity, even combining clustering and inhibition on different scales.The null hypothesis states that X and C m+1 are independent, conditionally on the nuisance covariates.For simplicity, we focus on the situation with a single nuisance covariate.The proposed nonparametric tests are compared with the parametric methods available in standard software represented by the spatstat package.

Simulation study design
The following notation and choices are used in all simulation experiments.Z 1 , Z 2 , . . .are independent, identically distributed Gaussian random fields, centered, unit variance, with exponential covariance function with scale 0.1.The observation window is W = [0, 1] 2 .The expected number of points in W is equal to exp{5} .= 148.4for Poisson and clustered models and approximately equal to exp{5} for models exhibiting regularity.
For each model, we simulate 5 000 independent realizations, and for each realization, we perform a set of tests on the 5% nominal significance level.In the tables of results we report the fractions of rejections for the individual tests, rounded to three decimals.To assess the liberality or conservativeness of the tests, one can compare the reported rejection rates (in experiments performed under the null hypothesis) with the interval based on the 2.5 % and 97.5 % quantiles of the binomial distribution with parameters n = 5 000 and p = 0.05, that is, with the interval [0.0440, 0.0562].
We investigate the performance of the random shift tests with either τp or CW R as the test statistic, with either parametric or nonparametric version of residuals (denoted by the symbol "p" or "n" in the tables of results) and with either torus or variance correction (denoted by "tor" or "var" in the tables of results).The values of τp are computed with the bandwidth selected by the adaptive procedure from Section 3.3.1 and with 100 sampling points chosen uniformly and independently in W .The random shift tests are compared with the parametric tests provided by the functions ppm (for Poisson, Strauss, and hardcore Strauss processes) and kppm (for log-Gaussian Cox processes, denoted LGCP in the following) from the spatstat package, see Section 2.1.
To mimic the practical issues with model specification, we consider these parametric tests both with the correct interaction model and with an incorrect interaction model of a similar type.Specifically, in addition to fitting the correct model to the LGCPs we also fit a Matérn cluster process; for the Strauss and hardcore Strauss processes we fit the models with the interaction distance fixed to either the correct or incorrect value, specified in the tables of results in the column "Variant".We also fit an inhomogeneous Poisson process to all data sets to investigate the effect of ignoring the interaction structure.On the other hand, we do not try fitting clustered models to clearly regular data sets and vice versa.
All the parametric tests assume the log-linear model for the intensity function (1), even though for some point process models we consider below this does not hold.This also illustrates possible issues with model misspecification in practice.

Significance level under independent covariates
In the following, we let the nuisance covariate influencing the intensity function of the point process be C 1 (u) = Z 1 (u) and the covariate of interest be C 2 (u) = Z 3 (u), which means that the covariate of interest C 2 is independent of the nuisance covariate C 1 and the point process X.For the construction of the LGCP models we also use the random field Z 2 , which is responsible for interactions in the point process rather than variation in its intensity function.For the Poisson and LGCP models, the covariate C 1 influences the intensity function directly.For the Strauss and hardcore Strauss models, it directly influences the trend function β(u).This influence is transformed to the intensity function in a nontrivial way.We consider the following models: (P 1 ) Poisson process with intensity function λ(u) = exp{4.5+ Z 1 (u)} (P 2 ) Poisson process with intensity function λ Note that in the shorthand notation for the models the letter represents the type of interaction in the point process while the subscript specifies whether the covariate C 1 influences the intensity function in a log-linear way (denoted by 1) or in a quadratic way (denoted by 2).Since the covariate of interest C 2 is independent of X, the tests should reject in 5 % of cases.Table 1 shows the fractions of rejection.We make the following observations: • The nonparametric tests match the nominal significance level correctly for all models, the tests based on CW R match it slightly more precisely than those based on τp .Both the torus correction and the variance correction perform well, with only a slight tendency toward liberality observed for the torus correction and the tests based on τp .
• Parametric tests assuming correct interaction structure and correct model for the intensity function (denoted by 1) match the nominal significance level correctly for the Poisson process (P) while being highly liberal for the LGCP (L) and hardcore Strauss process (H).They are slightly conservative for the Strauss process (S).
• Parametric tests assuming correct interaction structure and incorrect model for the intensity function (denoted by 2) may exhibit very strong liberality (P, H) or conservativeness (L).
Table 1: Size of the tests, independent covariates -fractions of rejection.For the H models the asterisk signifies that the correct hardcore distance was assumed in the given parametric test, whereas for the P and S models no hardcore distance is assumed.These observations illustrate that parametric tests are prone to perform poorly under model misspecification either in terms of the interaction structure or the intensity function.However, even when both of these model components are specified correctly, there is a risk of strong liberality of the parametric tests with the sample sizes considered here.From this point of view, the nonparametric tests are preferable as they match the nominal significance level correctly for all models in this study.

Significance level under dependent covariates
In this section, we consider the case of the covariate of interest C 2 being correlated with the nuisance covariate C 1 .We denote A smaller value of b implies a stronger correlation between the covariates.We consider the same point process models as in Section 4.2.In fact, we use the same realizations and simply construct the covariate C 1 in a different way.
We have performed the simulation experiments for the L 1 , L 2 , S 1 and S 2 models.However, we report the results only for the L 1 model (denoted L * 1 below to indicate that the covariates are correlated) since the observations made in all these cases are the same.
Table 2 shows the fractions of rejection for the L * 1 model with different choices of b together with the results for the original L 1 model from Section 4.2 which can be considered as the limiting case for b → ∞.The parametric tests exhibit the same rejection rates no matter the value of b due to the specific choice of the (linear) form of the covariates and the log-linear form of the intensity function -for all values of b the parametric tests in fact fit the same model by putting different weights on C 1 and C 2 .
The nonparametric tests show an increasing level of conservativeness with the increasing correlation between the covariates (with b going to 0).For CW R this is caused by the nature of the random shift test and the preferential sampling effects which reduce the variance of the test statistic computed from the observed data (with no shift) compared to the test statistic values computed from shifted data (where the preferential sampling effects are reduced or removed completely), as confirmed by simulation experiments not reported here.Similar conservativeness also appears for τp .We consider this conservativeness to be a smaller issue than liberality and conclude that this observation does not provide arguments against the use of the nonparametric tests.

Power under dependent covariates
In this section, we study the power of the tests in the situations where the covariate of interest C 2 influences the intensity function of X even after removing the effect of the nuisance covariate C 1 .We consider models similar to those in the previous sections and let C 1 (u) = Z 1 (u) and C 2 (u) = Z 1 (u) + 2Z 3 (u).We focus on the case with dependent covariates, which is more challenging for our proposed tests as they showed conservativeness in Section 4.3.The models depend on a parameter a > 0 that controls the strength of dependence between X and the covariate of interest C 2 .The value of a is chosen so that all the tests exhibit nontrivial powers, i.e. not close to 0.05 and not close to 1.00.The models are given as follows: (P p 1 ) Poisson process with intensity function λ(u) = exp{4.5+ Z 1 (u) + aZ 3 (u) − a 2 /2} with a = 1/4.
(S p 2 ) Strauss process with γ = 0.5, R = 0.05 and (H p 1 ) Strauss process with hardcore distance hc = 0.01, interaction parameter γ = 4, interaction distance R = 0.02 and trend and c is chosen for each realization so that the maximum of the given realization of Z(u) over W is 1.
Table 3 shows the fractions of rejections for the individual tests for the eight models specified above.We make the following observations: • For both τp and CW R, the versions of the test based on nonparametric residuals exhibit higher power than those based on parametric residuals.
• The tests based on τp have very low power due to the smoothing and removal of the preferential sampling effects.
• The tests based on CW R exhibit very high power comparable to the parametric tests with correct interaction model and correct model for the intensity function (for P, L, and H models) or even higher power (S).
• When the parametric tests are used with the correct interaction model and incorrect model for the intensity function, the nonparametric tests based on CW R have much higher power (L, S), slightly higher power (H), or direct comparison is not possible due to severe liberality of the parametric test (P).• The torus correction and the variance correction perform nearly equivalently for tests based on CW R, while for tests based on τp the torus correction shows slightly higher power, which can be explained by the small liberality of these tests observed in Section 4.2.
These observations indicate that the random shift tests based on CW R with nonparametric residuals and either torus or variance correction can be preferred in practice to parametric tests since the possible issues with model misspecification are avoided without compromising the power of the test.

Results of further simulation experiments
In the following, we comment on some observations made from further simulation experiments not reported here.First, the scaled versions of the residuals discussed in Baddeley et al. (2005) can be used instead of the raw residuals ( 2) and ( 4).We have investigated the performance of the nonparametric tests based on the inverse and Pearson residuals and compared it to the performance of the tests based on the raw residuals.In terms of rejection rates under the null hypothesis, we have found no significant differences between the three types of residuals.Concerning the power of the tests, the raw and Pearson residuals performed equally well, while the inverse residuals exhibited somewhat smaller power.Second, the nonparametric estimation of the intensity function depending on a covariate can be performed by the spatstat function rhohat using three types of estimators: "ratio", "reweight" and "transform" (Baddeley et al., 2012).In Sections 4.2 to 4.4 we reported the results for the default ratio estimator.The rejection rates for the ratio and the reweight estimators were comparable, while being somewhat higher for the transform estimator, showing slight liberality under the null hypothesis accompanied by slightly higher power under the alternatives.
Finally, the random shift tests were performed in the previous sections with shift vectors generated from the uniform distribution on a disc both for the torus correction and the variance correction to enable direct comparison.When the observation window W is rectangular and the torus correction is used, it might be more natural to consider the shift vectors generated from the uniform distribution on the whole W .In a smaller simulation experiment, the two versions of the random shift test with torus correction performed similarly, with a small tendency towards liberality for the version with shift vectors generated uniformly on W .

Nonparametric covariate selection for the BCI data set
To illustrate the possibility to use the proposed random shift tests for covariate selection, we consider now the BCI data set described in Section 1.2.Five covariates are available that possibly influence the intensity function of the point process.A possible way to select the set of covariates that have a significant effect on the intensity function is the backward selection procedure described in the following.The numerical results are given in Table 4.
We start in stage 1 with all five covariates, and for each of those we perform the random shift test where the given covariate is the covariate of interest and the remaining four covariates are considered to be the nuisance covariates.We use the test based on CW R with nonparametric residuals and torus correction, with 999 random shifts where the shift vectors have uniform distribution on a disc with radius 250 metres.The covariate with the highest p-value (potassium in this case, printed in italics in Table 4) is removed, and the procedure is repeated in stage 2 with the four covariates.In this stage, the nitrogen covariate is removed, then the gradient covariate, and finally in stage 4 where only two covariates are considered (elevation, phosphorus), both covariates are found significant on the 5% significance level, see Table 4.We conclude that these two covariates significantly affect the intensity function of the point process and should be included in the further steps of the inference.Other covariates can be disregarded without losing important information.
For comparison, we have also fitted the log-linear model (1) with the five covariates considered here, using the kppm function from the spatstat package as described in Section 2.1.We assume the Thomas type of interactions as suggested in Baddeley et al. (2015, Sec. 12.4.4).With this approach, three covariates are found significant on the 5% significance level: elevation, gradient and phosphorus, with p-values 0.019, , respectively.Two of these covariates were also found significant by the nonparametric procedure described above (elevation and phosphorus, see Table 4).On the other hand, the gradient covariate was found borderline significant by the parametric approach and not significant by the nonparametric procedure.
6 Nonparametric comparison of dependence strength for the CLM data set We now focus on the CLM data set described in Section 1.2.To assess the strength of dependence of the forest fire locations on the two available covariates we estimate the correlation coefficient τ using (8).The bandwidth is chosen as the default value from the spatstat function density.pppwhich is 50 kilometers in this case.For the elevation covariate the estimated value is 0.035, while for the gradient covariate it is 0.103.The positive signs of the estimated values indicate that the intensity of point occurrence tends to be higher in locations with high covariate values.However, the influence of elevation seems to be negligible and the influence of gradient appears to be very small.When looking at the partial correlation coefficients τp from (9), removing the influence of the other covariate, we obtain the value 0.031 for the elevation covariate and 0.103 for the gradient covariate, respectively.We may also estimate the correlation coefficients for the BCI data set and compare the strength of dependence between the point process and the covariates (elevation, gradient) between the two data sets (BCI vs. CLM).For the BCI data set we choose the bandwidth of 62.5 kilometers in the same way as above.For the elevation covariate the estimated value of τ is -0.048, for the gradient covariate it is 0.249.When removing the influence of all the remaining covariates, including the soil mineral contents, the estimated value of τ p for the elevation covariate is 0.083, for the gradient covariate it is 0.172.We conclude that the influence of the gradient on the point process, after removing the influence of the remaining available covariates, is nearly twice as strong in the BCI data set than in the CLM data set.The influence of elevation, as quantified by the correlation and partial correlation coefficients, is much weaker than the influence of gradient in both data sets.
We remark that in Section 5, the elevation covariate has been determined to have a stronger influence on the point process in the BCI dataset than the gradient covariate while the opposite observation has been made in this section.This can be attributed to the strong dependence between gradient and elevation and the conceptually different methods applied: the CW R test statistic uses the covariate values directly while the correlation coefficients only use the signs of the differences of covariate values.Also, smoothing is required for computation of the correlation coefficients while it is avoided for CW R.

Conclusions and discussion
The methods proposed in this paper allow quantification and testing of the significance of the correlations between a point process and a covariate of interest, possibly after removing the influence of nuisance covariates.We stress that the proposed methods can be used without specifying any model for the data.The simulation experiments reported in Section 4 show that the random shift tests based on τp or CW R match the nominal significance level correctly even in situations where parametric tests based on asymptotic distributions (assuming the correct form of interactions in the point process and the correct form of the intensity function) exhibit different degrees of liberality or conservativeness.Under model misspecification, the parametric tests may suffer from even more severe problems.Concerning power, the nonparametric tests based on CW R exhibit comparable or even higher power than parametric tests under the correct model, while showing higher power than parametric tests under incorrect models (where either the interaction or the intensity function is misspecified).This indicates the superiority of the CW R tests over parametric tests in practical applications where the true model is not known.Hence, using the proposed nonparametric CW R tests for covariate selection, e.g. as discussed in Section 5, provides more reliable results than the available parametric tests are able to provide, and the selected covariates can be used in the further steps of inference with greater confidence.
The only assumption of the proposed random shift tests is that at least one of the objects is stationary under the null hypothesis, so that its distribution is not affected by the shifts.Either the covariate of interest can be assumed to be stationary or the covariate-weighted residual measure or the smoothed residual field can be assumed to be close to stationarity if all the relevant covariates are used in construction of the residuals.
A natural question is whether the proposed methods are applicable also to categorical covariates.If one of the nuisance covariates is categorical, nonparametric estimation of the intensity function may be performed separately on the individual subregions of W determined by the categorical covariate, allowing all the proposed methods to be used as described above.If the categorical covariate is the covariate of interest, computing τ or τp is not relevant due to the ties in the data.However, the observation window W can be separated into subregions W 1 , . . ., W k determined by the values of the covariate of interest.The values V i = n(X ∩ W i ) − W i ρ(C 1 (u), . . ., C m (u)) du, i = 1, . . ., k, can be used to form a vector test statistic (V 1 , . . ., V k ) and the random shift test can be performed e.g. by means of the global envelope test (Myllymäki et al., 2017).This approach corresponds to the determination of differences between point process intensities in subregions W 1 , . . ., W k .Using Fubini's theorem and stationarity of X, we get the desired expression var S = K W λ(u) du.
The variance of S is proportional to W λ(u) du which is the expected number of points in W .In practical situations, this quantity is not known and can be estimated by the observed number of points n(X ∩ W ).
If the intensity function is bounded from above and from below by finite positive constants, W λ(u) du is of order |W | for large observation windows.We take advantage of this in the following simulation experiment where we determine the sample variance of CW R from a set of independent realizations and standardize it by |W | instead of n(X ∩ W ) which is different for individual realizations.Furthermore, following the ideas from Theorem 1 of Dvořák et al. (2022) it can be shown that the variance of the sum in (10) is of order |W | under reasonable assumptions.
We have performed the same simulation experiment as in Section A.1 for CW R. The sample variance of CW R divided |W a | is given in Figure 6, indicating that the variance correction factor |W | correctly captures the variability of CW R across different realizations.

Figure 1 :
Figure 1: The Barro Colorado Island data set.From left to right, top to bottom: locations of trees, terrain elevation, terrain gradient, the soil contents of nitrogen, phosphorus and potassium.

Figure 2 :
Figure 2: The Castilla-La Mancha data set.From left to right: locations of forest fires, terrain elevation, terrain gradient.

Figure 3 :
Figure 3: Left to right: realization of the Poisson process on [0, 1] 2 with intensity function λ(x, y) = 400(1 − 4(x − 1/2) 2 ), the nonparametric smoothed residual field s from (5) depending on the covariate x, the parametric smoothed residual field s from (3) with log-linear model depending on x, the parametric smoothed residual field s from (3) with log-linear model depending on x and x 2 .

Figure 5 :
Figure5: Sample variance of the test statistic τp , multiplied by the area of the observation window (vertical axis) plotted against the area of the observation window (horizontal axis).The scale parameter for the exponential correlation function of the random fields Z i is chosen to be 0.05 (solid line), 0.10 (dashed line) and 0.15 (dotted line), respectively.Left: for parametric residuals; right: for nonparametric residuals.

Figure 6 :
Figure6: Sample variance of the test statistic CW R, divided by the area of the observation window (vertical axis) plotted against the area of the observation window (horizontal axis).The scale parameter for the exponential correlation function of the random fields Z i is chosen to be 0.05 (solid line), 0.10 (dashed line) and 0.15 (dotted line), respectively.Left: for parametric residuals; right: for nonparametric residuals.

Table 2 :
Size of the tests, correlated covariates -fractions of rejection for the L * 1 model with different values of b.

Table 3 :
Power of the tests, correlated covariates -fractions of rejection.For the H models the asterisk signifies that the correct hardcore distance was assumed in the given parametric test, whereas for the P and S models no hardcore distance is assumed.

Table 4 :
Backward selection of covariates for the BCI data set.Individual cells show the p-values of the random shift tests.The row indicates the covariate of interest, while all other covariates considered in the given stage (column) are considered to be the nuisance