Plant community associations with morpho-topographic, geological and land use attributes in a semi-deciduous tropical forest of the Dominican Republic

ABSTRACT Despite being increasingly threatened by human-induced disturbances, dry forests remain the least studied and protected forest types in the Caribbean region. In contrast to many other forest systems in the world, we have little knowledge of the site-specific variation in vegetation communities within these forests nor understand how plant species distribution is determined by environmental variables, including geological attributes. Here, we aimed to provide evidence of the relationship between biodiversity and geodiversity, by assessing the associations between plant communities and habitat types in a semi-deciduous forest of the Dominican Republic. We collected vegetation data from 23 sites within the Ocoa river basin, which we classified into six groups with a Random Forest algorithm, using lithology, geomorphology, topography, and last decade history of forest loss as predictor variables. We established three main clusters: one group, which encompassed sites with forest over a limestone substrate, four groups of sites with forests over a marlstone substrate with varying degrees of steepness and forest loss history, and one group that gathered all sites with forest over an alluvial substrate. In order to measure the associations of plant communities with groups of sites, we used the indicator value index (IndVal), which indicates whether a plant species is found in one or multiple-habitat types and the phi coefficient of association, which measures species preferences for habitats. We found that 16 species of woody plants are significantly associated with groups of sites by means of their indices. Our findings suggest that the detection of plant species associations with our selection of environmental variables is possible using a combination of indices. We show that there is considerable variation in plant community composition within the semi-deciduous forest studied and suggest that conservation planning should focus on protection of this variation, while considering the significance and variability of geodiversity as well. In addition, we propose that our indicator groups facilitate vegetation mapping in nearby dry forests, where it is difficult to conduct thorough vegetation or environmental surveys. In short, our analyses hold potential for the development of site-specific management and protection measures for threatened semi-deciduous forests in the Caribbean.


Introduction
Tropical dry forests are amongst the most-threatened tropical ecosystems in the world [1,2], with 16% of the original area of dry forest remaining in South and Southeast Asia and 40% in Latin America [2,3]. In the Caribbean region, dry forests are usually considered relatively resilient to natural disturbances, due to high levels of biodiversity and a high proportion of root biomass, which aids a quick recovery of above-ground parts of plants by rapid absorption of water and nutrients [4]. However, human-induced disturbances decrease this natural resilience and allow for the invasion of alien species [5]. Despite these threats, dry forests remain the least studied and protected forest types in the Caribbean region [5][6][7].
Some of the key factors that contribute to the resilience of forest ecosystems threatened by anthropogenic activities, are the structural and compositional diversity of the vegetation, and the local presence of old-growth forest remnants [8,9]. Such diversity is, in turn, determined by the spatial turnover and distribution of vegetative communities, the heterogeneity of which is linked to spatial patterns among environmental factors [10,11]. For example, we know that geodiversity may explain richness and distribution of plant species, and by extension determine turnover among vegetative communities, at various spatial scales [12][13][14][15]. However, while efforts are being made to expand this link between biodiversity and geodiversity to conservation and ecosystem management [16,17], empirical evidence of this relationship remains incomplete, especially for (sub-)tropical ecosystems [18]. This also holds true for previous studies of dry forests in the Dominican Republic (hereafter, DR), which have hinted at, but failed to analyze, biodiversity-geodiversity relationships [19,20].
We examined to what extent various environmental factors, those related to geodiversity in particular, drive the spatial distribution of vegetation within a semideciduous tropical forest in the Dominican Republic. Specifically, we hypothesized that the spatial distribution of plant species corresponds to their associations with selected morpho-topographic and geological attributes, as well as past forest loss. And, by extension, that detection of associations of vegetation with environmental variables allows for typification of distinct habitat types within these dry forests; habitat types for which we may find indicator plant species [21]. An understanding of plant community associations with geodiversity and land use attributes would be instrumental for conservation and ecosystem management efforts in these logistically difficult to survey, and topographically complex, forests. For example, identification of indicator plant species may allow us to map the current and potentially changing distributions of communities of plants defined by their mutual associations with aforementioned environmental variables. And, remote-sensing derived morpho-topographic, geological, and land-use attributes are easily mapped across inaccessible areas, from which we may then infer associated vegetation characteristics [22][23][24][25][26][27][28]. All in all, we propose that our findings allow for a better understanding and protection of patterns and drivers of heterogeneity, and by extension diversity and resilience to change, in plant communities of tropical dry forests.

Location and sampling sites selection
We collected our samples within a Swietenia-Coccoloba semi-deciduous forest [20] in the Ocoa river basin of the Dominican Republic (18 � 31' N, 70 � 33' W) ( Figure 1). This forest ranges in altitude from almost 300 m to over 800 m, on slopes of varied inclination. The total annual precipitation is 1300 mm, and the mean annual temperature is 24 � C. The most common lithology is marlstone, but alluvial deposits and limestone are also common.
Using QGIS [29], we established 23 sampling sites by placing random points in digital vector polygons of different rock and landform-types, in proportions that were representative of the area of the Ocoa Basin covered by each rock/landform type (see Table A1). This allowed us to obtain samples from all rock and landform types found in the region but did not take forest loss into account as a factor of importance for our sampling design.  Table A1). The overlapping points are depicted slightly displaced from their actual positions to ensure that all become visible. The background is a color shaded-relief view (red-white is highland, green is lowland) based on a However, during field sampling, we obtained local information that there was considerable recent forest loss (after year 2000) in at least two plot locations, something we were able to confirm using a check of maps created by Hansen et al. [27,28]. These authors defined forest loss as a stand-replacement disturbance or a change from a forest to non-forest state, by assessing percent tree cover over time from Landsat imagery. We thereafter decided to utilize this useful-but not a priori considered at the onset of this studyinformation to conduct preliminary analyses on the impact of forest loss on vegetative community composition.
Due to the challenging conditions for accessing the forest, we adjusted the predefined location of some of the sites while keeping resemblance with traits of the original points. In addition, to assess the risk of spatial autocorrelation in our data, we conducted a spatial neighbourhood analysis using the actual coordinates of the ultimately selected locations. The average distance between the 23 points was ca. 7.1 km, the minimum was 39 m and the maximum was 19.3 km. It is noteworthy that, from 253 pairwise distances calculated between points, only 9 were below 500 m.
To identify rock types we used field recognition and consultation of geological maps (1:50,000 scale) [26], choosing between the following types: limestone (Jura Formation, Middle Eocene), marlstone/mudstone frequently sandy and intercalated with sandstone and boulders (Ocoa Formation, Upper Eocene), marlstone with boulders and sand (Ocoa Formation, Upper Eocene) and alluvial deposits such as boulders, gravels and sand (Quaternary).
Based on our field observations and aerial photography [25], we classified landforms qualitatively, choosing between one of the three types: slopes, which could be of high or low/medium steepness, river bank/margin, and ridge. As a complement, we also calculated the slope from a terrain processed SRTM 1 Arc-Second Global DEM [24]. To estimate the average angle of the slope representative of each entire site, we averaged the slope value of a 5 � 5 pixel moving window centered on each site.

Vegetation data collection
During two field campaigns (fall 2013 and summer/fall 2014), we collected our vegetation data in the 23 randomly placed sites using an adapted version of the Gentry transect protocol [30], based on Cámara & Díaz del Olmo [31]. In each of the 23 sites, we established one 50 m � 2 m (100 m 2 ) transect, sampling a total area of 2300 m 2 . In our adapted version, we recorded vines, trees and shrubs over 2.5 cm dbh and 1.5 m or more in height. We used height and branching as criteria for differentiating between trees, woody plants � 6 m high and shrubs, < 6 m and ramified from their base [32].
To digitize our vegetation data, we used LIBREOFFICE CALC worksheets [33]. In order to avoid the use of unaccepted names, duplicates and synonyms, we cleaned our data consulting international databases [34,35] and reviewing previous research [36,37].
An expert botanist from the National Botanical Garden of Santo Domingo aided us with the identification of the species. We conducted all fieldwork under a permit issued by the Ministry of Environment and Natural Resources and also requested permission to access private land orally and on-site when required.
Based on the environmental variables, we generated a distance matrix (1-proximity) using the machine learning algorithm Random Forest in unsupervised learning mode, a suitable method for variables of mixed types [41]. We configured the algorithm to sample cases with replacement growing 1000 trees and to use out-of-bag proximity estimation. Subsequently, we used the distance matrix as the source for AGNES agglomerative hierarchical clustering algorithm with Ward's method (agglomerative coefficient of 0.91), then we divided the tree into site groups and characterized them accordingly using environmental traits.
In order to assess the association of species with groups of sites, we calculated the following two indices: (1) Indicator value index (hereafter IndVal), which assesses the value of a species as a bioindicator. This index was originally proposed by Dufrêne and Legendre [49], as the product of quantity A, which measures the positive predictive value of a species as an indicator of the site group, and B, which measures the fidelity or sensitivity of the species with the site group. There are variants of the index for presenceabsence data and for individual-based data. Also, in the case of unequal group sizes, the authors suggest the equalization of the relative sizes of all site groups. As we collected plant abundance in field campaigns, and our groups are unequally sized, we opted to calculate an "individual-based group-equalized indicator value index" using the following formula [21]: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi IndVal g ind q ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi A g ind � B pa q ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where A g ind is the mean abundance of the species in the target site group divided by the sum of the mean abundance values over all groups (hereafter A ind ), B pa is the relative frequency of occurrence (presenceabsence) of the species inside the target site group, a p is the sum of the abundance values of the species within the target site group, n p is the number of occurrences of the species within the target site group, N p is the number of sites belonging to the target site group, k is the number of site groups, a k is the sum of the abundance values of the species in the kth site group, and N k is the number of sites belonging to the kth site group.
(2) Pearson's phi point-biserial correlation coefficient (hereafter r pb ), in its group-equalized variant, which is suitable for determining the degree of preference of a species for a specific site group among a set of alternative site groups, having individual-based data. We calculated this index according to the following formula (same notation from previous equation) [21]: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where l is the norm of the vector abundances of the species, N g As stated by De Cáceres and Legendre [21], an advantage of r pb is that it can take negative values, which suggests that species avoid particular environmental conditions. In such a scenario, absences outside the target group contribute to increase the strength of the association, in contrast to IndVal which assumes that having fewer or more absences of a particular species outside the target group is not taken into account for measuring the strength of the association.
Using the function multipatt from package indicspecies, we estimated both indices, IndVal and r pb . This function computes the value of each index for a site group and also for a combination of them. For each species, the function chooses the site group or the combination of them with the highest association value. Afterward, the best matching patterns are tested for statistical significance of the associations, by means of the permutation test and with α set at 0.05.
The permutation test compares an observed statistic with a distribution obtained by randomly reordering the data. Under the null hypothesis of no association, the statistic computed after randomly reassigning the occurrence or abundance values of sites, should be very close to those obtained from unpermuted data. The p-value of the test is the proportion of permutations that yielded the same association values than that observed for the unpermuted data. Therefore, we reject the null hypothesis of no association whether the p-value is lower or greater than the significance level [21].
The strategy of randomly reordering the samples varies according to the number of permutations and the random number generator used (the "seed"). Thus, we assessed the sensitivity of our approach by comparing the results of various tests with different combinations of numbers of permutations and seeds. First, we computed 20 tests, by combining 10 different seeds with 10 3 and 10 4 permutations each. Afterward, we computed 60 additional tests, by combining 20 different seeds with 10 3 , 10 4 and 10 5 permutations each. Overall, we computed 80 p-values, and subsequently we summarized them in a conservative approach by keeping the maximum p-value from the different permutation tests. Thus, we obtained a short list of species significantly associated with habitats, from which we kept only those species with four or more individuals, which at the same time were recorded in at least two sites.
Finally, as a means of validating our results, we performed a redundancy analysis (RDA), which is a type of canonical ordination aimed to extract the structure of a community matrix in relation to an environmental data set. For this, first we generated a Hellinger transformed matrix from our original community data. Afterwards, we fitted the transformed matrix to the environmental matrix using multiple-regression techniques, and summarized the results in "triplot" showing sites, species and environmental variables in a twodimensional space. Finally, we generated a PCA ordination using the Hellinger matrix, and subsequently performed a passive post hoc explanation method, by fitting the environmental variables onto the ordination. We assessed the significance of the environmental variables by means of permutation tests [11,50].

Results
During our field campaigns, we recorded a total of 2158 individuals belonging to 172 species, from which 69 are trees, 68 shrubs, 4 palms, 29 vines and 2 cacti. The vast majority are native (n = 130, 76%) and endemic species (n = 34, 20%). The eight most abundant species represented almost one-third of the abundance, which included Coccoloba diversifolia and Randia aculeata. We collected an average of 93 individuals per site, with a maximum of 175 individuals and a minimum of 47 individuals. The average numeric richness per site was 28 species, with the richest site reaching 44 species and the poorest just 16 species.
To assess the completeness of the sample, we used our abundance data to estimate the expected species richness with several asymptotic methods available in the SpadeR R package [51]. Species richness estimates for our plots ranged from 194 (parametric homogeneous model) to 277 species (uncorrected Chao1), which indicates that our observed richness reached between 62% and 89% of the estimated richness. Although we are aware that we sampled across a relatively small study area (few small plots), we were logistically restricted to this sampling design, and deemed our sampling sufficiently complete to justify further analyses.

Site groups
We classified our 23 sites in six groups, which we characterize below according to environmental traits (see Figure 2

Indicator value index, IndVal
IndVal calculations suggest that 11 species are significantly associated with groups of sites (Table 1). Five species showed significant association with only one group of sites, and the the other six species showed a strong association with combinations of groups of sites.
We highlight that Phyllostylon rhamnoides and Savia sessiliflora are suitable indicators for group A (forests on steep slopes consisting of limestone). Two species, Acacia skleroxyla and Randia aculeata, showed a strong association with forest growing on marlstones in different geomorphological positions. Also, we found that Eugenia foetida is a good indicator of forests on flat marlstone ridges (group E), and tests showed that  Table A1 for a description of the transects. Schaefferia frutescens is significantly associated with group B, which comprises forests on steep slopes consisting of sandy marlstone, intercalated with sandstone and boulders. Moreover, we found that Trichilia pallida is a suitable indicator of group F, which are forests on almost flat river banks or margins (locally steep slopes) consisting of alluvial deposits.

Pearson's phi point-biserial correlation coefficient,r pb
Ecological preference, measured by r pb , showed that 13 species are significantly associated with one group of sites or a combination of them (see Table 2). From this, seven species are shared between this list and that of the IndVal calculations, also showing preference for the same groups of sites with which they were associated through IndVal. In addition, we highlight Ateleia gummifera, Coccothrinax argentea and Leucaena leucocephala, associated with group D, which represent forests on marlstone with tree cover loss reported between 2001 and 2014. Finally, we highlight two other species, Hura crepitans and Picramnia pentandra, which showed ecological preference with group F.

Summary of species associated with site groups
Overall, 16 species are significantly associated with groups of sites by means of IndVal and/or r pb indices (see Table 3). We separated three sets of species based on whether the association to site groups was detected by both indices or only by one of them: (1) Eight species that are significantly associated with site groups by both indices: Ateleia gummifera, Coccoloba incrassata, Eugenia foetida, Exostema caribaeum, Phyllostylon rhamnoides, Pictetia sulcata, Schaefferia frutescens and Trichilia pallida. Thus, these species are indicators and also have significant preference for the habitat types represented in site groups, meaning that associations are likely to be strong.
(2) Three species that are associated with site groups by means of IndVal only: Acacia skleroxyla, Randia aculeata and Savia sessiliflora. These species are indicators of one or more of their associated site groups.
(3) Five species that are associated with site groups by means of r pb only: Coccothrinax argentea, Hura crepitans, Leucaena leucocephala, Picramnia pentandra and Samyda dodecandra. These species show an ecological preference to one or more of the habitats represented by their associated site groups.
Legend: IndVal and r pb : "X" indicates the species is significantly associated with one or more groups of sites, whether by means of only one of the indices or by both of them simultaneously. Species highlighted in boldface are associated to site groups by both indices, which suggests a strong association.

Canonical ordination analyses
We performed a redundancy analysis (RDA), a type of constrained community ordination that incorporates the explanatory variables directly in the ordination process. This technique is suitable for extracting the structure of the composition data set (e.g. the community matrix) that are related to the environmental variables. We found that one-third of the total variance explained by the RDA corresponds to the constrained fraction, which is a high value considering the high complexity of our community matrix.
To represent the RDA trends, we used a triplot, which is a graph that features three types of entities in a single ordination space: sites, response variables (e.g. species) and explanatory variables, which in this case are environmental variables (see Figure 3). The triplot resembles the six clusters identified by the AGNES agglomerative hierarchical Legend: Species: genus and species name in alphabetical order. A, B, C, D, E, F: "X" means r pb shows that the species is associated to the group of sites. r pb stands for phi point-biserial correlation coefficient. clustering algorithm. In addition, the species correlated with the sites are the same highlighted as associated by both IndVal and/or r pb indices to any of the groups previously identified. This is an expected result, since the RDA incorporates the environmental variables in the ordination process.
To test the significance of our environmental variables in relation to the community composition, we used a passive post hoc explanation fitting method by fitting the environmental matrix to a PCA ordination of the Hellinger transformed community matrix. The results of the permutation tests, suggest that both lithology and geomorphology are significant factors in our sample. However, forest loss history and terrain slope resulted in non-significant variables according to the tests (Table 4).

Discussion
We confirm that spatial heterogeneity in the composition of plant communities in semi-deciduous forests of the DR, can be linked to local spatial patterns in geodiversity and topography, with less evidence for the role of past land-use. Specifically, we found support for the hypothesis that plant species are significantly associated with sites characterized by two qualitative factors, lithology and geomorphology. Accordingly, our findings provide new evidence of the links between geodiversity and biodiversity in tropical dry forests, and support the hypothesis of association between forest communities and site groups characterized by lithological attributes found in the previous research [52].
Several studies support most of the species-habitats associations we detected in our study. For example, Phyllostylon rhamnoides (Ulmaceae) and Savia sessiliflora (Phyllanthaceae) often occur on calcareous soils within dry tropical forests [5,53,54]. Similarly, Acacia skleroxyla (Leguminosae) is considered a calciphilous species (i.e. adapted to life in calcium-rich and clayish soils typically developed on marlstone) and Randia aculeata (Rubiaceae) is commonly reported in fertile  Table A1 for a description of the transects.  [54,57,58]. Conversely, knowledge of these species-habitat associations allows us to quickly target high, or highly complementary, plant diversity when gazetting coreprotected areas or conducting restoration efforts. For example, high representation of diverse plant species in a given gazetted area could be achieved by focusing on selecting plots across gradients in lithology [59,60]. This would allow for high species turnover (i.e. unique species in plant communities across lithology types), low nestedness of plant communities (i.e. local plant communities are not merely a subset of the wider species pool), and thus a high beta diversity between locations and ultimately a high gamma diversity for the entire area. And, such an exercise could be facilitated by a focus on the presence of just a few indicator species such as P. rhamnoides (limestone), T. pallida (alluvial), and Acacia skleroxyla (marlstone); all significantly associated with one particular lithology type and distinctly plotted at the extremes of lithologyrelated gradients in the community ordination plot (Figure 3).
Although we found strong species-habitat associations, we acknowledge that factors other than lithology and geomorphology, play a significant role on the plant species distributions of the dry forest analyzed. Future research may address to what extent climate, drainage and soil nutrients at site-level, as well as biotic interactions (e.g. pollination and dispersal), determine vegetation communities in these forests [61][62][63]. These challenging goals may be addressed using fine resolution instrumentation (e.g. soil moisture loggers) and periodic monitoring.
Although we found no significant association between plants and forest loss following the passive post hoc explanation fitting method, we do recognize that both IndVal and rpb calculations suggest a strong association with forest loss for certain species. For example, Coccothrinax argentea (Arecaceae), an endemic palm, is strongly associated with sites that experienced recent tree cover loss (group D), which is in line with this species' needs for direct sunlight and welldrained soils [64]. Similarly, the presence of a species such as Leucaena leucocephala (Leguminosae) in this group-present at all groups of sites but particularly associated with group D-can easily be explained, as this agrees with the notion that this is an invasive species that quickly becomes abundant under open canopies (e.g. elsewhere it is known to invade pastures) [65][66][67][68]. More surprisingly, we also found some species that were not previously considered to be associated with forest disturbances or conditions of direct sunlight occurring in group D, such as Ateleia gummifera and Pictetia sulcata (both Leguminosae), which suggests that these species may be related to disturbed forests or calcium-rich soils [54].
We propose that our analyses hold potential for the development of site-specific management in these forests and may support their conservation by restoration practices and protection. First, we show that geodiversity metrics, which are relatively easy to obtain from remote sensing imagery and official maps, are useful predictors of plant communities in dry forests of the DR. Second, at relatively small-scales, we detected considerable variation in forest composition, a variation that could be related back to the presence of a small set of indicator species. Since it is logistically challenging to survey all dry forests in the DR or the wider Caribbean region, we might thus focus our attention on this suite of indicator species to protect a large and complementary range of biodiversity. And third, we showed the potential of Random Forest algorithms, association analyses based on IndVal and r pb indices, and ordination techniques, in filling in gaps in our understanding of the spatial clustering, association, and variation of vegetative communities of DR's dry forests. Taken together, these three arguments support the notion that our approach can be replicated across other forests and countries in the region and that our methods can be scaled up to aid regional conservation planning.

Disclosure statement
No potential conflict of interest was reported by the author(s).