Characterizing nature-based recreation preferences in a Mediterranean small island environment using crowdsourced data

ABSTRACT Nature-based recreation is a key ecosystem service that contributes to positive physical and mental welfare but, at the same time, nature-based recreational activities can increase human pressure and impacts on natural areas and biodiversity. Understanding people’s preference for visiting natural settings is challenging due to data and methodological limitations. Social media data can be used to map nature-based recreation. However, variation in popularity of platforms and limitations to data accessibility are highlighting the importance of exploring and using different data sources. We analyzed complementary crowdsourced data using an automated content analysis refined by manual identification to assess nature-based recreation ecosystem services across the Maltese archipelago. A content analysis of images uploaded to Flickr between 2015 and 2021 was performed using the Google Vision machine learning algorithm to identify nature-based interactions and nature visitation patterns were modeled based on landscape characteristics, environmental variables and socio-economic parameters. Flickr data were compared and complemented with publicly available geolocated data from the iNaturalist platform. Significant difference was found between the spatial distribution of Flickr and iNaturalist data. Generalized linear models identified coastal areas, protected areas, natural habitats and accessibility via the road network as significant predictors of nature-based recreational visits. Localities with a higher percentage of people receiving old age and unemployment benefits were also positively correlated with users’ preference for nature-based recreation. Finally, we discussed how the low resource methodology developed here to identify people’s nature-based recreational preferences can be used to assess which natural areas should be prioritized for ecological restoration efforts.


Introduction
Enjoyment of nature and engagement in nature-based recreation is recognized as an important cultural ecosystem service that provides a wide range of benefits for individuals and communities (Millennium Ecosystem Assessment 2005).Visiting natural areas and being outdoors can improve physical and psychological wellbeing, enhance social connections and generate revenues that contribute to local economies (Winter et al. 2019).Outdoor recreation and nature's contributions to beneficial health outcomes and the quality of life became increasingly important during restrictions and isolation times imposed by the COVID-19 pandemic (Fagerholm et al. 2021).Furthermore, nature-based tourism is a rapidly growing sector (Elmahdy et al. 2017), and many green areas and protected sites are becoming popular destinations that every year attract an increasing number of visitors.The growing trend in nature-based recreation participation is leading to an overuse of natural attractions with negative consequences for environmental resources and biodiversity (Buckley et al. 2016), including vegetation disturbances, soil erosion and depending on the activity, may impact on air and water quality, noise and cause wildlife disturbances (Coombes et al. 2008;Steven et al. 2011;Monz et al. 2021).It has, therefore, become crucial to ensure a balance between the use of land for recreation and its ecological preservation (Thomas and Reed 2019).
Increasing mobility, overall wealthier living conditions and more leisure time are driving the demand for recreational ecosystem services, at the same time, pressures arising from the use of the same ecosystems for nature-based recreation can impact biodiversity and lead to ecosystem services trade-offs (Guo et al. 2010;Schägner et al. 2013).Integrating nature-based recreational services in landscape planning and management can guarantee the provision of benefits that support human well-being and help identify mitigating actions to reduce the anthropogenic impacts on ecosystems and biodiversity (Wood et al. 2020).The cultural values of an ecosystem or a landscape are irreplaceable but, despite their importance, these services are rarely considered in ecosystem service assessments, land management and policy-making (Plieninger et al. 2013;Tew et al. 2019;Gould et al. 2019).These limitations often depend on how specific cultural ecosystem services (CES) are defined and the availability of data to cover different landscapes across spatial and temporal scales.
In recent years, preferences for nature-based recreation have been studied using social media data.These data sources, compared to traditional surveys of people's preferences that can be time-consuming, resourceintensive and spatial-temporally bounded (Louviere et al. 2000;Ilieva and McPhearson 2018), can offer a high volume of freely available data to characterize people's preference for nature-based recreation activities (Di Minin et al. 2015;Ghermandi et al. 2020;Sinclair et al. 2022).Previous studies have demonstrated that geotagged photographs uploaded on Flickr effectively represented visitation trends and correlated with observed visitation at recreational sites (Sessions et al. 2016;Salas-Olmedo et al. 2018;Ghermandi 2022).Crowdsourced data from social media platforms can also provide insight into the behavior, preferences or habits of visitors of diverse background and origin (Su et al. 2016;Hausmann et al. 2017;Levin et al. 2017).This methodological approach opened new opportunities to assess less tangible cultural ecosystem services, particularly in a multifunctional landscape where often too much emphasis is given to a small number of services that are easily quantifiable and marketed (Van Berkel et al. 2018).
Recent limitations in the use of social media such as decreasing popularity of some platforms, including Flickr, and changes in the data access policy are highlighting an urgent need to refine new methodological approaches (Ghermandi et al. 2023).Combining different data sources, including user-generated-content from citizen science platforms, can help overcome some of these limitations.This approach can also provide a more complete characterization of which landscape attributes drive people's preferences for engaging in nature-based recreation (Wartmann et al. 2018).The potential use of a multi-source approach has been tested to investigate patterns of species distribution and abundance across large spatial and temporal scales (Liconti et al. 2022), to develop environmental indicators (Hartmann et al. 2022) or to map cultural ecosystem services related to biodiversity (Havinga et al. 2023).Content analysis of the large volume of social media data has also recently been facilitated by technological developments including deep learning computer vision (Huai et al. 2022).Despite this methodological progress, efforts to better characterize human-nature interactions depicted by social media and citizen science platforms in heterogeneous regions composed by a variety of ecosystems including marine, freshwater, urban or agroecosystems are still sparse (Sousa et al. 2016).
Complex interactions among diversified landscapes, ecosystem services and societies are especially evident in small islands (Wong et al. 2005), and island ecosystems directly and indirectly promote the welfare of island inhabitants and visitors by providing important regulating, provisioning and cultural ecosystem services (Balzan at al. 2018).Recreation and eco-tourism benefit the economy of local island communities and increase the flow of ecosystem services (Aretano et al. 2013).Naturebased recreation and tourism ecosystem services are well researched in island environments.However, most of the studies focus on mitigating the environmental pressures arising from human use of these services, while few studies evaluate the links between island biodiversity and ecosystem service capacities and flows to communities (Balzan et al. 2018).Without effective planning and management, tourism and nature-based recreation can further exacerbate the overexploitation of habitats and ecosystems (Scandurra et al. 2018;Balzan et al. 2018).Mapping the community access and exposure to ecosystem services within an island environment can therefore be a useful tool for planners and policy-makers as they can allow localization of the most highly valued landscape or ecosystem to improve ecosystem management (Kim et al. 2019).
The aim of this study was to test a new methodological approach to infer people's nature-based recreation preferences across a multifunctional landscape characterized by a diversified land use.Central objectives of this study were to i) combine the use of an image recognition tool and manual classification to identify pictures depicting nature-based recreation from a large volume of Flickr data; ii) compare the spatial distribution of nature-based recreation patterns generated by Flickr and iNaturalist users; iii) compare the temporal variation in nature-based recreational patterns between the less touristic wet period (October to March) and more touristic dry period (April to September); iv) test if the potential difference between Flickr and iNaturalist is influenced by environmental and/or socio-economic parameters.

Study area
The Maltese archipelago includes three developed islands (Malta, Gozo and Comino) and several uninhabited islets.The archipelago (35°52'59.99'N 14° 26'59.99'E) is located in the Central Mediterranean Sea, directly south of Italy (Figure 1).The largest island (Malta) is 27 Km long and 14.5 Km wide, with a total land area of 246 Km 2 .The climate is characterized by hot, dry summers with temperatures ranging from 15°C to 31°C and cool winters.The annual average rainfall is nearly 476 mm (National Statistics Office Malta [NSO] 2014).Around 51% of the territory is characterized by agricultural land use, whilst urban and industrial areas cover more than 30% of the surface (Balzan et al. 2018).Besides more than 200 km of coastline, the heterogeneous landscape supports a variety of important and unique natural areas that provide opportunities for naturebased tourism and recreation to both locals and tourists (Balzan and Debono 2018).Across the Maltese Islands 34 terrestrial Special Areas of Conservation have been declared under the EC Habitats Directive, as well as nine marine protected areas, all of which form part of the Natura 2000 network (ERA 2000).The archipelago has the highest population density in the European Union with 1,595 persons per km 2 (EUROSTAT, 2021).

Crowdsourced data acquisition
Crowdsourced data were acquired from the photosharing website Flickr, one of the largest available datasets with over 92 million monthly active users and up to 25 million daily new uploads (Ding and Fan 2019).The user demographics of this social media platform is wider compared to other social media sites (Oteros-Rozas et al. 2018) and is particularly popular among nature enthusiasts and professional photographers (Toivonen et al. 2019).Compared to other social media, photos and available supplement metadata are easier to extract from Flickr through the application program interface (API), in accordance with the restrictions set by users.Approximately 200 million of these photographs have specific geographic information of where the picture was taken.We searched the Flickr API for all geolocated images that were taken within the Maltese archipelago and uploaded between 2015 and 2021 (January to October).We extracted geotagged and publicly available photo metadata including location, user id, date when each photo was taken, image Uniform Resource Locator (URL) and user information using the 'photosearcher' package in R (Fox et al. 2020).A total of 41,839 images uploaded by users between 2015 and 2021 were downloaded.Photos that were missing URLs or with null geographic coordinates (612) were excluded from the analysis.Images taken during the wet season (October to March) and dry season (April to September) were grouped.
Data retrieved from the Flickr platform were complemented with information from iNaturalist, a popular online social network for documenting species observation with more than 1,460,000 users (Aristeidou et al. 2021).iNaturalist observations, which are usually a record of a single organism, often include metadata like geographic coordinates, date, time, taxonomic identification and other user-defined data fields.We downloaded 3,551 publicly available observations taken between 2015 and 2021 from the iNaturalist Application Programming Interface (API) using the 'rinat' package (Barve et al. 2017) in R (R Core Team, 2020).Observations without geographic coordinates were removed from the dataset.After downloading data from both Flickr and iNaturalist, we clipped observations to the Malta boundary using ArcGIS version 10.4 (ESRI, 2016).

Crowdsourced data content analysis
A content analysis on all 41,227 photos retrieved from Flickr was performed with the Google Cloud Vision API during October 2021.The URL extracted from the metadata of each Flickr photo was analyzed by the Cloud Vision API Label Detection function using the R package 'Roogle Vision'.The Google Cloud Vision algorithm returned a list of 10 labels for each image and a confidence score (between 0no confidence and 1 -high confidence) associated with each label.The analysis was limited to labels with a score of 0.6 or higher.A total of 273,447 labels were retained for further analysis.
We manually classified each label using a two-level classification process.In the first level, each label was manually categorized as biotic nature or abiotic nature, as defined by two broad cultural services distinguished by CICES v5.1 (Haines-Young and Potschin 2018).Labels defining nature-based recreation (i.e.diving, hiking or people recreating in nature) were used to categorize images depicting nature-based recreational activities.All non-nature labels were excluded from the analysis.To obtain a label dataset of a computationally more manageable size, we used a second level classification process.Labels used to describe similar categories including amphibians, birds, butterflies, coastal and oceanic landforms, arthropods, marine wildlife, nature-based recreation, reptiles, other terrestrial animals, terrestrial landscape, terrestrial plants and water were manually grouped (Table S1).
A manual content analysis was also performed by the same person on a random sample of about 10% of the images (about 4200 images) to evaluate the performance of Google Cloud at assigning labels and classifying photos.Each image was manually classified according to the same types of CES defined above.The percentage agreement between the Google Vision and manual classification of each ecosystem service group was quantified using the Cohen's kappa index.Inter-rater reliability was calculated using the R package 'irr'.The spatial distribution of label semantics was mapped to explore the precision and accuracy of Google vision generated labels to derive knowledge of people's preferences for nature-based recreation.A correlation matrix was computed to quantify the number of times that a pair of labels is used to describe the same picture.Data manipulations and visualization were performed using the R package 'corrplot'.
Since we were interested in assessing which places most individuals use to engage in nature-based recreation activities and to avoid bias linked to having a small number users posting a high number of photographs, each dataset was also filtered to include only one observation or image per person per location.We extracted the number of photo-user-days (PUDs) or unique iNaturalist observation (OUDs) by using a combination of user ID and date to retain only one photo/observation taken by each user every day and deleting multiple photos from the same users on the same day.Data manipulations were performed using the R package 'Dplyr'.
The Kernel density estimates analysis was used to visualize the presence of spatial clusters of naturebased recreational hot spots (high values) and cold spots (low values).This data smoothing technique represents the intensity of individual observations over space by transforming a referenced point data into a continuous surface (Bailey and Gatrell 1995).
Patterns and locations of nature-based recreational images were analyzed for the unique Flickr and iNaturalist observations and also for images taken during the wet period (October to March) and dry period (April to September) to assess the temporal variation in nature-based recreational patterns.The ArcGIS Spatial Analyst module (McCoy and Johnston 2001) was used for the analysis.The default parameters, which are based on the spatial configuration and the number of input points, were selected for the search radius (bandwidth).The output cell size was derived from the cell size environment (our study area), the area unit scale factor was set as square map units, the values represented the output raster were set as densities and the method planar was selected.

Statistical analyses
An overlaying grid of 250 m × 250 m grid square cells was generated across the study area, and the total number of unique uploads within each grid square was calculated to conduct spatial analysis.The spatial overlap of each data set was compared by calculating the Jaccard index as described by Lehtomäki et al. (2015).The Jaccard index gauges the similarity between two sets and is computed by dividing the number of observations present in both datasets by the number in either datasets.The index ranges between 0 and 1, with value 1 indicating complete overlap between the two sets, and value 0 indicating that two sets do not overlap.
Spatial variation of nature-based recreation inferred from social media was further analyzed to identify landscape characteristics, environmental variables and socio-economic parameters that might explain the nature-based visitation patterns.Datasets from both Flickr and iNaturalist were centered and scaled, producing standard Z-scores for each variable, and checked for multivariate normality.Since the scores were not normally distributed, the Flickr and iNaturalist Z-values were compared using the Wilcoxon Signed-Rank Test.Generalized Linear Models (GLM) with a negative binomial distribution and principal component analysis (PCA) were adopted to identify the relationships between nature-based recreational patterns inferred from social media data and identified landscape, environmental and socio-economic indicators within each grid.The PCA was used to reduce the dimensionality of the explanatory variable dataset, and subsequently the Flickr and iNaturalist scaled data were fitted on the ordinating using the 'envfit' function in the R package vegan (Oksanen et al. 2022).Explanatory variables selected for the generalized linear model analysis and the PCA were the land use/land cover (LULC) type, road network, coastal zones, protected areas, population size and habitat distribution.The original 16 LULC classes were reclassified into the following 5 classes: (1) crops, (2) garrigue, (3) grassland, (4) urban development and (5) woodland.As potential socio-economic explanatory variables, we combined several measures to represent neighborhood characteristics and included data relating to population size and the number of beneficiaries receiving social benefits as part of social security, unemployment and benefits for elderly people (Table 1).
A total of three GLM were used to assess the relationship with the identified variables for (1) area of each land use land cover category, (2) environmental characteristics and accessibility, including coastal area, number of protected habitats and area of protected areas, and accessibility measured using the number of roads and bus trips, and (3) socio-economic conditions using data for social security, unemployment and elderly benefits (Table S2).For all models, a beyond optimal model was fitted on all explanatory variables, and then backward stepwise regression with the Akaike Information Criterion (AIC) was used to select the best-fit model of the relationship between the response variables of social media nature-based recreation visits and environmental and socio-economic explanatory variables.For the first model, using land use land cover data as independent variables, we added garrigue and grassland cover as these are often observed together within the study area and the resulting model had lower AIC values than when included separately as independent variables.Subsequently, the most parsimonious model was selected using the stepAIC() function in R while starting from the full model.Spatial and statistical analyses were carried out using R statistical software (R Core Team, 2020).

Crowdsourced data acquisition
We retrieved 41,277 Flickr photos taken by 770 unique photographers.More than 40% of the total images recovered depicted some form of nature-based interactions (abiotic-nature, biotic-nature or nature-based recreation).The median number of nature-based photographs taken by each unique user was 4, the maximum number of photographs uploaded by a single user was 3071 and 76.6% of users uploaded fewer than 20 photographs.User information was available for only 48.7% of the total user.Based on the available information 46.5% of the Flicker users considered in this study were international tourists.Only a small percentage of users (2.2%) reported their home location in Malta.We retrieved 3551 iNaturalist observations taken by 1351 unique users.The median number of observations taken by each unique user was 30, the maximum number of observations uploaded by a single user was 116 and 55.5% of users uploaded fewer than 10 observations.

Crowdsourced data content analysis
The Google Vision API returned a total of 273,447 labels including 436 unique labels.Most of the labels assigned by the Google Vision algorithm that were selected to identify 'cultural services' were related to abiotic characteristics of the landscape (i.e.water, terrestrial landscapes, coastal and oceanic forms) or to biotic features (i.e.tree, plant or grass) (Figure 2).
Cohen's Kappa coefficient was 0.80 which indicated that the level of agreement between the manual and automated classification performed by the Google Vision algorithm was 'substantial' (Richards and Tunçer 2018).Overall, the Google Vision algorithm reported a higher number of labels that we classified as a 'non-nature'.For instance, photos with labels like 'sky' or 'sunset' were categorized as non-nature even if they may represent biotic, abiotic or nature-based recreational activities.Photos depicting artifacts representing animals (i.e.fountains with bird artifacts) were labeled with keywords that we classified as 'biotic nature'.The Google Vision algorithm also reported high proportions of photographs categorized as biotic (label = plant) that we manually classified as non-nature because representing urban settings with plants in the background.Most labels (97) were grouped under the 'Terrestrial plants' group followed by 'Terrestrial landscape' (50) and 'Marine wildlife' (49).Figure 3 shows an exemplary representation of specific labels generated by the Google Vision algorithm for specific images.Some of the images representing elements of water or seascape were also misclassified by the Google vision algorithm.For instance, images depicting water ecosystems and labeled as 'lake' also needed to be reclassified since there are no lakes in Malta or the label 'underwater lake' was used to describe sea caves (Figure 3).
Visual analysis of the spatial distribution of each group of labels revealed, as expected, that images identified with labels grouped as 'coastal and oceanic landforms' or 'water and marine wildlife' were geolocated near coastal areas (Figure 4).Images with labels identified in the 'nature-based recreation' group were localized in both coastal and urban areas.Several images characterized by labels grouped as 'water/seascape' were localized in inland areas.Manual content analyses revealed that some of these images represented artificial water structures (i.e.fountains) or terrestrial ecosystems (i.e.agricultural fields) that were misclassified by the Google algorithm.
Co-occurrence analysis highlighted that labels depicting nature-based recreation were frequently reported with keywords representing marine wildlife, coastal and oceanic landforms and birds.Labels representing terrestrial animals and birds often cooccurred with nature-based recreation labels (Figure 5).Labels grouped as water/seascape were also often paired with labels in the terrestrial plant group, probably because of the misclassification of the Google algorithm that identified some terrestrial landscapes as water features.

Kernel density estimation
The kernel density estimation identified the presence of a concentration of high or low values of images describing nature-based interactions within the dataset.Most observations depicting nature-based interactions were taken in coastal areas and within natural and wildlife reserves.A high concentration (red areas) of social media observations was also taken within the dense urban area of the Valletta Grand Harbour region in the green urban and peri-urban areas surrounding the capital (Figure 6).Spatial patterns of nature-based recreation differed among the data sources.The Jaccard similarity index between Flickr and iNaturalist hotspots and between distribution patterns of geotagged observations taken during wet and dry periods was 0.04, which indicates a relatively low spatial overlap between datasets.Similarly, when the Flickr and iNaturalist Z-scores within the grid cells were compared using the Wilcoxon Signed-Rank test, the distribution of records from the two platforms was significantly different (T = 202,018; p < 0.001).Visual analysis of the Kernel density maps indicates a higher concentration of data uploaded during the wet season within urban areas, while during the dry season there is a high concentration of images in rural and less developed areas like Gozo and Comino (Figure 6 c-D).

PCA analysis and generalized linear models
A PCA was used to visualize the association between the social media scores and the considered explanatory variables (Figure 1 SI).Principal Component 1 (PC1) explained a total of 42% of the total variance, while PC2 and PC3 explained 20.6% and 10% of the variance, respectively.The other principal components explained 7% or less of the remaining variance.The PC scores indicate a strong positive association of cropland and positive but weaker associations of garrigue and grassland LULC categories, coastal areas and protected sites with PC1.A strong negative association of the urban cover and road density is recorded with PC1, while weaker negative associations are recorded with the woodland category.When socio-economic variables are considered, population size and beneficiaries of unemployment benefits were negatively associated with PC1, while old age disability and social security beneficiaries were positively associated with PC1 (Figure S1).
The GLM analyses indicate that Flickr was significantly negatively associated with cropland and woodland LULC categories, while iNaturalist showed a significant positive association with urban and woodland LULC (Table 2, Figure S2 and S3).Both Flickr and iNaturalist scores were positively associated with protected habitats, protected areas and the road network, but only Flickr was found to be significantly positively associated with coastal areas (Table 2, Figure S2 and S3).Significant interactions were recorded between these variables, and both coastal areas and protected areas showed a significant negative interaction with protected habitat richness for both Flickr and iNaturalist scores, while a positive interaction of coastal areas with road density for Flickr data was recorded (Table 3).Flickr scores were negatively associated with social security beneficiaries but positively associated with beneficiaries for elderly, while iNaturalist scores were positively associated with beneficiaries receiving old age and unemployment benefits but, in both cases, no significant direct effect of population size was recorded (Table 4, Figure S2 and S3).

Combining crowdsourced data for nature-based assessments
Using complementary crowdsourced data analyzed combining an automated image recognition tool and manual clustering, we identified and mapped images depicting human-nature interactions.We developed spatial models to assess the relative importance of different landscape characteristics, environmental variables and socio-economic parameters associated with nature-based recreation opportunities, which are particularly relevant in areas, like the Maltese archipelago, characterized by a complex spatial heterogeneity.Integrating CES in land use planning can be challenging in multifunctional landscapes that generate multiple ecosystem services as more emphasis is usually placed on those ecosystem services that are more easily quantified (Plieninger et al. 2013;Queiroz et al. 2015).Our approach to assess the spatial distribution of cultural ecosystem services can be a useful tool for policymakers and land-use planners as it can allow the identification of popular nature-based recreation areas where to prioritize management efforts to improve recreational ecosystem services while preserving biodiversity.
The use of Google Cloud Vision enabled us to obtain information about people's preferences for engaging in nature-based recreation from a high volume of data.Although some manual labor was required to manually verify the accuracy of the labels generated by the Google Vision algorithm, we were able to process and classify more than 41,000 Flickr images in a few hours.Compared to other available artificial intelligence tools, the Google Cloud Vision is a pre-trained machine learning model, relatively easy-to-use and does not require high computational power (Lingua et al. 2022).Despite these advantages and the substantial level of agreement between the manual and automated content analysis, limitations associated with the use of Google Cloud Vision were also evident.Our results are in agreement with previous findings (i.e.Richards and Tunçer 2018;Runge et al. 2020;Gosal and Ziv 2020) that the level of   accuracy of this automatic classification tool is about 85%.By manually eliminating the non-relevant images and reclassifying the mislabeled keywords, we were able to increase the level of accuracy to about 95%.
We tested a methodology based on manual identification of thematic groups of photos to refine the automated content analysis.This approach was accurate and allowed us to downscaling to a manageable size and initial large volume of photos.The majority of images unsuitable for the analysis were omitted, which provided more robust results about complex human-nature interactions.Co-occurrence analysis of labels grouped in different categories also allowed to extract nature-based images based on semantic descriptors.Textual description, tags or titles generated by the users are usually used for co-occurrence analyses that are carried out to improve content-Table 3. Generalized linear model with a negative binomial distribution result of environmental accessibility variables calculated for each grid cell and normalized values of social media nature-based visitations.The most parsimonious model (lowest Akaike information criterion with a second order correction, AICc) for each response variable was selected as the best model.based retrieval (Garg and Gatica-Perez 2009;Bouchakwa et al. 2020).In this study, we used labels generated by Google Vision for the co-occurrence analysis, which further helped verification of the content-based automatic annotation of the algorithm.By complementing the label co-occurrence analysis with geographic label distribution, we were also able to further refine connections between labels, landscape characteristics and people's preference for naturebased recreation.

Exploring complementary crowdsourced data to assess nature-based recreation
Our results suggest that Flickr and iNaturalist may represent different nature-based recreation preferences and, therefore, can provide complementary information.Based on the Jaccard Index, the spatial overlap between these datasets was relatively low as different data types highlighted different hotspots.iNaturalist data hotspots were mostly concentrated in areas with high population density, while Flickr data hotspots were near coastal and protected areas.These results confirm that combining data from complementary social media networks can allow a more accurate assessment of people's heterogeneous preferences for nature-based recreation as the use of a single platform may underestimate the contribution of natural attractions and landscapes.Exploring the use of complementary datasets to investigate cultural ecosystem services has become particularly relevant due to the recent limitations and decreased popularity of some social media platforms, including Flickr (Ghermandi et al. 2023).Both Flickr and iNaturalist provided relevant information about human-nature interactions.While Flickr has been extensively used in environmental research (Ghermandi and Sinclair 2019;Fox et al. 2020), the citizen science platform iNaturalist has gained more popularity in recent years and data retrieved from the iNaturalist portal has been used in a large amount of research related to biodiversity (Mesaglio and Callaghan 2021;Aristeidou et al. 2021) but not to map nature-based recreational activities.
In our study, we considered the occurrence of a photo depicting nature-based interaction as an indicator of people's interest and use in a particular location.Caution is needed when interpreting these results as they cannot be used as a surrogate for visitor counts.It should also be considered that the content of an image alone is not representative of the factors that drive people-nature interaction.For instance, it does not inform if the nature experience was positive or negative.To better inform policy and decision-makers, the analysis of social media should be strengthened and complemented with information about the personal or emotional value that can be retrieved from the image textual metadata (Fox et al. 2021), while other techniques, which more strongly involve stakeholders, may also be used to obtain an in-depth qualitative understanding of people preferences, particularly those of locals.Surveys or public participation GIS (PPGIS) methods, involving indepth discussions with respondents to identify and characterize sites of recreational interest, combined with social media analysis can be used to reveal local people's preferences (Depietri et al. 2021).

Factors influencing nature-based recreation
The Kernel Density estimation was used to visualize and compare the spatial and temporal distribution of nature-based recreation hotspots generated by Flickr and iNaturalist users, while the statistical analyses were carried out using grid data.Results from the GLM aligned with the Kernel Density estimation in identifying coastal, protected areas, protected natural habitats and accessibility via the road network are important in providing high appreciation of naturebased interactions.These results support previous studies which showed the importance of protected areas and coastal areas in providing nature-based recreation opportunities (Ruiz-Frau et al. 2020;Cheung et al. 2022).GLM results also showed significant negative interactions of coastal and protected areas with number of habitats of conservation value and protected areas, representing a trade-off between these two variables in their impact on Flickr and iNaturalist scores and indicate lower visitation for sites having higher conservation value.These results may, therefore, show a negative impact of existing site management and recreational visitation on habitats of conservation value but could also be a consequence of the distribution of habitats of conservation value which are more predominant in coastal areas within the study area (Balzan et al. 2018).Localities characterized by a high percentage of beneficiaries receiving old age and unemployment benefits were positively associated with Flickr and iNaturalist activities.These findings imply that these neighborhoods at relative disadvantage are particularly important in generating nature-based visitations despite, as shown in previous research, having low ES capacity (Balzan et al. 2021).
Other studies have often documented conflicting observations, and namely that low-income, disadvantaged communities are characterized by lower availability and access to CES (Hamstead et al. 2018;Gourevitch et al. 2021).Results obtained here are likely to be influenced by the high availability of records from the localities around the Grand Harbour, which has some of the highest population densities within the case-study areas, and which previous results have shown to also have some of the highest ecosystem service flows per unit area within the study area (Balzan et al. 2018), including for recreational ecosystem services (Balzan and Debono 2018).Our results emphasize the importance of supporting ecological restoration and placemaking actions supporting nature-based interactions particularly in localities at relatively socio-economic disadvantages, while ensuring that the quality of these spaces, their conservation value and ecosystem service capacities are maintained despite the high flows of nature-based recreation activities.This study contributes to recent research using multiple platforms to ensure a wider representation of human behaviors associated with the use of nature for recreational purposes and provides evidence that social media data can corroborate the magnitude and spatial distribution of anthropogenic pressures in natural areas.Identification of nature-based visitation hotspots can help inform what landscape characteristics attract people and in which activities they engage.Within the context of multifunctional landscapes, like the Maltese archipelago where rural, urban and coastal ecosystems are interconnected, an understanding of the relationships between the different types of landscape characteristics, socio-economic parameters and cultural ecosystem services can be used to prioritize management actions, to improve the contribution of naturebased recreation to well-being while mitigating negative environmental impacts.
Our findings can help land use planners and decision makers to identify areas providing opportunities for human-nature interactions and where management of natural resources should be prioritized based on people's recreational preferences.Naturebased recreation is growing worldwide along with the availability of information through the use of social media and citizen science platforms (Winter et al. 2019).Developing a rapid and low-cost method to inform natural area management is important to guarantee a sustainable development, especially when conservation authorities lack the resources to assess visitation and implement management practices.

Conclusions
This study presented a methodological approach using an automated image recognition tool and manual clustering that allowed to improve the accuracy of the machine learning algorithm and provide more robust results of where and when people visit a specific location to engage in nature-based recreational activities.We tested the use of two complementary crowdsourced data to map and assess the relative importance of nature-based recreational activities on small Mediterranean islands.Sites that have high conservation value like coastal areas and protected sites provide opportunities for a plethora of nature-based activities.Road accessibility was also identified as variable influencing nature-based recreational visits.Our results highlight some spatial overlap of areas rich in biodiversity with recreational hotspots.Monitoring people's visitation patterns and its characteristics in natural areas can inform how biodiversity and the benefits provided by nature-based recreation can be simultaneously supported.However, more research is needed to assess the impact of nature-based recreation within the framework of multifunctional island landscapes, as documented by the case-study of the Maltese archipelago, given that some sites of high conservation value are also highly frequented for recreation purposes.

Figure 1 .
Figure 1.Map of land/use land/cover across the Maltese archipelago.Source: land use/land cover data from Balzan et al. (2018).

Figure 2 .
Figure 2. Bar plot of the 20 most frequent labels representing nature-based ecosystem services assigned to Flickr photographs in Malta by Google's Cloud Vision algorithm.

Figure 3 .
Figure 3. Examples of randomly selected photographs shared on Flickr and identified with specific labels within each of the similar groups.All images were uploaded on the Flickr database under the creative common license for further non-commercial use.

Figure 4 .
Figure 4. Locations of different groups of labels characterizing Flickr images across the Maltese archipelago.a) Labels depicting nature based recreation, terrestrial plans or terrestrial landscape; b) labels depicting amphibians, reptiles, other terrestrial animals, butterflies, birds or arthropods; c) labels depicting marine wildlife, coastal and oceanic landforms or water/seascape.Background map: national geographic style map.

Figure 6 .
Figure 6.Kernel density maps representing the hot spots of PUD or unique iNaturalist observation.The red color indicates the highest density of images/observations in the area, while the blue color indicates the lowest density.a) Photo user-days (PUD) taken by Flickr users; b) observation user-days (OUD) taken by iNaturalist users; c) photo user-days (PUD) taken during the dry season (April to September); d) photo user-days (PUD) during the wet season (October to March).

Table 1 .
Environmental and socio-economic explanatory variables considered in the multiple linear regression analysis.*Habitat of ecological importance according to the habitat directive ('the habitats directive'.Europa.European commission.Retrieved 26 May 2023). https://msdi.data.gov.mt/

Table 2 .
Generalized linear model with a negative binomial distribution result of land-use/land cover variables calculated for each grid cell and normalized values of social media nature-based visitations.The most parsimonious model (lowest Akaike information criterion with a second order correction, AICc) for each response variable was selected as the best model.

Table 4 .
Generalized linear model with a negative binomial distribution result of socio-economic variables calculated for each grid cell and normalized values of social media nature-based visitations.The most parsimonious model (lowest Akaike information criterion with a second order correction, AICc) for each response variable was selected as the best model.