Assessing FAIRness of citizen science data in the context of the Green Deal Data Space

As part of the European Data Strategy, the European Commission is working on common European data spaces, including a Green Deal Data Space (GDDS) that covers issues such as climate change, circular economy, pollution, biodiversity, and deforestation. The successful development of the EU GDDS will depend on the availability of FAIR (findable, accessible, interoperable, and reusable) data sources, including FAIR citizen science data. While the importance of FAIR principles is increasingly acknowledged within the field of citizen science, sources of FAIR data outside the biodiversity domain are generally scarce. This is contributed by the lack of end-to-end technical solutions, readily available semantic resources to support data interoperability, and centralised data repositories suited for citizen science data. To investigate the current state of play with citizen science data FAIR compliance, we conducted a review to elicit platforms, tools and standards either used by or indicated as suitable for facilitating stages of the citizen science project lifecycle. We report on the results of our review and discuss gaps that still exist to achieve citizen science data FAIRness. We also examine three data aggregation platforms identified in our review which closely align with FAIR, namely: the Global Biodiversity Information Facility, OpenStreetMap, and Sensor.Community.


Introduction
To overcome the challenges of climate change and environmental degradation, the European Union has adopted the European Green Deal (EC 2020) as a way 'to transform the EU into a modern, resource-efficient and competitive economy, ensuring: no net emissions of greenhouse gases by 2050; economic growth decoupled from resource use; no person and no place left behind' (EC 2023).
As part of the European Data Strategy towards establishing a Single EU Market for data, the European Commission proposes European data spaces across several domains to enable easy data flow between countries and sectors (EC 2022).This includes a Green Deal Data Space (GDDS) covering issues such as climate change, circular economy, pollution, biodiversity, and deforestation (Farrell et al. 2023).To be useful to decision makers, the GDDS will need FAIR (findable, accessible, interoperable and reusable) data sources (INSPIRE 2022).
focus on machine capability to automatically find, access, interoperate with and reuse assets, thereby promoting open science, which may be defined as: 'a collaborative culture enabled by technology that empowers the open sharing of data, information, and knowledge within the scientific community and the wider public to accelerate scientific research and understanding' (Ramachandran, Bugbee, and Murphy 2021).
While there is no full agreement in the scientific community on how FAIR should be evaluated in practice (Peng 2023), the Go FAIR initiative defines a FAIR assessment framework consisting of 10 principles and sub-principles, with a total of 15 criteria (GO FAIR 2016).

Citizen science
The term 'citizen science' primarily emerged from the field of biodiversity (Bonney et al. 2009); it can be defined as general public or non-expert participation in scientific processes to produce or enrich scientific knowledge (Eitzel et al. 2017).There is no single agreed definition of citizen science; a comprehensive list of definitions can be found in Haklay et al. (2021).
While the main purpose of citizen science projects varies, most collect data as a part of their activities.This data can play an important role in complementing official data sources (Fritz, Costa Fonte, and See 2017;Haklay, Mazumdar, and Wardlaw 2018;König et al. 2021;Sullivan et al. 2014;Sy et al. 2020), not least because of its currency and specificity (e.g.Ferri et al. 2020).While quality of such data remains a concern (Aceves- Bueno et al. 2017;Stevenson, Merrill, and Burn 2021;See 2019), the development of consistent study protocols, advanced data collection and data visualisation tools, data standards and protocols, machine learning for calibration and outlier detection can help address some data quality issues (Balázs et al. 2021;Fraisl et al. 2022;See 2019).
Active citizen engagement in science is now one of the European Research Area priority actions, as defined in the Pact for Research and Innovation (R&I) in Europe (Council of the EU 2021).The Open Science Policy of the European Commission recognises citizen science as one of its eight policy ambitions, stating that 'the general public should be able to make significant contributions and be recognised as valid European science knowledge producers' (EC 2019).These developments underline the importance of including citizen science data within the GDDS.However, to be fit for integration into any Data Space or be effectively and properly re-used outside of the project that collected the data, citizen science data needs to adhere to the FAIR data principles.This is supported by the '10 Principles of Citizen Science' developed by the European Citizen Science Association (ECSA) where Guideline 7 states that 'Citizen science project data and meta-data are made publicly available and where possible, results are published in an open access format' (ECSA 2015), implying adherence to FAIR principles.
The goal of this work is to investigate the current state of play regarding FAIRness of citizen science data.As such, we explore the following research questions: (1) What platforms, tools and standards are currently used by the citizen science community to collect, document and share citizen science data?(2) Can the tools used by the citizen science community effectively support the production and governance of FAIR data?(3) What gaps still exist to support FAIR citizen science data?
To address these research questions, we conducted a review of scientific publications, citizen science conference publications and major citizen science platforms to identify tools, platforms, standards, and standardised resources used by, or suitable for running citizen science projects.In this paper, we present the results of our review and examine three longstanding initiatives and data aggregation platforms that successfully apply open standards and tools to collect and share community-generated data and are relevant to the GDDS.
In Section 2, we outline the review method, while Section 3 discusses major citizen science project discovery platforms.Section 4 examines three initiatives which closely align with at least some of the FAIR principles.Sections 5 discusses the tools that can support citizen science projects at different lifecycle stages and considers open standards that could support citizen science data FAIRness.Finally, Section 6 presents conclusions and discusses gaps that still exist for enabling FAIR citizen science data.

Method
The search for scientific papers was conducted in March 2024 in Scopus and Web of Science by searching ('Citizen Science' AND FAIR) keywords in 'title', 'abstract', and 'author keywords'.The search returned 123 results (plus one collection of 28 papers related to FAIR in citizen science).After removing 41 duplicates and screening for relevance, the final set consisted of 32 publications.Only those that directly focused on citizen science and discussed FAIR principles were considered.
We additionally conducted a review of (1) ECSA 2022 Conference Proceedings, (2) S*Csi 2023 Conference abstracts and poster presentations, and (3) information resources available on the EU-Citizen.Science, SciStarter, CitizenScience.gov,CSA, ECSA platforms.This was done to capture current trends in citizen science project management since many projects may not have resources, sufficient scientific results, or awareness to publish their work in peer-reviewed conferences and journals.
Each source was reviewed to elicit platforms, tools and standards either used by or indicated as suitable for facilitating stages of the citizen science project lifecycle: project hosting, data collection, data documentation, data storage, and data publication and sharing.Figure 1 presents the identified platforms, tools and standards.Table 1 lists discovered information resources relevant to open data and FAIR principles.Table 2 lists semantic resources identified in the review; these relate to the interoperability facet of FAIR.Detailed summaries of the results available in Appendix 1 and 2.

Citizen science project discovery platforms
Citizen science project discovery platforms facilitate the search of citizen science projects hosted on the platform itself and/or other sites.Such platforms might be considered as an obvious choice to search for data; however, at present, these primarily focus on project discovery by prospective participants or collaborators and the provision of guidelines and resources, rather than on the curation of project data.Our review identified three project discovery platforms.Here, we focus on EU-Citizen.Science and SciStarter since CitizenScience.govonly supports the US federal government projects.
EU-Citizen.Science 1 was established in 2019, initially funded by the EU Horizon 2020 programme, and now supported by a consortium of 14 partners and nine third parties.It primarily focusses on projects within the EU but is not exclusive to Europe.It contains 271 projects and, in addition to project discovery, offers 220 information resources, a Moodle Training Platform with 24 training courses, and a Swagger API for retrieving full project metadata.
SciStarter 2 was founded in 2011 and is primarily supported by grants from the National Science Foundation, Institute for Museum and Library Services, Schmidt Futures, NASA, and National Library of Medicine.SciStarter is a global platform covering a range of thematic areas and is more popular among US-based projects.It contains 1528 registered projects, and 426 free and low-cost tools (e.g.designs for sensors and testing kits) for making observations, recording data, and processing samples.The platform offers data hosting, which allows users to submit their observations and permits the visualisation of observations on a map.This enables potential re-users to better evaluate project data for fitness-for-use; however, raw observation data is not accessible to download.Project discovery platforms deliver the vital function of promoting citizen science projects and offering resources and training for citizen science practitioners.While such platforms continue to grow and evolve (as demonstrated by the large number of projects and resources listed on EU-Citizen.Science and SciStarter), these are unlikely to serve as centralised citizen science data hubs due to the lack of necessary technical resources and data licensing issues, as there is no obligation for the projects to provide open data.

Data aggregation platforms
Our review identified three large-scale initiatives and data aggregation platforms which closely align with at least some of the FAIR principles and are relevant to the GDDS: The Global Biodiversity Information Facility (GBIF), OpenStreetMap and Sensor.Community.We examine these initiatives highlighting their approaches to achieving data FAIRness.
Before diving into the discussion, it is important to note two structures of governance of public participatory science, namely top-down and bottom-up approaches.
The top-down approach traditionally refers to the type of governance where a central governing body or funder seeks information from the public and makes executive decisions (Liu et al. 2021).This type of governance is also known as 'consultative' and 'functional levels of participation' (Conrad and Hilchey 2011).The benefits of this approach are standardised protocols and data formats that support interoperability -users and machines know what to expect.Drawbacks include lack of flexibility and challenges of adopting and implementing rigorous standards imposed by the governance body (Ceccaroni, Bowser, and Brenton 2017), valuable knowledge from contributors can be lost if its concepts are not captured by a strictly defined data model.
Bottom-up governance structure often results from a community response to a crisis, with the intention to initiate government action (Conrad and Daoust 2008).This type of governance is also referred to as transformative, community-based, grassroots, or advocacy (Conrad and Hilchey 2011;Wolff and Muñoz 2021).In a bottom-up approach, standards are loosely defined, and all members can participate equally in decision-making.There are views that this type of governance is more favourable and leads to more sustainable use of resources (Bradshaw 2003).The main benefits are flexibility and natural shaping of standards from diverse community contributions.Flexibility can also be a disadvantage, since communal harmonisation and decision-making are slow, and a non-standardised approach affects interoperability and credibility (Bradshaw 2003) when it is impossible to know what to expect from data.Additionally, funding and platform stability can be challenging to maintain (Bradshaw 2003).While on the opposite sides of the spectrum, data collected using bottom-up approach is complementary to the data collected following a top-down participation governance (Elwood, Goodchild, and Sui 2012) and policymakers should find a balance between two approaches (Marchezini et al. 2017).
The Global Biodiversity Information Facility (GBIF) 3 is an international network that promotes and facilitates free and open access to biodiversity data from across the globe.GBIF was established in 2001 through a Memorandum of Understanding between participating governments, and is now funded by agencies from national governments with voting rights.GBIF accepts data from diverse sources, including citizen science initiatives such as iNaturalist and eBird (also identified in our review).GBIF facilitates searching for species occurrence data, taxonomic information, and biodiversity datasets.The platform contains over 2.5 billion species occurrence records and over 90 thousand datasets.Data is available for download as a zip file in two formats: tab-delimited CSV (only data that has gone through interpretation and quality control), and Darwin Core Archive (DwC-A) (the original data as shared by the publisher(s) and the interpreted quality-controlled data).Each data download has a Digital Object Identifier (DOI) that, in accordance with the licence, must be cited when using the data; this increases transparency and reproducibility by recording the provenance of the data.
While Darwin Core is the required format for GBIF published data, there is consensus that Darwin Core alone is not sufficient to support a variety of richer and more complex types of biodiversity data.GBIF provides Registered Extensions 4 and actively supports the initiative to evolve their biodiversity data model. 5 The success of GBIF in becoming the largest open biodiversity data provider lies not only in developing a stable software platform but also in providing standardised but evolving data and metadata standards, best practice documents, and technical tools.GBIF Darwin Core Archive Assistant, Validator Tool and Integrated Publishing Toolkit facilitate the structuring of data using the DwC-A format, validation of datasets before uploading to GBIF, and publishing of datasets through the GBIF network.These resources make the platform more accessible to a wide range of stakeholders, and ensure data openness, correctness, and interoperability.
GBIF, which includes citizen science observations data, is actively working towards observing FAIR principles and only accepts data contributions that align with FAIR.While GBIF is a potential biodiversity data source for the GDDS, some limitations should be noted.The DwC-A data model ensures that data consumers always know how to query data and what format to expect when downloading it, but also results in the loss of valuable data that does not conform to the model's structure, such data needs to be hosted elsewhere, contributing to data fragmentation.Differentiating citizen science data on GBIF is not a straightforward task; data can be filtered by the provider but, e.g.museums can contribute both official and citizen science observations.OpenStreetMap (OSM) 6 is a collaborative platform and project that aims to create an editable, open-access map of the world from contributions by citizens.A community of volunteers from across the globe use GPS devices, aerial imagery, and local knowledge to map and verify various features, including roads, buildings, parks, rivers, and more.The platform is financed by regular donations, intermittent fundraising appeals, and OpenStreetMap Foundation membership, and is currently hosted with support from University College London and other partners.
An in-depth review aimed at readers with little knowledge of OSM is offered by Mooney and Minghini (2017).Here, we summarise the key features and most prominent services and tools that utilise OSM data.
A vast and ever-evolving range of third-party applications, tools, and services are developed using OSM data. 7Commercial companies use OSM data for mapping services (Geofabrik), navigation (Mapbox, Mapzen, OSMAnd), live traffic updates and road conditions (MapQuest), geospatial analytics (CampToCamp).Examples of prevalent free OSM-based services and applications include route planning and navigation for outdoor activities (Komoot), cycling infrastructure and cycling route planner (OpenCycleMap, BBBike), accessibility information for wheelchair users (Wheel-Map), support for humanitarian and disaster response and mapping of the most vulnerable and disaster-prone areas (The Humanitarian OpenStreetMap Team HOT, Missing Maps).Successful applications of OSM data, not only in open source but also in commercial settings, demonstrate its high value as a re-usable resource.
The core function of OSM is to collect, maintain, and distribute an open global geospatial database, rather than to produce cartographic products and maps (Mooney and Minghini 2017).The OSM conceptual data model of the physical world consists of three basic elements 8 : nodes that define points in space, ways that define linear features and area boundaries (polygons and polylines), and relations that define logical collections between elements.On creation, each element in OSM is assigned a unique identifier that is also linked to its subsequent versions.An element must contain at least one tag that describes its specific properties; this creates structured metadata and adds an essential semantic meaning to each element in the database.There are many resources to guide users in identifying appropriate tags and understanding tag usage (e.g.TagInfo 9 ).
There are many ways in which OSM data can be accessed 10 : download of a complete copy of OSM database (updated weekly) or a full OSM editing history 11 , regional datasets 12 , unfiltered raw data 8 , data in GeoJSON format. 13OSM offers a RESTful Editing API supporting developers and applications in creating, reading, updating, and deleting OSM data programmatically.Such services facilitate easy access to and interoperability of OSM data, a crucial aspect for seamless integration into the GDDS.
The OSM initiative closely aligns with FAIR by observing good practices of open data.Successful applications of OSM data exist covering all the GDDS themes.Some examples include support for global climate resilience 14 , studies on urban heat islands (Dimitrov, Popov, and Iliev 2021), classification of local climate zones (Fonte et al. 2019), OSM CircularEconomy project (OSM 2022), environmental assessment studies (Kloog, Kaufman, and Hoogh 2018), research on habitat fragmentation and disturbances (Bista et al. 2021;Snell et al. 2020), crowdsourcing mapathons for detecting deforestation (Bratic and Brovelli 2022), and urban forest mapping (PlanIT 2023).OSM presents a valuable resource for inclusion in the GDDS, though an additional layer of applications and semantic resources will be required to facilitate data discovery and data integration with other sources.
Sensor.Comunity 15 , formerly Luftdaten.info, is an open-source, community-driven project aimed at building and deploying low-cost air quality sensors and providing real-time high-resolution air quality data at the local level.Luftdaten.infowas established by the Open Knowledge Lab (OK Lab) in Stuttgart in 2015 (re-branded as Sensor.Community in 2019) as a German air quality project, and quickly grew into a global citizen science community (although currently most sensors are concentrated in Europe).The project is supported by volunteers and voluntary donations.
Sensor.Community's goal is to raise awareness about air pollution and its potential health and environmental impacts, enable citizens to actively participate in monitoring and improving air quality in their communities, and to create a comprehensive dataset that can be used for research, advocacy, and policymaking related to air quality improvement.Some example applications of Sensor.Community data include a Samen voor Zuivere Lucht 16 platform that combines Sensor.Community data with official data sources, Samen Meten 17 portal that harvests Dutch data from Sensor.Community database, and HackAir 18 platform that uses Sensor.Community data to generate information on air quality, thermal comfort, and the probability of forest fires in Europe.
The sensor kits can be assembled to measure environmental factors (temperature, pressure, humidity), particulate matter pollutants (PM10 and PM2.5), and noise, and once configured, can be registered with the platform.The aggregated results are displayed on a live map from nearly 13,000 active sensors in 78 countries with over 23 billion data points.
Historic data from 2015 onwards can be downloaded from the Sensor.Community Archive 19 automatically by writing custom scripts.Aggregated daily readings for each sensor are served as CSV files with file names indicating the date, type of sensor, and sensor ID.Sensor kits can contain multiple sensors (environmental, pollutants, and/or noise), each of these sensors will generate a separate CSV file in the historic database.Location information (latitude and longitude) can be used to identify sensors that belong to the same sensor kit.No standardised metadata is currently supplied to describe the sensor readings in the database.
Sensor.Community data is currently free and open, with clearly-documented licence conditions.It fulfils a number of the FAIR principles, and has the potential to complement official environmental data sources at a local scale in the GDDS context of climate change and pollution.To accomplish successful inclusion within the GDDS, Sensor.Community data (like most sources of air quality information) could be semantically enriched using controlled vocabularies such as DEFRA Air Pollution Glossary, Eionet Data Dictionary, or other to ensure seamless integration with other sources.Additional APIs or service layers could facilitate data search (by sensor ID, date/time location, etc.) and aggregation of measurements from the same sensor kits (e.g. for calibration or data quality estimation).

Discussion: FAIRness in citizen science data
As discussed earlier in this section, there are two main models of governance of participation -topdown and bottom-up approaches -that define how data FAIRness can be achieved in citizen science.GBIF follows a top-down approach by specifying rigorous standards for data contribution (DwC-A and EML).It observes FAIR by setting metadata and data requirements and assigning DOIs (F), offering an API and machine-readable interface (REST + JSON) (A), using Ecological Metadata Language (EML) and DwC-A (I), requiring creative common data licences, and recording data provenance (R).
OSM and Sensor.Community are examples of bottom-up approaches where data structure and documentation emerged from the community contributions.OSM free-text tagging has evolved into a database of community-accepted, commonly used tags.The use of persistent identifiers facilitates the recording of the full history of changes to the nodes (F), and various applications provide data search and download capabilities (A), consistent data formats support interoperability (I), DbCL v1.0 licence ensures the traceability of data (re-)use (R).The structure of Sensor.Community data is defined by the specific sensors used to collect data, but as new sensor kits become available, new data fields will emerge.Interoperability and Reuse are facilitated using a simple data format (CSV) and by offering data under DbCL v1.0 licence.Further alignment with FAIR can be achieved by tagging with semantic resources (F, I) and developing a search and download API (A).

Tools that support citizen science
To achieve data FAIRness, projects must follow good practice from the project planning stage and produce (or adopt) a suitable Data Management Plan.However, many citizen science projects may struggle with finding and selecting a compatible set of tools, standards, and protocols to support them with all stages of the project lifecycle.Adding to this challenge, free and open-source tools typically carry several, but not all, functions to deliver a project end-to-end.For instance, the primary role of Zooniverse (discussed in Section 5.2) is project hosting, with additional facilities for basic project search and data storage (for active projects), but no support for data publishing.In this section, we discuss in detail some of the more prominent tools, resources and standards identified in the review to explore the functions such tools can offer; a full summary is in Appendix 1 and 2.

Planning and conception of data governance
The first step in achieving data FAIRness is a strong Data Management Plan that (among other things) considers participation consent, (meta)data formats, (meta)data standards and vocabularies, data structuring, data documentation, data licensing, data hosting, and data sharing.Most citizen science platforms, network websites, and online tools provide free supporting materials, guides, and/or training courses for citizen science project managers, educators, researchers, citizens, and other stakeholders.
EU-Citizen.Science, for instance, offers training courses including 'FAIR Data in Citizen Science Projects' and 'Doing Citizen Science as Open Science' (other resources summarised in Appendix 1).
Advanced search and filtering of such resources is not yet supported, so citizen science stakeholders either need to know what they are looking for or to manually inspect the resources that appear relevant.

Project hosting
Citizen science projects can either be hosted on a dedicated third-party platform or can develop their own infrastructure for participation and data collection.The latter can be resource-intensive, depending on the project ambition and the complexity of the platform required.Our review identified 12 project hosting platforms which are summarised in Appendix 2. Here, we discuss Anecdata, CitSci and Zooniverse, since at present these platforms support the largest number of citizen science projects.
Anecdata 20 is a free community science platform founded in 2014 by the Community Lab at the MDI Biological Laboratory in Bar Harbor, Maine.It is well suited for more complex biodiversity protocols such as recording absence data, water quality monitoring, litter recording and clean up, and collection of non-biodiversity image observations.Anecdata allows project owners to create projects, define data sheets with multi-dimensional data, select participation mode, share data publicly or keep it private to the project.The platform offers a free mobile app (iOS and Android) to collect observations from the field with support for geoprivacy.Either Creative Commons Attribution 4.0 International License or Open Data Commons Attribution License (ODC-By) v1.0 can be selected for the data collected via Anecdata platform.The platform contains over 300 active projects, 15,500 users, 111,000 observations, and 74,000 photos and images.
Anecdata facilitates access to public observations in a tabular format or displayed on a map.Observation data can be filtered by project name, date range, user who submitted the observations, and location.There is no option to filter observations by tags, which limits the ability to obtain all observations for the required domain or topic.
CitSci 21 offers free tools for the entire citizen science project process, from project creation, management of participants, building custom data sheets to collecting data, analysing data, sharing data, and gathering community feedback.Observations can be added via a web form or CitSci mobile app (Android and iOS).Only project members can contribute data; memberships can be open (any registered user can join without owner approval) or closed (requests to join require owner approval).Project owners can permit project members or the public to view data in tabular format or on a map and download data in Excel or CSV format.Most projects choose to restrict data downloads to members only, or entirely disallow downloads.The platform hosts 1,133 projects and has 147,504 observations; most projects are located in the US.While projects created and hosted on CitSci can be automatically published to SciStarter for discovery, it is not possible to search or access data in a straightforward manner.
Zooniverse 22 , former Galaxy Zoo, is a free platform designed for projects that need volunteer support in classifying or annotating images, transcribing historical documents, identifying patterns in data, and other classification tasks.Projects can create tutorials, define workflows (sequences of tasks), set questions and drawing tasks, and more.Zooniverse lists 97 active, 243 paused, and 110 finished projects, but the project search is limited to the domain and project name.Completed projects can publish aggregated results and reports; however, data downloads are not supported.
Anecdata and CitSci may appear as potential citizen science data sources for the GDDS; however, in practice, their main function is project hosting with limited data discovery for reuse.

Data collection
The requirements for data collection will vary based on the nature of the project.Observation tasks will typically require tools that support custom datasheets, multimedia upload, mobile apps or mobile-friendly web interfaces, and secure data transactions.For sensor data like air or water quality measurements, the management and retrieval of observations and metadata and the implementation of secure Internet of Things (IoT) protocols become essential.Enriching the data will involve annotation, classification, or workflows.
Our review identified 23 data collection platforms which are summarised in Appendix 2. Some of these platforms, e.g.iNaturalist, eBird, GLOBE Observer, cannot be customised and can only be used to contribute data to their corresponding initiatives.While Natusfera 23 presents an example of interface customisation of iNaturalist, the data is contributed to iNaturalist platform.Here, we discuss ODK and ArcGIS tools as examples of customisable free and commercial data collection platforms widely applied in citizen science.
Open Data Kit (ODK) 24 is designed for building custom data collection forms on mobile devices to support efficient and reliable data collection, especially in offline or low-connectivity environments.ODK is widely used in public health, humanitarian aid, environmental monitoring, and social research (Hartung et al. 2010;Tom-Aba et al. 2015;Campus et al. 2020).The three key components of the ODK platform are ODK Collect (an Android application for building custom data collection forms and capturing data), ODK Build (no-code web-based survey designer tool for customised forms), and ODK Central (the ODK server that acts as a central repository).
ArcGIS 25 is a commercial cloud-based software toolkit for capturing, managing, analysing, and displaying geospatial data which is used by a variety of citizen science projects (e.g.Hawthorne et al. 2015;Spear, Pauly, and Kaiser 2017;Chmielewski et al. 2018).ArcGIS Survey123 26 offers a fully customisable survey product for data collection via a web browser or mobile application.Data collection forms can include lines and polygons, images and audio files, high-accuracy data capture.ArcGIS QuickCapture 27 survey product for field observations allows capture of images and sensor information from devices on moving vehicles.ArcGIS Community Science Solution 28 is specifically designed for collecting location-enabled plant and animal observations from citizen scientists and is primarily used by conservation organisations, natural resource departments, and other government agencies.
There are important tradeoffs to consider between cost and technical capacity: open-source tools such as ODK offer free data collection capabilities, but require technical competency and a private or cloud-based server to run the software code and store data.Commercial ArcGIS solutions provide flexible off-the-shelf data collection capabilities; however, these can be costly for small-scale citizen science projects.

Semantic resources
Semantic resources are essential for data FAIRness: (meta)data standards, controlled vocabularies or other structured data descriptions (e.g.data tagging) facilitate data discovery, interoperability, (re)use, and integration (especially across domains).Such resources could be integrated within citizen science project hosting platforms to offer pre-populated lists of terms for creating datasheets (with an option for customisation) rather than every project defining its own vocabularies.For instance, CitSci does not endorse any semantic resources (datasheet templates are under development), which results in an unpredictable data structure, ultimately impacting interoperability.
Our review identified 8 semantic resources relevant to the biodiversity, environment, bioinformatics, oceanography, and agriculture domains (summarised in Appendix 2).Here, we discuss Darwin Core, EnvO, and NERC Vocabulary Server to exemplify functions that controlled semantic resources can offer to users.
Darwin Core 29 encompasses two functions: an evolving semantic resource, and a structural data standard for publishing, integrating, and sharing biodiversity information.Darwin Core contains a glossary of terms and 'is primarily based on taxa, their occurrence in nature as documented by observations, specimens, samples, and related information'.Since Darwin Core carries two functions, we will continue the discussion in Section 5.7. 30is a collection of standardised and hierarchically structured controlled vocabularies, primarily covering oceanographic and related domains with example applications in citizen science (Busch et al. 2016).The platform comprises vocabularies and thesauri stored as Linked Data in human -and machine-readable formats.NVS supports basic searches based on simple text matching, advanced searches for terms in specified vocabularies or across vocabulary collections, and interrogation of mappings between different vocabularies.An Interactive Query UI 31 provides a simple interface to query NVS triplestore (the RDF database of all NVS vocabularies) using SPARQL queries.It also allows automatic encoding of SPARQL queries into a single line string and decoding back into SPARQL queries format.

The Natural Environment Research Council (NERC) Vocabulary Server (NVS)
The Environment Ontology (EnvO) 32 is a FAIR-compliant community ontology that offers concise, controlled description of environments from microscopic to intergalactic scales.EnvO was established in 2013 as a simple ontology and grew with the support of the ESIP Federation, UN Environment, IOC-UNESCO, and individual contributions.It contains over 7,000 classes and allows requests for new terms and synonyms, enhancements, or reporting defects via GitHub issue tracker. 33Subsets of terms linked to the EnvO Internationalised Resource Identifiers (IRIs) for traceability can be generated to tailor particular needs.Subsets can be hosted for projects or communities on EnvO GitHub on request.The ontology can be downloaded from OntoBee web server 34 , EBI Ontology Lookup Service repository 35 , or EnvO GitHub repository. 36 Semantic resources are increasingly used by the scientific community (Leadbetter 2015;Magagna et al. 2021) but as yet are rarely considered by citizen science initiatives.One factor contributing to this is the limited awareness of available resources and their role in data interoperability (and the importance of interoperability itself).Certain semantic resources might be overly complex for citizen science initiatives, but relevant terms can be extracted into custom vocabularies or ontologies and referenced back to the original sources, e.g. using Semantic Treehouse vocabulary hub (Van den Berg 2023).Tools like OntoPortal 37 can be used to support citizen science communities in building ontology repositories, annotating free text with the vocabulary terms, identifying the associations between terms, and offering recommendations on semantic resources.

Data publishing and preservation
To ensure long-term value outside the project that collected data (and to ensure FAIRness), data needs to be hosted in an accessible manner.Those projects which do publish their data may use their own infrastructure, which makes data difficult to discover.Schade and Tsinaraki (2016) revealed that the majority of surveyed projects host their data on a remote server (38%) or a local machine (16%) managed by a project member.However, it remains unclear whether this data is catalogued and is discoverable elsewhere.Other projects collect data suited for contribution to larger initiatives that already provide open data capabilities, e.g.iNaturalist, eBird, GLOBE, and Sensor.Community.Ideally, data (or reference to data) that does not fit domain-specific platforms should be published on a suitable platform so that it can be easily discovered, acquired, and (re)used.Our review identified two data repositories used by citizen science projects: Zenodo and Mendeley Data.
Zenodo 38 is a multidisciplinary open repository designed for research communities to deposit research datasets, software, reports, papers, and other digital research artifacts.The platform was launched in 2013 and is owned by the European Organization for Nuclear Research (CERN).Registered users can deposit research artifacts under closed, open, or embargoed access and at any stage of the research lifecycle, provided that they hold appropriate rights for the materials.Zenodo offers a RESTful API to support deposit of research outputs, records search, and files upload and download.All uploads are assigned a DOI for traceability.
Zenodo integrates with other research platforms and services, including GitHub for automatic synchronisation between code repositories and associated research outputs; ORCID (Open Researcher and Contributor ID) to connect researchers' ORCID profiles to Zenodo, ensuring attribution and recognition for their deposited research outputs; DataCite to provide persistent identifiers; OpenAIRE (Open Access Infrastructure for Research in Europe) to index Zenodo content in the OpenAIRE 39 database, enhancing discoverability and accessibility within the open science community; CERN Analysis Preservation (CAP) 40 infrastructure enabling researchers to preserve and share their analysis workflows, code, and associated data in a FAIR manner.
Mendeley Data 41 is a free multidisciplinary open repository designed for long-term data storage.The platform is a product of Elsevier and was launched in 2016.Mendeley Data fully supports FAIR principles (Elsevier 2020) however it is an institutional data repository and is only available to registered research institutions.All datasets (including the underlying assets and versions) include deep-indexing of both metadata and files; metadata is indexed in common search indexes, such as Google Dataset Search, DataCite Search, OpenAIRE with OAI-PMH, and Share from Open Science Framework.Artifacts can be deposited under closed, open, or embargoed access.Mendeley Data offers a Digital Commons Data API for managing and searching of research artefacts.The platform supports standard metadata schema such as Dublin Core and schema.org,controlled vocabularies for standard fields and custom metadata fields which can be configured to use values from existing taxonomies for interoperability, discoverability and reuse.
Zenodo and Mendeley Data support advanced search by constructing complex text-based queries, though discovering new relevant geospatially tagged resources and datasets can be extremely challenging.Both platforms support dataset updates and DOI versioning, but it is impractical to generate an excessive number of versions.This is a potential limitation for hosting data from long-term or ongoing citizen science projects that generate continuously evolving datasets, rather than static data snapshots or regular 'releases'.
The O&M data model is fundamental as the core of OGC Sensor Web Enablement (SWE) standards such as SensorThings API, WaterML 2.0, and Sensor Observation Service (SOS).It defines a core set of properties for observing a phenomenon (Figure 2): Feature (an abstraction of a realworld phenomenon), Observation (the act of measuring or obtaining information about a phenomenon), Feature of Interest (the entity for which the observation is being made), Observed Property (a characteristic, attribute, or property of the phenomenon being observed, e.g.particulate matter in measuring air quality), Procedure (the method or process used to make an observation, e.g.instruments, sensors, human observers), and Result (the data obtained from an observation, e.g. a single value, a time series, an image).
The Environmental Monitoring Facilities (EMF) data model 43 is an example application of the O&M standard.EMF describes each facility as a spatial object in the context of INSPIRE 44 and links observations and measurements of environmental parameters to the facility, where citizen science is included as one of the stakeholder initiatives for sharing public data.
OGC SensorThings API 1.1 (STA) 45 provides an open and unified way to interconnect heterogeneous Internet of Things (IoT) devices, data, and applications over the Web.The first version 1.0 was published in 2016 (latest version 1.1 in 2021) and developed by the OGC Sensor Web for IoT Standards Working Group (SW-IoT SWG).The standard is designed for organisations that need web-based platforms to manage, store, share, and analyse IoT-based sensor observation data across domains.
The key entities specific to STA are (Internet of) Thing, defined as 'an object of the physical world (e.g.device) or the information world (e.g.system) that is capable of being identified and integrated into communication networks' (ITU 2012), as well as associated Location and Datastream (a collection of observations from a single sensor) (Figure 3).Entities, such as FeatureOfInterest, Observed-Property, and Observation are based on the OGC O&M model.
STA is relevant for the GDDS particularly because of its increasing use in IoT platforms for environmental monitoring and smart cities, including the FROST Server open source implementation of STA 46 , an STA-based INSPIRE download service 47 , and the adoption of STA by the French Geological Survey 48 for the national groundwater monitoring system and water quality database.
OGC SensorThings API Extension: STAplus 1.0 49 is an approved international standard and an extension of the STA data model based on requirements from the citizen science community.FAIR (in particular, Interoperable and Reusable) principles are reinforced by adding entities of ownership, licence, and project information for sharing observations.The extension also enables users to express explicit relations between observations and to create group(s) of observations that belong together.
The STAplus data model describes five entities in addition to the STA (Figure 4): Party (links a user to a Datastream or Group), License (specifies reuse conditions), Project (allows for organising a    PPSR Core 51 is an open data and metadata standard that defines a common framework for describing citizen science projects.The PPSR Core initiative started in 2013, supported by the DataONE PPSR Working Group and SciStarter.It is now maintained by the Citizen Science Association's Data & Metadata Working Group with support from volunteers.The standard is still under development but is designed to enable the sharing of basic common information across databases that catalogue citizen science projects.It facilitates consistent project discovery between all major project discovery platforms including SciStarter, CitSci, Atlas of Living Australia BioCollect, and CitizenScience.gov. The PPSR Core standard comprises four models: Common Data Model (CDM) for aggregating citizen science projects into programs or campaigns within a common organising framework, Project Metadata Model (PMM) for describing the purpose, responsible parties, participation and engagement, and other contextual information for citizen science projects, Dataset Metadata Model (DMM) for describing collections of observations (e.g.protocols, temporal range, licence), Observation Data Model (ODM) for defining domain 'profiles', i.e. core sets of features that should be collected for a given study.PMM includes some controlled vocabularies, but projects are welcome to adopt other semantic resources, provided that they are clearly referenced.
Foundational data standards, such as O&M and STA, can serve as the basis for tailored extensions to meet the needs of citizen science initiatives.Clear supporting documentation of best practice (e.g.OGC 2022) plays an important role in providing use cases and improving understanding of how standards can be applied in practice.Additionally, for services based on APIs (e.g.STA FROST Server), tools similar to the NVS Interactive Query UI could be developed, to offer a user-friendly interface for constructing complex API queries and encoding these as URLs.

Discussion and conclusions
The importance of FAIR is increasingly being acknowledged within the field of citizen science, as demonstrated by the major citizen science initiatives promoting FAIR (EU-Citizen.Science 2021; ECSA 2023), and recent research into citizen science data FAIRification (Coché et al. 2021;Ramírez-Andreotta et al. 2021;Turicchia et al. 2021b;Alvarez et al. 2022).However, citizen science projects that operate independently from larger initiatives may still lack awareness of FAIR principles, struggle to select suitable standards and tools, or fail to recognise the value of sharing their data outside of the project.
Our review identified a number of tools that can facilitate different stages of the citizen science project lifecycle.Commercial solutions like ArcGIS and SPOTTERON offer a full suite of tools and applications to support an end-to-end data lifecycle; however, these may be costly for smaller-scale projects.Free and open-source platforms and tools are generally limited in functionality required for end-to-end data lifecycle management in a FAIR way.Therefore, projects might need to select and combine different tools by purpose from different providers, resulting in more challenges to achieve a seamless flow of FAIR data.
On the face of it, it may seem paradoxical that commercially-licensed software and platforms are discussed in the context of FAIR data.While 'FAIR' does not necessarily equate to 'open' (Jeffery 2021) FAIR data are required to have clear licence information, ideally in machine-readable form, and citizen science data governance involves obtaining and documenting the consent of contributors for their data to be used and re-used in specific contexts.Any technical tools which assist in this governance might be considered as assisting on the path towards FAIR data.This is also an important consideration in the EU GDDS, which of necessity will bring together commercial and private stakeholders with public sector players, requiring 'transparent but controlled accessibility of data and services' (Mons et al. 2017).
While a vast number of domain-specific controlled vocabularies and other semantic resources exist, our review identified only 8 semantic resources used by the citizen science projects.This may indicate that independent projects rarely apply standardised semantic tagging because they are either unaware of its importance or unsure which resources to choose from a confusing range.This presents a major gap in data discovery and interoperability, especially cross-domain.In addition, as demonstrated by Ramírez-Andreotta et al. (2021), projects may have to create custom semantic resources by combining subsets of controlled vocabularies and introducing new custom terms to fulfil their needs.Tools for selecting and extending semantic resources, similar to the Eco-Portal 52 tools which practically implement an OntoPortal for the ecological domain, need to be developed for a wide range of domains to support the citizen science community (de Sherbinin et al. 2021).
The availability of centralised data repositories for citizen science data presents another major challenge.Platforms created during time-limited research projects may not be accessible after project funding terminates.Open repositories such as Zenodo and Mendeley can be used to publish and share citizen science datasets, but search capabilities are limited (e.g. it is not straightforward to filter citizen science data).Another limitation is that such platforms only facilitate publishing of static datasets which might be suited for completed projects; dynamic projects will need to publish periodic 'snapshots' of their data.The quality of data collected by citizen science projects may be in question when the methodology is not transparently documented or robust.If a data repository platform is tailored for citizen science, citizen science data can be improved by AI technologies and validated by expert knowledge, as practised in the iNaturalist and eBird platforms.
The successful development of the EU GDDS will depend on the availability of FAIR data sources, including FAIR citizen science data.Large longstanding initiatives such as GBIF already offer FAIR data that can be easily integrated within the GDDS.Other large community platforms such as OpenStreetMap and Sensor.Community will require additional layers of tools and semantic resources to enable integration.Smaller-scale projects with limited resources may miss opportunities to offer their data for re-use in the absence of free end-to-end solutions to support production and sharing of FAIR data.This presents a major challenge for the GDDS to deliver the ambition of establishing a Single EU Market for data and integrating citizen contributions as defined by the Open Science Policy of the EC.
FAIRification of citizen science data has significant importance beyond policy making and decision support.Adherence to FAIR principles can improve knowledge mobilisation, strengthening capacity to conduct research using citizen science data.Production of FAIR data can also help empower communities by making their data more visible to and accessible by authorities.It can additionally increase community engagement, e.g. a case study on flood monitoring (Wolff 2021) showed that community members highly value ability to access and share their data.
There is the potential for citizen science data to be integrated in environmental Research Infrastructures 53 or e-infrastructures that can serve as intermediaries connecting to the GDDS and supporting data sharing.In exchange for citizen science data, such infrastructures should increase technical and semantic services to facilitate citizen science projects in meeting high Technological Readiness Levels (Mankins 1995) in operational environments.Future citizen science project calls should include a strategic plan on how services developed during the project period will be sustained by connecting them with specific environmental Research Infrastructures such as LifeWatch ERIC, eLTER, or others from the environmental cluster of RIs.OSMTracker allows the creation of a gpx trace of a journey, with the collection of 'waypoints' along the route.Voice recording, photographs and other notes may also be recorded, and all will be geolocated.

Figure 1 .
Figure 1.Tools, platforms and standards identified in the review.

Figure 2 .
Figure 2. Basic structure of the OGC Observations and Measurement Model (adopted from Usländer, Coene, and Marchetti 2012).
campaign or project), Group (allows to package individual Observations as a bag or set), and Relation (supports relationships between Observations).The 'OGC Best Practice for using SensorThings API with Citizen Science' document 50 offers practical examples of applying the STAplus extension in the citizen science domain.

Table 1 .
Information resources related to FAIR and open data identified in the review.
Resource Source FAIR Data in Citizen Science Projects EU-Citizen.Science Doing Citizen Science as Open Science EU-Citizen.Science Basic Regulations and Ethics for Citizen Science EU-Citizen.Science UK Environmental Observation Network (UKEOF) Resources EU-Citizen.Science / UKEOF Data Ethics for Practitioners SciStarter

Table 2 .
Semantic resources identified in the review.

Table A1 .
Citizen science information resources and training materials.Note: topics listed here are extracted from the descriptions of the resources.

Tools that can support citizen science project lifecycle.
Data storageGoogle Forms is a cloud-based survey software included as part of the free, web-based Google Docs Editors suite offered by Google.