Data trustworthiness and user reputation as indicators of VGI quality

ABSTRACT Volunteered geographic information (VGI) has entered a phase where there are both a substantial amount of crowdsourced information available and a big interest in using it by organizations. But the issue of deciding the quality of VGI without resorting to a comparison with authoritative data remains an open challenge. This article first formulates the problem of quality assessment of VGI data. Then presents a model to measure trustworthiness of information and reputation of contributors by analyzing geometric, qualitative, and semantic aspects of edits over time. An implementation of the model is running on a small data-set for a preliminary empirical validation. The results indicate that the computed trustworthiness provides a valid approximation of VGI quality.


Introduction
Volunteered geographic information (VGI) (Goodchild 2007) is a form of user-generated content (UGC) that is primarily concerned with the survey, collection, and dissemination of spatial data. Recently, VGI projects gained increasing attention and, as a consequence, their user basin increased notably. Several studies showed that while VGI coverage and quality is approaching that of authoritative data-sets (in areas with large number of volunteers), VGI quality is not homogeneously distributed in space (Haklay 2010;Mooney, Corcoran, and Winstanley 2010). Accordingly, the conception and implementation of new methods to assess VGI quality is highly needed (Elwood, Goodchild, and Sui 2013).
In general, data quality can be regarded as the level of fitness between single pieces of data and the parts of reality that these representi.e. the more precisely data corresponds to reality, the higher its quality.
Thus, the only way to obtain the "true" quality value of data would be to compare it to reality, but this is impracticable because data and reality belong to two different spaces (conceptual and physical, respectively). In practice, data quality is typically evaluated with respect to four main dimensions: accuracy, completeness, consistency, and timeliness (Wand and Wang 1996).
There are two main approaches to assess VGI data quality. The first one assumes that authoritative data are quality data. Thus, it consists in comparing volunteered against authoritative data-sets and addresses three of the four dimensions mentioned above: accuracy, completeness, and consistency. One main drawback of this approach is that it requires the access to authoritative data, which might not be possible because of limited data availability, licensing restrictions, or high procurement costs (Antoniou and Skopeliti 2015). Moreover, while authoritative data are regarded as quality data, in fact there is no absolute guarantee about its correctness and consistency, especially considering the discrepancies between data and reality that arise over time because of the slow update rate of authoritative data-sets. The second approach aims at assessing the quality of VGI by analyzing the evolution of the data itself, i.e. its history or provenance. This approach overcomes the limitations of the previous one as it does not require external data sources and can also take into account the timeliness dimension. However, this approach provides an approximation of the quality, rather than an accurate measurement. In other words, it provides a proxy measure for data quality that is referred to as trustworthiness (Dai et al. 2008).
This article presents a system to compute a trustworthiness score for each version of a geographic feature by analyzing its provenance and evolution. Typically, in the literature, provenance is defined as the historical evolution of data over time, but in the scope of this work we distinguish between data provenance and data evolution. Fixed a point in time, by provenance we intend the sequence of edits that a feature undergoes over time until the given time instant; by evolution we intend the edits that the feature will undergo in the future. The edit sequence of a feature over time is assumed to be implemented through versioning. Both provenance and evolution include authoring information. Accordingly, our system computes two scores: (1) reputation is a score that is associated to a volunteer and that denotes her reliability; (2) trustworthiness is a score that is associated to each version of a geographic feature and that depends on both the sequence of edits that the feature undergoes over time and the reputation of its author.
The main underlying idea is that, according to the many eyes principle (Raymond 1999), the quality of a feature will improve over time and editing: the more volunteers contribute information about a feature, the higher the probability that errors and inconsistencies are spotted and corrected, and, thus, the higher the probability that the feature is correctly mapped in the VGI system. Such a concept is captured by the notion of confirmation. Approaches to encourage crowdsourcing take into account people's intrinsic motivation, such as in Martella, Kray, and Clementini (2015).
We distinguish three main effects making up a confirmation: direct, indirect, and temporal. The direct effect corresponds to the case where a new version of a feature is contributed that leaves untouched a subset of the attributes of the current feature version. We assume that this is an indication that the attribute values were already correct, thus the overall trustworthiness of the feature must increase. The indirect effect covers the cases where a feature is edited that is close enough to the feature being confirmed. Since the volunteer only modified the nearby feature, this can be considered as an indication of the quality of the feature being confirmed. Finally, the temporal effect addresses the persistency of a feature version over time, i.e. the longer a version stays untouched, the higher the probability that it is correctly mapped.
Direct and indirect effects are finer divided into three components, to more precisely account for the different aspects characterizing the editing preferences of contributors: thematic, geometric, and qualitative. The first two components have been included in order to model more precisely the skills and the preferences of the volunteers. For example, a volunteer with little or no surveying skills may be more inclined to contribute information only about the thematic component of a geographic feature and leave untouched the geometric part. This means that confirmations generated by these kinds of people must be weighted differently for the geometric and the thematic aspects.
The qualitative aspect has been introduced to address properties of human spatial cognition. Findings in cognitive science (Mark 1993) have shown that humans conceptualize space in a relative and qualitative manner, i.e. mental representations of space are typically geometrically distorted, but some qualitative relations among objects (e.g. order, direction, and topology) are interiorized correctly. This means that while a volunteer may be unable to notice a small geometric imprecision in a piece of data, she/ he may notice much more easily a wrong qualitative spatial relation (e.g. two disconnected buildings that have been mapped as touching each other).
The rest of the article is structured as follows. In Section 2, we discuss more in detail background work on the subject. Section 3 is the core of our model: we discuss the types of changes that affect VGI data, we define trustworthiness and reputation as the main aspects that influence VGI data quality, and we distinguish among direct, indirect, and temporal effects. In Section 4, we describe a system architecture that implements our model: the resulting framework, called TandR (for Trustworthiness and Reputation), makes use of domain ontologies for data representation. In Section 5, we carried out an experiment with a selected portion of OpenStreetMap (OSM) data (http://www.openstreetmap.org), whose various feature versions were compared to ground truth data to explore how trustworthiness and reputation indexes vary over time following data editing. Section 6 draws short conclusions and highlights future improvements and extensions.

State of the art
While VGI is a very successful means to collect geographic information at low cost (Goodchild 2007), it suffers a major drawback: VGI comes with no assurance of quality (Goodchild and Li 2012). The issue is inherently related to the very nature of VGI, that is provided by volunteers in a relatively unconstrained manner. Indeed, most of the VGI systems currently implemented do not impose any enforcement on their contributors, except for some guidelines to streamline the input data. So, according to the mainstream philosophy of VGI, there is no control in place that guarantees the goodness of the contributed information.
Recent surveys (Antoniou and Skopeliti 2015;Senaratne et al. 2017;Degrossi et al. 2018;Fonte et al. 2017) perform a review of quality measures and indicators for VGI. Specifically, Senaratne et al. (2017) discriminate among three different forms of VGI systems: map-based (e.g. OSM), image-based (e.g. Flickr), and text-based (e.g. geo-blogs). They conclude that the first form is by far the most widespread. As quality measures for map-based VGI, they identify completeness, consistency, positional accuracy, temporal accuracy, and thematic accuracy. These quality measures are defined in ISO 19157. As for quality indicators, they find trustworthiness, credibility, and reputation, among others. Accordingly, there exists two main methods to assess VGI quality. The first compares VGI data-sets against professionally surveyed ground truth datasets. In this case, it is possible to assess the aforementioned quality measures. The second aims at deriving some quality indicators only by analyzing the VGI data-sets themselves, without a comparison with external sources.
Several studies have carried out in the last years to assess quality of map-based VGI (mainly OSM) by comparison against professionally surveyed data-sets. Most studies focused on quality assessment of road networks in England (Haklay 2010), Germany (Zielstra and Zipf 2010;Neis, Zielstra, and Zipf 2012), and France (Girres and Touya 2010), among other countries. Other work focused on general features rather than street networks only (Helbich et al. 2012;Mooney, Corcoran, and Winstanley 2010;Fan et al. 2014). All such contributions came to similar conclusions: (1) coverage and accuracy of map-based VGI data-sets are approaching those of professionally surveyed data-sets and (2) VGI quality is not homogeneously distributed, with peaks in the most populated areas. Goodchild and Li (2012) suggest three approaches to assure VGI quality without resorting to external data sources. The first is termed "crowdsourcing" and assumes that quality increases with the number of contributors. Basically, this is an adaptation of the so-called Linus's Law in honor of Linus Torwalds. Speaking about open-source software, Torwalds stated that "given enough eyeballs, all bugs are shallow" (Raymond 1999). When adapted to crowdsourcing, Linus's Law is sometimes referred to as the many eyes principle: "If something is visible to many people then, collectively, they are more likely to find errors in it. Publishing open data can therefore be a way to improve its accuracy and data quality, especially where a good interface for reporting errors is provided" (http://opendatahandbook.org/glossary/en/terms/ many-eyes-principle/). Therefore, Goodchild and Li (2012) conclude that the crowdsourcing approach does not properly apply to less known facts, such as geographic features located in a sparsely populated area. Indeed, in this case, the many eyes needed to assure the quality would be missing. The second approach mentioned by Goodchild and Li (2012) was named "social". It relies on the construction of a hierarchy of moderators, i.e. individuals that are reputed trustworthy in the community because of the quality of their contributions. The third approach is termed "geographic" and is the one that is best suited for full or semiautomatization: it decides on the quality of a feature by comparing it against geographical lawse.g. a shoreline should have a fractal shape.
The many eyes principle comes in support of collective intelligence. According to this theory, a group of individuals performs better than its best members. Spielman (2014) analyzes the issue of VGI quality from this perspective. He argues that there are two approaches to quality assessment: validation-by-accuracy and validation-by-credibility. The former corresponds to measuring VGI quality by comparison against ground truth datasets. The latter aims at deriving VGI quality indicators based on the credibility of the volunteer contributing a feature that, in turn, depends on reputation, trustworthiness, and motivation of the volunteer (Bishr and Mantelas 2008). Spielman (2014) concludes that VGI systems should be designed to foster collective intelligence.

VGI quality indicators: trustworthiness and reputation
Bishr and Janowicz (2010), among others, suggest to use data trustworthiness (aka reliability) as a proxy measure for VGI quality. Arguably, the reliability of an object, a person, or an action corresponds to its grade of predictability. Reliability reflects this idea and corresponds to "a bet about the future contingent actions of others" (Sztompka 1999). Mezzetti (2004) provides a compatible definition of trustworthiness by asserting that an entity is trustworthy, within a given context, if it actually justifies reliance on the dependability of its behavior within that context. Additionally, Bishr and Janowicz (2010) stress the notion of people-object transitivity: the degree of reliability associated to a person propagates to the entities that are somehow connected to her/him. Moreover, they argue that the reputation of a person indicates how much this person is considered reliable within a community.
Reliability indicators relating to spatial features contributed in a VGI system (trustworthiness) and to their contributors (reputation) seem to be valid approximations of VGI data quality. Lodigiani and Melchiori (2016) propose an adaptation of the PageRank algorithm for web pages to derive contributors reputation. Barron, Neis, and Zipf (2014) introduce a framework that incorporates 25 quality indicators and corresponding methods for OSM data quality. Finally, Keßler, Trame, and Kauppinen (2011) propose to derive reliability scores from historical information of VGI items. They argue that, from the edit history of features, one can obtain all necessary information to define trustworthiness scores for the spatial features in a VGI system and reputation scores for their contributors.

Data provenance
In order to use historical information to derive quality indicators, it is convenient to use an appropriate model. Keßler, Trame, and Kauppinen (2011) tackle the challenge by adapting the concept of data provenance (Hartig 2009) to VGI. More specifically, Keßler, Trame, and Kauppinen (2011) introduce the ontology in Figure 1 to represent OSM data provenance. They introduce the concept of "Editing Pattern": a sequence of editing actions from which it is possible to deduce useful information to derive a score for user reputation and data trustworthiness. The identified patterns are "Confirmations", "Corrections", and "Rollbacks", including the special cases of "SelfCorrections" and "SelfRollbacks". Each editing pattern has a different effect on reputation and trustworthiness. Keßler and Groot (2013) suggest that the trustworthiness of a feature should grow proportionally with the number of versions, contributors, and confirmations, and inversely with the number of corrections and rollbacks.

Modeling trustworthiness and reputation
This section details a novel model to derive data trustworthiness and user reputation from feature edit sequences. A first version of the model appeared in D'Antonio, Fogliaroni, and Kauppinen (2014). We drew inspiration from the work of Keßler, Trame, and Kauppinen (2011) and Keßler and Groot (2013) see Section 2.2but our model differs from other approaches with regard to the following aspects. Our model: (1) can fit any map-based VGI system (rather than OSM only) provided that the system implements feature versioning and that differences between versions are ascribable to series of atomic operations; namely, creation, modification, and deletion; (2) associates a reputation score to each VGI author; (3) derives a trustworthiness score for each feature version (rather than for each feature); (4) accounts for the impact of changes between two versions according to their relevance; (5) derives trustworthiness and reputation scores also considering evolution information (i.e. future edits, rather than only provenance information).

Model overview
Our first goal was to design a model to derive trustworthiness and reputation scores from both data provenance and evolution. That is, rather than evaluating a feature contributed at a specific point in time only by analyzing current and historical information in the system, we also exploit information that has to come. For this to be possible, two conditions must be satisfied: (1) the VGI system at hand implements versioning; (2) every time an event alters the state of the information system, the trustworthiness score of features affected directly or indirectly (see Section 3.5) by the change are also updated.
Versioning is a method that prevents data from being lost and defines a temporal order among feature states (as shown in Figure 2). The state of a feature f is called feature version and is, simply, a set of attributes describing the feature at a given point in time. We denote by f i the i À th version of a feature f . A new version is established every time a user of the VGI system creates or deletes a feature, or modifies a nonempty subset of the attributes of an existing feature. This user is the author of the version and to each version is associated exactly one author. The time interval between a feature version and the next is called the version lifetime. The creation of a feature starts a feature version lineage. Each new version extends the lineage, which terminates with the deletion of the feature.

Model dynamics
Every time a new feature version f i is added to the VGI system the feature version lineage of f gets extended and we use the new bit of information to update trustworthiness and reputation for a subset of the system's features and authors. For example, assume that feature f , currently at version f i , is modified, generating version f iþ1 . The model is designed to derive a trustworthiness value for the new feature version by comparing it with previous versions. Concurrently, the reputation of the author of the new feature will also be updated. The new version also triggers an adjustment of the trustworthiness of previous versions of f , as now we have a new version that they can be compared against. Consequently, the reputation of the authors of these versions will also be adjusted. Finally, the newly introduced version indirectly affects the trustworthiness of currently alive versions of "nearby" features (see Section 3.5 for details) and the reputation of their authors. Each time a trustworthiness or reputation score is updated, it is assigned a validity timestamp, which denotes that the score value accounts for all information available in the VGI system from that point in time; it will be valid until the next update. Also, in order to support the analysis and the monitoring of the system dynamics, scores are never deleted: simply, newer scores supersede older ones.

Types of edits
We call edit any operation on the VGI system that generates a new feature version. Some previous approaches (Keßler, Trame, and Kauppinen 2011;Keßler and Groot 2013) focus on specific VGI systems (i.e. OSM) and aims at modeling emergent editing patterns of the system at hand. Such an approach allows for treating more accurately specific patterns: one exemplary editing pattern goes under the name of "edit wars" (Keßler and Groot 2013) or "tag wars" (Mooney and Corcoran 2012); two or more users keep changing the attributes of a feature back and forth between the same states. Nevertheless, the approach delivers a rigid model, in the sense that it hardly applies to VGI systems exposing different editing patterns than the one for which it has been designed. Conversely, we favored generality and flexibility over specialty and designed a model to respond to atomic editing operations.

Creation
A new feature is added to the VGI system, along with its first version. Creation starts a version lineage that is used at later points in time to derive trustworthiness scores. At creation time, though, there is no historical information available to derive the trustworthiness of the new feature version, so this is set equal to the reputation of its author.

Modification
A new feature version is generated by adding, altering, or deleting some attributes of the latest version of the feature. At modification time, the trustworthiness of the new version is derived and the trustworthiness of previous versions is updated according to the similarity to all versions in the lineage, which was just extended by the new version.

Deletion
A feature is removed from the VGI system. In fact, this creates a special version of the feature denoting the end of the feature lineage. The trustworthiness of this version is set equal to the reputation of its author and the trustworthiness of previous versions are not updated. Indeed, a deletion may denote one of two things: either the feature is not existing any longer in the reality or the information provided is totally wrong. In the first case, the trustworthiness of previous versions should not be affected as these were established in the past, when the feature was still existing. In the latter case, according to the many eyes principle, the wrong information will eventually be corrected by restoring the deleted feature. Restoration is treated as a modification, meaning that the trustworthiness associated to the "deleted" version will decrease drastically and, proportionally, the reputation of its author.

Impact and aspects of edits
In general, the trustworthiness of a feature at a given point in time is a function of the similarity among the versions in the feature lineage. We adopt the typical geographical information system (GIS) perspective that considers a geographic feature as consisting of thematic and geometric attributes, but we further add a qualitative spatial level that takes into account more drastic changes that might be caused by small geometric changes. In fact, a small geometric change may change drastically the qualitative spatial relations holding among a feature and its neighbors and vice versa.
Versioning establishes a temporal ordering on feature states.
Accordingly, for the computation of trustworthiness and reputation, we consider the aspects reported in the following subsections.

Thematic aspect
The thematic aspect describes the nongeometric characteristics of the feature. For example, the fact that the feature represents either a road, a traffic light, or a building with an orange facade. Thematic information is assumed to be represented by key-value pairs.
A simple approach to account for thematic edits is to look at the number of differences between the pairs associated to the new version and the pairs associated to other versions in the lineage. A finer-grained alternative would be to look at the semantic similarity between the values associated to the different versions. For example, this can be done by considering the lexical distance (i.e. the shortest path) between two terms on the Wordnet graph (http://graphwords. com/) (Miller 1995). Alternatively, the shortest distance in an ontology including both previous and altered key-value pairs can be used.
For example, consider the case where the key "amenity" of feature version f i , initially set to "Establishment", undergoes two thematic edits. The first edit changes its value to "University", and the second to "Educational Institution". If we only look at the number of thematic differences, both edits would result in a score of one. Conversely, using the lexical distance on the Wordnet graph (as shown in Figure 3), it would be possible to better characterize the edits: the first edit would be characterized by a score of three, the latter by a score of six with respect to f i .

Geometric aspect
The geometric aspect describes shape and position of a feature. The relevance of a geometric edit is evaluated with respect to a series of geometrical properties, such as perimeter, area, number, and position of vertices. Versions of a feature (as shown in Figure 4) are compared to see the differences in these geometric components. For example, version f j has more vertices than version f i , but lesser area and perimeter.
The simultaneous consideration of both the geometric and qualitative spatial aspects allows for weighting more appropriately spatial edits. A qualitative spatial change happens when a modification to the geometric properties of a feature alters at least one qualitative spatial relation between the feature and its neighbors. The impact on trustworthiness is a function of the distance on the conceptual neighborhood graph between the relations holding before and after the qualitative spatial change.
As an example, let us consider topological relations only. Consider the scenario depicted in Figure 5, where two features f and g at version i and j, respectively, are very close to each other but topologically Disjoint. A geometric edit occurs that slightly modifies f i into f iþ1 , as depicted on the right side of the figure.
Although the geometric change is very small, it  changes the topological relation holding between the two features. They are not Disjoint any longer but, rather, they Overlap, which, according to the theory of conceptual neighborhoods (Freksa 1991), is a sensible change. The conceptual neighborhood graph for topological relations (Egenhofer and Mark 1995) (as shown in Figure 6) indicates how topological relations may evolve following a continuous transformation. The distance on the graph evaluates the qualitative spatial difference between two topological configurations. For example, the relations Disjoint and Overlap are at distance two, because there is an intermediate configuration (Meet) between them.

Trustworthiness and reputation
Data trustworthiness (T) and user reputation (R) are discrete functions of time T; R : t 1 ; t 2 ; . . . ; t n f g! ½0; 1 defined on the set t 1 ; t 2 ; . . . ; t n f gof time instants when new feature versions are contributed that influence their value. Their values range in a continuum going from 0 (i.e. minimum reliability) to 1 (i.e. maximum reliability). To keep notation cleaner, in the following we avoid to make explicit the time dependency of trustworthiness and reputation: we write Tðf i Þ and RðaÞ instead of Tðf i ; tÞ and Rða; tÞ, respectively.
The reputation RðaÞ of an author a is the average of the trustworthiness values of all the feature versions authored by a: where FðaÞ is the set of feature versions contributed by a until this point in time, Á j j denotes the cardinality of a set, and Tðf i Þ is the trustworthiness currently associated to the i-th of such feature versions.
All the complexity of the model lies in the definition of trustworthiness as an implementation of the many eyes principle (Raymond 1999): the more authors contribute to a feature (i.e. more versions), the higher the probability that errors are spotted and correctedso that the contributed feature fits well to reality. Such a concept is captured by the notion of confirmation (resp. contradiction): the more versions conform (resp. differ) to each other with respect to the aspects described in Section 3.4, the higher (resp. lower) the trust of these versions.
We differentiate among three different types of confirmations, each capturing a different effect on the overall trustworthiness score: direct effect (T dir ), indirect effect (T ind ), and temporal effect (T tmp ). Trustworthiness is defined as their weighted sum: Figure 5. A small geometric change may correspond to a qualitative spatial change. Figure 6. Conceptual neighborhood graph of topological relations, as defined in 9-Intersection model.
where w dir ; w ind , and w tmp are weights balancing the respective components. Since T 2 ½0; 1, we have w dir þ w ind þ w tmp ¼ 1, with T dir ; T ind ; T tmp 2 ½0; 1.

Direct effect
The direct effect (T dir ) expresses the similarity among all versions in the lineage of a feature. We call this effect "direct" as the information conveyed in the system by each version of a feature affects the trustworthiness of all versions of the same feature. So, this is a direct confirmation (or contradiction) of the state of a feature. Direct trustworthiness of a feature version increases as the attribute-wise difference to the average feature version diminishes, where the average feature version is obtained by averaging (attributewise) all versions in the lineage of a feature.
According to the tripartite classification of the aspects of an edit presented in Section 3.4, direct effect is regarded as consisting of three components: direct thematic effect (T dir;them ), direct geometric effect (T dir;geom ), and direct qualitative spatial effect (T dir;qual ). The overall direct effect is obtained as the weighted sum of the three components: where w dir;them ; w dir;geom and w dir;qual are weights balancing the respective components. Since T dir 2 ½0; 1, we have w dir;them þ w dir;geom þ w dir;qual ¼ 1, with T dir;them ; T dir;geom ; T dir;qual 2 ½0; 1.

Indirect effect
The indirect effect (T ind ) is designed to account for the effect of indirect confirmations on the trustworthiness of a feature. The concept of indirect confirmation was first introduced in Keßler and De Groot (2013). It is based on the consideration that if an author contributes a new version of a feature but leaves untouched nearby ones, then the latter are likely to be correctly mapped. Accordingly, their trustworthiness must increase. Rather than considering all nearby features, it would probably be more appropriate to consider nearby landmarks, or more generally, outstanding features in the local context. There is a large literature addressing the challenge of defining what a landmark issee Richter and Winter (2014) for an extensive surveybut this falls outside the scope of this article. So, in this work we consider indirect confirmation based on spatial proximity and temporal co-occurrence of feature versions. That is, for a feature version f i to be indirectly confirmed by another feature version g j , two conditions must be fulfilled: (a) (temporal co-occurrence) f i exists when g j is created; (b) (spatial proximity) g j is within a given range from f i .
The example in Figure 7 illustrates the indirect effect in case of versions f i , a j , and b k of three different features. Figure 7(a) represents the lifetime of the feature versions and can be referred to check temporal co-occurrence, while Figure 7(b) depicts spatial proximity, with the dotted line denoting the proximity range of feature f . The entities that satisfy the temporal co-occurrence or spatial proximity are depicted in green. Only the trustworthiness of those entities that satisfy both criteria (green on both sides of the figure) are indirectly affected by f i ; those in gray are not affected. Version f i is created during a j and b k lifetime, with b k also falling in the proximity area. Both versions a j and b k satisfy the condition on temporal co-occurrence, but only b k satisfies the condition on spatial proximity. So, b k is indirectly confirmed by f i , while a j is not. Note that the feature version a jþ1 is not affected by f i as it is still not existing. Conversely, when a jþ1 will start to exist, it will influence the trustworthiness of f iassuming that the condition on spatial proximity will also be satisfied. Similarly to the direct effect, also the indirect effect consists of three components: indirect thematic effect T ind;them , indirect geometric effect T ind;geom , and indirect qualitative spatial effect T ind;qual . The overall indirect effect is obtained as the weighted sum of the three components: where w ind;them ; w ind;geom , and w ind;qual are weights balancing the respective components. Since T ind 2 ½0; 1, we have w ind;them þ w ind;geom þ w ind;qual ¼ 1, with T ind;them ; T ind;geom ; T ind;qual 2 ½0; 1.

Temporal effect
The temporal effect (T tmp ) is designed to account for temporal confirmation, that is, the longer a version persists in the system, the higher the probability that it is well mapped. Accordingly, we want that the trustworthiness of a feature version increases over time as it remains unaltered, converging to T ¼ 1 as t tends to infinity. Again, this is a derivation of the many eyes principle: a version with serious and remarkable errors is more likely to be modified sooner than a version with negligible errors.
The temporal effect determines the increment of trustworthiness until the maximum possible level. As illustrated in Figure 8, the temporal effect asymptotically increases toward 1, as described by the function: where tðf i Þ is the lifetime of feature version f i , tðf Þ is the overall lifetime of feature f (i.e. the sum of the lifetimes of all versions of f ), and c is a parameter that can be adjusted to modify the curve slope.

A possible implementation of direct and indirect trustworthiness
In previous sections, we introduced our model for trustworthiness and reputation and gradually broke down trustworthiness into finer pieces. The finest level comprises thematic, geometric, and qualitative spatial aspects for both direct and indirect effects; specifically, direct thematic effect (T dir;them ), direct geometric effect (T dir;geom ), direct qualitative spatial effect (T dir;qual ), indirect thematic effect (T ind;them ), indirect geometric effect (T ind;geom ), and indirect qualitative spatial effect (T ind;qual ). In this section, we propose one possible implementation of these fine-grained components of trustworthiness. The empirical evaluation of the model presented in Section 5 is based on this implementation. We leave for future work the task of deriving more complex implementations.

Direct thematic effect
The direct thematic effect (T dir;them ) considers how thematic attributes vary among different versions. If a set of attributes is used more consistently throughout different versions of a feature, then the contribution of these attributes to the direct thematic effect is higher.
A possible behavior can be expressed as follows: where f i is the version under assessment, n is the number of versions of the feature f , and noDiffThemðf i Þ is the number of versions of f that have a different thematic attribute set than f i . As mentioned before, more complex behaviors would take into consideration other metrics for the attributes, for example, not only by counting the number of versions with different attribute sets, but also measuring the lexical distance among them.

Direct geometric effect
The direct geometric effect (T dir;geom ) addresses the difference between a feature version f i and the average feature version f avg in terms of geometric properties. Let be a measure of the similarity between f i and f avg for what concerns the generic geometric property p, where p i is the value of the property for f i , p avg is the average property value over the versions currently present in the feature version lineage, and c is a parameter that can be adjusted to modify the curve slope. Then, the direct geometric effect can be expressed as where P ¼ Area; Perimeter; VertexNumber; Vertex f Positiong is the set of considered geometric properties and P j j denotes its cardinality. Note that the set of considered properties can be altered according to those available in the VGI at hand.

Direct qualitative spatial effect
The direct qualitative spatial effect (T dir;qual ) considers the change in qualitative relations among different versions. For this effect, speaking of average values makes no sense, as in general qualitative relations range in a discrete set with no total order defined. Therefore, we count the number of times a specific qualitative spatial relation occurs between the feature version at hand f i and the other features in the system. To reduce the computational complexity, we suggest to filter the features to be considered. A simple filter can be achieved by considering only those features falling within a given distance d from the feature version f i . Alternatively, filtering can be based on more sophisticated criteria that only select features satisfying given thematic or geometric constraints. For example, one may select only buildings whose area is bigger than a given threshold.
For simplicity, let us consider only current neighbors of feature version f i , i.e. the set of feature versions N that are alive when version f i is generated and whose distance from f i is not greater than d. We denote by F ¼ f 1 ; Á Á Á ; f i f g the set of all versions of feature f , with f i being its latest version. Let rel j;k ¼ relðf j ; kÞ with j 2 ½1; i; k 2 N (9) be the qualitative spatial relation holding between the j-th version of f and the k-th neighbor of f i . Finally, we use Iverson bracket notation and define as the number of times the relation holding between f i and the generic neighbor k also holds for previous versions of f . Then, we can define direct qualitative spatial trustworthiness as where n denotes the total number of versions of feature f , and N j j denotes the number of neighbors of f i . Said differently, we first count for each neighbor k 2 N of f i the number of times the relation rel i;k also holds for past versions f j (with j < i) of f and normalize it by dividing by the number of versions of f . Then, we average these values over all neighbors of f i .
Note that, although previous versions of neighbors are not considered here, they will affect the trustworthiness of f i indirectly, provided that we chose a commutative filtering function to generate set N (such as distance).

Indirect effects
One very easy approach to model indirect effects (T ind;them , T ind;geom , T ind;qual ) is to consider the reputation of the author a that contributes the feature version g j triggering an indirect confirmation for feature f i .
We defined the overall reputation of an author a in Equation 1 as the average trustworthiness of the feature versions authored by a. Trustworthiness, as defined in Equation 2, consists of direct, indirect, and temporal effects. In turn, direct (Equation 3) and indirect (Equation 4) effects consist of thematic, geometric, and qualitative spatial aspects. By substituting Equations 3 and 4 in Equation 2 and by applying basic algebraic transformations, we can refactor the overall trustworthiness in terms of aspects (rather than by effects): with the generic component w x Á T x ðf i Þ, x 2 them; geom; qual f g equal to Equivalently, we can refactor the overall reputation of author a as follows: with the generic component R x ðaÞ, x 2 them; geom; qual f g equal to where FðaÞ is the set of feature versions contributed by a until this point in time, Á j j denotes the cardinality of a set, and w x Á T x ðf i Þ is the trustworthiness aspect (Equations 12 and 13) currently associated to the i-th of such feature versions.
Then, we can define indirect trustworthiness effects as T ind;them ðf i Þ ¼ R them ðaÞ; (16) Figure 9 presents a summary of how trustworthiness is composed.

System architecture
In Section 3, we presented a model to derive trustworthiness and reputation scores for VGI. This section presents and analyzes a system architecture that implements the model and that we call TandR, shortly for Trustworthiness and Reputation.

Requirements and features
The TandR framework uses feature evolution information. The outputs consist of a trustworthiness score for each spatial feature version and a reputation score for each contributor. The framework is able to manage only VGI systems that adhere to the model of Section 3; that is, feature evolution information is organized in versions, where each version is the result of an atomic edit, either a creation or a modification or a cancellation.
The framework must perform data analysis on multiple VGI system sources; it does not have to provide direct support to each specific system format, but it must be able to analyze data from different sources; such data have to be brought back to an established format that is flexible and general enough to be able to manage all necessary information.
The problem of giving the right structure to feature evolution information can be solved by organizing data in versions and defining a common format for information sources. Provenance information was already adopted and properly structured in Keßler, Trame, and Kauppinen (2011), as already discussed in Section 2.2. Figure 1 shows the ontology designed in this work, which correlates spatial data taken from the OSM VGI system with provenance information. We use such an ontology as a starting point to introduce, in Section 4.2, a more general ontology that models VGI information.
The model introduced in Section 3 describes a basic strategy for scores calculation; nonetheless, it can handle changes and extensions. Extensibility of the framework is implemented through a mechanism of module selection, with which the framework can handle different implementations of the trustworthiness and reputation model: each module has the task of managing a different implementation. The selection of a given module is made by the user via a string passed to the framework configuration.
In addition to the last trustworthiness and reputation scores, the framework keeps record of historic scores that show how trustworthiness and reputation evolve over time. The model implementation follows events time line; therefore, computation takes into account a series of trustworthiness and reputation scores, each representing the level of "reliability" that versions and users had in the moment the scores calculation refers to.

Data models
Data are modeled following two ontologies: HVGI and TandR. The historical VGI (HVGI) ontology, shown in Figure 10, models the domain of VGI systems provenance information and is used to manage VGI feature versions. This ontology includes many concepts defined in external ontologies, notably, the OSM provenance information ontology proposed by Keßler, Trame, and Kauppinen (2011) that defines the concept of FeatureState (corresponding to our feature version). Also, it includes the ontology defined by the Open Geospatial Consortium (OGC) to manage feature geometry, with the Geometry class and its subtypes Point, LineString, and Polygon. Finally, it includes the Friend Of A Friend (FOAF) ontology to manage user details.
The TandR ontology (as shown in Figure 11) models trustworthiness and reputation scores domain. It allows us to represent the score data of the model introduced in Section 3. The ontology provides definition of Trustworthiness and Reputation concepts, linking them to the concept of Effect. This can consist of one or more Aspects, as for the case of direct and indirect effects that are broken down into semantic, geometric, and qualitative aspects. The concepts of Effect and Aspect provide flexibility to the model, in the sense that they are general and can be used to extend or modify the model proposed in this work. Note that Trustworthiness and Reputation are connected to FeatureState and User concepts, respectively, both defined in the OSM provenance information ontology (Keßler, Trame, and Kauppinen 2011). TrustworthinessValue and ReputationValue group two bits of information: the numerical score value and a timestamp, indicating the point in time since when the score is valid. Thanks to this information, it is possible to reconstruct the evolution of the trustworthiness of a feature version and of the reputation of a user over time. The Effect concept is related to two more concepts. The EffectDescription provides the name of the effect, a human-readable description of it, and relates to the person that defines it. The EffectValue reports the numeric value of the effect and the date from which that value is valid. The Aspect concept is specular to the Effect.

Framework overview
In this section, we describe the system components diagram of the TandR framework (as shown in Figure 12). The system uses semantic technologies, hence information persistence is managed by the Parliament triple store (http://parliament.semwebcentral.org/). Parliament uses the quadruple pattern (graph {subject predicate object}) and is spatially enabled via the GeoSPARQL standard (http://www.opengeospatial.org/standards/geosparql/). The system adopts the Parliament libraries available in the Java language, that provide an API to interact with the underlying triple store. As shown in Figure 12, the triple store manages three different components, each of which represents a graph that contains a set of related RDF (https://www.w3.org/RDF/) triples: the graph that contains VGI data, the graph that contains trustworthiness and reputation scores, and the graph that contains results of scores validation.
The TandR framework is implemented in Java. To manage interactions with the triple store and process data, TandR uses two external components: the Java Topology Suite (JTS) (https://github.com/location tech/jts/) and Jena (https://jena.apache.org/). The former provides support for spatial operations, while the latter collaborates with Parliament to manage the connection to the triple store.
Data sources are divided into two kinds. The first one, the most commonly used, needs a layer that adapts data from the VGI system format to the application format, as specified by the HVGI ontology. The second kind of source provides data directly to the format required by the application. OSM represents an example of the first kind of source that requires the development of a component that translates the OSM history file into the application format: we called such a component OSH2RDF. In Figure 12, the components in green indicate that they have been implemented in our prototype, while the components in gray indicate that they can be added in future developments.
The framework supervises the following activities: (1) Installation. The installation process aims at preparing a triple store to host data. More in detail, the installation process carries out the management and creation of graphs, the VGI system data import, and spatiotemporal indexing.
(2) Scores Computation. As already mentioned, multiple scores are provided for versions and users, each associated with a validity date. The latest score available is always the most reliable, due to the fact that it was computed with a greater amount of provenance information. The obtained scores are structured and made persistent in such a way that they are properly differentiated by version date. (3) Scores Validation Scores validation. compares the obtained scores against a ground truth data-set, in order to assess their quality. We will explore how validation works in Section 5.
In the framework configuration, it is possible to select which of the three activity the user wishes to perform or even more than one activity at a time.

Framework behavior
Events and edits are considered in temporal order. Therefore, the framework will take into account the order in which versions were generated and process them accordingly. One way to achieve this result is presented in the following algorithm: Retrieve all dates in which at least one feature version was created. For each date: Retrieve all versions whose lifetime begins in that date.
For each version: Delegate scores calculation to selected module.
Reputation and trustworthiness scores may be calculated according to our model (Section 3.6). After the first time a version is processed, trustworthiness is updated whenever a new feature version (direct effect) or a nearby version (indirect effect) is created; the temporal effect is updated every time either of the other effects are altered. See the following algorithm:

Empirical validation
In order to prove the validity of the proposed model, we should show that feature versions with higher trustworthiness resemble more closely the features mapped in authoritative data-sets. In this section, we provide a qualitative validation of the model by (1) running our TandR Java framework on VGI data to derive trustworthiness and reputation scores and (2) selecting a feature f and comparing its lowest f min , medium f med , and highest f max trustworthy versions against the corresponding feature f t from a ground truth data-set. The model proves valid if the similarity σ of the three feature versions to the reference feature grows as trustworthiness grows. That is, the following system of equations should be satisfied: where the similarity function σ accounts for the same aspects used by the TandR framework to compute trustworthiness and reputation scores. Note that the model does not require a comparison to a ground truth data-set to work. This is only done to prove the validity of the model.

Data sources
As VGI data source we use the OSM full history dump file (http://planet.osm.org/planet/full-history/), which contains the evolution history from the beginning of the project until today. The ground truth data-set was obtained from Open Government Wien (https://open.wien.gv.at/), which provides Austrian land cover data; a view of such government data is shown in Figure 13 and is about the municipality of Vienna in 2012.
For the experiment reported in this article, we restrict ourselves to the spatial window depicted in Figure 14 and to a temporal window of four years starting January 2010 and ending December 2013. The resulting data-set, obtained from the OSM full history dump, consists of 834 feature versions belonging to 552 different geographic features edited by 38 users.

Experiment setup and validation methodology
First, we run the TandR framework on the data-set defined above with the following weights: For the slope parameters of the temporal effect (Equation 5) and the direct geometric aspect (Equation 7), we set the values 500 and 1, respectively. For the geometric aspect we adopted the measures Area; Perimeter; VertexNumber f g and for the qualitative aspect we considered topological relations as defined in the 9-Intersection model (Egenhofer 1989). For the thematic aspect we used the formula in Equation 6.
As a second step, we performed a qualitative validation. The validation methodology associates to each ground truth feature f t the corresponding VGI feature f and compares each version f v of f against the corresponding feature f t from the ground truth data-set. In other words, we consider the feature f t from the ground truth as if it was an extra version of the corresponding VGI feature. Then, the comparison is done by applying the same calculations used for computing the direct trustworthiness values only (since the ground truth data-set does not evolve over time, it does not make sense to account for the temporal and the indirect effects).
Since the ground truth data-set does not report thematic information, the comparison is performed only considering the direct geometric and direct qualitative aspects. Accordingly, the first-level weights of the model (Equation 2) have been set as follows: w dir ¼ 1, w ind ¼ w tmp ¼ 0. The finer-grained weights for the direct effect (Equation 3) have been set to: w dir;geom ¼ w dir;qual ¼ 0:5, w dir;them ¼ 0.  The high-level algorithm used to validate trustworthiness indexes can be described as follows: For each feature f t in the ground truth data-set: Retrieve corresponding VGI feature f .
For each version f v of the retrieved VGI feature: Compare the geometry of f v with the geometry of f t .
Apply comparison logic to obtain the similarity score σðf v ; f t Þ.
To identify correspondences between the VGI and the ground truth data-set, we consider the versions f v of VGI feature f , whose geometry intersects f t . Hence, for each feature f t in the ground truth data-set, we retrieve the VGI feature f that has in its lineage the version f v with the largest intersection area with f t . The similarity score σðf v ; f t Þ is obtained by comparing the geometries of the VGI features against those of the authoritative ones, as explained in Section 3.6.

Results
To showcase the results obtained in our experiment, we report in Figure 15 the computed trustworthiness values for the OSM feature 32506491 over time. Note that while edits are identified by the date and time they occurred, the x-axis is not scaled over time. That is, dates are nominal values in this plot. The different versions of the feature are reported in different colors. Also, note that, although this specific feature (in the time range considered in our experiment) consists of 16 versions, the number of plotted trustworthiness values are much more. The first data point of each version (color) is equal to the reputation of the version's author at the time of insertion in the VGI system. All the other data points in a feature version life correspond to an adjustment of the trustworthiness ignited by indirect confirmations.
In general, we can observe how the trustworthiness value swings quite smoothly up and down according to the impact of the effects and aspects discussed in previous sections. However, at the beginning of the time interval the trustworthiness value jumps quite abruptly from 0 to approximately 0.3. This might be related to the specific choice of the slope parameters influencing the behavior of the temporal aspect (Equation 5) and of the direct geometric aspect (Equation 7). The value of the parameters shall be tuned in future experiments to obtain a smoother effect.
As an example, we show in Tables 1 and 2 trustworthiness and similarity scores obtained for two polygons: OSM feature ids 38838966 and 45275690, respectively. The trustworthiness scores are computed using all the effects and aspects discussed in Section 3.6. The similarity values are computed using only the direct geometrical aspect and the direct qualitative spatial aspect, as explained in the previous section. Both types of values, trustworthiness and similarity, do not have an absolute meaning. Rather, they have to be interpreted in a relative manner among several versions of the same features. So, for example, we can read that version 4:0 of feature 38838966 is the most trustworthy among all of its versions, but we cannot infer anything about how reliable this version is with respect to other features.
All results in the selected territory portion seems to indicate that our model of trustworthiness provides a valid measure of VGI data quality with respect to ground truth data. Collected data encourages this approach and shows its validity; nevertheless, it is necessary to improve it and perform more exhaustive tests.

Conclusions and future work
VGI systems collect spatial information from Internet users and follow the principles that stand behind crowdsource information. The scope of map-based VGI    systems, such as OSM, is to build a global map by filling spatial information and to maintain it freely accessible.
One key factor for developing and popularizing these systems are "web users" and the contribution they provide; the higher is the number of contributors, the larger is the collected information, in terms of both amount and quality. The definition of a method that quantifies how good VGI is constitutes an active research topic. Among proposed methods, we based our work on the approach taken by Keßler, Trame, and Kauppinen (2011) and Keßler and De Groot (2013). We presented a model to derive trustworthiness and reputation scores for geographic features and contributors of a VGI system, respectively. The model relies on provenance information and applies to all VGI systems that support versioning, where each version is the outcome of an atomic edit action, either creation, modification, or deletion. The model provides a trustworthiness score for each feature version and a reputation score for each author.
The model was implemented as a Java application using semantic technologies: ontologies have been developed for the representation of data that was made persistent with triplestores. The system architecture is thought of as a framework that can be easily extended and modified by introducing new modules that implement different score calculation models.
The software framework was run on an OSM dataset (full history dump). The obtained scores were validated against a ground truth data-set provided by an authoritative source. For each authoritative feature, we selected the corresponding VGI feature and verified that versions with increasing trustworthiness scores have increasing similarity to the reference feature. In the validation, we only applied direct geometric and qualitative spatial aspects, but we did not consider the thematic aspect and the temporal effect. The results met the expectations since an increase in trustworthiness corresponds to an increase in the similarity to the authoritative reference feature.
The current implementation of the framework can be extended in many directions. We foresee to extend the model by also considering the typical region of contribution of single users. That is, typically the same user contributes information about the same geographic region(s). Consequently, it is reasonable to derive a reputation score for this user that is a proxy measure of her/his local knowledge. Arguably, over time the user reputation will increase and, accordingly, the trustworthiness of the features authored by her/ him. Yet, if the same user starts contributing information about a different area (maybe because she/he moved to a different city or state), there is no reason to assume that she/he has good local knowledge about the new region. Thus, the reliability of her/his contributions should be rated accordingly. This is especially true for the thematic aspect of edits but not much for her mapping and cognitive skills (i.e. the geometric and qualitative aspect, respectively).
The reputation of a user should also account for the frequency of contributions, that is, if a user becomes inactive, her/his reputation should diminish over time. This behavior might be modeled similarly to the temporal effect of trustworthiness. In this case, the curve should be monotonically decreasing and converging to 0.
The computation of trustworthiness scores might be improved by introducing a temporal window. Indeed, as of now, trustworthiness scores are updated by comparing each version in a feature lineage against the "average feature version". This might led to very strange behaviors in case a feature disappears or changes drastically in the real world. For example, consider the case when a building is demolished. In this case, the older versions of the feature will be completely different than the newer versions. With time, the trustworthiness of the newer versions will increase while the trustworthiness of the older versions will decrease. The reputation of their authors will be altered accordingly. Yet, the deterioration of the reputation of the authors of old versions should be mitigated as at the time of contribution the building was still existing. To solve this issue, we suggest to introduce a temporal window that, at the moment of updating trustworthiness scores, only selects a subset of the versions in a feature lineage, leaving untouched very old versions and, consequently, the reputation of their authors.
Finally, the framework can be more thoroughly validated by considering other case studies, the thematic aspect, more domain ontologies, and a wider range of qualitative spatial relations (such as directional and distance operators).
Web. He is currently collaborating with University of L'Aquila and working in a private company Thales Alenia Space on On-Board Software for Space and Avionic Systems.