Contributors’ enrollment in collaborative online communities: the case of OpenStreetMap

Abstract The number of people registering in an online community depends on two main factors: interest in, and awareness of, the project. Registering to a project does not, however, imply contributing to it, as lacking the knowledge and skills can be a barrier to participation. In order to identify the nature of events that might have facilitated or hindered enrollments in the OpenStreetMap (OSM) project over time, we analyzed the correlations between the number of new participants and the events that dotted its history. Four different metrics were defined to characterize participants’ behaviors: the daily number of registrations, the daily number of participants that made a first contribution, the delays between contributors’ registration and their first edits, and a daily contribution ratio built from the number of new contributors and the number of new registered members. Time series analyses were used to identify trends, and outstanding variations of the number of participants. An inventory of events that took place along the OSM project’s history was created and appreciable variations of the metrics have been linked to events that seemed to be meaningful. Although a correlation does not imply causality, many of the explanations these correlations suggest are supported by the results of other studies, either directly or indirectly. For instance, when considering the time participants spend as “lurker”, as well as on the nature of the contribution of early participants. In other cases, they suggest new explanations for the origin of the spam accounts that affect registration statistics, or the decline in the proportion of registered members who actually become contributors.


Introduction
With the advent of the Web 2.0, contributing to an online community of interest has never been easier and the improvement of Web applications removed most of the barriers linked to physical distance or volunteers' availability (Bryant, Forte, and Bruckman 2005). These communities play an important role in today's society. They are increasingly considered as a credible source of information and researchers are referring to these communities as both a valuable work force and an important data source (Riesch and Potter 2014;Kimura and Kinchy 2016;Michelucci and Dickinson 2016). Volunteers' motivations for contributing to online projects have been well studied in the scientific literature (Clary 1998;Ryan and Deci 2000;Penner 2002;Nov 2007;Borst 2010;Budhathoki 2010;Stebbins 2015) and in summary, the number of people registering to a project depends on two main factors: interest and awareness.
First, the project must be of interest to potential contributors, which means that it must be perceived as being either relevant, appealing or both. A project is relevant when people expect it to meet their needs, desires, or aspirations, whether because of the nature of the task (Hemetsberger 2003;Houle 2005;Borst 2010), or because of the project's objectives (Nov, Arazy, and Anderson 2011;Aknouche and Shoan 2013). People will find a project appealing if they foresee that their participation will be enjoyable or even fun (Budhathoki, Nedovic-Budic, and Bruce 2010;Aknouche and Shoan 2013). Furthermore, registering to a project does not imply contributing to it, and the phenomenon of "lurkers" (i.e. members who do not immediately contribute) is well described in the literature (Preece, Nonnecke, and Andrews 2004;Schneider, von Krogh, and Jäger 2013;Sun, Rau, and Ma 2014). These lurkers may be new members that have been confronted to a reality that differs from their expectations, preventing them from contributing for various reasons. In the context of volunteered geographic information (VGI), the knowledge and skills required to contribute can be certainly be an obstacle (DiBiase et al. 2006;Downs and DeSouza 2006). These above factors are considered as internal to the project.

OPEN ACCESS
Second, potential contributors must be aware the project exists. In order to see the number of participants growing, most of them must like their experience and share it with others to slowly expand the circle of participants within friends, colleagues or groups of interests (Brown and Reingen 1987;Hemetsberger 2003;Rogers 2010). If this process is successful, the community will eventually reach people on a much larger scale through blogs, conferences, or even mass media that can lead to an exponential growth (Tichenor, Donohue, and Olien 1970;Rogers 2010) that is typical of most successful online communities. These are identified as external factors to the project.
The number of participants that enroll and contribute to an online project therefore, depends on complex interactions between the project characteristics (e.g. objectives, infrastructure and community) and the participants' profile (e.g. motivation, expectations, knowledge and skills) as they evolve each other over time. Understanding these interactions and their relative impacts on an online collaborative project could help concerned people to decide what actions to take, or not to take, to allow these communities to grow and remain healthy. Unfortunately, little has been published about the actual effects such interactions have on the evolution of the number of contributors in an online project.
In order to apprehend the complex interactions between these factors, different metrics were used to assess the evolution of enrollments of a large VGI project. Since the factors that influenced the decision of individual participants to enroll are not known, the correlations between their enrollment and the events that dotted the history of the project were used as proxy indicators. Although correlation does not imply causality, many of the correlations found suggested explanations that are supported by the literature while in other cases they suggest new explanations that will need to be explored further.
This paper presents four metrics used to assess the enrollment of participants in the OpenStreetMap (OSM) project over time. It describes the procedures elaborated to prepare and analyze the data and discusses the variations that affected the metrics and their correlations with the events that dotted the history of the project.

Materials and methods
OpenStreetMap is a project of general interest that aims at mapping the world using a Wiki approach. Similarly to Wikipedia, participants decide what, when and where they contribute without any constraints, the respect of the community's guidelines being validated a posteriori by the other participants or by bots (OpenStreetMap Wiki 2017: "Good practice" and "Editing Standards and Conventions" pages). With more than 4 million registered users, OSM has become the most successful VGI project on the web, even though the level of technical knowledge required to contribute is higher than the average collaborative community.
Furthermore, the project is very well documented and the data are freely available. The history of the project (e.g. technical improvements, normative changes, social activities) can, in parts, be reconstructed from the OSM blog (OpenStreetMap Foundation 2017) and the OSM documentation wiki (OpenStreetMap Wiki 2017). Information about individual OSM members is available through the OSM application programming interface (API) (OpenStreetMap Wiki 2017: "API v0.6" page). Their personal profiles provide, among other things, the username, the registration timestamp, the number of contributions made and an optional free text field that can be used to present the participant. Contributions to the project are made available on a regular basis through history dump files (OpenStreetMap Wiki 2017: "Complete OSM Data History" page). Those files contain all the edits made since the beginning of the project up to the release date of the dump files. In addition to the edits, the file also contains the virtual containers (changesets) that identify the content, the contributor and both the geographical and temporal extents of each editing session.

Metrics
The literature has proposed multiple metrics to study the OSM project, either on the nature of contributions (Neis and Zipf 2012;Corcoran, Mooney, and Bertolotto 2013;Rehrl et al. 2013;Steinmann et al. 2013), the quality of the data (Girres and Touya 2010;Keßler and de Groot 2013), the profiles of its contributors (Budhathoki, Nedovic-Budic, and Bruce 2010) and the interactions they have between them Arsanjani et al. 2015).
We used four metrics to characterize the participation to OSM on a temporal perspective. The first metric is the "daily number of new registered members" which aims at assessing variations in people's interest and awareness about the project. The second metric is the "daily number of new contributors" which provides both the number of registered members who made a first contribution and therefore those who did not contribute yet. The third and fourth metrics are derived from the previous two. The "contribution ratio" results from dividing the number of new contributors by the number of new registered members on a daily basis, and the "contribution delay" which is the time spanned between contributors' registrations and their first contributions (i.e. the time spent as lurker).

Information retrieval
As a part of a larger project that started three years ago, a history dump file released on 1 September 2014, was downloaded from the OSM web site (OpenStreetMap Wiki 2017: "Complete OSM Data History" page). FME (safe software) workbenches were developed to extract and load to a PostgreSQL database the data from both the history dump file and from queries made to the OSM API. Statistical analyses and visualizations were carried out using R software. The observations used in this study were built from the timestamps of all contributors' first edits and an estimation of the registration's timestamps of all OSM members at that time. The dates of contributors' first edit were obtained from the creation timestamp of their first changeset, and the daily count of new contributors was based on these dates.
Obtaining the daily count of registrations would have required querying the OSM API for over 2.3 million individual profiles (as of 1 September 2014). Instead, only contributors' profiles were retrieved and their registration timestamps were used to approximate those of the remaining members. These registration timestamps were linearly interpolated using the R's "approx" procedure (R Core Team 2016) over the whole range of members' identifiers (ID) generated over that period (according to the IDs found in the history dump). The accuracy of the resulting timestamps was assessed over a sample of 3074 evenly distributed lurker profiles.
An inventory of the events that dotted the history of the OSM project was retrieved from the OSM Wiki pages (OpenStreetMap Wiki 2017: "History of OpenStreetMap", "Past Events", "OpenStreetMap in the media", "Development activity" pages) and some OSM mailing lists were consulted (i.e. the general (talk), development (dev) and legal mailing lists). Since building an event classification was outside the scope of this research, we adopted the event categories developed by the OSM community (OpenStreetMap Wiki 2017: "Current events" page) to include development milestones, media news, and internal announcements (i.e. blogs and mailing lists). Categories were grouped under internal and external factors. Internal factors are categories of events that set or change the project's characteristics and determine whether the project is relevant or appealing to an individual, such as new rules or application improvements. External factors are categories of events that affect the number of people that may be aware of the project (i.e. project visibility), the perception they may have about the project, or both, such as media coverage or conferences. Within the different categories (presented later in), the "Mapping" category is a special case combining activities that are inherent to the project, but mostly impacted the visibility of the project (i.e. classified as external factors). Mapping parties (i.e. typical social gathering oriented toward a mapping task) have increased the visibility of the OSM project by bringing new participants (Haklay and Weber 2008;Mashhadi, Quattrone, and Capra 2015). Similarly, the mapping efforts made by the OSM community after natural disasters have also increased the visibility of the project to international relief organizations (Zook et al. 2010;Horita et al. 2013).

Invalid accounts removal
Online collaborative projects often see user accounts removed by administrators, either because the users were banned or the accounts created to spam the project. A stratified random sample of members' profiles was performed to assess the proportion of these accounts over time and remove the accounts from the registration statistics we generated using the whole range of available IDs. Two random profiles were retrieved for every 1000 sequential ID. Instances for which the API did not return any profiles were invalid accounts removed by OSM admin and considered as such in our analysis. Three fields from the sample profiles might be used to identify spam accounts: the username, the content of the free personal text field, and the number of contributions made.
The free text field of 4604 sampled profiles was first analyzed to identify possible spam content. Anticipating username patterns in spam accounts, all usernames were compared considering whether the accounts were flagged as spammed or not, contributed to the project or not, and the time at which they registered. Identified patterns were translated into a regular expression to identify most of invalid accounts from our sample, while minimizing erroneous identification of legitimate accounts. The proportion of invalid accounts was assessed over time using a moving average on a 101 samples window (i.e. covering about 50,000 consecutive IDs) and was set to constant values on the edges.

Time series analysis
Standard time series analyses postulate the presence of a stochastic process, dividing the process into a centered random component and deterministic trend and seasonal components (McLeod, Yu, and Mahdi 2011;Hyndman and Athanasopoulos 2014). The trend component is used to assess the long-term variations in rates of registrations and initial contributions. Turning points in these curves may result from changes in either the popularity of a project, the ease with which participants can contribute, or both. We expected these variations to correlate with events that had a long-term impact on the project. The seasonal component is expected to identify recurring events that modulate the rate of registrations and initial contributions. Finally, the random component should highlight their outstanding variations. The correlations with specific types of events may reveal clues about what affected participants' behavior, such as some downtime from servers, or the coverage of the project by mass media.
The decomposition of the time series was performed using R's decompose procedure (R Core Team 2016). A yearly cycle was used as the time unit for seasonal variations, resulting in 365 observations (days) per unit. The determination of the "trend" components over a yearly cycle left 182 days without value on each side of Internal factors regroup 1350 "Meeting", 135 "Upgrade", 52 "Forum" and 8 "License" events. External factors counted 725 "Mapping", 369 "Conference" and 939 "Media" events. With only a few exceptions, all potential explanatory events were found within a week or so from identified variations in participants' behavior.

Invalid account removal
Interpolated registration timestamps (i.e. lurker registration) proved to be accurate, with a standard deviation of 37 minutes and 95% of observations being within one hour from their actual timestamps. The resulting rates of registrations were compared to the rates of new contributors over time. While the later increase steadily, the number of registered members exploded on 8 July 2012, raising from an average of 704 to 2259 registrations a day, a volume that remained high over most of the period covered by the data-set ( Figure 1).
Analysis of sampled profiles revealed that after 8 July 2012, large proportions of new accounts were created with spam contents in their text field. The examination of the text field revealed that on average 25% of all accounts created between July 2012 and September 2014 contained spams and, with only a few exceptions, spam contents affected only lurkers. Spamming contents were mostly random texts, without obvious purpose, that may result from search engine optimization (SEO) procedures. the components. The results are expressed as an average number of participants. The seasonal components were computed by averaging observations over each day of a year after the trends were removed. In this case, trend components were removed by dividing observed values by the trends (creating a ratio) to take into account variances dependency on the means for both distributions (Hyndman and Athanasopoulos 2014). The seasonal components were then expressed as a proportion of the trend and the 365 resulting values were duplicated as necessary over the whole range of observations. The random components result from removing both the trend and the seasonal components from the observed values and are also expressed as a proportion of the trend.

Contribution delays and contribution ratios
Contribution ratios were obtained by dividing the trend component of the initial contributions by the trend component of the registrations. The resulting daily ratios provided the proportions of registered members that contributed to the project over time. The contribution delays were obtained by computing the time span between contributors' registration and their first edits. Daily averages and medians of computed delays were plotted to understand how they evolved over years.

Events associations
Abrupt variations in the metrics were correlated to events that made the history of the project. Outstanding variations (outliers) found in seasonal and random components of time series analyses were identified using the R's "Boxplots" procedure (R Core Team 2016) and a manual identification of major turning points was used for the remaining analyses results. Potential explanatory events were searched within a few days from identified variations and a qualitative analysis of the events was used to select the most relevant ones. The analysis considered changes in the volume of participants prior and after each event and the potential number of people reached, or affected, by these events.

Results
The event repository counted more than 3560 events that dotted the history of the OSM project from 2005 to September 2014. Events were classified into seven categories and two factors (Table 1).  illustrate removed bot accounts and the light-green segments show the actual number of registered members after the correction. Our results suggest that the spamming processes succeeded to seed spams in 54% of the accounts they created during the period covered by the analysis and that OSM administrators were able to close about one-third of these accounts. In order to assess if the spamming processes were still active, a sampling of the accounts created beyond this period until February 2017 was made. The result shows that spam accounts creation processes were still active with about half of newly registered members that may not be legitimate. The 3299 sampled profiles revealed that 10% of the accounts contained spams, another 10% had been closed by the OSM administrators, and more than 30% of the profiles were lurkers having a username pattern that match our regular expression.

Time series analysis
The data present two continuous sequences of discrete time-ordered observations that display increasing averages and variances with positive and negative peak events. Analyses results are presented on Figure 3. The "observed" components show the original distributions.
Height discontinuity (i.e. rupture points) were identified in spam accounts distribution, generating nine segments (i.e. periods) over which the rate of bot accounts creation was relatively constant. These rupture points were compared to the list of events from OSM history to find potential relationships and the results are presented in Table 2.
The results show two periods during which spam accounts were created on a larger scale. The first one spans from March to September 2010, a six-month period after the OSM legal working group (LWG) resolved the remaining problems around the ODbL license implementation. The second one started in July 2012, just before the data from those who did not agree to the ODbL license were removed from the database. The creation of spam accounts has continued over the period covered by the analysis with few rupture points that matched some State of the Map (SOTM) conferences in Europe.
The characteristics of lurkers' usernames changed significantly over the periods the spamming processes were active. During these periods, 88% of sampled spammed accounts showed specific patterns of English words and digits in their usernames. As anticipated, such patterns were rarely seen for contributors (5%), or for lurkers outside these periods (7%). Three distinct patterns were identified and combined in a regular expression to estimate the proportion of accounts created by spamming processes. The regular expression was applied to our samples, identifying 530 of the 603 spam accounts, a detection rate of 88%. Only 47 of the 940 sampled legitimate contributors were flagged as spam account resulting in 5% false positives. Equation (1) was used to estimate the proportion of accounts generated by bots in our sample (Pbots): where Pspam is the proportion of spam accounts and Pregex is the proportion of lurkers' usernames that matched the regular expression excluding spam accounts. The proportion was adjusted to compensate for the 5% false positives resulting from the regular expression. The distribution of registration rates prior and after the correction is shown in Figure 2.
Dark green segments indicate where both curves overlap (i.e. no bot accounts detected). Red segments (1) Pbots = MAX (Pregex − 0.05, 0) + Pspam Table 2. rupture points in spamming processes and potentially related events. "Date" refers to the rupture points, "prior" and "after" show the proportion of accounts potentially derived from bot processes for each rupture points, "Days" is the number of days between the rupture point and the most relevant event found in the list within the surrounding days. a no obvious link except that a similar conference (osc2010 tokyo/spring) was held a week prior spams began.

Date
Prior (  (OSS) conferences (16%), until the media (81%) took over after 2007. However, as expressed in Table 3, largest bursts of registrations were mostly correlated to external factors (e.g. Media, Mapping). Regarding the daily number of new contributors, 330 outliers were identified in which 199 days showed much smaller ratios, and 131 days much higher ones. The events that correlated with a small number of new contributions had similar explanations to the ones found for registrations. The events that correlated to high numbers of registration were dominated by upgrades (65%), mapping parties (12%) and forum threads (16%), the latest being mostly related to new editors and data importing tools. After 2007, the correlations shifted to media (43%), upgrades (34%), and mapping parties (11%). Largest bursts of new contributions are shown in Table 4.

Variations in trend components
The trend components (Figure 3, trend) show the cumulative effect of all the events that dotted the history of the project. Some of these events may have played an important role in the way the project evolved over time. Over a dozen turning points were identified independently for each curve. With one exception, all these points were paired with each other over the same dates. The five largest turning points were selected for discussion and are presented in Figure 4. The OSM project really started attracting new participants after March 2007 (A), two years after the first contributions were made. Over the preceding months, high-resolution images from Yahoo! have become available to contributors, the project moved to API 0.4, and a user-friendly editor (Potlatch 1) was set up in the "Edit" tab of the project's web page. Over the same time, the founder of the project, Steve Coast, published in the

Variations in seasonal components
Seasonal variations (Figure 3, seasonal) of registration rates follow an inverted U shape and is repeated annually over the studied period (Figure 3(a)). Average registrations are 10% above normal from April to October and 10% below normal from November to March, with a clear minimum in December (−30%). A similar pattern is seen for new contributors (Figure 3(b)). No relationship was found between seasonal peak events and known recurring statutory holiday or vacations, with the exception of Christmas Eves (minimums of both distributions). Most outstanding seasonal variations echoed large peaks of participation rather than recurring yearly variations because of the short history of the project. These peaks of participation influence the average value of recurring variations per time unit when the number of cycle is too low.

Variations in random components
Random variations (Figure 3, random) show numerous peaks on both distributions. These peaks identify specific days when an unusual (i.e. small or large) volume of participants registered or made a first contribution to the project. Regarding the daily number of new registered members, 123 outlier values were identified out of 3069 observations. Within these outliers, 22 days showed much smaller ratios while 101 days showed much higher ones. Low registration ratios happened mostly at the beginning of the project without obvious related events except for connection problems with the servers, while they all occurred on planned servers downtime later in the history of the project. The nature of the events that correlate to high registration rates has evolved over time. At the beginning of the project, burst of registrations often followed technical threads in OSM forums (29%), upgrades (18%), or Open Source Software agree with the license change even copied the entire database and started a similar project under the previous terms (i.e. "forked" the project). Finally, since only a small proportion of OSM members declined the ODbL license (Weait 2011), the OSMF called the case closed during a SOTM conference held in September 2011 (C) and decided to move ahead with the change.
The number of OSM participants then rapidly increased until the next turning point (D) in September 2012, which correlates with the Tokyo SOTM 2012 conference, when the OSMF board passed a resolution to implement the new license. The burst of new participants visible between points (C) and (D) correlated with the many online communities that changed their background maps from Google to OSM during this period. However, since it ends with the change to the ODbL license, it might also be related to calls made to the community to remap "tainted" data. The exercise aimed at remapping the data provided by those who did not agree with the ODbL license, before and after a redaction process (bot) removed all their data from the database.
Over the following months (D-E), a short burst of registrations seemed to be related to the announcement of a commercial users' summit to be held later. In May 2013 (E), a new OSM editor called ID was made available, leading to an increase in the number of participants.

Contribution delays
Generally, the average delay between users' registration and first contribution shortens gradually over time, while median delays shorten step by steps ( Figure 5).
The horizontal dotted line of Figure 5 represents a one-day delay. The graph shows that both distributions mostly overlapped until December 2006 (A). During this period, contributors waited on average 604 days before making a first edit. In April 2009 (E), the average delay OSM blog a thought about the need for the project in relation to the products offered by national mapping agencies.
The second points (B) show that both curves toppled around 2010. In October 2009, the OpenStreetMap Foundation (OSMF) board announced that its members (not OSM participants) were to vote for a license change. While the project had seen a steady increase of its participants (i.e. members and contributors) over the previous two years, the daily number of new contributors started declining at this time, followed by the number of registrations four months later, a month after the OSMF voted in support of the license change.
During this period and over the following two years, harsh discussions were held on different forums about the license change. Some of the members who did not   delays slightly dropped around May 2013 (I). Both changes occurred around the time of the arrival of a new OSM editor. The ID editor was made available on the web site in May 2013 and became the default OSM editor in August of the same year. Past this point, the average delay dropped rapidly to a point until it joined the median distribution. The latest measurements are affected by their proximity with the closing date of the history dump file used in this research. All those who signed up to the project prior that date and contributed after are not included in the graph. Only the quickest ones will have made a first contribution, which artificially shortens the delays as we move closer to the end of the data.

Contribution ratios
Figure 6 reveals that the ratios increased to reach almost 50% in 2009 before they declined to reach 27% at the end of the period covered by the analysis. The first turning point appears in May 2006 (A). According to the list of events, a first collaborative mapping weekend was held in Manchester (GBR), attracting new volunteers that were initiated to GPS and mapping operations. The events that matched the second (B) and third (C) turning points are likely linked with each other. In October 2008 (C), the OSM administrators opted out from a web site called BugMeNot.com and blocked the related OSM accounts. This site allows people to connect to web sites requiring personal accounts by making public the logins (i.e. username and password) of a few accounts. Using this clue, we found that just before the contribution ratios started lowering in May 2008 (B), someone named "bugmenot" inquired about mapping in an OSM blog. In June 2009 (D), the contribution ratios stabilized until October 2009 (E) when it started dropping. The event repository provided similar potential explanations over both turning points. In dropped below 31 days and the median delays stabilized below 20 min which may correspond to the implementation of the API 0.6.
The first drops in the median distribution (A and B) correlates with the arrival of the Yahoo! aerial imagery in OSM applet in December 2006, and then in JOSM (i.e. a popular OSM editor) eight months later. A similar effect was found in October 2007 (C) after which most median delays dropped below one day which corresponds to the time at which the API 0.5 was implemented, a move that, according to the forums content, made queries and edits easier. Another drop in on both distributions (D) appears in January 2009, with an important compression of the range of values. This change co-occurs with a new version of the Potlatch editor that brought more presets and a detailed coverage of England and Wales with public domain maps (i.e. to copy at will).
The release of the API 0.6 in mid-April 2009 (E) fitted with a significant break in both distributions. Afterward, the delays reached a minimum for a month and a half, until the distributions broke again toward higher values at the beginning of June 2009 (F). This jump to higher delays matches the announcement on the OSM blog that the OSM web site was now available in German and partly in French. Prior to that, the site was available only in English. Delays dropped again until late September 2009 (G) when the delays increased suddenly. This corresponds to the time at which the web site went available in 26 more languages. The effects on both distributions were similar to what happened at the time the web site was translated into German and French (F).
In December 2009 (H), the delays stabilized at the time the Potlatch 2 editor was released. After the advent of the Potlatch 2 editor, both distributions remained generally stable with trends toward shorter delays. However, this trend became stronger for average delays in 2013. At the same time, the mean and the variance of the median  First, an infrastructure development phase was identified, extending from 2004 to 2007. The project was initiated in 2004 but participants were able to contribute data only from 2005. The correlation found during this period suggests that the project was under construction considering its unreliable infrastructure and missing or inadequate contribution tools. Trends in daily enrollments and initial contributions ( Figure  4) show a limited increase in the number of participants until high-resolution images and user-friendly mapping applications were made available in 2007. Event association with the random components of time series analyses showed that most of lowest enrollments and contribution rates correlated with downtime and poor servers' performances. At the same time, most of their highest rates were correlated with highly technical threads on OSM forums or OSS conferences. Furthermore, the most surprising characteristic of the phase is that more than half of those who enlisted during this period waited on average two years before contributing data ( Figure 5). Registering to a project under construction and not been able to contribute data on the short term suggest that the primary motivation of these participants may not have been to contribute data. The correlations found rather suggest that their primary motivations were rather be to contribute as developers or to support project objectives. In this context, the volume of participants for whom the project may appear as either relevant, appealing or both, remained limited considering required knowledge and skills and the uncertainty of the project.
The second one is a consolidation phase that extended from 2007 to 2009. During this period, the daily enrollment rates grew by a factor of 10 and the initial contributions by a factor of 20. This high increase of initial contributions may result from the combination of contributions from new participants and from the older ones that mostly waited until this period to contribute data. Lowest enrollments and contribution rates were now fitting with planned downtime periods, while highest rates correlated with external events or upgrades for initial contributions. As the infrastructure and contribution tools were improving, the time the majority of new participants took to contribute ( Figure 5) dropped from years to hours, and the proportion of them who contributed ( Figure 6) reached almost 50%. In other words, at the end of this phase, about half of the people who enroll in the project contributed data within an hour. These correlations suggest that the volume of participants exploded only after the infrastructure was properly settled and adequate contribution tools were provided. The project was now more appealing to those who were eager to fill blank areas on the map, even if contributing still brought some uncertainties.
The last one is an operational phase that started in 2009. By definition, such phase consists in recurring maintenance operations, updates and tool improvements June 2009 (D), the OSM web site was made available both in German and in French and five months later (E) the site was available in 26 more languages, in an attempt to make the registration process more accessible by non-English-speaking contributors. Finally, the drop of contribution ratios paused between May 2010 (F) and January 2011 (G). In the first case (F), the event repository showed that two weeks prior this turning point, the new OSM members had to accept the ODbL license to register from then on. The search of an explanatory event for the last point (G) was not successful. Neither the list of events, nor the different mailing lists we consulted, provided a meaningful event.
Analyzing the 3299 profiles sampled to assess current spamming processes, we estimated that between September 2014 and February 2017, the average proportion of new members that made a contribution was about 25%. This estimation was obtained using the number of legitimate accounts found in our sample (i.e. not spammed and not matching our regular expression) and the number of these accounts that had at least one contribution. This proportion is similar to the latest values obtained in September 2014.

Discussion
The number of participants that register to a project depends on the number of people that are aware of the project and the interest it generates after they have discovered it. This interest is in turn determined by the perceptions individuals have about either project's relevance, attractiveness or both, depending on how they heard about it. The correlations we found between significant variations in participants' behavior and some events that dotted the history of the project tells us a story about both the project's evolution and participants' motivations. Although correlation does not imply causality, it seemed a first step to link participants' behavior and the contexts in which they have enrolled in the project. Three distinct phases were identified according to participants' behaviors when enrolling in the project. These phases were found regarding both the project's development and the nature of the source that made participants aware of the project.

Enrollment and project's development
If the objectives of a project do not change much over time, its attractiveness may evolve over time according to, among other things, the complexity of the tasks and the knowledge and skills required to contribute. The latter are usually mitigated by improving documentation and applications. The more adequate they are, the greater the number of participants can be. Our results suggest that participants may have had different behaviors according to the development phases of the project and, as the project evolved, the needs and tasks required may not have attracted the same type participants. parties were external events related to bursts of new participants. These gatherings (Haklay and Weber 2008;Hristova et al. 2013) provided OSM participants an opportunity to initiate friends and colleagues to the project by witnessing mapping operations and appreciating the outcome on the map. At the same time, treads from OSM forums also correlated with burst of new participants. These bursts happened either because these people, already aware of the project, enrolled after those treads, or because they used those threads to motivate friends and colleagues to enroll.
The second phase could be labeled as "seeking for authoritative approval", a period that matched with the project's consolidation phase. Early media coverage appeared to have triggered multiple bursts of registration but not all these events had an impact. Well-established media (electronic and conventional) triggered most of the largest bursts found during this period. For instance, when the BBC or Der Spiegel reported on OSM (Table 3), the burst of enrollments that followed may have resulted not only from media's popularity but also from their authoritativeness. Authoritative sources cited in these media were also correlated with some peaks of registrations. After the BBC reported positive comments from the president of the British Cartographic Society, the second-highest peak of registrations we found appeared the following day and lasted for a week. A similar effect was found when the web site "slashdotted.com" linked to a story citing an authoritative searcher of the domain (Goodchild 2007).
The latest one could be referred to as a "seeking for credibility" phase. Large peaks of enrollment happened after electronic media reported on large organizations that interacted with OSM. For instance, large bursts happened after electronic media reported that well-known online communities changed the Google's map background of their applications for OSM data. The origin of the bursts may be explained from two perspectives regarding the credibility it may have brought to the project. On the one hand, the concerned communities may have brought credibility to the project, at least from their members' perspective, on the other hand, the credibility of Google as a map provider may have been reassigned to OSM when it was chosen as an alternative. A similar effect could be considered for a large burst of enrollment after electronic media reported that Google's workers were caught vandalizing OSM, sending the message that they may have felt threatened by the project. The interactions of these organizations with OSM may not only have provided credibility to the project but, in some of these cases, it may also have brought large numbers of participants were eventually interested in freely enhancing the background map of their favorite applications.
The OSM's humanitarian contributions (Ahmouda and Hochmair 2017;Poiani et al. 2016;Soden and Palen 2014) are important activities that were widely used to made on a continuous basis. Except during the license change conflict, the daily rates of enrollment kept increasing faster, while initial contribution rates were growing at a constant pace (Figure 4). Delays before contributing were now short and constant ( Figure 5). An unexpected behavior that characterizes this phase is a drop of the contribution ratios ( Figure 6) that reached 27% in 2014, before it stabilized around this value after this date. This drop correlated with an improvement to the registration process when the interface was made available in multiple languages while the contribution tools and wiki pages (i.e. the documentation) were not translated simultaneously. Such potential language barrier when it comes to contributing to the project must be evaluated to ensure that contributors from all over the world can share their local knowledge, especially if it results from an increase in the proportion of participants that come from developing countries. Nowadays, contributing to the project involves little uncertainties or risks. Those who wish to fill remaining blank areas on the map, or add details to their neighborhood can easily contribute.

Enrollment and the sources of participants' awareness
The detailed investigation of the time series analyzes revealed correlations between high rates of enrollment and external events (i.e. media, conference and mapping activities). By definition, external events reached people from outside the project and increased the number of participants when they triggered their interest. Less than 5% of registered external events correlated with bursts of new participants, but a few of these events may have had a large impact on participants' behavior. Furthermore, we observed that the nature of these significant events shifted over time from individuals to collective, authoritative to social.
The literature has investigated the effects of "important others" on people's motivation to enroll in volunteered activities, either because of emotional links or because of their credibility (Fishbein and Ajzen 1975;Metzger 2010;Rogers 2010). OSM participants certainly had an influence on friends or colleagues to enroll or not in the project, but the private nature of these events excluded them from this analysis. However, the public nature of the events that dotted the history of the project enabled us to identify other types of "important others" from the influence they seemed to have had on participants' motivations to enroll. Three phases were identified according to the nature of the events that appeared to have motivated people to enroll as the project was developing.
The first one would be described as a "close encounter" phase that was parallel to the project's infrastructure development phase. During this phase, mapping license change process struck the values of many contributors who then vigorously opposed the change or its process. Even if the number of opponents was low (0.002% did not agree to the new license), their harsh opposition, mostly expressed in OSM forums, seemed to have had an effect on both the enrollment and initial contribution rates. These forums have an important role in online communities, and in the context of a peer production projects such as OSM. They build the community by sharing its values (von Krogh et al. 2012;Aknouche and Shoan 2013), by developing rules and norms (Fishbein and Ajzen 1975;Taylor and Todd 1995;Venkatesh et al. 2003) and by discussing community issues. During the conflict, individuals' values were questioned regarding the relevance of the license change or the validity of the process. It seems to have undermined people's perceptions about actual values and beliefs of the community, which may have refrained them from engaging until the situation was resolved. The fact that during this period many registered members refrained from making an initial contribution also confirmed that many lurkers were probing the project to see if the community were healthy and could suit their expectations (Preece, Nonnecke, and Andrews 2004;Amichai-Hamburger et al. 2016). The effect of the conflict decreased as the proportion of people accepting the license increased until the license change was finally approved.

Enrollment and the diffusion of innovations
Interestingly, participants behaviors in each of these phases were also found to be very similar to those described by Rogers (2010) in characterizing people's behaviors in the early phases of the diffusion of an innovation. In early phases of the diffusion of an innovation, Rogers categorizes participants as "Innovators", "Early adopters" and "Early majority". The "innovators" are described as participants that seek to be involved in the implementation of a new idea, are venturesome people that are highly skilled and that can apply complex technical knowledge. This description matches the behavior the participants that enroll in the first phase of the project's development. "Innovators" are also known to develop diversified social relationships and friendship among other innovators, which could explain the origin of participants' awareness about the project in its "close encounter" phase. The "early adopters" are described as having a great degree of opinion leadership, provide advice about an innovation and often serves as a role model for other participants and to keep this status they must make judicious decision about an innovation. This characterizes the origin of the events that triggered most of outstanding enrollments that happened during the second phase of the project when people seem to seek for authoritative approval. Finally, the "early majority" is said to deliberate for some time before being involved but, once done, they follow with a "deliberate publicize the project and give it credibility. Considering the large volume of media coverage reporting on these activities in the event repository, we were expecting these contributions to have triggered burst of new participants over years. However, no correlations were found that could be linked directly to these activities, except after few media reported OSM community's involvement in relief operations of Haiti earthquake. These activities probably brought large numbers of new participants to the project but, unless they have immediately triggered their enrollment, their effects could not be measured, or correlated.

Enrollment and project's internal conflicts
The correlations found with the license change milestones may have revealed important characteristics of participants' behaviors during internal conflict: a potential retaliation of few offended participants and the postponement of enrollment and contributions to the project.

Retaliation of offended participant
Most of today's digital communications are impacted by unsolicited junk information (Gyongyi and Garcia-Molina 2005;Chakraborty et al. 2016). Aggressive marketers use cheap SEO mechanism to either advertise, sell their products, or make a web site appearing as popular for search engines (Chakraborty et al. 2016). It is therefore not surprising that online collaboration sites, such as OpenStreetMap and Wikipedia are affected the creation of fake accounts containing spam contents (Yamak, Saunier, and Vercouter 2016). However, according to our analyses, the fake accounts that spammed the OSM registrations for a couple of months in 2010 and since July 2012, looks like the retaliation of one or a few offended participants. Vandalizing the OSM registration with low-quality SEO could have made the site tagged as "spammy" by search engines such as Google (2011), lowering the odds that the site will appear in the first pages of the search results (Google 2012). Another consequence is that fake registrations required more resources (e.g. processing, disk space) and generated erroneous registration statistics. With about 50% of the accounts created since 2012 originating from a spamming process, more robust protection against registration bots (e.g. CAPTCHA) should be implemented considering the current email confirmation used by OSM has proven not to be sufficient.

Postponement of enrollment and initial contributions
Internal conflicts, as the one triggered by the license change, have the potential to throw a community apart. Communities develop around perceived shared goals, values and beliefs, a unique ethos that brings people to identify themselves with a community (Stebbins 2015; Budhathoki, Nedovic-Budic, and Bruce 2010). The

Funding
This work was supported by a Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant; and Memorial University of Newfoundland.

Notes on contributors
Daniel Bégin is a PhD candidate at Memorial University where his research focuses on the behavior of voluntary geographic information contributors. Prior to that, he worked nearly 30 years for the Canadian mapping agency to develop data validation mechanisms and innovative approaches to topographic maps updating.
Rodolphe Devillers is a professor of Geography at Memorial University, specialized in GIS and Cartography, where he leads the Marine Geomatics Research Lab. His research focus includes spatial data quality, volunteered geographic information and marine geomatics. Devillers co-edited three books on spatial data quality and is an associate editor of several journals, including Marine Geodesy and Geomatica.

Daniel Bégin
http://orcid.org/0000-0002-6110-6613 willingness". This may characterize participants that enroll in the third phase of the project, both from the development perspective (they waited until everything was fully operational) and from the origin of their enrollment as they decided to move in after large organizations had provided some credibility to the project.

Conclusions
The research aimed to uncover some of the factors that affect the enrollment and the contribution to VGI projects. On the one hand, the scientific literature mostly studied online participants' motivation and interests through online surveys. However, surveys provide only time-specific information and usually offer no guarantee that those who chose to reply to the surveys are representative of the studied community. On the other hand, researchers have assessed participants' behavior regarding their lifespan, the volume of their contributions or the nature of content they provided. However, none addressed the way people enrolled and made a first contribution or how it evolved over time. A detailed analysis of the evolution of enrollments and initial contributions to the OSM project identifies trends in the nature of events that correlated with significant changes in new participants' behavior.
Our study showed important correlations between different types of events and the effects they may have had on the recruitment and participation of individuals in a large VGI project. Specifically, elements such as technological improvements to the infrastructure, media coverage, and recognition of the project by other communities were shown to correlate with direct increase in recruitment and participation. We also found that internal conflicts within a community can harm a project, even if it results only from a very small group of people. Furthermore, unpredictable consequences of such conflicts may affect the project on the long-term as shown by the spamming of the OSM registration process.
We finally established comparisons with the "Diffusion of innovation" theory which indicate that the profile of participants that enroll in the project change over time. As their profile changes, their behavior is expected to change as well, which should be considered when analyzing their contributions over time. According to the distribution of the different profiles proposed by Rogers (2010), our results suggest the OSM participants may still be issued from an "early majority" which would predict a long life to the project before exhausting participants from following phases. Those findings can help online communities to create strategies for growing and reinforcing their membership according to the profile of the participants as the project evolve or by mitigating conflicts, all actions being oriented toward contributors being enthusiastic about their participation.