Web search volume as a surrogate of public interest in biodiversity: a case study of Japanese red list species

ABSTRACT Introduction: Citizen science has contributed substantially to the quantity of biodiversity data collections and is used as an essential monitoring scheme for global conservation studies. However, there remain large gaps in the geographic and taxonomic coverage of data collections, and different levels of interest in participation and uneven distribution of participants can result in biased data collection in citizen science programs. These issues must be addressed for more efficient use of citizen science-based biodiversity data. We compared web search volumes with citizen-collected occurrence data of Japanese Red List species. Outcomes: Uneven distribution of web search volumes with different taxa was correlated with the amount of data collected by citizen-volunteered programs. Moreover, the relative web search volumes and amounts of citizen-collected data showed similar spatial patterns at the province level. Conclusion: Overall, our results indicate that web search volume can indirectly reflect potential citizen participation or interest in particular species. Web search behavior can help project coordinators estimate potential citizen engagement and refine efficient citizen participation programs for biodiversity conservation.


Introduction
The demand for biodiversity data has increased greatly as ecological issues related to human impacts on global ecosystems and adaptation targets for sustainable conditions have become important topics in conservation ecology (Boakes et al. 2010;Theobald et al. 2015;Zapponi et al. 2017). In this framework, citizen science has been successfully adapted to fulfill broadening demands for species occurrence data on regional and global scales (Chandler et al. 2017). The number of citizen science projects and publications resulting from citizen science programs is increasing rapidly (Catlin-Groves 2012), and ecology and environmental studies related to biodiversity conservation constitute more than 40% of citizen science publications (cf. geography, history and philosophy of science, computer science, and astronomy; Kullenberg and Kasperowski 2016).
Citizen participation in scientific surveys has broadened the spatial and temporal scale of data collection and simultaneously contributed to enhancing the capacity of social communities to face diverse environmental problems (Cooper et al. 2007). Citizen science programs have positive influences on increasing science literacy and awareness of the regional environment by providing rich opportunities to interact with wildlife (Evans et al. 2005;Trumbull et al. 2000). Due to rapid urbanization and concentrated habilitation in urban areas (United Nations 2018), opportunities to interact with wildlife have decreased in modern society (Imai, Nakashizuka, and Kohsaka 2018). Citizen science programs for biodiversity surveys can help participants maintain contact with wildlife and the natural environment, and further enhance public interest in conservation activities (Devictor, Whittaker, and Beltrame 2010;Miller 2006;Pocock et al. 2017).
There have been various methodological discussions on the efficient use and application of citizen science data, and several development issues and cautions for using citizen-collected data have been suggested (Kosmala et al. 2016;McKinley et al. 2017). Most studies have compared the consistency of observations between citizen participants and experienced scientists within various research topics and sampling techniques (Fitzpatrick et al. 2009;Galloway, Tudor, and Haegen 2006;Millar, Hazell, and Melles 2018). Statistical approaches to mitigate spatial or temporal bias of citizen science data have also been proposed (Bird et al. 2014;Johnston et al. 2018;Kelling et al. 2015). However, the possible influences of social aspects, including public interest among heterogeneous groups, on data collection in wildlife surveys are not well understood.
Citizen behaviors to express interest in biodiversity work similarly in the real world and cyberspace ( Figure 1). People participate in citizen science programs to obtain ecological knowledge of wildlife, learn about their local environments, and educate their children (Evans et al. 2005). In the real world, people generally perform relatively simple tasks, such as recording the occurrences of targeted wildlife (e.g., observing wildlife, taking photographs, collect samples, etc.) to contribute to citizen science projects. Meanwhile, in cyberspace, people express their interest in biodiversity by searching for information (e.g., text, images, or maps) of certain species and receive new information on species of interest as a reward for their web search behavior.
To understand the influence of public interest on citizen science programs, we focused on the conceptually similar patterns of public behavior related to the expression of interest in biodiversity in cyberspace and the real world. Culturomic data, including web search volume, have recently been used to compare public interest in diverse topics in conservation ecology (Kim et al. 2014;Ladle et al. 2016;Willson et al. 2007). The relative level of public interest can be estimated by comparing the search volume levels for different issues. We used the web search volumes of Red List species as an index of public interest in biodiversity and compared them with the quantity of data collected by the citizen survey program of a wildlife distribution survey. We questioned if public participation in biodiversity programs can be understood with web search behavior in cyberspace. We hypothesized (1) there would be a positive enhancement of data collection for more popular species (i.e., well-known species that is recognized by many people or groups) during citizen science programs; (2) higher public interest in biodiversity will increase the level of citizen participation at the regional scale.

Red list species in Japan
We used the fourth version of the Japanese Red List published in May 2018, distributed by the Ministry of Environment. We downloaded the original list from the webpage of the Biodiversity Center of Japan (Biodiversity Center of Japan 2018). This Red List includes information on extinction categories, scientific names, and common names (Japanese) for 5,613 species found in the Japanese Archipelago. The original list of Red List species in Japan assesses the extinction risk of wildlife species into nine categories, including Extinct (EX), Extinct in the Wild (EW), Endangered Ⅰ (Critically Endangered and Endangered; CR+ EN), Endangered Ⅰ-A (CR), Endangered Ⅰ-B (EN), Endangered Ⅱ (vulnerable; VU)," Near Threatened (NT), Data Deficient (DD), and Local Population (LP). We reorganized the species with the first seven categories from EX to NT (Table 1), and species with DD or LP categories were not included in the analysis.

Web search volume
Web search data of Red List species were acquired from the Google Trends dataset (Google 2018). Google has the highest web search share (72%) in Japan (StatCounter 2018), and it was assumed that their web search dataset would reflect the majority of public web-search behavior in Japan. Yahoo (22.7%) and Bing (4.5%) had the next highest market shares. Google Trends calculates the relative search rate of a specific term divided by the total searches in a specific time and region (Google 2018). These web search datasets make it possible to compare the public interest in groups of search terms using web search logs by Internet users.
We used the Japanese common names of 5,172 Red List species as input queries for Google Trends (Table 1). Species names were entered in groups of five and each group contained a word to rescale the results (Kim et al. 2014). The web search volumes of each group were rescaled based on the value of the same search term. The search period was set from January 2004 to December 2017, and the target area was all of Japan. For the top 10% of species based on web search volume, the search results were re-checked to see if the common name of the species was being used simultaneously with a different meaning (Correia 2018). In the case of a duplication, the web search volume was corrected by adding additional queries or based on the relative amount of web pages related to the species. The sum of monthly web search volume records for each species and their values for each prefecture was used for the following analysis. Species with a web search volume less than 0.1 were not included.
The relationships between the web search volumes of Red List species and other characteristics (i.e., Red List category, taxon, and length of species name) were examined by comparing their coefficients in the multiple linear regression model. Category NT of the Red List category and algae in the taxonomic group were used as the reference categories for the comparison of categorical variables. The estimates of variables were represented by their mean and confidence intervals (CI 2.5, 97.5%). The "lm" function in R (R Development Core Team 2016) was used for analysis of the regression model. The patterns of web search volumes for different taxa were compared graphically by calculating the species rank distributions. The rank was calculated based on the web search volume of each species in decrease order of web search volume.

Species occurrence records from citizen science programs
Several nationwide citizen science projects that surveyed the distribution of wildlife were widely organized in the 2000s in Japan (Kobori et al. 2016). Species occurrence records from these survey projects are shared through their own webpages or online biodiversity platforms (e.g., Global Biodiversity Information Facility, Ikimono Log, etc.). Ikimono Log (English: records of living things) is an online biodiversity platform for sharing data from citizen science surveys and the results of government monitoring (Biodiversity Center of Japan 2018b). More than 2,468,000 observations of wildlife in Japan and 162,600 observations of Red List species have been deposited in this platform (records as of 1 July 2018).
Based on the same Red List species lists used for the previous web search analysis, we organized the species occurrence records collected from the citizen survey program in the Ikimono Log platform. We downloaded the data as a Darwin Core for biodiversity informatics. These standard format data included information such as Darwin Core ID, species name, taxonomic information, date of observation, locality, project name, etc. Only occurrence records made by citizens were collected and included in the analysis. Duplicated observations were further omitted and 3,162 records were retained. We summed the number of occurrence records of each Red List species for a pair-wise comparison with their web search volume.
Since the spatial records of Red List species at the prefecture level were limited, the total number of species occurrence records collected by the citizens in each prefecture was used as a regional indicator of citizen participation in biodiversity programs. The number of occurrence records in each prefecture was divided by population density to correct for the area of the prefecture and the population size, and then used as the relative effort of occurrence data collection at the prefecture level. The population density of each prefecture was downloaded from the Statistics Bureau of Japan (Statistics Bureau 2018). We also organized the number of environmental NGOs (Non-Governmental Organizations) working in each prefecture (Ministry of Environment 2017). The relationship between web search volume and the relative effort of data collection was analyzed with ordinary least squares regression analysis using the "lm" function in R and the sjPlot package. The average web search volume of Red List species at the prefecture level and the number of species occurrence records collected by citizen science projects were visualized with map projections using the "Spatial Join" tool in ArcMap (ver. 10.5; ESRI, USA).

Web search volumes of red list species
The web search volumes of 4,537 Red List species were identified, of which 652 species had significant search volumes (> 0.1). We identified significant influences of Red List category, taxonomic group, and length of species name on web search volume, although these variables explained only a small part of the variability (R 2 : 0.048, p < 0.001; Table 2, Figure 2). The web search volume of the EX/EW category was higher than that of the NT category, and those of vertebrates were higher than the algae group. Nipponia nippon (Crested Ibis), belonging to the EW category, showed the highest web search volume among Red List species (Supplemental Table S1). The mean web search volumes were 57.09 ± 15.86 for the CR/EN category, 60.26 ± 16.34 for the VU category, and 23.42 ± 5.71 for the NT category (Figure 2(a)). Higher web search volumes tended to be found for higher taxonomic groups (Table 2; Figure 2(b)). The web search volumes of birds (186.36 ± 40.13) and fish (212.28 ± 67.41) were higher than those of plants (11.76 ± 1.83), invertebrates (13.30 ± 4.60), and insects (19.48 ± 8.50). Notably, one algae group species (Aegagropila linnaei, marimo) had a very high web search volume (4,895; Supplemental Table S1). The average web search volume of other algae species was 2.64 ± 1.68, which was the lowest among the taxonomic groups. And species name length had a negative association with web search volume.
Different distribution patterns among the biological taxa were represented well by the results of the species rank distributions using web search volume (Figure 3). Web search volumes for endangered species within the top 10 species per taxa were very high, and the values tended to decrease rapidly thereafter. Algae showed the steepest decrease in web search volume among taxonomic groups. Plants and birds showed more gradual decreases in web search volume compared to other taxa. The rate of decrease in web search volume followed the order plants < birds < fish < insects < invertebrates < amphibians/reptiles < mammals < algae.

Relationship between web search volumes and species occurrence records
The web search volumes of 161 species listed in the Red List were compared to the number of occurrence records collected by citizen survey projects (Figure 4). The results showed that the number of collected records increased as the web search volume of the species increased (β: 0.02, R 2 : 0.17, p < 0.001). The rate of increase was highest for amphibians/reptiles (8.63 × 10 −2 ), followed by birds (4.46 × 10 −2 ), plants (1.63 × 10 −2 ), insects (1.10 × 10 −2 ), and mammals (0.19 × 10 −2 ). The number of occurrence records of invertebrates and fish was not influenced by web search volume.
The spatial patterns of web search volumes and the number of wildlife occurrence records by citizen survey programs were compared at the prefecture level ( Figure 5). Web search volumes of 355 species on the Red List were available at the prefecture level ( Figure 5(a)). The mean percentage of web search volume for Red List species was highest in Hokkaido prefecture 4.47(%), followed by Chiba (4.07%), Tochigi (4.01%), Ibaraki (3.96%), and Saitama (3.91%) prefectures. Hokkaido prefecture and the middle-eastern part of Japan showed relatively higher web search volumes for Red List species compared to other regions. The number of wildlife occurrence records collected by citizen surveys was highest in Tokyo (9,986 records), followed by Kanagawa (5,656), Aichi (4,473), Chiba (3,311), Fukuoka (3,310), and Ibaraki (2,046) prefectures. The relative effort of occurrence data collection was highest in Miyagi (6.26), followed by Kagoshima (4.77), Niigata (4.49), and Ibaraki (4.27) prefectures ( Figure 5(b)). Table 2. Estimated coefficient in linear model assessing the effects of the categories of red list category, taxonomic group, and the length of species name on the web search volume of red list species. Category "NT" of the red list category and "algae" in the taxonomic group was used as the reference category. P-values lower than 0.05 are in bold.

Parameters
Estimate ( We further identified that increased web search volume for Red List species had a positive association with the effort of occurrence data collection at the prefecture level (R 2 : 0.14, p = 0.010; Figure 6). Miyagi, Kagoshima, Kumamoto, Yamanashi, Niigata, and Fukuoka prefectures had relatively higher efforts of species data collections records, although their web search volumes were lower than average. These prefectures tend to have a higher number of environmental NGOs. For example, Fukuoka (126)

Discussion
Web search volume as a surrogate of public interest in biodiversity Differences in public interest for different wildlife species have been reported in several studies (Jarić et al. 2016;Kim et al. 2014;Troumbis 2017). Our comparison with web search volumes of Red List species also showed unbalanced distributions among different Red List categories and biological taxa. However, our results did not fully explain the gaps in web search volumes among different species. Thus, other external factors (e.g., economic importance of species, amount of conservation funding related to the species, or charismatic species) besides Red List category, taxonomic group, or length of species name may have greater influences on web search volume. For instance, people are generally more interested in taxa with unique physical characteristics or with a close phylogenetic distance from humans (Proença, Pereira, and Vicente 2008;Willson et al. 2007). Biological taxa with large amateur interest groups (i.e., plant identification or bird watching groups) showed a more even distribution of web search volumes. Unbalanced interest for biological taxa was further reflected by the number of citizen science programs per taxa (Devictor, Whittaker, and Beltrame 2010). For instance, bird censuses constituted the largest portion of citizen science programs, and programs focused on iconic species generally had broad spatial coverage.
Interestingly, the same bird species (i.e., Crested Ibis, which had a government-based restoration program) had the highest web search volume both in Japan and Korea (Kim et al. 2014). The Crested Ibis was a very common species in local rice paddies in the nineteenth century, and it had close cultural connection in both countries. The Crested Ibis was extinct in the wild in 1981 in Japan and 1979 in Korea, and the Ministry of Environment initiated a  restoration program to release Crested Ibises into wild habitats (Toyoda 2017). Conservation issues, like restoration, can increase the number of public articles, scientific reports, and participation of local people (Jarić et al. 2018), elevating the social exposure of endangered species.

Increased tendency for data collection of popular species
The level of understanding or interest of society influences public participation in citizen science programs. A comparison of web search volume (i.e., a surrogate  of public interest) and the number of occurrence records by public participation showed a positive relationship in various taxa. Species with higher web search volumes also tended to have higher reports of species occurrence. However, large deviations in the regression of our results may indicate a possibility that it has been influenced by the relative abundance of different species. Considering the positive relationship between the amount of data and public interest, voluntary participation in occurrence data collection for species with lower public interest may result in insufficient data collection. Besides, differences in data collected by citizen volunteers and experts were also larger in lesser-known taxa than in well-recognized taxa (Delaney et al. 2008;Fitzpatrick et al. 2009).
Some successful cases of citizen science have specified targeted species groups with a standardized method, reducing sampling errors at the individual level (Barlow et al. 2015;Crall et al. 2011;Devictor, Whittaker, and Beltrame 2010;Pocock and Evans 2014). Other groups have asked citizens to collect raw data (e.g., photographs of wildlife, location records, or raw soil samples) and further analysis (e.g., identification and counting) was done by experienced experts (Suzuki-Ohno et al. 2017;Zapponi et al. 2017). All data inspection by expert systems has been adopted in diverse projects and proofed for accuracy. This approach significantly reduces individual error, but it has application limits based on the number of experts available. Considering the significant differences in social interest for different taxa, it is thought that a more sophisticated research design would be necessary if the target species has less public interest compared to other species groups. Basically, these findings emphasize that survey programs should be accompanied by citizen awareness and training programs to enhance the efficiency and accuracy of citizen data collection (Gardiner et al. 2012;Kosmala et al. 2016).
The use of citizen survey data has increased in conservation ecology with the development of the species distribution model (Fournier et al. 2017). When comparing the distribution patterns of different species using citizen science data, it is necessary to consider potential bias related to the design of the survey program. The collection of species distribution data by citizen participation is influenced by the level of public interest as well as the population rarity, distribution range, and accessibility to the survey area (Geldmann et al. 2016;Lennon et al. 2004;Mair and Ruete 2016;Xue et al. 2016). For example, studies have reported spatial biases of citizen science data due to unbalanced observation patterns related to access proximity (Geldmann et al. 2016). In particular, Snäll et al. (2011) presented a possible bias related to the willingness behavior of participants. They found a change of willingness to report certain species groups, which resulted in different abundance trends in the monitoring results. Our data further suggest a possible influence of public interest on the observation effort for biodiversity surveys, where prefectures with higher web search volumes had higher efforts of occurrence data collection from citizen science programs. And we partly identified some prefectures with a large number of environmental NGOs (e.g., Fukuoka, Miyagi, and Niigata) tend to have higher efforts of species data collection compared to their level of web search volume. The social context and regional environmental conditions related to the level of local participation activities can also have a positive influence on successful data collection by citizen science programs.
Imai, Nakashizuka, and Kohsaka (2018) compared the temporal changes in public observations for 12 wildlife species and found a significant decrease of observation experience over 15 years. It was difficult to directly compare these results with our web search volume results, as we had different temporal and spatial coverages (Supplemental Figure S1). However, the temporal patterns for various species (e.g., dandelions, swallows, cuckoos, and cabbage butterflies) were comparable in the results from the two different approaches. The findings imply that the relative experience and interest of the public for wildlife is not consistent overtime and changes with sociological or environmental context. To maintain the sustainability of data collection in citizen science, it is important to consider the fluctuations of public interest in biodiversity, which would help increase the accuracy of longterm changes in the temporal abundance or distribution of wildlife populations derived from citizen science (Kamp et al. 2016). Thus, culturomic tools such as web search volume data can be used to estimate public interest trends before the actual research is done.

Conclusion
We demonstrated the adaptability of web search volume data as a surrogate of public interest in biodiversity. We also showed that different levels of public interest in certain taxa or species can result in biased data collection in citizen science programs. Minimizing bias in data collection is one of the main issues in citizen research platforms. Citizen science must apply standardized methods to prevent spatial bias and reduce skewed reports for more popular species. It is important to include public awareness programs to minimize biased data collection due to an unbalanced understanding or interest for certain species groups. Culturomic tools, including web search volume data, can be used as supporting tools to screen for efficient target species, which can enhance public participation. The design of citizen monitoring programs can be further refined by considering differences in public interest at the regional level.