Multiscale geovisual analysis of knowledge innovation patterns using big scholarly data

ABSTRACT Knowledge innovation is a key factor in industrial development and regional economic growth. Understanding regional knowledge innovation and its dynamic changes is one of the fundamental tasks of regional policy-makers and business decision-makers. Although many existing studies have been conducted to support in understanding knowledge innovation patterns, data-driven and intuitive visual analysis of georeferenced knowledge innovation has not been sufficiently studied. In this work, we analysed knowledge innovation by visually exploring big georeferenced scholarly data. More specifically, we first applied network analysis and statistical methods to derive key measures (e.g., the number of publications and academic collaborations) of knowledge innovation with multiple spatial scales. We then designed geovisualizations to explicitly represent the multiscale spatiotemporal patterns and relations. We integrated the analytical methods and geovisualizations into an interactive tool to facilitate stakeholders’ visual learning and analysis of knowledge innovation with a spatial focus. Our work shows that geovisualizations have great potential in supporting complex geoinformation communication in knowledge innovation.


Introduction
Knowledge innovation is one of the fundamental factors in regional development. Supporting knowledge innovation and collaboration becomes increasingly important in modern regional planning. Science and technology parks are good practices for the encouragement of knowledge innovation. For instance, Silicon Valley in California, Hsinchu Science Park in Taiwan, Arabianranta in Helsinki and Bio M in Bavaria, have gained popularity and have successfully led to continuous knowledge innovation and academia-industry interactions (Henriques, Sobreiro, and Kimura 2018). This, in turn, attracts ideas, knowledge, innovation, population growth and financial investment in the regions (Carrillo et al. 2014). Therefore, fostering knowledge innovation, collaboration and academic spin-offs are key strategies in regional planning and management (Fikirkoca and Saritas 2012;Friedmann and Yong 2004). Understanding knowledge innovation patterns and regional development is essential for policy and decision-making.
The growing field of urban analytics in big geodata may bring more evidence and insights into spatiotemporal patterns of various social activities, including knowledge innovation. Analytical methods have been proposed to measure and analyse many characteristics of a city, such as urban land functions, human mobility, life quality and place attractiveness. For example, Ye et al. (2021) identified the functions of urban areas using social media data. Brauer, Mäkinen, and Oksanen (2021) analysed the influence of cycling traffic on mobility using trajectory data. Sapena et al. (2021) predicted life quality in multiple dimensions by modeling socioeconomic indicators. Among a wide range of research topics, big scholarly data are widely used to understand knowledge innovation (Hoekman, Frenken, and Van Oort 2009;Galaso and Kovářík 2021). For instance, Wang et al. (2021) used big scholarly data to analyse the correlation between academic innovation and hightech industry and found that international collaboration has become more important for local industry in recent years. Another study proposed that functions of different cities in a polycentric region can be revealed by analysing multiscalar academic collaboration (Li and Phelps 2018). The structural disparities and proximity mechanisms in academic collaboration can be measured by analysing the collaboration network (Chengliang, Qingchang, and Dezhong 2017;Li, Wei, and Wang 2015). In addition, analysing scholarly data can provide quantitative evidence to support decision-making. For instance, a recent study, Imran and Jabeen (2020) analysed domestic and international scientific collaborations and found that international collaborations positively affect local knowledge innovation. Their findings provided some suggestions for research policies, such as establishing scholarships and investing in digital infrastructures.
While most existing studies have focused on applying statistical methods for strategic analysis and anticipating the future, effective visualizations, that leverage the human ability of visual analysis, play an important role in improving transdisciplinary knowledge among various stakeholders (Billger, Thuvander, and Wästberg 2017). As Kandt and Batty (2021) suggested, geovisual analysis and interactive interfaces are needed in smart cities and urban policy-making to facilitate communication between various stakeholders, such as investors and planners. The innovation capability of a region and its connections with other regions contributes greatly to economic development. Geovisualizations, illustration of specific topics and provision of geographic knowledge are beneficial tools for regional planning and decision-making. In general, it is challenging to design geovisualizations that represent the dynamic development of cities, especially when covering a long-time frame to gain strategic insights (Kandt and Batty 2021). Maps are often used to visualize spatial information in urban analytics (Maantay and Ziegler 2006). For example, Sobral, Galvão, and Borges (2019) used flow maps to study traffic and pedestrian movement and supported the analysis of passenger dynamics, ridership and urban service reliability. Zhao et al. (2020) used machine learning methods based on zoning strategies to analyse urban population distribution and showed the results on choropleth and heat maps. Their results had high accuracy and helped in communicating regional inequalities. 3D visualization was also adopted to provide an intuitive presentation of urban space and help the general public participate in the urban planning process (Wu, He, and Gong 2010).
Many studies have also been conducted to visualize big scholarly data for the analysis of knowledge innovation patterns. However, most have focused on developing static visualizations to show individual perspectives of publications, such as popular scientific domains, geographic publication hotspots, coauthorship and citation networks, whereas the connections among these perspectives and the relations of scientific networks in multiple scales were not adequately presented. Interactive tools for the integration of the visualizations and the display of the connections between them should be designed and developed to bridge this gap. However, there are several challenges for designing interactive tools. First, the design of visualizations to show multivariate and multiscale spatiotemporal data is difficult. To be specific, scholarly data has multiple variables, such as domains, journals, citations and publishers (Leydesdorff, Bornmann, and Wagner 2017), and can be analysed at multiple aggregation levels, such as author level, topic level, institute level and administration level, as well as multiple geographic coverages, such as local, municipal, regional, national and international (Galaso and Kovářík 2021;Frenken, Hardeman, and Hoekman 2009;Csomós and Lengyel 2020). Second, design and development interactive tools to support dynamic analysis of multi-level and multi-coverage networks requireinterdisciplinary knowledge, such as knowledge innovation studies, visual analytics, cartography, and human-computer interaction. Third, there are some technical challenges related to the georeferencing of implicit spatial locations, because scholarly data is normally collected with implicit locations, for example, in the form of name and address of affiliation. When conducting geospatial analysis, there is a need to convert these names and affiliations to geographic locations relying on geocoding techniques.
This study aims to combine analytical and visualization methods to uncover knowledge innovation patterns from big scholarly data. We explored knowledge innovation in multiple spatial scales over a long time span and collaboration networks in several selected research domains. Novel geovisualization methods were applied to illustrate and reveal the complex academic networks and their geographic distributions. The visualizations were further integrated into a web-based interactive analysis tool to support strategic decision-making.

Network analysis of knowledge innovation
Network analysis with a set of integrated techniques is widely applied in the social sciences to explore relations and interactions among actors in societies (Borgatti et al. 2009), such as friendship networks in organizations, follower networks in social media, spread networks of news and rumours, as well as co-authorship and citation networks in academia (Tabassum et al. 2018). In academic network analysis, network theories are widely applied with multiple levels of detail ranging from individual authors, to institutes and regional and global scales.
Fine-level knowledge networks are analysed based on the characteristics of individual nodes in the network. For instance, by analysing citation networks, Imran and Jabeen (2020) and Chen, Ibekwe-sanjuan, and Hou (2010) identified the influences of journals and institutes. Basole (2016) explored topological metrics of inter-firm collaboration networks, such as centrality and clustering coefficients, to describe the characteristics of nodes in the network. Similarly, Chen et al. (2017) calculated the metrics for directed networks, such as PageRank and connected component sizes, to describe the clusters of the network.  detected the most influential authors by analysing metrics such as number of papers, network density and centralities.
The complex patterns of the relations between knowledge innovation and industry, urban structure and regional collaborations were analysed as well utilizing network analysis methods. Wang et al. (2021) studied how the cooperation network supports innovationdriven industries. They found that interregional and international innovation collaboration are important driving forces for economic development. They compared statistics, including the total number, mean, standard deviation, minimum and maximum of regional and county-level networks. Innocenti, Capone, and Lazzeretti (2020) investigated the relationship between the structure of knowledge innovation and local industry at the firm level. Their results showed that knowledge networks are correlated with regional innovative capacity. Statistics of the networks, such as scale, density and average path, were measured in their analysis. Galaso and Kovářík (2021) compared the influence of different geographical levels of embeddedness on innovation and found that regional-and country-level innovation networks exerted different influences. They further analysed the correlation between the density of the co-patenting networks and the socioeconomic indexes, e.g. gross domestic product (GDP) per capita and human capital.
In general, analysing knowledge innovation in multiple aggregation levels, such as institutes or cities, and in multiple spatial coverages, such as local or international, is important for understanding knowledge innovation patterns. In addition, location and distance in the aforementioned studies are crucial for knowledge innovation and therefore demand analysis of knowledge innovation with a spatial focus.

Geovisualization of scholarly data
Visualization of big spatiotemporal data could reveal a large amount of information at a glance or emphasize the patterns of certain key features. It is an important tool for further analysis in decision-making (Hamdi et al. 2021). Many studies have been conducted to explore innovative, aesthetic and effective visualization designs to support the identification of significant locations and time periods. Seebacher et al. (2018) overlayed designed glyphs onto a map to show the temporal changes of certain species. Leite et al. (2020) designed donut charts which, when superimposed on a map, show economic networks. Li et al. (2020) designed glyphs to show the location and movement of different wild animals.
Various visualization methods and techniques have been developed to support understanding complex spatiotemporal trends in knowledge innovation. In the wellknown scholarly analysis tool CiteSpace (Chen et al. 2006), different visualization methods were integrated for the analysis of data from multiple perspectives. For example, network visualization with weighted nodes and links was used to support cluster detection; coloured bars to visualize the temporal periods of certain keywords; flow maps to show geographic hotspots; word clouds to illustrate the popularity of research domains.
In recent years, visualization methods have been increasingly applied in network analysis (Liu et al. 2018), and maps have been widely used as an instrument to show geographic patterns in networks. Examples include colour-coded flow maps that show the connections between nodes (Hennemann, Derudder, and Taylor 2015), contour lines that represent the degree of nodes (Zhao et al. 2015) and grid maps for spatial cluster identification (Piry et al. 2016). Furthermore, the complex patterns in collaboration networks can be represented by various geovisualizations, such as 3D symbol maps (Helbig et al. 2014), flow maps (Jenny et al. 2018) and treemap visualizations (Scheibel et al. 2020). In addition, Csomós used maps to show various perspectives of spatial patterns of publications. For instance, he used proportional symbol maps to show the hotspot of scientific output areas (Csomós 2018) and designed a flow map to show international scientific collaborations (Csomós and Lengyel 2020). Moreover, spatial information is represented in other novel visualizations. For example, Liu, Derudder, and Taylor (2014) applied alluvial diagrams to analyse international connections, and Hennemann, Derudder, and Taylor (2015) employed a chord diagram to visualize intercity networks.
Although various visualizations are proposed to explore statistical attributes and spatial distributions of networks, few interactive tools are designed and developed for visual analysis of networks and connections from different perspectives. For example, tools are needed to support the overview of networks with multiple scales and aid users in comparing the differences among networks in different scientific domains. To support such functions, relevant visualizations need to be integrated and linked into an interactive visual analytical tool, which should allow users to choose the content based on their own interests. Multiple factors should be considered in designing interactive visual analysis tools with respect to their target users, such as showing aggregated data in required geographic scales, designing the appropriate visualizations, arranging the visualizations logically and applying effective interactions (Few 2006;Zuo, Ding, and Meng 2020). Interactive tools have been designed to support various purposes, e.g., mapbased interactive tools were designed to discover cyber traffic spatial patterns (Kodituwakku et al. 2020;McKenna et al. 2016) and support visual detection and monitoring of air quality . In a big scholarly data analysis, Bach, Pietriga, and Fekete (2014) visualized the topology of collaboration among authors over time by using a space-time cube. However, more types of geovisualizations can be integrated into an interactive tool to support visual analyses of scientific collaboration patterns.

Big scholarly data
Big scholarly data is data about scientific resources and it is rapidly growing. It usually includes information about papers, authors, institutes, journals and conferences and reflects scientific activities, personnel, research domains and scholarly networks (Xia et al. 2017). Scholarly data can be easily accessible with the help of many online libraries and platforms, such as digital libraries like Web of Science, academic search engines like Google Scholar, and academic social media like ResearchGate. Many studies employed scholarly data to detect academic activities (Xia et al. 2017). For example, impact of authors can be predicted by analysing citations (Dong, Johnson, and Chawla 2016), and interactions among research communities can be detected by analysing co-authorships (Mercorio et al. 2019). Similarly, keywords and citations can be utilized to identify influential publications and research frontiers (Yan and Li et al. 2020) . The temporal changes of the popularity of research domains can be identified by analysing the number of publications and journals (Wang et al. 2020). In addition, scholarly data can also reflect academic communities, such as university groups and research institute clusters. In this study, we employ big scholarly data to explore the patterns of knowledge innovation.
We collected scholarly data from the ACM Digital Library via web crawling technologies. The ACM Digital Library (https://dl.acm.org/) is a public research platform with a large collection of literature in the field of computing. According to the ACM categories, journals, proceedings, letters and magazines reflect the emerging and established computing research, cutting-edge innovations, leading views and opinions on research, and upto-date research activities, respectively [56]. In this study, we have therefore collected publications in these four categories. The collected items are from January 2017 to December 2019, with information about the title of the publications, the name and affiliation of the author(s), the publication time, the research domain and the uniform resource locator (URL) of the webpages. The collected data set contains 137,818 publication items, 305,815 authors and 39,030 institutes worldwide. Among them, 85.0% (117,150 items) are proceedings, 10.6% (14,627 items) journals, 2.6% (3533 items) letters and 1.8% (2478 items) magazines. Table 1 shows an example of a publication item from the raw data. The names and affiliations of the authors correspond in their orders. The domain IDs follow the poly-hierarchical ACM domain ontology. 1 There are 13 main domains, such as computing methodologies and human-centered computing, and each domain is further divided into subdomains.

Methodology
We proposed a geovisual analytical workflow to examine the spatiotemporal patterns of knowledge innovation. The workflow consists of four major modules, i.e., data processing, data modeling, geovisualization and the interactive tool. Figure 1 shows the modules and their major steps. Firstly, we preprocessed the scholarly data by cleaning, georeferencing the affiliations to obtain the geographic coordinates and aggregating the publications onto geographic locations with multiple scales. Secondly, the co-authorship networks were established and analysed by different methods in the modelling layer. The institutes, cites and research domains were analysed at multiple scales. Thirdly, novel visualizations were designed to show the patterns from various perspectives with a spatial focus. Finally, an interactive visual interface was designed by formulating the design goals and designing accordingly the layout and the interactions.

Data preprocessing
Before data visualization and analysis, the scholarly data needs to be cleaned, georeferenced and aggregated. In this work, data cleaning included data restructuring and text encoding. The data was restructured into four types of entities, including paper, author, institute and domain, and these entities were connected with extra tables (as Figure 2 shows). Text encoding converts non-English letters and mathematical symbols into the UTF-8 encoding system. Next, we used the Nominatim 2 opensource geocoding service to georeference the names and addresses of affiliations into longitudes and latitudes and then project the coordinates of the affiliations onto multiple administrative divisions, such as cities, regions, or counties. The administration boundary data were collected from the http://nnu.geodata.cn:8008/ geodata open data portal. Finally, the number of publications, co-authorship, and domain information were aggregated into the multiple levels of administrations. All of the preprocessed data was stored in the PostgreSQL database.

Data modeling
Social network research is dedicated to studying relationships between connected actors (van Steen 2010). An actor could be a person, an organization, a webpage, or a region. In a network, the actors are represented as nodes and their relationships are represented as links or edges. Network analysis may focus on different levels, e.g., node level, group level, or network level. According to Li, Wei, and Wang (2015), an academic network is a complex network with some unique features, such as a heavy tail in the degree distribution and a high clustering coefficient (Yang and Yang 2008).
In this study, we measure knowledge innovation by the number of scientific publications and research collaborations. We established academic networks at two representative levels, i.e., institute level and the city level, and we analysed them locally, regionally and internationally. In the academic network at the institute level,  nodes represent institutes where authors' affiliations and weights of nodes represent the number of publications of an institute. The links represent co-authorships between two institutes, and the weights of links represent the number of joint publications. For instance, if a publication had several authors with their affiliated institutes, each distinct affiliation of the publication was represented as one node, and this publication in turn contributed to the weight of each node. Links were established between every two involved affiliations, and each pair of authors with different affiliations contributed to the weight of the link. Similarly, the network was also established at the city level, with each city had publications as a node and the intercity co-authorship as links. In addition, the weights of the nodes were calculated with integer counting, which means the affiliated institute or city receives full credit for each author of a publication. We consider the networks to be undirected because the communication among the collaborated authors could be bidirectional.
To explore knowledge innovation patterns and to further support decision-making in regional analytics, analysis methods on the network level were applied. We selected scale, node, link, degree centrality and gravity centre to measure the academic networks. In addition, the geographic locations of the nodes were calculated, and the spatial distributions were analysed. The metric scale measures the number of nodes in a network, indicating the number of institutes or cities involved in the publications. The weights of nodes and links reflect the number of publications and collaborations. We further calculated the median and standard deviation of the weights to measure the statistical distributions of the publications and the collaborations at the institute and city levels. The degree centrality of network was calculated as the number of links a node has. We used it at the city-level network to measure the activeness of cities in intercity collaborations. The gravity centre is the weighted centre of the nodes, indicating the theoretical central location of a network. We calculated it at the city-level network to analyse the spatial distribution of the knowledge innovation in the region. It is calculated as where GC is the gravity centre, Longitude GC and Latitude GC indicate the coordinate of the gravity centre, i is a node in a network, Longitude i and Latitude i indicate the coordinate of Node i, and Weight i is the number of publication of Node i.

Geovisualization
To support an intuitive and in-depth analysis of the characteristics of the academic networks, we applied geovisualization methods to reveal the patterns in different aggregated levels and spatial coverages. In particular, maps were used extensively to support exploring and understanding spatial patterns and relationships. We identified four important types of spatial information in knowledge innovation: the location and number of publications, the location and number of collaborations, the proportion of publications in different domains, and the spatial distribution of specific domains. In this study, we selected and applied geovisualizations to show these features. Figure 3 illustrates four types of selected visualization methods, i.e., a 3D symbol map, a flow map, a treemap and a chart map dedicated to the representation of these four types of data features. We applied the 3D symbol map to show the spatial locations of institutes and their number of publications, that can reflect the innovation ability. As Figure 3(a) shows, each bar represents an institute, and the height of the bars represents the number of publications in the institute. The academic network at the city level reflects regional innovation and collaboration capabilities. We adopted the flow map to show the nodes and the links of the network at the city level. As Figure 3(b) shows, each city is represented as a circle, and the number of publications of city is represented by the size of the circle. The arcs represent the collaborations among cities, and the width of arcs represents the number of co-authorships. To reflect the hierarchical structure of research domains and their relative proportions, we utilized the treemap as shown in Figure 3(c). Each square represents a domain or a subdomain. The entire square is divided into several regions. We visualized two levels of domains in the treemap: The first-level domains are represented by the squares with bold frames, and the subdomains by squares with light frames. The chart map was used to show the spatial distribution of several domains, so that the spatial distribution of individual domains can be revealed and compared. Figure 3 (d) shows a chart map representing the number of publications of four subdomains in cities. The colour of bars represents a specific academic domain, and the height of bars shows the number of publications.

Interactive tool
With the support of visual and analytical functions, interactive tools can help decision-makers improve their understanding of knowledge innovation in various applications. Based on the aforementioned studies in Section 2.1, we formulated the design goals as follows: A focus on spatial patterns. The tool should provide visualizations of the knowledge innovation network with geographic locations, so that users can identify spatial clusters and compare different areas intuitively.

Multi-level knowledge innovation exploration.
The tool should support the visualization of networks at multiple levels, such as at the institute level and city level.

Multi-coverage knowledge innovation exploration.
The tool should aid users to explore and relate between local, regional and international knowledge innovation networks.
Research domain analysis. The tool should enable users to analyze the popularity of research domains and their spatial patterns.
Easy to use. The interface should be easy to understand and provide simple but effective interactions for users.
Driven by these goals, we designed the interactive tool with an integration of the visualization methods mentioned in Section 3.2.3. Figure 4 shows the interface layout of the conceptual design, which consists of eight panels. The title panel describes the name of the tool. The information panel shows the data source and designers. There are two selector panels in the middle of the interface, , allowing users to switch areas and types of maps. The treemap panel shows the popular research domains of a specific region. The remaining three map panels show the networks in the corresponding region on 3D symbol maps, flow maps and chart maps with local, domestic and international coverage, respectively.

Case study
In this section, we applied the analysis and visualization methods on a case study to explore the knowledge innovation patterns reflected by the ACM Digital Library data. More specifically, we analysed the patterns of knowledge innovation from the perspectives of academic organizations, spatial relations and influential research domains at multiple scales. In addition, an interactive tool was designed and developed for users to explore knowledge innovation patterns in a test area.

Test area
We chose the Yangtze River Delta (YRD) as our test area. It is located in the east of China and covers more than 358,000 square kilometres (about 3.7% of China). It is one of the economically leading regions of China, including four provinces and containing 41 cities. With about 16% the Chinese population, the YRD contributes about 24% of Chinese GDP (China statistical yearbook 2020). Figure 5 shows the location of the YRD and the administrative boundaries of the provinces and cities.
Although the YRD had an economic boom during recent decades, its economy is spatially unbalanced. YRD regional economic development is highly correlated with the industrial structure, and the collaborations between cities are largely related to factors such as spatial distance, population and the local economies (Ye et al. 2019). Therefore, analysing the structure of knowledge innovation of the YRD can benefit policy- makers and planners, e.g., helping them understand the local spatial spillover effect (Wu et al. 2017) and supporting formulating research policy (Imran and Jabeen 2020; Diercks, Larsen, and Steward 2019).

Analysis of academic institutes
Analysis of individual research institutes and their collaborations is crucial for understanding sustainable innovation capability. We analysed the spatial distribution of  institutes and the number of their publications. In this analysis, we selected worldwide institutes that have joint publications with institutes located within the YRD region. The spatial patterns were visualized in the 3D symbol maps as shown in Figure 6. We found that the number of publications have increased significantly from 2017 to 2019, especially between 2018 and 2019. Several spatial clusters can be identified that have joint publications with the YRD region and that are mainly distributed in East China, Northern America and Western Europe. Besides, there was a large amount of collaboration with Singapore and Japan.
The statistics of the publications and their affiliated institutes are shown in Table 2. The statistical values also confirm the significant increasing trend of the publications and the collaborated institutes that have collaborated with the YRD from 2017 to 2019. Compared with inter-regional joint publications, the increase of intraregional publications within the YRD was more significant.

Analysis of spatial relations
Regional and international collaborations reflect the collaboration network from different perspectives and are important for resource integration in the YRD region. Li, Wei, and Wang (2015) have found that the scientific network in China meets scale-free properties. In this section, we analyse the characteristics of the intraregional academic network based on prefectural cities and the international collaborations of the YRD. We calculated the nodes and links on the aggregated level as mentioned in Section 3.2.2. We then calculated and analysed the attributes of the networks. In addition, we visualized the spatial information of the networks on flow maps as mentioned in Section 3.2.3. Figure 7 shows the spatial and temporal intraregional network patterns from 2017 to 2019. We can see that the networks were well connected in the centre while surrounded by some isolated nodes. In addition, not all cities were involved in the networks. A few cities are very active in publications: Shanghai, Hangzhou, Nanjing and Hefei. The scale of the networks did not show an obvious growth, but the number of publications and the connections in the active cities had grown greatly. This suggests that few new research institutes appeared between 2017 and 2018, but that the academic activities in the existing institutes increased greatly. Table 3 shows the statistics of the city-level network within the YRD. It confirms that the increase of publications and collaborations varied in different cities. The number of cities involved in publication increased from 2017 to 2019. The average number of publications was much higher than the median and the standard  Similarly, most intercity collaborations were between a few popular city pairs. Figure 8 shows the histogram of the degrees of the city-level network. It shows the distribution of the number of collaborations between cities within the YRD. We can see that the intercity collaborations grew. Some of the cities reached out to have more than 10 collaborated cities in 2019, whereas most cities had small-scale intercity collaborations.
By applying the same approach, we visualized the spatial and temporal patterns of international collaboration with the YRD from 2017 to 2019 in Figure 9. It shows that the number of international collaboration countries Table 3. Statistics of the academic networks at the city level within the YRD from 2017 to 2019.  showed a great increase, whereas the number of publications of each country did not show much change from 2017 to 2019.

Analysis of influential domains
Regional competitiveness of knowledge innovation can be partially reflected in the popular research domains of computing and information technology. We first visualized the proportion of the research domains and subdomains, and then selected four representative subdomains for deeper analysis with visual representations of their spatial distributions. We applied the treemap visualization method to show the proportion of various domains. In Figure 10, two levels of the domains from the publications during the years 2017 to 2019 were aggregated and visualized. The size of the squares represents the relative number of publications in each domain. The treemap is divided into 13 sections representing the 13 domains, each further divided into various subdomains. The subdomains within each domain have similar colours. We can see in the top level of Figure 10 that the two largest domains in YRD were Computing methodologies and Information systems, whereas the two largest worldwide domains were Computing methodologies and Human-centred computing.
We selected four representative subdomains, artificial intelligence (AI), human-computer interaction (HCI), life and medical sciences (LiMS), and machine learning (ML) from the popular subdomains for further spatial analysis. We showed their spatiotemporal distributions at the city level using bar charts (shown in Figure 11). In Figure 11, we can see that the number of publications in AI and ML increased exponentially in cities like Shanghai, Nanjing, Hefei, and Hangzhou from 2017 to 2019, while the publications related to HCI and LiMS increased relatively slowly. In addition, a growing number of cities had publications in these four domains.
Gravity centre was used in a previous study on spatial scientific activity assessment (Ye et al. 2019), which showed that the YRD economic gravity centre had a northwards movement from 2003 to 2015. We calculated the weighted centres of publications of the four selected subdomains and marked the trajectories of the centroids from 2017 to 2019 in Figure 12. We can see that the centroids generally moved to the north in 2018 and to the south in 2019. The centroids in the subdomains of ML and HCI moved westwards in 2018 and eastwards in 2019, whereas AI and LiMS moved eastwards in both years. However, the movement trends may be better observed with a larger data set.

An interactive visual analysis tool
We developed an interactive tool, integrating geovisualizations and providing the stakeholders a structured overview. The tool aims to support users' understanding and analysis of the knowledge innovation patterns in terms of spatial distribution of the institutes, the collaborations, and the popular domains. As described in Section 3.2.3 and shown in Figure 13, the tool is composed of eight panels, including a title panel, a information panel, an area selector, a map selector, the research domains visualized in treemaps, and the spatial distribution of the academic network visualized in flow maps in the YRD region, in Chinese domestic area, and the international coverage. The users follow the natural reading order from top to bottom (knowing  the general background to the details) and from left to right (from the innovation domains to their spatial distributions). Additionally, users could select focus regions by clicking on the area selector. If a region is selected, the maps will adjust the content to the selected region. Users could select the flow map, 3D symbol map, or chart map to show the knowledge innovation in multiple perspectives. The interactive tool is available at http:// 129.187.45.33/KnowledgeDash/. However, the interactions are not completely available in the online version. The Area Selector is functional, but the Map Selector has not yet been integrated to the current version.

Conclusion
This study proposes a geovisual analytical method coupling network analysis with geovisualizations for multiscale analysis of big scholarly data. We designed and developed an interactive visual analysis tool to support users in discovering comprehensive patterns of knowledge innovation. The spatial clusters and centroid movements were analysed and visualized at different levels, from the institute and city levels, and in different local, regional, and international coverages. The interactive tool was designed and developed to help stakeholders investigate knowledge innovation from different perspectives.
This methodology is applied to analyse academic networks in the Yangtze River Delta region using scholarly data from 2017 to 2019 collected from the ACM library.
Our analytical results showed an increasing trend in the number of publications and research collaborations in the region. We further found that there are large spatial variations in the distribution of the publications and their growth. Specifically, most publications were from a few hotspot cities such as Shanghai, Nanjing, Hefei and Hangzhou, and there are more collaborations among the hotspot cities. Inside the hotspot cities, a large part of the increase in publications originated from a few particularly active institutes. On the contrary, there were few initiatives and research collaborations in under-developed cities. For instance, the cities, Anqing, Huangshan, Quzhou and Lishui should be better integrated into the regional academic network. Moreover, the temporal trend varies in different domains. Some worldwide popular domains (AI and ML) demonstrated rapid increase in the YRD as well, whereas other popular domains grew relatively slowly. In line with the previous studies (Wang et al. 2021), international collaborations significantly impact regional innovation, such as humancentred computing. Finding such domains that are highly valued worldwide but underdeveloped in the YRD can encourage further investigatation into the causes.
Despite the aforementioned findings, we are aware of some limitations and formulated future recommendations from this study. The scholarly data from the ACM digital library used in our study is focused on science-based innovation. This may not comprehensively and adequately reflect knowledge innovation patterns. According to Figure 13. Visual interface of the analysis tool. Håkansson and Waluszewski (2007), knowledge and innovation are embedded in various recourses. To obtain a holistic picture of scientific developments, further data sources on research projects and innovation patents in various languages are required. In addition, as Castaneda and Cuellar (2020) stated, the process of innovation is facilitated by knowledge sharing and innovation. Distinguishing knowledge sharing and innovation from the publications and analyzing their patterns respectively might provide a deeper understanding of regional knowledge innovation development. Text mining methods can be integrated to separate the knowledge types. In the future, more analytical indexes, such as centrality, clustering coefficient and core-edge index, can be calculated and visualized for power users to explore more patterns about knowledge innovation. Last but not least, a usability test of the interactive tool should be carried to improve its usefulness.